From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id p3PAaGfg009855 for ; Mon, 25 Apr 2011 12:36:16 +0200 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AloCAOtNtU3U4366kWdsb2JhbACET6BlFAEBAQEJCwsHFAMisDqQGwKBJ4NQfQSJE4kq X-IronPort-AV: E=Sophos;i="4.64,265,1301868000"; d="scan'208";a="93742547" Received: from moutng.kundenserver.de ([212.227.126.186]) by mail4-smtp-sop.national.inria.fr with ESMTP; 25 Apr 2011 12:36:10 +0200 Received: from office1.lan.sumadev.de (dslb-094-219-210-240.pools.arcor-ip.net [94.219.210.240]) by mrelayeu.kundenserver.de (node=mrbap3) with ESMTP (Nemesis) id 0M5P83-1Ptkv700r9-00zSc0; Mon, 25 Apr 2011 12:36:10 +0200 Received: from [192.168.1.111] (546BE640.cm-12-4d.dynamic.ziggo.nl [84.107.230.64]) by office1.lan.sumadev.de (Postfix) with ESMTPA id A966E5F702; Mon, 25 Apr 2011 12:36:09 +0200 (CEST) From: Gerd Stolpmann To: Jon Harrop Cc: Caml List In-Reply-To: <013e01cc02d7$e23b8f00$a6b2ad00$@ffconsultancy.com> References: <013e01cc02d7$e23b8f00$a6b2ad00$@ffconsultancy.com> Content-Type: text/plain; charset="UTF-8" Date: Mon, 25 Apr 2011 12:36:08 +0200 Message-ID: <1303727768.3782.66.camel@thinkpad> Mime-Version: 1.0 X-Mailer: Evolution 2.28.1 Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:+RnAuzj9QSbBbWVTttMZrczkW2/u7/IqWmjhUZZYx26 cGQdbeX+zK6Qznv0HuPe82awKkrptkI0KbP8htSAYfTwTvdlJ6 LZNstUDWZ1olGUFZk88sta0TLvVHrE7Uy0trmGeKvB9dkqcXoC SuSzy9CVvZZ9X5JHy3ghoMug4rqxJqq0URO9NrkO3Nb/j1mCPk pUcbFACZyOi9QQCFYoAyw== Subject: Re: [Caml-list] Parallel n-queens solver Am Montag, den 25.04.2011, 00:32 +0100 schrieb Jon Harrop: > Gerd recently posted this article about parallelizing an n-queens solver in > OCaml: > > http://blog.camlcity.org/blog/multicore3.html > > The idea is to reuse the same basic definitions and parallelize the program > simply by changing the "run" function. I have tried to translate this OCaml > to F# without benefit of being able to run the original OCaml code but I > have checked the results against the known solutions to this problem. > > Here is Gerd's sequential OCaml: > > module Sequential = struct > let run n = > let t0 = Unix.gettimeofday() in > let ht = Hashtbl.create 91 in > for k = 0 to n-1 do > solve k n > (fun b -> > if not (Hashtbl.mem ht b) then ( > let b = Array.copy b in > List.iter > (fun b' -> > Hashtbl.add ht b' () > ) > (transformations b); > print b > ) > ) > done; > let t1 = Unix.gettimeofday() in > printf "Number solutions: %n\n%!" (Hashtbl.length ht / 8); > printf "Time: %.3f\n%!" (t1-.t0) > end > > My equivalent F#: > > module Sequential = > let run n = > let timer = System.Diagnostics.Stopwatch.StartNew() > let solve k = > let sols = ResizeArray() > solve k n (transformations >> Seq.min >> sols.Add) > sols.ToArray() > let sols = > Array.init n solve > |> Seq.concat > |> Seq.distinct > |> Seq.length > printfn "Number solutions: %n" sols > printfn "Time: %.3f" timer.Elapsed.TotalSeconds > > Gerd's sequential OCaml run on his 4-core Opteron and my sequential F# run > on a 4-core W3520 have very similar performance characteristics. > > Now for parallelization. Gerd's fastest parallel OCaml solution (aka MP2) is > 287 lines long. In contrast, the F# can be parallelized by adding just 12 > characters to the source code of the sequential implementation: > > module Parallel = > let run n = > let timer = System.Diagnostics.Stopwatch.StartNew() > let solve k = > let sols = ResizeArray() > solve k n (transformations >> Seq.min >> sols.Add) > sols.ToArray() > let sols = > Array.Parallel.init n solve > |> PSeq.concat > |> PSeq.distinct > |> PSeq.length > printfn "Number solutions: %n" sols > printfn "Time: %.3f" timer.Elapsed.TotalSeconds > > I just replaced Array.init with Array.Parallel.init and Seq with PSeq. > > Moreover, this trivially parallelized F# implementation also achieves > performance on this machine comparable to Gerd's parallel OCaml on his > machine. In particular, the absolute performance results for my parallel F# > are better in every single case. However, it is not clear how scalable these > parallel solutions are. On an 8-core E5405 Xeon I get only 5x speedup > compared to 3.8x speedup on 4 cores. > > There can be little doubt that this solution is vastly more readable and > maintainable though. Well admitted. Netmulticore is an add-on to an existing compiler and runtime, which explains a lot of the additional verbosity. Also, it is right now focusing on the core mechanisms for parallelization, so there is nothing like a parallel array initializer. Actually, I'm quite happy that a quite complex approach like MP2 (using 3 processes communicating with each other, and a bit over-engineered) can be worked out in only 287 lines. Without having checked in detail, I'm quite sure a number of parallel design patterns can be supported by higher-level constructs. Because of the "fork-style" process creation, not everything is well-suited for this, e.g. a parallel Array.init makes only sense when the array lives in shared memory (which exists as Netmcore_array). In some sense, the challenge is how well three ideas can be brought together: - fork-style multi-processing - value heaps in shared memory that support mutation - the right mix of functional and imperative programming that is best for the problem to solve Generally, I think FP data structures are easier to support in this context, e.g. with parallel list or stream operations (imagine parallelized map and filter operations). Let's see how this evolves - Netmulticore is at its beginning, and there is still a lot of room for improvement. Gerd > > -- > Dr Jon Harrop, Flying Frog Consultancy Ltd. > http://www.ffconsultancy.com > > -- ------------------------------------------------------------ Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de Phone: +49-6151-153855 Fax: +49-6151-997714 ------------------------------------------------------------