caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Parallel n-queens solver
@ 2011-04-24 23:32 Jon Harrop
  2011-04-25 10:36 ` Gerd Stolpmann
  0 siblings, 1 reply; 5+ messages in thread
From: Jon Harrop @ 2011-04-24 23:32 UTC (permalink / raw)
  To: Caml List

Gerd recently posted this article about parallelizing an n-queens solver in
OCaml:

  http://blog.camlcity.org/blog/multicore3.html

The idea is to reuse the same basic definitions and parallelize the program
simply by changing the "run" function. I have tried to translate this OCaml
to F# without benefit of being able to run the original OCaml code but I
have checked the results against the known solutions to this problem.

Here is Gerd's sequential OCaml:

module Sequential = struct
  let run n =
    let t0 = Unix.gettimeofday() in
    let ht = Hashtbl.create 91 in
    for k = 0 to n-1 do
      solve k n
	(fun b ->
	   if not (Hashtbl.mem ht b) then (
	     let b = Array.copy b in
	     List.iter
	       (fun b' ->
		  Hashtbl.add ht b' ()
	       )
	       (transformations b);
	     print b
	   )
	)
    done;
    let t1 = Unix.gettimeofday() in
    printf "Number solutions: %n\n%!" (Hashtbl.length ht / 8);
    printf "Time: %.3f\n%!" (t1-.t0)
end

My equivalent F#:

module Sequential =
  let run n =
    let timer = System.Diagnostics.Stopwatch.StartNew()
    let solve k =
      let sols = ResizeArray()
      solve k n (transformations >> Seq.min >> sols.Add)
      sols.ToArray()
    let sols =
      Array.init n solve
      |> Seq.concat
      |> Seq.distinct
      |> Seq.length
    printfn "Number solutions: %n" sols
    printfn "Time: %.3f" timer.Elapsed.TotalSeconds

Gerd's sequential OCaml run on his 4-core Opteron and my sequential F# run
on a 4-core W3520 have very similar performance characteristics.

Now for parallelization. Gerd's fastest parallel OCaml solution (aka MP2) is
287 lines long. In contrast, the F# can be parallelized by adding just 12
characters to the source code of the sequential implementation:

module Parallel =
  let run n =
    let timer = System.Diagnostics.Stopwatch.StartNew()
    let solve k =
      let sols = ResizeArray()
      solve k n (transformations >> Seq.min >> sols.Add)
      sols.ToArray()
    let sols =
      Array.Parallel.init n solve
      |> PSeq.concat
      |> PSeq.distinct
      |> PSeq.length
    printfn "Number solutions: %n" sols
    printfn "Time: %.3f" timer.Elapsed.TotalSeconds

I just replaced Array.init with Array.Parallel.init and Seq with PSeq.

Moreover, this trivially parallelized F# implementation also achieves
performance on this machine comparable to Gerd's parallel OCaml on his
machine. In particular, the absolute performance results for my parallel F#
are better in every single case. However, it is not clear how scalable these
parallel solutions are. On an 8-core E5405 Xeon I get only 5x speedup
compared to 3.8x speedup on 4 cores.

There can be little doubt that this solution is vastly more readable and
maintainable though.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Parallel n-queens solver
  2011-04-24 23:32 [Caml-list] Parallel n-queens solver Jon Harrop
@ 2011-04-25 10:36 ` Gerd Stolpmann
  2011-04-25 16:36   ` Frédéric Gava
  0 siblings, 1 reply; 5+ messages in thread
From: Gerd Stolpmann @ 2011-04-25 10:36 UTC (permalink / raw)
  To: Jon Harrop; +Cc: Caml List

Am Montag, den 25.04.2011, 00:32 +0100 schrieb Jon Harrop:
> Gerd recently posted this article about parallelizing an n-queens solver in
> OCaml:
> 
>   http://blog.camlcity.org/blog/multicore3.html
> 
> The idea is to reuse the same basic definitions and parallelize the program
> simply by changing the "run" function. I have tried to translate this OCaml
> to F# without benefit of being able to run the original OCaml code but I
> have checked the results against the known solutions to this problem.
> 
> Here is Gerd's sequential OCaml:
> 
> module Sequential = struct
>   let run n =
>     let t0 = Unix.gettimeofday() in
>     let ht = Hashtbl.create 91 in
>     for k = 0 to n-1 do
>       solve k n
> 	(fun b ->
> 	   if not (Hashtbl.mem ht b) then (
> 	     let b = Array.copy b in
> 	     List.iter
> 	       (fun b' ->
> 		  Hashtbl.add ht b' ()
> 	       )
> 	       (transformations b);
> 	     print b
> 	   )
> 	)
>     done;
>     let t1 = Unix.gettimeofday() in
>     printf "Number solutions: %n\n%!" (Hashtbl.length ht / 8);
>     printf "Time: %.3f\n%!" (t1-.t0)
> end
> 
> My equivalent F#:
> 
> module Sequential =
>   let run n =
>     let timer = System.Diagnostics.Stopwatch.StartNew()
>     let solve k =
>       let sols = ResizeArray()
>       solve k n (transformations >> Seq.min >> sols.Add)
>       sols.ToArray()
>     let sols =
>       Array.init n solve
>       |> Seq.concat
>       |> Seq.distinct
>       |> Seq.length
>     printfn "Number solutions: %n" sols
>     printfn "Time: %.3f" timer.Elapsed.TotalSeconds
> 
> Gerd's sequential OCaml run on his 4-core Opteron and my sequential F# run
> on a 4-core W3520 have very similar performance characteristics.
> 
> Now for parallelization. Gerd's fastest parallel OCaml solution (aka MP2) is
> 287 lines long. In contrast, the F# can be parallelized by adding just 12
> characters to the source code of the sequential implementation:
> 
> module Parallel =
>   let run n =
>     let timer = System.Diagnostics.Stopwatch.StartNew()
>     let solve k =
>       let sols = ResizeArray()
>       solve k n (transformations >> Seq.min >> sols.Add)
>       sols.ToArray()
>     let sols =
>       Array.Parallel.init n solve
>       |> PSeq.concat
>       |> PSeq.distinct
>       |> PSeq.length
>     printfn "Number solutions: %n" sols
>     printfn "Time: %.3f" timer.Elapsed.TotalSeconds
> 
> I just replaced Array.init with Array.Parallel.init and Seq with PSeq.
> 
> Moreover, this trivially parallelized F# implementation also achieves
> performance on this machine comparable to Gerd's parallel OCaml on his
> machine. In particular, the absolute performance results for my parallel F#
> are better in every single case. However, it is not clear how scalable these
> parallel solutions are. On an 8-core E5405 Xeon I get only 5x speedup
> compared to 3.8x speedup on 4 cores.
> 
> There can be little doubt that this solution is vastly more readable and
> maintainable though.

Well admitted. Netmulticore is an add-on to an existing compiler and
runtime, which explains a lot of the additional verbosity. Also, it is
right now focusing on the core mechanisms for parallelization, so there
is nothing like a parallel array initializer. Actually, I'm quite happy
that a quite complex approach like MP2 (using 3 processes communicating
with each other, and a bit over-engineered) can be worked out in only
287 lines.

Without having checked in detail, I'm quite sure a number of parallel
design patterns can be supported by higher-level constructs. Because of
the "fork-style" process creation, not everything is well-suited for
this, e.g. a parallel Array.init makes only sense when the array lives
in shared memory (which exists as Netmcore_array). In some sense, the
challenge is how well three ideas can be brought together:

- fork-style multi-processing
- value heaps in shared memory that support mutation
- the right mix of functional and imperative programming that is best
  for the problem to solve

Generally, I think FP data structures are easier to support in this
context, e.g. with parallel list or stream operations (imagine
parallelized map and filter operations). Let's see how this evolves -
Netmulticore is at its beginning, and there is still a lot of room for
improvement.

Gerd

> 
> -- 
> Dr Jon Harrop, Flying Frog Consultancy Ltd.
> http://www.ffconsultancy.com
> 
> 


-- 
------------------------------------------------------------
Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Parallel n-queens solver
  2011-04-25 10:36 ` Gerd Stolpmann
@ 2011-04-25 16:36   ` Frédéric Gava
  2011-04-25 19:30     ` Jon Harrop
  0 siblings, 1 reply; 5+ messages in thread
From: Frédéric Gava @ 2011-04-25 16:36 UTC (permalink / raw)
  To: caml-list

Dear all,

> Without having checked in detail, I'm quite sure a number of parallel
> design patterns can be supported by higher-level constructs.

It is the well known "skeleton paradigm" and BSP model (twice, since 1990)

Skeletons:
http://en.wikipedia.org/wiki/Algorithmic_skeleton
http://homepages.inf.ed.ac.uk/mic/Skeletons/index.html

BSP:
http://en.wikipedia.org/wiki/Bulk_Synchronous_Parallel
http://www.bsp-worldwide.org/

Frédéric Gava

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [Caml-list] Parallel n-queens solver
  2011-04-25 16:36   ` Frédéric Gava
@ 2011-04-25 19:30     ` Jon Harrop
  2011-04-25 23:33       ` Eray Ozkural
  0 siblings, 1 reply; 5+ messages in thread
From: Jon Harrop @ 2011-04-25 19:30 UTC (permalink / raw)
  To: frederic.gava, caml-list

Frédéric wrote:
> It is the well known "skeleton paradigm" and BSP model (twice, since 1990)
> 
> Skeletons:
> http://en.wikipedia.org/wiki/Algorithmic_skeleton
> http://homepages.inf.ed.ac.uk/mic/Skeletons/index.html
> 
> BSP:
> http://en.wikipedia.org/wiki/Bulk_Synchronous_Parallel
> http://www.bsp-worldwide.org/

Very true. Subsequent fiddling turned up the following F# solution that uses a sequential pair of calls to Parallel.init (i.e. direct bulk synchronous parallelism) instead of the data flow parallelism:

  let run n =
    let timer = System.Diagnostics.Stopwatch.StartNew()
    let solve k =
      let sols = Array.init (n*n) (fun _ -> ResizeArray())
      solve k n (fun b ->
        let b = transformations b
        sols.[b.[1]*n + b.[0]].Add b)
      sols |> Array.map (fun sols -> sols.ToArray())
    let sols =
      let sols = Array.Parallel.init n solve
      Array.Parallel.init (n*n) (fun i ->
        let s = System.Collections.Generic.HashSet(HashIdentity.Structural)
        for sols in sols do
          let sols = sols.[i]
          for i=0 to sols.Length-1 do
            s.Add sols.[i] |> ignore
        s.Count)
      |> Array.sum
    printfn "%d solutions %d-queens problem in %gs" sols n timer.Elapsed.TotalSeconds

That old "invoke" combinator that evaluates a function application on a forked process and marshals the result back could be used to implement a naïve Parallel.init and, therefore, this solution could be translated to OCaml quite easily. However, it would still lack load balancing. That turns out to be ok in this particular case because the generated data is quite evenly distributed (and we assume that nothing else is running on the machine) but real applications might not be so lucky...

Turns out you can also use data flow parallelism entirely by replacing Array.Parallel.init with PSeq.init. In theory, this allows the second phase to occur concurrently with the first phase, being updated as new results are generated. In practice, I don't see a speedup.

Cheers,
Jon.




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Parallel n-queens solver
  2011-04-25 19:30     ` Jon Harrop
@ 2011-04-25 23:33       ` Eray Ozkural
  0 siblings, 0 replies; 5+ messages in thread
From: Eray Ozkural @ 2011-04-25 23:33 UTC (permalink / raw)
  To: Jon Harrop; +Cc: frederic.gava, caml-list

[-- Attachment #1: Type: text/plain, Size: 1033 bytes --]

On Mon, Apr 25, 2011 at 10:30 PM, Jon Harrop <jon@ffconsultancy.com> wrote:

> Frédéric wrote:
> That old "invoke" combinator that evaluates a function application on a
> forked process and marshals the result back could be used to implement a
> naïve Parallel.init and, therefore, this solution could be translated to
> OCaml quite easily. However, it would still lack load balancing. That turns
> out to be ok in this particular case because the generated data is quite
> evenly distributed (and we assume that nothing else is running on the
> machine) but real applications might not be so lucky...
>
> Turns out you can also use data flow parallelism entirely by r
>
>
Compare implementations of the adaptive quadrature algorithm. That doesn't
balance so easily :)
http://en.wikipedia.org/wiki/Adaptive_quadrature

Cheers,

-- 
Eray Ozkural, PhD candidate.  Comp. Sci. Dept., Bilkent University, Ankara
http://groups.yahoo.com/group/ai-philosophy
http://myspace.com/arizanesil http://myspace.com/malfunct

[-- Attachment #2: Type: text/html, Size: 1557 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-04-25 23:33 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-24 23:32 [Caml-list] Parallel n-queens solver Jon Harrop
2011-04-25 10:36 ` Gerd Stolpmann
2011-04-25 16:36   ` Frédéric Gava
2011-04-25 19:30     ` Jon Harrop
2011-04-25 23:33       ` Eray Ozkural

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).