caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Jon Harrop <jon@ffconsultancy.com>
To: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] Re: multicore wish
Date: Tue, 22 Dec 2009 19:12:50 +0000	[thread overview]
Message-ID: <200912221912.51017.jon@ffconsultancy.com> (raw)
In-Reply-To: <87oclrrw8o.fsf@frosties.localdomain>

On Tuesday 22 December 2009 13:09:27 Goswin von Brederlow wrote:
> Jon Harrop <jon@ffconsultancy.com> writes:
> > 1. The array "a" is just an ordinary array of any type of values on the
> > shared heap in F# but, for generality in OCaml, this must be both the
> > underlying ordinary data and a manually-managed shared big array of
> > indices into the ordinary data where the indices get sorted while the
> > original data remain in place until they are permuted at the end.
>
> Unless you have a primitive type that isn't a pointer.

In OCaml, you would need to write a custom quicksort optimized for that 
particular type. In F#, the generic version just works and works efficiently.

> The advantage with ocaml though is that you never have pointers into a
> structure. Makes thinks a lot simpler for the GC and avoids large
> overheads in memory.

I don't understand what you mean by OCaml "never has pointers into a 
structure". Half the problem with OCaml is that OCaml almost always uses 
pointers and the programmer has no choice, e.g. for complex numbers.

> > 2. The sorted subarrays are contiguous in memory and, at some
> > subdivision, will fit into L2 cache. So F# offers optimal locality. In
> > contrast, there is no locality whatsoever in the OCaml code and most
> > accesses into the unsorted original array will incur cache misses right
> > up to main memory. So the OCaml approach does not scale as well and will
> > never see superlinear speedup because it cannot be cache friendly.
>
> On the other hand swapping two elements in the array has a constant
> cost no matter what size they have. At some size there will be a break
> even point where copying the data costs more than the cache misses and
> with increasing size the cache won't help F# so much either.

In theory, yes. In practice, that threshold is far larger than any value type 
that a real program would use so it is of no practical concern. Moreover, F# 
gives the programmer control over whether data are unboxed (value types) or 
boxed (reference types) anyway. In contrast, OCaml is tied to a few value 
types that are hard-coded into the GC.

> > 3. Child tasks are likely to be executed on the same core as their parent
> > and use a subset of their parent's data in F#, so they offer the best
> > possible locality. In contrast, child processes are likely to be executed
> > on another core in OCaml and offer the worst possible locality.
>
> But, if I understood you right, you first fork one process per
> core.

No, in OCaml I fork every child. That is the only transparent way to give the 
child a coherent view of the heap but it is extremely slow (~1ms):

  "F# can do 60MFLOPS of computation in the time it takes OCaml 
to fork a single process" -
http://caml.inria.fr/pub/ml-archives/caml-list/2009/06/542b8bed77022b4a4824de2da5b7f714.en.html

> Those should then also each pin themself to one core.

You have no idea which core a forked child process will run on.

> Each process then has a work queue which is works through. So they will 
> always use the local data. Only when a queue runs dry they steal from
> another process and ruin locality.

You are correctly describing the efficient solution based upon work-stealing 
queues that F# uses but OCaml cannot express it.

> So I don't see where your argument fits. You are not creating childs
> on the fly. Only at the start and they run till all the work is done.
> At least in this example.

No, every recursive invocation of the parallel quicksort spawns another child 
on-the-fly. That's precisely why it parallelizes so efficiently when you have 
wait-free work-stealing task deques and a shared heap. In general, you 
rewrite algorithms into this recursive divide and conquer form and 
parallelize when possible. You can parallelize a *lot* of problems 
efficiently that way.

> > 4. Task deques can handle an arbitrary number of tasks limited only by
> > memory whereas processes are a scarce resource and forking is likely to
> > fail, whereupon the "invoke" combinator will simply execute sequentially.
> > So it is much easier to write reliable and performant code in F# than
> > OCaml.
>
> Why would you fork in invoke?

Fork is currently the only transparent way to implement "invoke" but it is 
extremely slow and unreliable.

> > 5. OCaml's fork-based "invoke" combinator is many orders of magnitude
> > slower than pushing a closure onto a concurrent task deque in F#.
> >
> > 6. The "threshold" value is a magic number derived from measurements on a
> > given machine in my OCaml code but is dynamically adjusted in a
> > machine-independent way by the "invoke" combinator in my F# code using
> > atomic operations and real time profiling of the proportion of time spent
> > spawning tasks vs doing actual work.
>
> 5+6 seem to be an implementation detail of some specific
> implementation you are talking about.

Yes. I'm talking about today's OCaml and F# implementations.

> I don't see anything in the theory that would require that.

If by "in theory" you mean that a new performant concurrent GC for OCaml would 
solve these problems then yes. But I doubt OCaml is ever going to get one.

> > The same basic principles apply to many algorithms. Although it can
> > sometimes be tricky to figure out how best to use this technology to
> > parallelize a given algorithm (e.g. successive over relaxation), I have
> > found that a great many algorithms can be parallelized effectively using
> > this approach when you have a suitable foundation in place (like the
> > TPL). Moreover, the ability to use ordinary constructs in F# instead of
> > hacks like type-specific shared memory big arrays in OCaml makes it a lot
> > easier to parallelize programs. My parallel Burrows-Wheeler Transform
> > (BWT), for example, took 30 minutes to develop in F# and 2 days in OCaml.
>
> It might be true that ocaml lacks some primitives for multi-core
> operations. But I would say that is mostly because so far it hasn't
> supported multi-core.

Well, yes. :-)

> If F# has great support for this then that might be a good place to steal
> some ideas from. 

The infrastructure for this kind of shared memory parallel programming is 
really very simple. You just need a GC that can handle a shared heap (which 
HLVM already has) and work-stealing task deques. Then you can easily write 
parallel programs that leverage multicores and run a *lot* faster than 
anything that can be written in OCaml. You can even make this accessible to 
OCaml programmers as a DSL with automatic interop to make high performance 
parallel programming as easy as possible from OCaml.

> But so far have heart nothing that would make F# fundamentally more capable
> than ocaml could become.

In theory, OCaml could catch up with F#. In practice, the core of OCaml's 
implementation is so heavily optimized in another direction (and that will 
not change because it is OCaml's raison d'être) that it is worth starting 
from scratch. Modern libraries like LLVM make this comparatively painless and 
the improvements are vast. And besides, it is great fun! :-)

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


  reply	other threads:[~2009-12-22 17:59 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-19 19:38 OCaml is broken Jeff Shaw
2009-12-20  4:43 ` [Caml-list] " Jon Harrop
2009-12-20 12:21   ` [***SPAM*** Score/Req: 10.1/8.0] " Erik Rigtorp
2009-12-20 13:22     ` Martin Jambon
2009-12-20 13:47     ` Yaron Minsky
2009-12-20 16:01       ` Gerd Stolpmann
2009-12-21 22:50       ` [***SPAM*** Score/Req: 10.1/8.0] Re: [***SPAM*** Score/Req: 10.1/8.0] " Erik Rigtorp
2009-12-22 12:04         ` Erik Rigtorp
2009-12-22 12:27           ` Mihamina Rakotomandimby
2009-12-22 13:27           ` Gerd Stolpmann
2009-12-23 11:25             ` Erik Rigtorp
2009-12-29 12:07         ` [***SPAM*** Score/Req: 10.1/8.0] Re: [***SPAM*** Score/Req: 10.1/8.0] " Richard Jones
2009-12-20 14:27     ` Dario Teixeira
2009-12-20 21:14       ` Jon Harrop
2009-12-21  1:08         ` Gerd Stolpmann
2009-12-21  4:30           ` Jon Harrop
2009-12-21  3:58             ` Yaron Minsky
2009-12-21  5:32             ` Markus Mottl
2009-12-21 13:29               ` Jon Harrop
2009-12-26 17:08           ` orbitz
2009-12-20 19:38     ` [***SPAM*** Score/Req: 10.1/8.0] " Jon Harrop
2009-12-21 12:26       ` Mihamina Rakotomandimby
2009-12-21 14:19         ` general question, was " Keyan
2009-12-21 14:40           ` [Caml-list] " rixed
2009-12-21 14:42           ` Gerd Stolpmann
2009-12-21 15:25             ` Eray Ozkural
2009-12-21 14:50           ` Philip
2009-12-21 15:01             ` Keyan
2009-12-21 15:13               ` Stefano Zacchiroli
2009-12-21 15:27               ` Dario Teixeira
2009-12-21 15:46                 ` Jacques Carette
2009-12-21 18:50             ` Jon Harrop
2009-12-21 18:48           ` Jon Harrop
2010-01-03 10:49           ` Sylvain Le Gall
2010-01-03 20:06             ` [Caml-list] " Jon Harrop
2009-12-21 13:07     ` [***SPAM*** Score/Req: 10.1/8.0] Re: [Caml-list] " Damien Doligez
2009-12-21 13:31   ` multicore wish [Was: Re: [Caml-list] Re: OCaml is broken] Goswin von Brederlow
2009-12-21 14:19     ` multicore wish Mihamina Rakotomandimby
2009-12-21 16:15       ` [Caml-list] " Fischbacher T.
2009-12-21 17:42       ` Dario Teixeira
2009-12-21 18:43       ` Jon Harrop
2009-12-21 19:53     ` multicore wish [Was: Re: [Caml-list] Re: OCaml is broken] Jon Harrop
2009-12-22 13:09       ` multicore wish Goswin von Brederlow
2009-12-22 19:12         ` Jon Harrop [this message]
2009-12-22 18:02           ` [Caml-list] " Edgar Friendly
2009-12-22 19:20             ` Jon Harrop
2009-12-24 12:58               ` Goswin von Brederlow
2009-12-24 16:51                 ` Jon Harrop
2009-12-24 13:19           ` Goswin von Brederlow
2009-12-24 17:06             ` Jon Harrop
2009-12-27 12:45               ` Goswin von Brederlow
2009-12-27 16:37                 ` Jon Harrop
2009-12-28 12:28                 ` Gerd Stolpmann
2009-12-28 15:07                   ` Anil Madhavapeddy
2009-12-28 18:05                   ` Xavier Leroy
2009-12-29 16:44                     ` Gerd Stolpmann
2009-12-20 11:56 ` [***SPAM*** Score/Req: 10.1/8.0] [Caml-list] Re: OCaml is broken Erik Rigtorp

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200912221912.51017.jon@ffconsultancy.com \
    --to=jon@ffconsultancy.com \
    --cc=caml-list@yquem.inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).