From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by yquem.inria.fr (Postfix) with ESMTP id 71D2CBBAF for ; Thu, 24 Dec 2009 16:52:42 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AtECAGYdM0vUnwdji2dsb2JhbACCGZlAAQEBCgsKBxEGuHyEMwQ X-IronPort-AV: E=Sophos;i="4.47,449,1257116400"; d="scan'208";a="39288700" Received: from relay.pcl-ipout01.plus.net ([212.159.7.99]) by mail2-smtp-roc.national.inria.fr with ESMTP/TLS/RC4-SHA; 24 Dec 2009 16:52:42 +0100 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AmcFAGYdM0tUXebq/2dsb2JhbACCGdJ8hDME Received: from relay07.plus.net ([84.93.230.234]) by relay.pcl-ipout01.plus.net with ESMTP; 24 Dec 2009 15:52:41 +0000 Received: from [87.114.35.173] (helo=leper.local) by relay07.plus.net with esmtp (Exim) id 1NNpzV-0001ZX-Aj for caml-list@yquem.inria.fr; Thu, 24 Dec 2009 15:52:41 +0000 From: Jon Harrop Organization: Flying Frog Consultancy Ltd. To: caml-list@yquem.inria.fr Subject: Re: [Caml-list] Re: multicore wish Date: Thu, 24 Dec 2009 17:06:51 +0000 User-Agent: KMail/1.9.9 References: <4B2D2BC1.6020204@msu.edu> <200912221912.51017.jon@ffconsultancy.com> <87r5qk5x1j.fsf@frosties.localdomain> In-Reply-To: <87r5qk5x1j.fsf@frosties.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200912241706.51917.jon@ffconsultancy.com> X-Plusnet-Relay: 0b4c83ada647ccc16920dfea8f96f746 X-Spam: no; 0.00; ocaml:01 ocaml:01 locality:01 ocaml's:01 recursive:01 invocation:01 deques:01 recursive:01 interop:01 2009:98 dry:98 ruin:98 frog:98 invoke:01 invoke:01 On Thursday 24 December 2009 13:19:52 Goswin von Brederlow wrote: > Jon Harrop writes: > > No, in OCaml I fork every child. That is the only transparent way to give > > the child a coherent view of the heap but it is extremely slow (~1ms): > > So if you add a (sleep 60) to the ocaml code then ocaml gets even > more worse than F#? You are comparing an implementation that is > specifically optimized to use things F# is good at and ocaml is > bad. To give a fair comparison you need to optimize the implementation > to the language. Post a better OCaml solution if you can. > >> Each process then has a work queue which is works through. So they will > >> always use the local data. Only when a queue runs dry they steal from > >> another process and ruin locality. > > > > You are correctly describing the efficient solution based upon > > work-stealing queues that F# uses but OCaml cannot express it. > > You mean that you didn't implement that way. No, I mean OCaml cannot express it. > I can easily express that with closures and pre-forked worker threads. OCaml's threads do not run in parallel so that will not work. > >> So I don't see where your argument fits. You are not creating childs > >> on the fly. Only at the start and they run till all the work is done. > >> At least in this example. > > > > No, every recursive invocation of the parallel quicksort spawns another > > child on-the-fly. That's precisely why it parallelizes so efficiently > > when you have wait-free work-stealing task deques and a shared heap. In > > general, you rewrite algorithms into this recursive divide and conquer > > form and parallelize when possible. You can parallelize a *lot* of > > problems efficiently that way. > > That seems awfully ineficient. Then you have a million children > running on maybe 4 cores and each job queue holds no job. > > I think we mean different things by children. By children I mean what > the kernel sees as children. Newly forked threads. I think you mean > jobs that get put into the queues. There is no reason to fork a new > system thread every time the parallel quicksort splits an array in > two. The children are lightweight tasks, not threads or processes. > >> Why would you fork in invoke? > > > > Fork is currently the only transparent way to implement "invoke" but it > > is extremely slow and unreliable. > > No, it is the insane way. Use closures. Please demonstrate. > > parallel programs that leverage multicores and run a *lot* faster than > > anything that can be written in OCaml. You can even make this accessible > > to OCaml programmers as a DSL with automatic interop to make high > > performance parallel programming as easy as possible from OCaml. > > Or just implement it properly in ocaml. The problem part is the > GC. The rest is easy. No kidding. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e