From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by yquem.inria.fr (Postfix) with ESMTP id 37B31BBAF for ; Thu, 24 Sep 2009 18:31:31 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ag4CABw8u0qE4zwCmWdsb2JhbACafwEBAQEBCAsKBxO8QoQbBQ X-IronPort-AV: E=Sophos;i="4.44,446,1249250400"; d="scan'208";a="34925056" Received: from isis.lip6.fr ([132.227.60.2]) by mail3-smtp-sop.national.inria.fr with ESMTP; 24 Sep 2009 18:31:05 +0200 Received: from tibre.lip6.fr (tibre.lip6.fr [132.227.74.2]) by isis.lip6.fr (8.14.3/lip6) with ESMTP id n8OGUvU0013882 ; Thu, 24 Sep 2009 18:30:57 +0200 (CEST) X-pt: isis.lip6.fr Received: from [127.0.0.1] (micmac [132.227.83.124]) by tibre.lip6.fr (8.14.3/8.13.3) with ESMTP id n8OGUvtr005476; Thu, 24 Sep 2009 18:30:57 +0200 (MEST) Subject: Re: [Caml-list] OC4MC : OCaml for Multicore architectures Mime-Version: 1.0 (Apple Message framework v1076) Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes From: Philippe Wang In-Reply-To: <5D7DF789-5EE6-4432-92A2-673F2D02FCD7@cea.fr> Date: Thu, 24 Sep 2009 18:30:56 +0200 Cc: caml-list@yquem.inria.fr Content-Transfer-Encoding: 7bit Message-Id: <95394D43-BA68-4BCE-9151-55EC22C42742@lip6.fr> References: <20090924154716.BCD0ABC5A@yquem.inria.fr> <5D7DF789-5EE6-4432-92A2-673F2D02FCD7@cea.fr> To: Pascal Cuoq X-Mailer: Apple Mail (2.1076) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (isis.lip6.fr [132.227.60.2]); Thu, 24 Sep 2009 18:30:57 +0200 (CEST) X-Scanned-By: MIMEDefang 2.63 on 132.227.60.2 X-Spam: no; 0.00; ocaml:01 cuoq:01 subset:01 pointers:01 mutable:01 threads:01 threads:01 wrote:01 wrote:01 stack:01 heap:01 heap:01 caml-list:01 gcs:01 algorithm:01 On Sep 24, 2009, at 18:02 GMT+02:00, Pascal Cuoq wrote: > On Sep 24, 2009, at 5:47 PM, Philippe Wang wrote: > >>> Is the copy operation parallelized? >> >> Nope. When the world is stopped for the collection, everything is >> done >> sequentially until the world is resumed. >> I don't think it's relevant to parallelize the copy operation (hell >> to >> implement&debug, then I don't think that performance would be very >> interesting because we would probably need a write mutex on the >> destination heap) > > Well, you could start copying to the bottom of the next heap with > one thread going up and to the top of it with another going down. > Assume optimistically that the two threads will not reach the same > cacheline at the end of the copies, and you don't need any > synchronisation at all between them, except joining at the end. > > After checking, if they have reached the same cacheline, > you need to reallocate the destination heap anyway. > > You still get a single unfragmented free block as a result. > > Even better: stop the world just before there remains less that one > cacheline of free space and you don't need to check if the two > threads have > met. You still need to reallocate the destination heap sometimes > though. A concurrent copy means that there would be bad overhead for single core. It also means putting bottleneck to memory bandwidth as memory copy operations are clearly quickly limited by this bandwidth, not by CPU. It may hopefully become false in a few years, but hardware manufacturers don't seem to be excited by that, they seem to prefer making the marketing on the number of cores. Look at GPUs : they have very fast graphical RAM, but they have a huge number of processing units. I don't really see the point in that (i.e. having a huge number of PU) anyway (except "marketing"). Ok, back to GC stuff. A stop© algorithm needs to have a set of roots to make the copy of living values. Each thread has its stack, so it has its subset of roots. Then what ? Parallelize the copy from each thread ? Ok we have to determine the best number of threads according to number of cores but more importantly according to memory bandwidth given per core. (what a nightmare!) Then there are shared values (in the shared heap for instance, but what if there are lateral pointers due to mutable values?). (We are leaving the nightmare for hell! but some people have been there.) Copying a living value means that if later you encounter something pointing to its old address, you have to know the new one. This means writing at the old address. I don't see how we can make *today* something very interesting in concurrent with a stop© algorithm. I believe (but I'm *not* a GC expert at all) concurrent GCs are not based on stop© algorithm but rather some mark&{do-some-stuff-such- as-"sweep"}. > Oh, and I meant to say, but everyone else was faster than me: > well done! Thank you, and thanks everyone else who appreciate this work. :-) Philippe Wang