There seem to be solutions in theory. I think a colleague had pointed out one of the papers below, so there is indeed something like a "lock-free garbage collector". Then, why do we worry about much synchronization overhead? I don't quite understand.

Maurice Herlihy, J. Eliot B. Moss: Lock-Free Garbage Collection for Multiprocessors. IEEE Trans. Parallel Distrib. Syst. 3(3): 304-311 (1992)
http://doi.ieeecomputersociety.org/10.1109/71.139204

Hui Gao, Jan Friso GrooteWim H. Hesselink: Lock-free parallel and concurrent garbage collection by mark&sweep. Sci. Comput. Program. 64(3): 341-374 (2007) 
http://portal.acm.org/citation.cfm?id=1223239

Java's new garbage collector is lock-free? At any rate, we really needn't fall behind a mega-lame language like Java :) The first paper is from 1992, enough time for the knowledge to diffuse. The second 2007 paper is probably what Jon was referring to earlier.

In my mind, you can use one of these, and use special pool allocation algorithms for small objects, and also use static lifetime analysis to bypass the garbage collection in many cases. Since there are many runtime designers here, I wonder, is there a language runtime that does all three of these?

Cheers,

Eray
 

On Wed, Nov 17, 2010 at 6:34 PM, David Allsopp <dra-news@metastack.com> wrote:
Edgar Friendly wrote:
> It looks like high-performance computing of the near future will be built
> out of many machines (message passing), each with many cores (SMP).  One
> could use message passing for all communication in such a system, but a
> hybrid approach might be best for this architecture, with use of shared
> memory within each box and message passing between.  Of course the best
> choice depends strongly on the particular task.

Absolutely - and the problem in OCaml seems to be that shared memory parallelism is just branded as evil and ignored...

> In the long run, it'll likely be a combination of a few large, powerful
> cores (Intel-CPU style w/ the capability to run a single thread as fast as
> possible) with many many smaller compute engines (GPGPUs or the like,
> optimized for power and area, closely coupled with memory) that provides
> the highest performance density.

I think the central thing that we can be utterly sure about is that desktops will always have *> 1* general purpose CPU. Maybe not be an ever-increasing number of cores but definitely more than one.

> The question of how to program such an architecture seems as if it's being
> answered without the functional community's input. What can we contribute?

It has often seemed to me when SMP has been discussed in the past on this list that it almost gets dismissed out of hand because it doesn't look future-proof or because we're worried about what's round the corner in technology terms.

To me the principal question is not about whether a parallel/thread-safe GC will scale to 12, 16 or even the 2048 cores on something like http://www.hpc.cam.ac.uk/services/darwin.html but whether it will hurt a single-threaded application - i.e. whether you will still be able to implement message passing libraries and other scalable techniques without the parallel GC getting in the way of what you're doing. A parallel/thread-safe GC should be aiming to provide the same sort of contract as the present one - it "just works" for most things and in a few borderline cases (like HPC - yes, it's a borderline case) you'll need to tune your code or tweak GC parameters because it's causing some problems or because in your particular application squeezing every cycle out of the CPU is important. As long as the GC isn't (hugely) slower than the present one in OCaml then we can continue to use libraries, frameworks and technologies-still-to-come on top of a parallel/thread-safe GC which simply ignores shared memory thread-level parallelism just by not instantiating threads.

The argument always seems to focus on utterly maxing out all possible available resources (CPU time, memory bandwidth, etc.) rather than on whether it's simply faster than what we're doing able to do at the moment on the same system. Of course, it may be that the only way to do that is to have different garbage collectors - one invoked when threads.cmxa is linked and then the normal one otherwise (that's so easy to type out as a sentence, summarising a vast amount of potential work!!)

Multithreading in OCaml seems to be focused on jumping the entire width of the river of concurrency in one go, rather than coming up with stepping stones to cross it in bits...


David

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs



--
Eray Ozkural, PhD candidate.  Comp. Sci. Dept., Bilkent University, Ankara
http://groups.yahoo.com/group/ai-philosophy
http://myspace.com/arizanesil http://myspace.com/malfunct