There seem to be solutions in theory. I think a colleague had pointed out one of the papers below, so there is indeed something like a "lock-free garbage collector". Then, why do we worry about much synchronization overhead? I don't quite understand.
Edgar Friendly wrote:Absolutely - and the problem in OCaml seems to be that shared memory parallelism is just branded as evil and ignored...
> It looks like high-performance computing of the near future will be built
> out of many machines (message passing), each with many cores (SMP). One
> could use message passing for all communication in such a system, but a
> hybrid approach might be best for this architecture, with use of shared
> memory within each box and message passing between. Of course the best
> choice depends strongly on the particular task.
I think the central thing that we can be utterly sure about is that desktops will always have *> 1* general purpose CPU. Maybe not be an ever-increasing number of cores but definitely more than one.
> In the long run, it'll likely be a combination of a few large, powerful
> cores (Intel-CPU style w/ the capability to run a single thread as fast as
> possible) with many many smaller compute engines (GPGPUs or the like,
> optimized for power and area, closely coupled with memory) that provides
> the highest performance density.
It has often seemed to me when SMP has been discussed in the past on this list that it almost gets dismissed out of hand because it doesn't look future-proof or because we're worried about what's round the corner in technology terms.
> The question of how to program such an architecture seems as if it's being
> answered without the functional community's input. What can we contribute?
To me the principal question is not about whether a parallel/thread-safe GC will scale to 12, 16 or even the 2048 cores on something like http://www.hpc.cam.ac.uk/services/darwin.html but whether it will hurt a single-threaded application - i.e. whether you will still be able to implement message passing libraries and other scalable techniques without the parallel GC getting in the way of what you're doing. A parallel/thread-safe GC should be aiming to provide the same sort of contract as the present one - it "just works" for most things and in a few borderline cases (like HPC - yes, it's a borderline case) you'll need to tune your code or tweak GC parameters because it's causing some problems or because in your particular application squeezing every cycle out of the CPU is important. As long as the GC isn't (hugely) slower than the present one in OCaml then we can continue to use libraries, frameworks and technologies-still-to-come on top of a parallel/thread-safe GC which simply ignores shared memory thread-level parallelism just by not instantiating threads.
The argument always seems to focus on utterly maxing out all possible available resources (CPU time, memory bandwidth, etc.) rather than on whether it's simply faster than what we're doing able to do at the moment on the same system. Of course, it may be that the only way to do that is to have different garbage collectors - one invoked when threads.cmxa is linked and then the normal one otherwise (that's so easy to type out as a sentence, summarising a vast amount of potential work!!)
Multithreading in OCaml seems to be focused on jumping the entire width of the river of concurrency in one go, rather than coming up with stepping stones to cross it in bits...
David
_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs