Hi,

I can understand why you are all excited about parallelism in OCaml, and most of your questions/suggestions seem relevant.
However, for the technical questions about the choices of the implementation, it is way too early for us to answer them.

About the bugs in library bindings, I'm afraid you are spotting a really problematic problem. For example, if I remember well, gtk C functions have to be called within the same thread in C (which is a smart way for them to say that their code isn't reentrant), and this may raise a problem if we add real parallelism to OCaml. But you understand that rewriting lablgtk and/or programs using it is out of scope.

You also noticed that the task is not trivial, and are concerned about the feasibility. In fact, we most probably won't have the time to find the best solution.
So our proposal is to let this project be more "a first reusable step toward parallelism in OCaml" than "a parallel OCaml".
More practically, we propose the following subtasks:
  1. To strip down the current run-time library, rewriting some parts which are too much dependent on the current GC
  2. To clean the (small) parts of the compiler preventing us from changing the allocator (for example, OCaml inlines some allocations by directly emitting code which modifies the heap pointer).
  3. To define a clean and documented interface for adding new GCs, ideally adding a run-time switch to choose the GC.
  4. To to reinject the current GC, or a simpler sequential GC we already wrote for another work, using this interface to validate the approach.
  5. To design a first parallel GC, simple enough for us to be able to test and benchmark it before the end of the project and to implement it within our interface.

With such an approach, we believe that the projet has a much greater chance to survive and perhaps be in integrated upstream.
Also, with such a generic GC plugging interface, libraries will be able to provide specific GCs. For example including some tricks to be able to run gtk-like non reentrant C calls, or dedicated to tasks which are currently problematic due to the current allocation mecanism, like MLGMP.

We'll probably open a blog or wiki to inform you about the progression, and collect suggestions and concerns.

Regards, and many thanks for your interest.
  Benjamin Canou.

Le samedi 19 avril 2008 à 10:46 +0200, Berke Durak a écrit :
The concurrent GC is a great idea.  A few interrogations.

- How "stoppy" would a stop-the-world parallel GC be in practice?  The more parallelism
you have, the more work is done, the higher the frequency of a major collection.

- Would major allocations be serialized?  What about other serialization points?

- I'm afraid true concurrency will introduce an awful lot of bugs in native bindings.  Thread-unsafe libraries will have to be replaced (Str, etc.)  Also what would be the CPU
and memory costs?  Don't concurrent GCs require extra colors?

- In case of performance impacts, will the old single-threaded mode still be available?

The argument that "you'll get the same old perfomance if you run it in single-threaded mode"
is not valid IMHO.  Many people will use a thread here or there and then you won't realistically be able to run in single-threaded mode.

But then we can't pretend multi-core doesn't exist.  A suggestion: making the parallel GC available only on 64-bit seems a reasonable restriction (if that's ever needed.)

Also Damien Doligez (in addition to Xavier Leroy) certainly have nice things to say about all this.
--
Berke Durak

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs