caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* "ok with parallel threads" GC (aka ocaml for multicore)
@ 2009-04-10 17:04 Philippe Wang
  2009-04-10 20:52 ` [Caml-list] " Jon Harrop
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Philippe Wang @ 2009-04-10 17:04 UTC (permalink / raw)
  To: caml-list; +Cc: Philippe Wang

Hello list,

Mathias and Adrien have just started their internship (for their  
Master's degree requirements).
Thus they have some time to spend on this project. Moreover, Mathias'  
internship is strongly related to this project.
=> man power dramatically increased

We are currently searching for the last remaining bugs.

Our thread library is restricted, it contains:
Thread : create, join, yield, id, self, delay
Mutex : full module
Condition : full module


Our alternative garbage collector
  - uses a Stop(the world)&Copy algorithm
  - has memory pages for threads (each thread takes a page at its  
creation)
  - has a shared heap for shared values and for old generation from  
pages (i.e. memory pages are flushed to this heap)
  - should be not to hard to replace.
Blocking sections such as I/O operations or mutex locks do not prevent  
garbage collection.

We currently do *not* support POSIX signals (let's say their behaviour  
is not specified).

We should make a release soon, but before:
  - some code has to be cleaned
  - some benchmarks have to be done
  - some documentation has to be completed
  - an installation script still has to be written.
Thus not a lot is left to do before the release :-)

We are writing test programs to search for the last remaining bugs but  
also to measure performances.

So far, as long as there are not too many concurrent memory accesses,  
it is not too hard to go n times faster with a n-core CPU;
though intense memory accesses generate page faults and divide memory  
bandwidth by the number of concurrent accesses,
and intense memory consuming programs show our GC is not as performant  
as INRIA's, of course.

Cheers,

--
Philippe Wang
   Philippe.Wang \at/ lip6.fr


PS: Sorry for taking so much time, debugging parallel threads in  
shared memory style is hell (you can give it a try).


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] "ok with parallel threads" GC (aka ocaml for multicore)
  2009-04-10 17:04 "ok with parallel threads" GC (aka ocaml for multicore) Philippe Wang
@ 2009-04-10 20:52 ` Jon Harrop
  2009-04-10 23:20 ` Jerome Benoit
       [not found] ` <BB3C091F-3BE9-4883-A7CF-9D672CDDF829@x9c.fr>
  2 siblings, 0 replies; 11+ messages in thread
From: Jon Harrop @ 2009-04-10 20:52 UTC (permalink / raw)
  To: caml-list

On Friday 10 April 2009 18:04:12 Philippe Wang wrote:
> We should make a release soon, but before:
>   - some code has to be cleaned
>   - some benchmarks have to be done
>   - some documentation has to be completed
>   - an installation script still has to be written.
> Thus not a lot is left to do before the release :-)

This is the best news I've heard since OCaml 3.11 made it into testing and 
Batteries went into beta!

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] "ok with parallel threads" GC (aka ocaml for multicore)
  2009-04-10 17:04 "ok with parallel threads" GC (aka ocaml for multicore) Philippe Wang
  2009-04-10 20:52 ` [Caml-list] " Jon Harrop
@ 2009-04-10 23:20 ` Jerome Benoit
       [not found] ` <BB3C091F-3BE9-4883-A7CF-9D672CDDF829@x9c.fr>
  2 siblings, 0 replies; 11+ messages in thread
From: Jerome Benoit @ 2009-04-10 23:20 UTC (permalink / raw)
  To: Philippe Wang; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 566 bytes --]

Le Fri, 10 Apr 2009 19:04:12 +0200,
Philippe Wang <Philippe.Wang@lip6.fr> a écrit :

Hello,

> We should make a release soon, but before:
>   - some code has to be cleaned
>   - some benchmarks have to be done
>   - some documentation has to be completed
>   - an installation script still has to be written.

Is there a public SCM repository I can pull off the source ?

Thanks. 

-- 
Jérôme Benoit aka fraggle
La Météo du Net - http://grenouille.com
OpenPGP Key ID : 9FE9161D
Key fingerprint : 9CA4 0249 AF57 A35B 34B3 AC15 FAA0 CB50 9FE9 161D

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] "ok with parallel threads" GC (aka ocaml for multicore)
       [not found] ` <BB3C091F-3BE9-4883-A7CF-9D672CDDF829@x9c.fr>
@ 2009-04-14 10:21   ` Philippe Wang
  2009-04-14 14:18     ` xclerc
  2009-04-16  9:45     ` Philippe Wang
  0 siblings, 2 replies; 11+ messages in thread
From: Philippe Wang @ 2009-04-14 10:21 UTC (permalink / raw)
  To: forum; +Cc: Philippe Wang, caml-list

On Apr 10, 2009, at 20:36 CEDT, forum@x9c.fr wrote:

> Would it be correct to assume that the current state of the project
> implies that you have patched the OCaml runtime to make it
> fully reentrant ?

Is this following partial answer relevant enough ?
The garbage collector is clearly separated from the rest of the  
runtime library: the GC is contained in a libgc.a and our patched  
ocamlopt links programs to this GC. The GC variables are known by the  
GC only.

> If so, is this code/patch available for download ?

Officially, not yet (and not before April 20th).

We did not expect the debugging part to be so complex and hard, and  
taking so long.
The man power dramatically decreased in late September : the 2  
Master's students went back to Master's courses, and the 3 supervisors  
had to do research in parallel with teaching.
Some major bug fixes were made in February/March, a lot of major bug  
fixes were made in April (yes, these last 2 weeks).
You know, bugs hiding other bugs... however we are hopefully getting  
close to the fix point: today there is no known bug ! :-)
Unsupported features are - of course - not considered as bugs. For  
instance, posix signals are (currently) not supported. And, as  
parallel computing *potentially* requires quite a lot more memory,  
some programs can easily end up in a blocked state when the heap  
becomes full: our GC (currently) uses (parameterized) fixed size pages  
and heap.

The next days, we will concentrate on making benchmarks (if you have  
some relevant testing programs, they are welcome), and if we don't  
discover new bugs then we will focus on (finishing) writing a  
documentation and a building script, for the release.
If we release as such now, we will have too much support to do because  
of the lack of documentation. So it's not quite a good idea...
When we have the minimal-but-sufficient documentation, we will make  
the release :-)

In parallel, we try to make it work with OS X Leopard 64 bit and/or  
ocaml 3.11 (currently we only support 3.10.2 - Linux x86_64).

> Anyway, wholehearted  respect for undertaking such a complex project.
> Good luck in your bug-chasing tasks !

Thanks.

--
Philippe Wang
    Philippe.Wang \at/ lip6.fr

N.B. I hope we will not discover new bugs in our amd64.S, our assembly  
guru is enjoying (abroad with no www) his "vacances de pâques"...

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] "ok with parallel threads" GC (aka ocaml for multicore)
  2009-04-14 10:21   ` Philippe Wang
@ 2009-04-14 14:18     ` xclerc
  2009-04-16  9:45     ` Philippe Wang
  1 sibling, 0 replies; 11+ messages in thread
From: xclerc @ 2009-04-14 14:18 UTC (permalink / raw)
  To: caml-list


Le 14 avr. 09 à 12:21, Philippe Wang a écrit :

> On Apr 10, 2009, at 20:36 CEDT, forum@x9c.fr wrote:
>
>> Would it be correct to assume that the current state of the project
>> implies that you have patched the OCaml runtime to make it
>> fully reentrant ?
>
> Is this following partial answer relevant enough ?
> The garbage collector is clearly separated from the rest of the  
> runtime library: the GC is contained in a libgc.a and our patched  
> ocamlopt links programs to this GC. The GC variables are known by  
> the GC only.

Well, this is not what I had in mind, but I realize that my question
was not clear. A better question would have been:
	Is your implementation still based on a global runtime lock ?

A negative answer would imply that you patched the OCaml
runtime to make it reentrant. To illustrate my point, I will take
the example of the file "byterun/compare.c". In this file, the
code for the comparison of values makes use of a global
variable (named "caml_compare_unordered").

So, allowing multiple threads to execute primitives from the
runtime at the same time may result in incorrect outcomes.
Unless you patch the runtime to make it reentrant, hence my
original (foolishly indirect) question.

If you patched the runtime to allow multiple threads to use
it concurrently, I would also be interested by the strategy
used: is the problem solved by additional parameters ?
by the use of thread-local storage ? by any other mean ?


Xavier Clerc


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] "ok with parallel threads" GC (aka ocaml for multicore)
  2009-04-14 10:21   ` Philippe Wang
  2009-04-14 14:18     ` xclerc
@ 2009-04-16  9:45     ` Philippe Wang
  2009-04-17 22:15       ` Philippe Wang
  1 sibling, 1 reply; 11+ messages in thread
From: Philippe Wang @ 2009-04-16  9:45 UTC (permalink / raw)
  To: caml-list

> Le 14 avr. 09 à 12:21, Philippe Wang a écrit :
>
>>     The garbage collector is clearly separated from the rest of the  
>> runtime library: the GC is contained in a libgc.a and our patched  
>> ocamlopt links programs to this GC. The GC variables are known by  
>> the GC only.
>
> Well, this is not what I had in mind, but I realize that my question
> was not clear. A better question would have been:
>        Is your implementation still based on a global runtime lock ?

No, it isn't. (And it would probably too often prevent parallel  
threads, wouldn't it.)

> A negative answer would imply that you patched the OCaml
> runtime to make it reentrant. To illustrate my point, I will take
> the example of the file "byterun/compare.c". In this file, the
> code for the comparison of values makes use of a global
> variable (named "caml_compare_unordered").

It should be, unless bugs are still hiding under that.
It is supposed to have been done for a while. We'll make one more check.

> If you patched the runtime to allow multiple threads to use
> it concurrently, I would also be interested by the strategy
> used: is the problem solved by additional parameters ?
> by the use of thread-local storage ? by any other mean ?

- local variable instead of global variable in functions
- some functions are parameterized by thread identifier (that is, one  
more parameter than "before") (e.g. in amd64.S)

Philippe Wang

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] "ok with parallel threads" GC (aka ocaml for multicore)
  2009-04-16  9:45     ` Philippe Wang
@ 2009-04-17 22:15       ` Philippe Wang
  2009-04-17 22:20         ` Joel Reymont
                           ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Philippe Wang @ 2009-04-17 22:15 UTC (permalink / raw)
  To: caml-list; +Cc: Philippe Wang


On Apr 16, 2009, at 11:45 CEDT, Philippe Wang wrote:

>> A negative answer would imply that you patched the OCaml
>> runtime to make it reentrant. To illustrate my point, I will take
>> the example of the file "byterun/compare.c". In this file, the
>> code for the comparison of values makes use of a global
>> variable (named "caml_compare_unordered").
>
> It should be, unless bugs are still hiding under that.
> It is supposed to have been done for a while. We'll make one more  
> check.
>
>> If you patched the runtime to allow multiple threads to use
>> it concurrently, I would also be interested by the strategy
>> used: is the problem solved by additional parameters ?
>> by the use of thread-local storage ? by any other mean ?
>
> - local variable instead of global variable in functions
> - some functions are parameterized by thread identifier (that is,  
> one more parameter than "before") (e.g. in amd64.S)

Well, we went back into runtime code implementation.

This is what can be said rapidly :
- compare.c contains no global variables anymore, we use local  
variables instead
- if a Caml-C or C-Caml interface uses caml_compare_unordered, we  
don't know what can happen with parallel threads
- we have many global mutex locks with small scopes
- we do use an "enter/leave blocking section" mechanism to prevent the  
GC from waiting on a blocking operation such as I/O or mutex locking  
etc.
- we don't support weak values (not sure whether they don't work or  
they became strong, if they dont work anymore, they can be back in 2  
minutes as strong values anyway)
- serialisation of values is a little bit tricky, though it should work
- most important : many global variables do not exist anymore because  
they are irrelevant in our implementation
- we do not support unofficial-features of ocaml 3.10, e.g. the new  
features that come with 3.11 but actually have their roots in previous  
versions
~ it is almost sad to see all the based-on-"one-thread-at-a-time"  
optimisations removed...
+ (it looks like it works just fine)


I hope there are no "hidden" bad global variables.

Is it fully reentrant ? Hmmmm... maybe.

We use a stop-the-world GC (which means no one is running is parallel  
with the GC), that is actually like original ocaml, that comes with  
its inconveniences : C calls not declared as blocking sections (which  
has quite a cost) may prevent other threads from running when the heap  
is full.

Graphics module, for instance, is not reentrant at all (anyhow it's  
not part of the runtime). Same for Str.
Funny thing is we can open several windows by launching parallel  
threads (though only one is useful at the end).


Anyway, thank you for your questions and interest, they have helped us  
find&fix some bugs.


--
Philippe Wang
   http://www-apr.lip6.fr/~pwang/


PS.  We tried to switch to 3.11, but it seems to need too much time,  
it's far from being a piece of cake.
We have tried to make it work on Leopard (actually, I failed the 1st  
time - half the way, I may try again if I have time).

=====================================
A free very personal advice that may save you some headaches:
do not program in concurrent shared memory style, especially when you  
can replace "concurrent" by "parallel".
Even if it may have a future, even if it may sound great, even if it  
sounds exciting, even if it helps you go faster, even if <put-here- 
whatever-you-want>,
it is **awful**. Well, if you really really don't have a choice, never  
mind what I said.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] "ok with parallel threads" GC (aka ocaml for multicore)
  2009-04-17 22:15       ` Philippe Wang
@ 2009-04-17 22:20         ` Joel Reymont
  2009-04-17 22:29           ` Philippe Wang
  2009-04-17 23:05         ` forum
  2009-04-23 16:49         ` Philippe Wang
  2 siblings, 1 reply; 11+ messages in thread
From: Joel Reymont @ 2009-04-17 22:20 UTC (permalink / raw)
  To: Philippe Wang; +Cc: caml-list


On Apr 17, 2009, at 11:15 PM, Philippe Wang wrote:

> PS.  We tried to switch to 3.11, but it seems to need too much time,  
> it's far from being a piece of cake.
> We have tried to make it work on Leopard (actually, I failed the 1st  
> time - half the way, I may try again if I have time).


What was the problem with Leopard?

---
Mac hacker with a performance bent
http://linkedin.com/in/joelreymont




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] "ok with parallel threads" GC (aka ocaml for multicore)
  2009-04-17 22:20         ` Joel Reymont
@ 2009-04-17 22:29           ` Philippe Wang
  0 siblings, 0 replies; 11+ messages in thread
From: Philippe Wang @ 2009-04-17 22:29 UTC (permalink / raw)
  To: Joel Reymont; +Cc: Philippe Wang, caml-list


On Apr 18, 2009, at 00:20 CEDT, Joel Reymont wrote:

>
> On Apr 17, 2009, at 11:15 PM, Philippe Wang wrote:
>
>> PS.  We tried to switch to 3.11, but it seems to need too much  
>> time, it's far from being a piece of cake.
>> We have tried to make it work on Leopard (actually, I failed the  
>> 1st time - half the way, I may try again if I have time).
>
>
> What was the problem with Leopard?

We have our own amd64.S which is based on but quite different from  
ocaml 3.10 amd64.S.
ocaml 3.11 has its amd64.S changed for Leopard.
*And* we have made some small changes to emit.ml.
Then, building the whole thing is not trivial.

Building ocaml 3.10 with our modified runtime is not trivial with  
Linux x86_64 : we use an intern script that does it. Without it, it  
would need hours to build it.
Adapting the script for Leopard is not trivial to do. But we think  
it's quite possible.

Besides, it was late at night and I was sleepy and tired when I tried.

Another answer is Leopard needs the changes made to amd64.S from 3.10  
to 3.11, which mainly fix an alignment problem.


--
Philippe Wang
   Philippe.Wang@lip6.fr
   http://www-apr.lip6.fr/~pwang/


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] "ok with parallel threads" GC (aka ocaml for multicore)
  2009-04-17 22:15       ` Philippe Wang
  2009-04-17 22:20         ` Joel Reymont
@ 2009-04-17 23:05         ` forum
  2009-04-23 16:49         ` Philippe Wang
  2 siblings, 0 replies; 11+ messages in thread
From: forum @ 2009-04-17 23:05 UTC (permalink / raw)
  To: caml users


Le 18 avr. 09 à 00:15, Philippe Wang a écrit :

>
> On Apr 16, 2009, at 11:45 CEDT, Philippe Wang wrote:
>
>>> A negative answer would imply that you patched the OCaml
>>> runtime to make it reentrant. To illustrate my point, I will take
>>> the example of the file "byterun/compare.c". In this file, the
>>> code for the comparison of values makes use of a global
>>> variable (named "caml_compare_unordered").
>>
>> It should be, unless bugs are still hiding under that.
>> It is supposed to have been done for a while. We'll make one more  
>> check.
>>
>>> If you patched the runtime to allow multiple threads to use
>>> it concurrently, I would also be interested by the strategy
>>> used: is the problem solved by additional parameters ?
>>> by the use of thread-local storage ? by any other mean ?
>>
>> - local variable instead of global variable in functions
>> - some functions are parameterized by thread identifier (that is,  
>> one more parameter than "before") (e.g. in amd64.S)
>
> Well, we went back into runtime code implementation.
>
> This is what can be said rapidly :
> - compare.c contains no global variables anymore, we use local  
> variables instead
> - if a Caml-C or C-Caml interface uses caml_compare_unordered, we  
> don't know what can happen with parallel threads
> - we have many global mutex locks with small scopes
> - we do use an "enter/leave blocking section" mechanism to prevent  
> the GC from waiting on a blocking operation such as I/O or mutex  
> locking etc.
> - we don't support weak values (not sure whether they don't work or  
> they became strong, if they dont work anymore, they can be back in 2  
> minutes as strong values anyway)
> - serialisation of values is a little bit tricky, though it should  
> work
> - most important : many global variables do not exist anymore  
> because they are irrelevant in our implementation
> - we do not support unofficial-features of ocaml 3.10, e.g. the new  
> features that come with 3.11 but actually have their roots in  
> previous versions
> ~ it is almost sad to see all the based-on-"one-thread-at-a-time"  
> optimisations removed...
> + (it looks like it works just fine)
>
>
> I hope there are no "hidden" bad global variables.
>
> Is it fully reentrant ? Hmmmm... maybe.
>
> We use a stop-the-world GC (which means no one is running is  
> parallel with the GC), that is actually like original ocaml, that  
> comes with its inconveniences : C calls not declared as blocking  
> sections (which has quite a cost) may prevent other threads from  
> running when the heap is full.
>
> Graphics module, for instance, is not reentrant at all (anyhow it's  
> not part of the runtime). Same for Str.


I was indeed mostly worried with the runtime itself. I wanted to have  
a fully reentrant runtime for the OCaml-Java
project (to be able to execute several programs in the very same JVM)  
and remember that it implied primitives
from "compare.c", "hash.c", "intern.c" and "extern.c" among others to  
be written differently for this purpose.

Out of curiosity: you state that your GC is of "stop-the-world"  
nature, what about finalizers ?
Are they executed by the GC thread when the world is stopped or  
concurrently with
application threads ?
Not sure this question really matters, just curious (I mean, it is  
doubtful that one would write finalizers with
a long execution time).


Keep up the good work !

Xavier Clerc

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Caml-list] "ok with parallel threads" GC (aka ocaml for multicore)
  2009-04-17 22:15       ` Philippe Wang
  2009-04-17 22:20         ` Joel Reymont
  2009-04-17 23:05         ` forum
@ 2009-04-23 16:49         ` Philippe Wang
  2 siblings, 0 replies; 11+ messages in thread
From: Philippe Wang @ 2009-04-23 16:49 UTC (permalink / raw)
  To: caml-list; +Cc: Philippe Wang

> On Sat, Apr 18, 2009 at 1:05 AM, forum@x9c.fr <forum@x9c.fr> wrote:
>     I was indeed mostly worried with the runtime itself. I wanted to  
> have a fully reentrant runtime for the OCaml-Java
>     project (to be able to execute several programs in the very same  
> JVM) and remember that it implied primitives
>     from "compare.c", "hash.c", "intern.c" and "extern.c" among  
> others to be written differently for this purpose.

Indeed. We have made them reentrant but we haven't made much stress  
testing on that.
Reentrance on those are not free (they have a cost), and the way we  
chose is the "simplest or quickest way".

>     Out of curiosity: you state that your GC is of "stop-the-world"  
> nature, what about finalizers ?
>     Are they executed by the GC thread when the world is stopped or  
> concurrently with
>     application threads ?
>     Not sure this question really matters, just curious (I mean, it  
> is doubtful that one would write finalizers with
>     a long execution time).

Finalizers are supposed to be called by the thread that does the  
garbage collection, so there is no concurrency with finalizers as the  
rest of the world is meant to be stopped when garbage collecting.
(Our garbage collector does not try to be as smart as the original one  
on many many things)

By the way, we are late on writing the documentation for our future  
release...
but we have just implemented a (simple) experimental growing heap.

Here is a quote from wikipedia (http://en.wikipedia.org/wiki/Speedup):
<<
Sometimes a speedup of more than N when using N processors is observed  
in parallel computing, which is called super linear speedup. Super  
linear speedup rarely happens and often confuses beginners, who  
believe the theoretical maximum speedup should be N when N processors  
are used.
 >>
Well, we *have* observed that on a matrix multiplication :-)

--
Philippe Wang
  Philippe.Wang@lip6.fr
  http://www-apr.lip6.fr/~pwang/


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2009-04-23 16:49 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-10 17:04 "ok with parallel threads" GC (aka ocaml for multicore) Philippe Wang
2009-04-10 20:52 ` [Caml-list] " Jon Harrop
2009-04-10 23:20 ` Jerome Benoit
     [not found] ` <BB3C091F-3BE9-4883-A7CF-9D672CDDF829@x9c.fr>
2009-04-14 10:21   ` Philippe Wang
2009-04-14 14:18     ` xclerc
2009-04-16  9:45     ` Philippe Wang
2009-04-17 22:15       ` Philippe Wang
2009-04-17 22:20         ` Joel Reymont
2009-04-17 22:29           ` Philippe Wang
2009-04-17 23:05         ` forum
2009-04-23 16:49         ` Philippe Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).