Re: Threading and SharedMem (Re: [Caml-list] Re: Is OCaml fast?)

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

* Re: Threading and SharedMem (Re: [Caml-list] Re: Is OCaml fast?)
       [not found]         ` <fa.+OkqNL3AB4+5LA8wOnQD9WS59QQ@ifi.uio.no>
@ 2010-11-30 14:04           ` Stephan Houben
  2010-11-30 14:22             ` Gerd Stolpmann
  2010-11-30 14:29             ` oliver
  0 siblings, 2 replies; 16+ messages in thread
From: Stephan Houben @ 2010-11-30 14:04 UTC (permalink / raw)
  To: oliver; +Cc: caml-list

On 11/30/2010 12:55 PM, oliver@first.in-berlin.de wrote:
> There is one problem with this... when you have forked, then
> you obviously have separated processes and also in each process
> your own ocaml-program with it's own GC running...

...neatly sidestepping the problem that the GC needs to lock out all threads...

> .with such a mem-mapping trick (never used Bigarray, so I'm astouned it uses
> mmap) you then have independent processes, working on shared mem without
> synchronisation.

> This is a good possibility to get corrupted data, and therefore unreliable behaviour.

Well, not more possibility than inherently in any code that updates a shared data structure
in parallel. It is certainly not the case that the independently executing GCs in both
processes are causing data corruption, since the GC only operates on unshared memory.
Note that the GC doesn't move the Bigarray data around.

(I am not sure if this was in particular your worry or that it was just the lack of
  synchronisation mechanisms which you bring up next.
  Apologies if I am addressing some non-concern.)

> So, you have somehow to create a way of communicating of these processes.
>
> This already is easily done in the Threads-module, because synchronisation
> mechanisms are bound there to the OCaml API and can be used easily.
>
> In the Unix module there is not much of ths IPC stuff...

In fact there is the Unix.pipe function which can be used for message passing
communication between processes.
A pipe can also be used as a semaphore:
operation V corresponds to writing a byte to the pipe, operation P corresponds to reading a byte.
It's a bit heavy since it always makes a kernel call even for the non-contended case, but
otherwise it works perfectly.

For many purposes (e.g. something "embarrassingly parallel" like computing the Mandelbrot set)
you can just divide the work up-front and only rely on the implicit synchronization given
by waitpid.

If you allow me a final observation/rant: I personally feel that the use of fork() and
pipes as a way to exploit multiple CPUs is underrated. When appropriate (lots of computation
and not so much synchronisation/communication) it works really great and is very robust because
all data is process-private by default, as opposed to threading, where everything is shared
and you have to stand on your head to get a thread-local variable. Performance can also be better
since you don't run into cache coherency issues.

I am not sure why it is not used more; possibly because it is not supported on Windows.

Stephan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Threading and SharedMem (Re: [Caml-list] Re: Is OCaml fast?)
  2010-11-30 14:04           ` Threading and SharedMem (Re: [Caml-list] Re: Is OCaml fast?) Stephan Houben
@ 2010-11-30 14:22             ` Gerd Stolpmann
  2010-11-30 14:29             ` oliver
  1 sibling, 0 replies; 16+ messages in thread
From: Gerd Stolpmann @ 2010-11-30 14:22 UTC (permalink / raw)
  To: Stephan Houben; +Cc: oliver, caml-list

Am Dienstag, den 30.11.2010, 15:04 +0100 schrieb Stephan Houben:
> On 11/30/2010 12:55 PM, oliver@first.in-berlin.de wrote:
> > There is one problem with this... when you have forked, then
> > you obviously have separated processes and also in each process
> > your own ocaml-program with it's own GC running...
> 
> ...neatly sidestepping the problem that the GC needs to lock out all threads...
> 
> > .with such a mem-mapping trick (never used Bigarray, so I'm astouned it uses
> > mmap) you then have independent processes, working on shared mem without
> > synchronisation.
> 
> > This is a good possibility to get corrupted data, and therefore unreliable behaviour.
> 
> Well, not more possibility than inherently in any code that updates a shared data structure
> in parallel. It is certainly not the case that the independently executing GCs in both
> processes are causing data corruption, since the GC only operates on unshared memory.
> Note that the GC doesn't move the Bigarray data around.
> 
> (I am not sure if this was in particular your worry or that it was just the lack of
>   synchronisation mechanisms which you bring up next.
>   Apologies if I am addressing some non-concern.)
> 
> > So, you have somehow to create a way of communicating of these processes.
> >
> > This already is easily done in the Threads-module, because synchronisation
> > mechanisms are bound there to the OCaml API and can be used easily.
> >
> > In the Unix module there is not much of ths IPC stuff...
> 
> In fact there is the Unix.pipe function which can be used for message passing
> communication between processes.
> A pipe can also be used as a semaphore:
> operation V corresponds to writing a byte to the pipe, operation P corresponds to reading a byte.
> It's a bit heavy since it always makes a kernel call even for the non-contended case, but
> otherwise it works perfectly.
> 
> For many purposes (e.g. something "embarrassingly parallel" like computing the Mandelbrot set)
> you can just divide the work up-front and only rely on the implicit synchronization given
> by waitpid.
> 
> If you allow me a final observation/rant: I personally feel that the use of fork() and
> pipes as a way to exploit multiple CPUs is underrated. When appropriate (lots of computation
> and not so much synchronisation/communication) it works really great and is very robust because
> all data is process-private by default, as opposed to threading, where everything is shared
> and you have to stand on your head to get a thread-local variable. Performance can also be better
> since you don't run into cache coherency issues.
> 
> I am not sure why it is not used more; possibly because it is not supported on Windows.

I don't think this is the reason. Many people can ignore Windows,
actually.

The problem is more that your whole program needs then to be
restructured - multi-processing implies a process model (which is the
master, which are the workers). With multi-threading you can start
threads at all times without having to worry about that (i.e. supports
"programming without design" if you want to take that as a negative
point).

This is what I want to fix with my Netmulticore library - it defines a
framework allowing you to start new processes at any time without having
to worry about the process hierarchy.

Also, many practical problems are only O(n log n), at most. The cost for
serialization of data through a pipe cannot be neglected here. This
makes shared memory attractive, even if it is only available in a
restricted form (like write once memory).

Gerd


> Stephan
> 
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
> 


-- 
------------------------------------------------------------
Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Threading and SharedMem (Re: [Caml-list] Re: Is OCaml fast?)
  2010-11-30 14:04           ` Threading and SharedMem (Re: [Caml-list] Re: Is OCaml fast?) Stephan Houben
  2010-11-30 14:22             ` Gerd Stolpmann
@ 2010-11-30 14:29             ` oliver
  2010-11-30 15:17               ` Eray Ozkural
  1 sibling, 1 reply; 16+ messages in thread
From: oliver @ 2010-11-30 14:29 UTC (permalink / raw)
  To: caml-list

On Tue, Nov 30, 2010 at 03:04:32PM +0100, Stephan Houben wrote:
> On 11/30/2010 12:55 PM, oliver@first.in-berlin.de wrote:
> >There is one problem with this... when you have forked, then
> >you obviously have separated processes and also in each process
> >your own ocaml-program with it's own GC running...
> 
> ...neatly sidestepping the problem that the GC needs to lock out all threads...
> 
> >.with such a mem-mapping trick (never used Bigarray, so I'm astouned it uses
> >mmap) you then have independent processes, working on shared mem without
> >synchronisation.
> 
> >This is a good possibility to get corrupted data, and therefore unreliable behaviour.
> 
> Well, not more possibility than inherently in any code that updates a shared data structure
> in parallel. It is certainly not the case that the independently executing GCs in both
> processes are causing data corruption, since the GC only operates on unshared memory.
> Note that the GC doesn't move the Bigarray data around.
> 
> (I am not sure if this was in particular your worry or that it was just the lack of
>  synchronisation mechanisms which you bring up next.
>  Apologies if I am addressing some non-concern.)

You addressed a non-concern... I just meant that there are no synchronisation mechanisms
for that kind of shared menory.

But your non-concern coined a question, that should be concerned...
how does the GC's handle that shared mem?

Will it handle only a reference to the shared mem?
And if the GC frees the shared mem, will it only
stop  sharing that mem with the others?
Will the released shared mem be alsu munmap'ed by OCaml?
(Should be done by Bigarray...)

> 
> >So, you have somehow to create a way of communicating of these processes.
> >
> >This already is easily done in the Threads-module, because synchronisation
> >mechanisms are bound there to the OCaml API and can be used easily.
> >
> >In the Unix module there is not much of ths IPC stuff...
> 
> In fact there is the Unix.pipe function which can be used for message passing
> communication between processes.

OK, yes, pipe is a way to go.
But some other IPC stuff may also be very helpful.

[...]
> If you allow me a final observation/rant: I personally feel that the use of fork() and
> pipes as a way to exploit multiple CPUs is underrated.

I have no problem with fork.

But if one want's to use fork() one has toalso use the IPC stuff,
and regarding that, it seems pipe is the only available way for now in
OCaml.

And: if you want to use fork, and spread the work in that way,
why not using Camlp3 or such things? I assume it also relies on fork,
but makes such programming much easier.

As we started with threads in mind, I just followed that way.

> When appropriate (lots of computation
> and not so much synchronisation/communication) it works really great

Yes, for just spreading the work, fork()/exec() is fine.

> and is very robust because
> all data is process-private by default, as opposed to threading, where everything is shared
> and you have to stand on your head to get a thread-local variable.

Yes, but OCaml provides it's own kind of making things more robust.
In C your argument would have more weight. In OCaml variable clashes
are not a problem.

But the threaded way - because of the global lock - would be a show-stopper
regarding independent threads splitting of work is not possible.

And here I see a thread-specific GC as a solution.

It seems to me that this way was not thought about before,
and people thought about changing the GC to be able to handle multiple threads.
Instead I mean: each thread that is not the global thread, get's it's own
thread-specific GC.

Maybe that can be implemented much easier.
But I've not looked into the Ocaml internals to say: yes, this can be done
comparingly easy, or to say: oh no, that's more complex than changing the GC and make it
handle all the threads from the main thread.

I just would assume that seperate threads would be easier to handle.

But maybe there are other restrictions in the language or the compiler
that block this attempt.

> Performance can also be better
> since you don't run into cache coherency issues.
> 
> I am not sure why it is not used more; possibly because it is not supported on Windows.

;-)

Ciao,
   Oliver

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Threading and SharedMem (Re: [Caml-list] Re: Is OCaml fast?)
  2010-11-30 14:29             ` oliver
@ 2010-11-30 15:17               ` Eray Ozkural
  0 siblings, 0 replies; 16+ messages in thread
From: Eray Ozkural @ 2010-11-30 15:17 UTC (permalink / raw)
  To: oliver; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 1766 bytes --]

On Tue, Nov 30, 2010 at 4:29 PM, <oliver@first.in-berlin.de> wrote:
>
> And here I see a thread-specific GC as a solution.
>
> It seems to me that this way was not thought about before,
> and people thought about changing the GC to be able to handle multiple
> threads.
> Instead I mean: each thread that is not the global thread, get's it's own
> thread-specific GC.
>
> Maybe that can be implemented much easier.
> But I've not looked into the Ocaml internals to say: yes, this can be done
> comparingly easy, or to say: oh no, that's more complex than changing the
> GC and make it
> handle all the threads from the main thread.
>
> I just would assume that seperate threads would be easier to handle.
>
> But maybe there are other restrictions in the language or the compiler
> that block this attempt.
>
>
Not anything that I have yet found but I am curious about the opinion of the
runtime designers as well. There seem to be some global variables in the
runtime, which isn't the best way to write it anyway, if all vars are local
then it becomes easier to embed, thread and fork as you like. That is to
say, such a level of virtualization allows the global lock to be substituted
with whatever locking the system alloc functions have, i.e. when many malloc
requests overlap.

Except that, the user has to explicitly sync his mem accesses so I don't
think there could be any problem. And no other part of the library is
thread-safe either (which is great for performance!). So I think each thread
can have its own GC. Perhaps there is an obstacle I have not yet noticed?

Best,

-- 
Eray Ozkural, PhD candidate.  Comp. Sci. Dept., Bilkent University, Ankara
http://groups.yahoo.com/group/ai-philosophy
http://myspace.com/arizanesil http://myspace.com/malfunct

[-- Attachment #2: Type: text/html, Size: 2329 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Threading and SharedMem (Re: [Caml-list] Re: Is OCaml fast?)
  2010-11-30 21:13                 ` Jon Harrop
@ 2010-11-30 21:28                   ` Christophe Raffalli
  0 siblings, 0 replies; 16+ messages in thread
From: Christophe Raffalli @ 2010-11-30 21:28 UTC (permalink / raw)
  To: Jon Harrop, OCaml

[-- Attachment #1: Type: text/plain, Size: 1529 bytes --]

Le 30/11/10 22:13, Jon Harrop a écrit :
> What would be responsible for collecting the shared heap?
Reference counting: if there are no pointer within the shared heap (I
mean pointer
to and from the shared heap), this should be quite easy via a finaliser ...

For more than that, reference counting via finaliser + a real GC for the
shared heap itself,
because this heap being seen as a C region from OCaml, but using the
memory representation
of OCaml, it could be managed by your own GC.

So globally, I think writing a C program that would manage an OCaml
shared heap with its own GC and reference counting for the number of
pointer from OCaml threads is quite feasible ...

There remain the following problem of
- an unexpectidely dying OCaml thread would leave its refenrece counting
increased forever ... It is not clear we have to deal with that ...
- What syntax to allocate in the Shared heap (some camlpN (N = 4 or 5)
magic ?)
- There is something to do for pointers from the shared heap to some
OCaml heap, they have to be forbidden,
but maybe you would like to disallow their apparition at compile time by
some static analysis ... Anyway, they will be detected at runtime by
page violation when an OCaml thread tries to follow a pointer to the
heap of another thread. At least this page violation should be
transformed into an OCaml exception. They will also be detected by the
GC of the Shared heap ... An here it is not clear what to do ... Just
ignore them ?

Cheers,
Christophe


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: Threading and SharedMem (Re: [Caml-list] Re: Is OCaml fast?)
  2010-11-30 13:06               ` Eray Ozkural
@ 2010-11-30 21:13                 ` Jon Harrop
  2010-11-30 21:28                   ` Christophe Raffalli
  0 siblings, 1 reply; 16+ messages in thread
From: Jon Harrop @ 2010-11-30 21:13 UTC (permalink / raw)
  To: 'Eray Ozkural', caml-list

What would be responsible for collecting the shared heap?

Cheers,
Jon.

Eray wrote:
> Seconded, why is this not possible? That is to say, why cannot each thread
maintain a separate GC,
> if so desired?



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Threading and SharedMem (Re: [Caml-list] Re: Is OCaml fast?)
  2010-11-30 16:07             ` Gerd Stolpmann
@ 2010-11-30 17:40               ` oliver
  0 siblings, 0 replies; 16+ messages in thread
From: oliver @ 2010-11-30 17:40 UTC (permalink / raw)
  To: caml-list

On Tue, Nov 30, 2010 at 05:07:31PM +0100, Gerd Stolpmann wrote:
> Am Dienstag, den 30.11.2010, 16:30 +0100 schrieb Stephan Houben:
> > On 11/30/2010 02:22 PM, Gerd Stolpmann wrote:
> > > I don't think this is the reason. Many people can ignore Windows,
> > > actually.
> > >
> > > The problem is more that your whole program needs then to be
> > > restructured - multi-processing implies a process model (which is the
> > > master, which are the workers). With multi-threading you can start
> > > threads at all times without having to worry about that (i.e. supports
> > > "programming without design" if you want to take that as a negative
> > > point).
> > >
> > > This is what I want to fix with my Netmulticore library - it defines a
> > > framework allowing you to start new processes at any time without having
> > > to worry about the process hierarchy.
> > 
> > I have in fact read with much interest your blog at
> > http://blog.camlcity.org/blog/parallelmm.html .
> > 
> > Your approach there is to really have separate programs for
> > server and client. However, one nice thing about fork is that you don't
> > have to restructure your program; you can just call fork down somewhere
> > in some subroutine where you decide it is convenient, start doing some
> > multicore computation, finish and return, and the caller needs never know
> > that you did that. So you can indeed program without design using fork.
> 
> Well, I would not recommend that in all cases: fork duplicates all
> memory, and if this is a lot, you can end up consuming a lot of RAM.
[...]

Hence the fork-early recommendation, wich should be a known "rule".
But of course this is, what you addressed with "programming without design"
would be problematic.

But the same also holds for threaded programs, or programs in general.

Ad-hoc code is often a good way to just start something, but after a while
one should start design... and I mean... long before release. ;)
And doing the design after starting with ad-hoc code is done even better
with a language that makes refactoring easy. (And this is the reason
why we are on the Caml-list :))

Some people might even start with the design... and I think a rigid type system
like OCaml supports design, because starting with types can give a good
starting point in a design.
So I experienced that starting with types (and interface) can make things
very clear from the beginning. But of course this only helps, if you have
clear so far, what you want to achieve (a clear, distinct task).
But somehow this becomes off-topic and would be a separate discussion.

[...]
> > Of course, the advantage of your approach is that you can now distribute
> > the work over multiple machines. So I guess there is an appropriate
> > place for all of these techniques.
> 
> I was recently working on an improved fork machinery, with some
> indirection between the request for creating a new worker process, and
> the actual fork. That's this netmulticore library I'm talking about. The
> new process is not a child of the process requesting the creation, but
> always a child of a common master process.
[...]

So, you ask for new sisters and brothers....
...and the parent can be slim.

Ciao,
   Oliver

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Threading and SharedMem (Re: [Caml-list] Re: Is OCaml fast?)
  2010-11-30 15:30           ` Stephan Houben
@ 2010-11-30 16:07             ` Gerd Stolpmann
  2010-11-30 17:40               ` oliver
  0 siblings, 1 reply; 16+ messages in thread
From: Gerd Stolpmann @ 2010-11-30 16:07 UTC (permalink / raw)
  To: Stephan Houben; +Cc: caml-list

Am Dienstag, den 30.11.2010, 16:30 +0100 schrieb Stephan Houben:
> On 11/30/2010 02:22 PM, Gerd Stolpmann wrote:
> > I don't think this is the reason. Many people can ignore Windows,
> > actually.
> >
> > The problem is more that your whole program needs then to be
> > restructured - multi-processing implies a process model (which is the
> > master, which are the workers). With multi-threading you can start
> > threads at all times without having to worry about that (i.e. supports
> > "programming without design" if you want to take that as a negative
> > point).
> >
> > This is what I want to fix with my Netmulticore library - it defines a
> > framework allowing you to start new processes at any time without having
> > to worry about the process hierarchy.
> 
> I have in fact read with much interest your blog at
> http://blog.camlcity.org/blog/parallelmm.html .
> 
> Your approach there is to really have separate programs for
> server and client. However, one nice thing about fork is that you don't
> have to restructure your program; you can just call fork down somewhere
> in some subroutine where you decide it is convenient, start doing some
> multicore computation, finish and return, and the caller needs never know
> that you did that. So you can indeed program without design using fork.

Well, I would not recommend that in all cases: fork duplicates all
memory, and if this is a lot, you can end up consuming a lot of RAM.
Even worse, the GC of the forked subprocess has to manage all of the
RAM, including the part that is not required for doing the computation
in the subprocess (the copy-on-write optimization of the OS gets you
nothing here).

Also, there can be subtle interactions between the parent and the child,
e.g. file descriptors are inherited, affecting whether closed
descriptors can be recognized.

So, use with care, and not without design. Forking in the middle of a
bigger program can be quite disastrous.

> Of course, the advantage of your approach is that you can now distribute
> the work over multiple machines. So I guess there is an appropriate
> place for all of these techniques.

I was recently working on an improved fork machinery, with some
indirection between the request for creating a new worker process, and
the actual fork. That's this netmulticore library I'm talking about. The
new process is not a child of the process requesting the creation, but
always a child of a common master process. This avoids all the problems
(memory issues, file descriptor issues, and a few more), at the cost of
having to transmit state to the new process.

> > Also, many practical problems are only O(n log n), at most. The cost for
> > serialization of data through a pipe cannot be neglected here. This
> > makes shared memory attractive, even if it is only available in a
> > restricted form (like write once memory).
> 
> Well, the original context was one of a benchmark which had an
> arbitrary rule that you can only use functions from the bundled libraries.
> And my proposal was to use the pipe for synchronisation and the shared memory
> for bulk communication.
> 
> If we drop the arbitrary rule there are faster options than pipes.
> (e.g. POSIX semaphores in a shared memory segment).

Yes, it's really arbitrary. All remaining solutions are very restricted
then, and don't have much to do with what you would choose for solving a
real-world problem. This makes this benchmark irrelevant.

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Threading and SharedMem (Re: [Caml-list] Re: Is OCaml fast?)
       [not found]         ` <fa.LGfjfIGKcYLW6PBxy7aMsEnvy/w@ifi.uio.no>
@ 2010-11-30 15:30           ` Stephan Houben
  2010-11-30 16:07             ` Gerd Stolpmann
  0 siblings, 1 reply; 16+ messages in thread
From: Stephan Houben @ 2010-11-30 15:30 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: caml-list

On 11/30/2010 02:22 PM, Gerd Stolpmann wrote:
> I don't think this is the reason. Many people can ignore Windows,
> actually.
>
> The problem is more that your whole program needs then to be
> restructured - multi-processing implies a process model (which is the
> master, which are the workers). With multi-threading you can start
> threads at all times without having to worry about that (i.e. supports
> "programming without design" if you want to take that as a negative
> point).
>
> This is what I want to fix with my Netmulticore library - it defines a
> framework allowing you to start new processes at any time without having
> to worry about the process hierarchy.

I have in fact read with much interest your blog at
http://blog.camlcity.org/blog/parallelmm.html .

Your approach there is to really have separate programs for
server and client. However, one nice thing about fork is that you don't
have to restructure your program; you can just call fork down somewhere
in some subroutine where you decide it is convenient, start doing some
multicore computation, finish and return, and the caller needs never know
that you did that. So you can indeed program without design using fork.

Of course, the advantage of your approach is that you can now distribute
the work over multiple machines. So I guess there is an appropriate
place for all of these techniques.

> Also, many practical problems are only O(n log n), at most. The cost for
> serialization of data through a pipe cannot be neglected here. This
> makes shared memory attractive, even if it is only available in a
> restricted form (like write once memory).

Well, the original context was one of a benchmark which had an
arbitrary rule that you can only use functions from the bundled libraries.
And my proposal was to use the pipe for synchronisation and the shared memory
for bulk communication.

If we drop the arbitrary rule there are faster options than pipes.
(e.g. POSIX semaphores in a shared memory segment).

Stephan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Threading and SharedMem (Re: [Caml-list] Re: Is OCaml fast?)
  2010-11-30 12:55             ` oliver
  2010-11-30 13:06               ` Eray Ozkural
@ 2010-11-30 14:09               ` Gerd Stolpmann
  1 sibling, 0 replies; 16+ messages in thread
From: Gerd Stolpmann @ 2010-11-30 14:09 UTC (permalink / raw)
  To: oliver; +Cc: caml-list

Am Dienstag, den 30.11.2010, 13:55 +0100 schrieb
oliver@first.in-berlin.de:
> On Tue, Nov 30, 2010 at 09:10:36AM +0100, Stephan Houben wrote:
> > On 11/29/2010 04:33 PM, Oliver Bandel wrote:
> > >Zitat von "Gerd Stolpmann" <info@gerd-stolpmann.de>:
> > >
> > >>Am Montag, den 29.11.2010, 17:12 +0100 schrieb Oliver Bandel:
> > >>>Zitat von "Gerd Stolpmann" <info@gerd-stolpmann.de>:
> > >>>
> > 
> > >>>You use shared mem(?), but you link only to *.ml files,
> > >>>and I see no *.c there.
> > 
> > >>>How can this be done?
> > >>>
> > >>>At least not via the libs that are shipped with OCaml?!
> > 
> > Actually it can be done using the libs that ship with OCaml
> > (Unix and Bigarray), although it is not 100% POSIX :
> > 
> > let create_shared_genarray kind layout dims =
> >   let fd = Unix.openfile "/dev/zero" [Unix.O_RDWR] 0
> >   in let ar = Bigarray.Genarray.map_file fd kind layout true dims
> >   in Unix.close fd; ar
> > 
> > 
> > The resulting bigarray object is shared among subsequent forks.
> 
> Hmhhh... we started talking about Threads and SharedMem.
> You mean even fork.... hmhhh

Independent processes are right now the only way to use several cores.
You can organize shared memory between processes, but it is tricky.
That's what I try to ease with my Netmulticore library.

> > This relies on the fact that mmap-ing /dev/zero is equivalent
> > to an anonymous mmap.
> > 
> > http://en.wikipedia.org/wiki//dev/zero
> > 
> > Well, at least it works on Linux.
> 
> In APUE it's mentioned that memory mapped regions are inherited
> by a child, when forking it. So it should work on all Unix-systems too.

Yes, but is not defined by POSIX what mapping /dev/zero means.

> There is one problem with this... when you have forked, then
> you obviously have separated processes and also in each process
> your own ocaml-program with it's own GC running...
> 
> ..with such a mem-mapping trick (never used Bigarray, so I'm astouned it uses
> mmap) 

Bigarrays can use any memory with fixed addresses. That's the essence
here: Bigarrays are not moved around by the GC.

> you then have independent processes, working on shared mem without
> synchronisation.
> 
> This is a good possibility to get corrupted data, and therefore unreliable behaviour.
> 
> So, you have somehow to create a way of communicating of these processes.

So you need inter-process synchronization primitives, like POSIX
semaphores.

> This already is easily done in the Threads-module, because synchronisation
> mechanisms are bound there to the OCaml API and can be used easily.
> 
> In the Unix module there is not much of ths IPC stuff...

But in Ocamlnet's netsys module.

> (A thread-specific GC for thread-specific variables would help here,
>  making global locks only necessary when accessing global used variables.
>  But I don't know if such a way would be possible without changing the GC-stuff
>  itself.)

The global lock does not protect user variables, but the Ocaml runtime,
e.g. the state of the memory manager/garbage collector. Also it eases
code generation - the memory image needs not be in a consistent state
all the time (i.e. all pointers meaningful), but only when the runtime
gets a hand on it. Removing this lock has far-reaching consequences.

The oc4mc (ocaml for multicore) project used a separate minor heap per
thread, which actually eases the task a lot - memory is in most cases
allocated in the minor heap anyway. Many variables keeping the state of
the runtime are then thread-local.

Gerd

> 
> 
> Ciao,
>    Oliver
> 
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
> 


-- 
------------------------------------------------------------
Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Threading and SharedMem (Re: [Caml-list] Re: Is OCaml fast?)
  2010-11-30 12:55             ` oliver
@ 2010-11-30 13:06               ` Eray Ozkural
  2010-11-30 21:13                 ` Jon Harrop
  2010-11-30 14:09               ` Gerd Stolpmann
  1 sibling, 1 reply; 16+ messages in thread
From: Eray Ozkural @ 2010-11-30 13:06 UTC (permalink / raw)
  To: oliver; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 622 bytes --]

On Tue, Nov 30, 2010 at 2:55 PM, <oliver@first.in-berlin.de> wrote:
>
>
> (A thread-specific GC for thread-specific variables would help here,
>  making global locks only necessary when accessing global used variables.
>  But I don't know if such a way would be possible without changing the
> GC-stuff
>  itself.)
>
>
Seconded, why is this not possible? That is to say, why cannot each thread
maintain a separate GC, if so desired?

Best,

-- 
Eray Ozkural, PhD candidate.  Comp. Sci. Dept., Bilkent University, Ankara
http://groups.yahoo.com/group/ai-philosophy
http://myspace.com/arizanesil http://myspace.com/malfunct

[-- Attachment #2: Type: text/html, Size: 1087 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Threading and SharedMem (Re: [Caml-list] Re: Is OCaml fast?)
  2010-11-30  8:10           ` Stephan Houben
@ 2010-11-30 12:55             ` oliver
  2010-11-30 13:06               ` Eray Ozkural
  2010-11-30 14:09               ` Gerd Stolpmann
  0 siblings, 2 replies; 16+ messages in thread
From: oliver @ 2010-11-30 12:55 UTC (permalink / raw)
  To: caml-list

On Tue, Nov 30, 2010 at 09:10:36AM +0100, Stephan Houben wrote:
> On 11/29/2010 04:33 PM, Oliver Bandel wrote:
> >Zitat von "Gerd Stolpmann" <info@gerd-stolpmann.de>:
> >
> >>Am Montag, den 29.11.2010, 17:12 +0100 schrieb Oliver Bandel:
> >>>Zitat von "Gerd Stolpmann" <info@gerd-stolpmann.de>:
> >>>
> 
> >>>You use shared mem(?), but you link only to *.ml files,
> >>>and I see no *.c there.
> 
> >>>How can this be done?
> >>>
> >>>At least not via the libs that are shipped with OCaml?!
> 
> Actually it can be done using the libs that ship with OCaml
> (Unix and Bigarray), although it is not 100% POSIX :
> 
> let create_shared_genarray kind layout dims =
>   let fd = Unix.openfile "/dev/zero" [Unix.O_RDWR] 0
>   in let ar = Bigarray.Genarray.map_file fd kind layout true dims
>   in Unix.close fd; ar
> 
> 
> The resulting bigarray object is shared among subsequent forks.

Hmhhh... we started talking about Threads and SharedMem.
You mean even fork.... hmhhh

> This relies on the fact that mmap-ing /dev/zero is equivalent
> to an anonymous mmap.
> 
> http://en.wikipedia.org/wiki//dev/zero
> 
> Well, at least it works on Linux.

In APUE it's mentioned that memory mapped regions are inherited
by a child, when forking it. So it should work on all Unix-systems too.

There is one problem with this... when you have forked, then
you obviously have separated processes and also in each process
your own ocaml-program with it's own GC running...

..with such a mem-mapping trick (never used Bigarray, so I'm astouned it uses
mmap) you then have independent processes, working on shared mem without
synchronisation.

This is a good possibility to get corrupted data, and therefore unreliable behaviour.

So, you have somehow to create a way of communicating of these processes.

This already is easily done in the Threads-module, because synchronisation
mechanisms are bound there to the OCaml API and can be used easily.

In the Unix module there is not much of ths IPC stuff...

(A thread-specific GC for thread-specific variables would help here,
 making global locks only necessary when accessing global used variables.
 But I don't know if such a way would be possible without changing the GC-stuff
 itself.)

Ciao,
   Oliver

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Threading and SharedMem (Re: [Caml-list] Re: Is OCaml fast?)
       [not found]         ` <fa.srfZThtnO8lApSpMeW3POD462Xg@ifi.uio.no>
@ 2010-11-30  8:10           ` Stephan Houben
  2010-11-30 12:55             ` oliver
  0 siblings, 1 reply; 16+ messages in thread
From: Stephan Houben @ 2010-11-30  8:10 UTC (permalink / raw)
  To: caml-list

On 11/29/2010 04:33 PM, Oliver Bandel wrote:
> Zitat von "Gerd Stolpmann" <info@gerd-stolpmann.de>:
>
>> Am Montag, den 29.11.2010, 17:12 +0100 schrieb Oliver Bandel:
>>> Zitat von "Gerd Stolpmann" <info@gerd-stolpmann.de>:
>>>

>>> You use shared mem(?), but you link only to *.ml files,
>>> and I see no *.c there.

>>> How can this be done?
>>>
>>> At least not via the libs that are shipped with OCaml?!

Actually it can be done using the libs that ship with OCaml
(Unix and Bigarray), although it is not 100% POSIX :

let create_shared_genarray kind layout dims =
   let fd = Unix.openfile "/dev/zero" [Unix.O_RDWR] 0
   in let ar = Bigarray.Genarray.map_file fd kind layout true dims
   in Unix.close fd; ar


The resulting bigarray object is shared among subsequent forks.
This relies on the fact that mmap-ing /dev/zero is equivalent
to an anonymous mmap.

http://en.wikipedia.org/wiki//dev/zero

Well, at least it works on Linux.

Stephan


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Threading and SharedMem (Re: [Caml-list] Re: Is OCaml fast?)
  2010-11-29 16:24                         ` Gerd Stolpmann
@ 2010-11-29 16:33                           ` Oliver Bandel
  0 siblings, 0 replies; 16+ messages in thread
From: Oliver Bandel @ 2010-11-29 16:33 UTC (permalink / raw)
  To: caml-list

Zitat von "Gerd Stolpmann" <info@gerd-stolpmann.de>:

> Am Montag, den 29.11.2010, 17:12 +0100 schrieb Oliver Bandel:
>> Zitat von "Gerd Stolpmann" <info@gerd-stolpmann.de>:
>>
>> > Am Sonntag, den 28.11.2010, 19:14 +0100 schrieb
>> > oliver@first.in-berlin.de:
>> >> On Thu, Nov 25, 2010 at 11:50:58PM +0100, Fabrice Le Fessant wrote:
>> >> [...]
>> >> >  The main problem was that other languages have bigger standard
>> >> > libraries, whereas OCaml has a very small one (just what is needed
>> >> > to compile the compiler, actually). In many problems, you could
>> >> > benefit from using a very simple shared-memory library (in
>> >> > mandelbrot, the ocaml multicore solution has to copy the image in a
>> >> > socket between processes, whereas it could just be in a shared
>> >> > memory segment),
>> >>
>> >>
>> >> ...so you work on a shared-mem module?!
>> >
>> > Don't know what Fabrice is referring to, but at least I work on a
>> > multicore-enabling library:
>> >
>> >  
>> https://godirepo.camlcity.org/svn/lib-ocamlnet2/trunk/code/src/netmulticore/
>> >
>> > This is work in progress and highly experimental. What's currently
>> > available:
>> >
>> > - managing processes and resources like files, shared memory objects
>> >   etc.
>> > - support for message passing via Netcamlbox (another library)
>> > - low-level only so far: shared memory, including copying Ocaml values
>> >   to and from shm
>> [...]
>>
>> You use shared mem(?), but you link only to *.ml files,
>> and I see no *.c there.
>
> cd ../netsys
>
> it's part of a larger package

ah, ok. :)


>
>>
>> How can this be done?
>>
>> At least not via the libs that are shipped with OCaml?!
>>
>> I would have expected some *.c for the shared mem part and
>> the creation of Caml-values....
>>
>>
>> Ciao,
>>     Oliver
>>
>> P.S.: OCaml also provides a Thread-Lib, which seems to use pthread-lib.
>>        Normally this should help in making things possible to run  
>> on multiple
>>        cores. What are the restrictions  that this does not run that way?
>>        Somehow... when all values are handled via one GC, then those threads
>>        are somehow bound together, but on the other side, it works threaded,
>>        and consumer-worker pipes and such stuff can be used.
>>        So... somehow the GC seems to be the point, where the show will be
>>        stopped? (Anyone who has looked inside OCaml here more detailed?)
>
> Quite easy: there is a global lock, and when Ocaml code runs, this lock
> must be acquired. So only one of the pthreads can have this lock,
[...]

Aha, ok.
Thanks for the details.

Wouldn't it be possible for each thread to have it's own GC,
and letting the global lock on global GC only be used, when
global variables are touched?
Then this could be added to the Threads-module...

Ciao,
    Oliver


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Threading and SharedMem (Re: [Caml-list] Re: Is OCaml fast?)
  2010-11-29 16:12                       ` Threading and SharedMem (Re: [Caml-list] Re: Is OCaml fast?) Oliver Bandel
@ 2010-11-29 16:24                         ` Gerd Stolpmann
  2010-11-29 16:33                           ` Oliver Bandel
  0 siblings, 1 reply; 16+ messages in thread
From: Gerd Stolpmann @ 2010-11-29 16:24 UTC (permalink / raw)
  To: Oliver Bandel; +Cc: caml-list

Am Montag, den 29.11.2010, 17:12 +0100 schrieb Oliver Bandel:
> Zitat von "Gerd Stolpmann" <info@gerd-stolpmann.de>:
> 
> > Am Sonntag, den 28.11.2010, 19:14 +0100 schrieb
> > oliver@first.in-berlin.de:
> >> On Thu, Nov 25, 2010 at 11:50:58PM +0100, Fabrice Le Fessant wrote:
> >> [...]
> >> >  The main problem was that other languages have bigger standard
> >> > libraries, whereas OCaml has a very small one (just what is needed
> >> > to compile the compiler, actually). In many problems, you could
> >> > benefit from using a very simple shared-memory library (in
> >> > mandelbrot, the ocaml multicore solution has to copy the image in a
> >> > socket between processes, whereas it could just be in a shared
> >> > memory segment),
> >>
> >>
> >> ...so you work on a shared-mem module?!
> >
> > Don't know what Fabrice is referring to, but at least I work on a
> > multicore-enabling library:
> >
> > https://godirepo.camlcity.org/svn/lib-ocamlnet2/trunk/code/src/netmulticore/
> >
> > This is work in progress and highly experimental. What's currently
> > available:
> >
> > - managing processes and resources like files, shared memory objects
> >   etc.
> > - support for message passing via Netcamlbox (another library)
> > - low-level only so far: shared memory, including copying Ocaml values
> >   to and from shm
> [...]
> 
> You use shared mem(?), but you link only to *.ml files,
> and I see no *.c there.

cd ../netsys

it's part of a larger package

> 
> How can this be done?
> 
> At least not via the libs that are shipped with OCaml?!
> 
> I would have expected some *.c for the shared mem part and
> the creation of Caml-values....
> 
> 
> Ciao,
>     Oliver
> 
> P.S.: OCaml also provides a Thread-Lib, which seems to use pthread-lib.
>        Normally this should help in making things possible to run on multiple
>        cores. What are the restrictions  that this does not run that way?
>        Somehow... when all values are handled via one GC, then those threads
>        are somehow bound together, but on the other side, it works threaded,
>        and consumer-worker pipes and such stuff can be used.
>        So... somehow the GC seems to be the point, where the show will be
>        stopped? (Anyone who has looked inside OCaml here more detailed?)

Quite easy: there is a global lock, and when Ocaml code runs, this lock
must be acquired. So only one of the pthreads can have this lock, and so
only one pthread can run Ocaml code.

The reason is that memory management is not thread-safe.

Gerd

> 
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
> 


-- 
------------------------------------------------------------
Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Threading and SharedMem (Re: [Caml-list] Re: Is OCaml fast?)
  2010-11-29 14:19                     ` Gerd Stolpmann
@ 2010-11-29 16:12                       ` Oliver Bandel
  2010-11-29 16:24                         ` Gerd Stolpmann
  0 siblings, 1 reply; 16+ messages in thread
From: Oliver Bandel @ 2010-11-29 16:12 UTC (permalink / raw)
  To: caml-list

Zitat von "Gerd Stolpmann" <info@gerd-stolpmann.de>:

> Am Sonntag, den 28.11.2010, 19:14 +0100 schrieb
> oliver@first.in-berlin.de:
>> On Thu, Nov 25, 2010 at 11:50:58PM +0100, Fabrice Le Fessant wrote:
>> [...]
>> >  The main problem was that other languages have bigger standard
>> > libraries, whereas OCaml has a very small one (just what is needed
>> > to compile the compiler, actually). In many problems, you could
>> > benefit from using a very simple shared-memory library (in
>> > mandelbrot, the ocaml multicore solution has to copy the image in a
>> > socket between processes, whereas it could just be in a shared
>> > memory segment),
>>
>>
>> ...so you work on a shared-mem module?!
>
> Don't know what Fabrice is referring to, but at least I work on a
> multicore-enabling library:
>
> https://godirepo.camlcity.org/svn/lib-ocamlnet2/trunk/code/src/netmulticore/
>
> This is work in progress and highly experimental. What's currently
> available:
>
> - managing processes and resources like files, shared memory objects
>   etc.
> - support for message passing via Netcamlbox (another library)
> - low-level only so far: shared memory, including copying Ocaml values
>   to and from shm
[...]

You use shared mem(?), but you link only to *.ml files,
and I see no *.c there.

How can this be done?

At least not via the libs that are shipped with OCaml?!

I would have expected some *.c for the shared mem part and
the creation of Caml-values....


Ciao,
    Oliver

P.S.: OCaml also provides a Thread-Lib, which seems to use pthread-lib.
       Normally this should help in making things possible to run on multiple
       cores. What are the restrictions  that this does not run that way?
       Somehow... when all values are handled via one GC, then those threads
       are somehow bound together, but on the other side, it works threaded,
       and consumer-worker pipes and such stuff can be used.
       So... somehow the GC seems to be the point, where the show will be
       stopped? (Anyone who has looked inside OCaml here more detailed?)


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2010-11-30 21:28 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <fa.B9mcuN46iEGhXlge41VUCLz69+Y@ifi.uio.no>
     [not found] ` <fa.D3cDWzaD9Uu03+KvekpwpBGCx7o@ifi.uio.no>
     [not found]   ` <fa.xsCCCeDYPj8J16i9UrdqxoOIQ0Y@ifi.uio.no>
     [not found]     ` <fa.SW2Swldk88Bs5ujaNHT8Yh4bXkg@ifi.uio.no>
     [not found]       ` <fa.V+M6RbukE/w/Aftpwxkx2MvkxlU@ifi.uio.no>
     [not found]         ` <fa.+OkqNL3AB4+5LA8wOnQD9WS59QQ@ifi.uio.no>
2010-11-30 14:04           ` Threading and SharedMem (Re: [Caml-list] Re: Is OCaml fast?) Stephan Houben
2010-11-30 14:22             ` Gerd Stolpmann
2010-11-30 14:29             ` oliver
2010-11-30 15:17               ` Eray Ozkural
     [not found] <fa.eehhGhbses+7RvDlflKpXQ8Uu34@ifi.uio.no>
     [not found] ` <fa.UVXWB7NnPNJbhh0Cf2OLmzYx/bQ@ifi.uio.no>
     [not found]   ` <fa.uksiRZ6fYFia4X1fXQaWa8z4Kio@ifi.uio.no>
     [not found]     ` <fa.Zv2Wkh0+DJAuXcOzq+qABiYFTP4@ifi.uio.no>
     [not found]       ` <fa.W5DnVSXs073N1X2rbtpyh7iGAcc@ifi.uio.no>
     [not found]         ` <fa.LGfjfIGKcYLW6PBxy7aMsEnvy/w@ifi.uio.no>
2010-11-30 15:30           ` Stephan Houben
2010-11-30 16:07             ` Gerd Stolpmann
2010-11-30 17:40               ` oliver
     [not found] <fa.sn187DUeFX1sJ62LL4s6SatUR/c@ifi.uio.no>
     [not found] ` <fa.PTndTGw0Otg08P5/YMoxmRptrPs@ifi.uio.no>
     [not found]   ` <fa.0ulojaV8bXHHiRN+1r6S98RGEsw@ifi.uio.no>
     [not found]     ` <fa.gQ7B1GYcdbBVupZowIyW2+1E/b4@ifi.uio.no>
     [not found]       ` <fa.ludbTMBmN7YGqnEwsRPwOGCpjrA@ifi.uio.no>
     [not found]         ` <fa.srfZThtnO8lApSpMeW3POD462Xg@ifi.uio.no>
2010-11-30  8:10           ` Stephan Houben
2010-11-30 12:55             ` oliver
2010-11-30 13:06               ` Eray Ozkural
2010-11-30 21:13                 ` Jon Harrop
2010-11-30 21:28                   ` Christophe Raffalli
2010-11-30 14:09               ` Gerd Stolpmann
2010-11-22 17:08 [Caml-list] Is OCaml fast? David Rajchenbach-Teller
2010-11-23  2:01 ` Isaac Gouy
2010-11-23 23:27   ` [Caml-list] " oliver
2010-11-24  0:23     ` Isaac Gouy
2010-11-24  1:36       ` [Caml-list] " Eray Ozkural
2010-11-24  2:13         ` Isaac Gouy
2010-11-24  4:39           ` [Caml-list] " Jeff Meister
2010-11-25 16:59             ` Stefan Monnier
     [not found]               ` <1534555381.33107.1290723160355.JavaMail.root@zmbs4.inria.fr>
2010-11-25 22:50                 ` [Caml-list] " Fabrice Le Fessant
2010-11-28 18:14                   ` oliver
2010-11-29 14:19                     ` Gerd Stolpmann
2010-11-29 16:12                       ` Threading and SharedMem (Re: [Caml-list] Re: Is OCaml fast?) Oliver Bandel
2010-11-29 16:24                         ` Gerd Stolpmann
2010-11-29 16:33                           ` Oliver Bandel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).