From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by yquem.inria.fr (Postfix) with ESMTP id CE0C7BC57 for ; Tue, 30 Nov 2010 15:29:43 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AlQBAL+Y9EzAbSoIe2dsb2JhbACDUJ9BFQEBFiIEHrJlkSINgRSDM3ME X-IronPort-AV: E=Sophos;i="4.59,280,1288566000"; d="scan'208";a="80982511" Received: from einhorn.in-berlin.de ([192.109.42.8]) by mail4-smtp-sop.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-SHA; 30 Nov 2010 15:29:43 +0100 X-Envelope-From: oliver@first.in-berlin.de X-Envelope-To: Received: from first (e178036032.adsl.alicedsl.de [85.178.36.32]) (authenticated bits=0) by einhorn.in-berlin.de (8.13.6/8.13.6/Debian-1) with ESMTP id oAUETgtt032767 for ; Tue, 30 Nov 2010 15:29:42 +0100 Received: by first (Postfix, from userid 1000) id ECD70441BD4; Tue, 30 Nov 2010 15:29:41 +0100 (CET) Date: Tue, 30 Nov 2010 15:29:41 +0100 From: oliver@first.in-berlin.de To: caml-list@yquem.inria.fr Subject: Re: Threading and SharedMem (Re: [Caml-list] Re: Is OCaml fast?) Message-ID: <20101130142941.GG1637@siouxsie> References: <4CF50470.802@planet.nl> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <4CF50470.802@planet.nl> User-Agent: Mutt/1.5.20 (2009-06-14) X-Scanned-By: MIMEDefang_at_IN-Berlin_e.V. on 192.109.42.8 X-Spam: no; 0.00; in-berlin:01 threading:01 ocaml:01 0100,:01 in-berlin:01 bigarray:01 inherently:01 bigarray:01 ocaml:01 camlp:01 threading:01 clashes:01 show-stopper:01 compiler:01 ths:98 On Tue, Nov 30, 2010 at 03:04:32PM +0100, Stephan Houben wrote: > On 11/30/2010 12:55 PM, oliver@first.in-berlin.de wrote: > >There is one problem with this... when you have forked, then > >you obviously have separated processes and also in each process > >your own ocaml-program with it's own GC running... > > ...neatly sidestepping the problem that the GC needs to lock out all threads... > > >.with such a mem-mapping trick (never used Bigarray, so I'm astouned it uses > >mmap) you then have independent processes, working on shared mem without > >synchronisation. > > >This is a good possibility to get corrupted data, and therefore unreliable behaviour. > > Well, not more possibility than inherently in any code that updates a shared data structure > in parallel. It is certainly not the case that the independently executing GCs in both > processes are causing data corruption, since the GC only operates on unshared memory. > Note that the GC doesn't move the Bigarray data around. > > (I am not sure if this was in particular your worry or that it was just the lack of > synchronisation mechanisms which you bring up next. > Apologies if I am addressing some non-concern.) You addressed a non-concern... I just meant that there are no synchronisation mechanisms for that kind of shared menory. But your non-concern coined a question, that should be concerned... how does the GC's handle that shared mem? Will it handle only a reference to the shared mem? And if the GC frees the shared mem, will it only stop sharing that mem with the others? Will the released shared mem be alsu munmap'ed by OCaml? (Should be done by Bigarray...) > > >So, you have somehow to create a way of communicating of these processes. > > > >This already is easily done in the Threads-module, because synchronisation > >mechanisms are bound there to the OCaml API and can be used easily. > > > >In the Unix module there is not much of ths IPC stuff... > > In fact there is the Unix.pipe function which can be used for message passing > communication between processes. OK, yes, pipe is a way to go. But some other IPC stuff may also be very helpful. [...] > If you allow me a final observation/rant: I personally feel that the use of fork() and > pipes as a way to exploit multiple CPUs is underrated. I have no problem with fork. But if one want's to use fork() one has toalso use the IPC stuff, and regarding that, it seems pipe is the only available way for now in OCaml. And: if you want to use fork, and spread the work in that way, why not using Camlp3 or such things? I assume it also relies on fork, but makes such programming much easier. As we started with threads in mind, I just followed that way. > When appropriate (lots of computation > and not so much synchronisation/communication) it works really great Yes, for just spreading the work, fork()/exec() is fine. > and is very robust because > all data is process-private by default, as opposed to threading, where everything is shared > and you have to stand on your head to get a thread-local variable. Yes, but OCaml provides it's own kind of making things more robust. In C your argument would have more weight. In OCaml variable clashes are not a problem. But the threaded way - because of the global lock - would be a show-stopper regarding independent threads splitting of work is not possible. And here I see a thread-specific GC as a solution. It seems to me that this way was not thought about before, and people thought about changing the GC to be able to handle multiple threads. Instead I mean: each thread that is not the global thread, get's it's own thread-specific GC. Maybe that can be implemented much easier. But I've not looked into the Ocaml internals to say: yes, this can be done comparingly easy, or to say: oh no, that's more complex than changing the GC and make it handle all the threads from the main thread. I just would assume that seperate threads would be easier to handle. But maybe there are other restrictions in the language or the compiler that block this attempt. > Performance can also be better > since you don't run into cache coherency issues. > > I am not sure why it is not used more; possibly because it is not supported on Windows. ;-) Ciao, Oliver