Thanks Alain for this detailed description. I did not get why you do not
use marshalling for computation functions: this should be safe given the
same code is run in the GUI process and in the calculation sub-processes,
right?

Compared to your setting, I'm affraid I cannot use the same trick for
running the sub-processes, as the ocsigen server is in charge here. Maybe
there are some hooks that can help?

Also I like this ability to send partial results, this would be a nice
feature in my case. I'll have to think how to achieve this ...

Cheers
  ph.


2013/3/28 Alain Frisch <alain@frisch.fr>

> On 03/28/2013 08:37 AM, Philippe Veber wrote:
>
>> Hi Martin,
>> nproc meets exactly my needs: a simple lwt-friendly interface to
>> dispatch function calls on a pool of processes that run on the same
>> machine. I have only one concern, that should probably be discussed on
>> the ocsigen list, that is I wonder if it is okay to fork the process
>> running the ocsigen server. I think I remember warnings on having parent
>> and children processes sharing connections/channels but it's really not
>> clear to me.
>>
>
> FWIW, LexiFi uses an architecture quite close to this for our application.
>  The main process manages the GUI and dispatches computations tasks to
> external processes.  Some points to be noted:
>
> - Since this is a Windows application, we cannot rely on fork.  Instead,
> we restart the application (Sys.argv.(0)), with specific command-line flag,
> captured by the library in charge of managing computations.  This is done
> by calling a special function in this library; the function does nothing in
> the main process and in the sub-processes, it starts the special mode and
> never returns.  This gives a chance to the main application to do some
> global initialization common to the main and sub processes (for instance,
> we dynlink external plugins in this initialization phase).
>
> - Computation functions are registered as global values.  Registration
> returns an opaque handle which can be used to call such a function.  We
> don't rely on marshaling closures.
>
> - The GUI process actually spawns a single sub-process (the Scheduler),
> which itself manages more worker sub-sub-processes (with a maximal number
> of workers).  Currently, we don't do very clever scheduling based on task
> priorities, but this could easily be added.
>
> - An external computation can spawn sub-computations (by applying a
> parallel "map" to a list) either synchronously (direct style) or
> asynchronously (by providing a continuation function, which will be applied
> to the list of results, maybe in a different process).  In both cases,
>  this is done by sending those tasks to the Scheduler.  The Scheduler
> dispatches computation tasks to available workers.  In the synchronous
> parallel map, the caller runs an inner event loop to communicate with the
> Scheduler (and it only accepts sub-tasks created by itself or one of its
> descendants).
>
> - Top-level external computations can be stopped by the main process (e.g.
> on user request).  Concretely, this kills all workers currently working on
> that task or one of its sub-tasks.
>
> - In addition to sending back the final results, computations can report
> progress to their caller and more intermediate results.  This is useful to
> show a progress bar/status and partial results in the GUI before the end of
> the entire computation.
>
> - Communication between processes is done by exchanging marshaled
> "variants" (a tagged representation of OCaml values, generated
> automatically using our runtime types).  Since we can attach special
> variantizers/devariantizers to specific types, this gives a chance to
> customize how some values have to be exchanged between processes (e.g.
> values relying on internal hash-consing are treated specially to recreate
> the maximal sharing in the sub-process).
>
> - Concretely, the communication between processes is done through queues
> of messages implemented with shared memory.  (This component was developed
> by Fabrice Le Fessant and OCamlPro.)   Large computation arguments or
> results (above a certain size) are stored on the file system, to avoid
> having to keep them in RAM for too long (if all workers are busy, the
> computation might wait for some time being started).
>
> - The API supports easily distributing computation tasks to several
> machines.  We have done some experiments with using our application's
> database to dispatch computations, but we don't use it in production.
>
>
>
>
>
> Alain
>