Thanks Alain for this detailed description. I did not get why you do not use marshalling for computation functions: this should be safe given the same code is run in the GUI process and in the calculation sub-processes, right? Compared to your setting, I'm affraid I cannot use the same trick for running the sub-processes, as the ocsigen server is in charge here. Maybe there are some hooks that can help? Also I like this ability to send partial results, this would be a nice feature in my case. I'll have to think how to achieve this ... Cheers ph. 2013/3/28 Alain Frisch > On 03/28/2013 08:37 AM, Philippe Veber wrote: > >> Hi Martin, >> nproc meets exactly my needs: a simple lwt-friendly interface to >> dispatch function calls on a pool of processes that run on the same >> machine. I have only one concern, that should probably be discussed on >> the ocsigen list, that is I wonder if it is okay to fork the process >> running the ocsigen server. I think I remember warnings on having parent >> and children processes sharing connections/channels but it's really not >> clear to me. >> > > FWIW, LexiFi uses an architecture quite close to this for our application. > The main process manages the GUI and dispatches computations tasks to > external processes. Some points to be noted: > > - Since this is a Windows application, we cannot rely on fork. Instead, > we restart the application (Sys.argv.(0)), with specific command-line flag, > captured by the library in charge of managing computations. This is done > by calling a special function in this library; the function does nothing in > the main process and in the sub-processes, it starts the special mode and > never returns. This gives a chance to the main application to do some > global initialization common to the main and sub processes (for instance, > we dynlink external plugins in this initialization phase). > > - Computation functions are registered as global values. Registration > returns an opaque handle which can be used to call such a function. We > don't rely on marshaling closures. > > - The GUI process actually spawns a single sub-process (the Scheduler), > which itself manages more worker sub-sub-processes (with a maximal number > of workers). Currently, we don't do very clever scheduling based on task > priorities, but this could easily be added. > > - An external computation can spawn sub-computations (by applying a > parallel "map" to a list) either synchronously (direct style) or > asynchronously (by providing a continuation function, which will be applied > to the list of results, maybe in a different process). In both cases, > this is done by sending those tasks to the Scheduler. The Scheduler > dispatches computation tasks to available workers. In the synchronous > parallel map, the caller runs an inner event loop to communicate with the > Scheduler (and it only accepts sub-tasks created by itself or one of its > descendants). > > - Top-level external computations can be stopped by the main process (e.g. > on user request). Concretely, this kills all workers currently working on > that task or one of its sub-tasks. > > - In addition to sending back the final results, computations can report > progress to their caller and more intermediate results. This is useful to > show a progress bar/status and partial results in the GUI before the end of > the entire computation. > > - Communication between processes is done by exchanging marshaled > "variants" (a tagged representation of OCaml values, generated > automatically using our runtime types). Since we can attach special > variantizers/devariantizers to specific types, this gives a chance to > customize how some values have to be exchanged between processes (e.g. > values relying on internal hash-consing are treated specially to recreate > the maximal sharing in the sub-process). > > - Concretely, the communication between processes is done through queues > of messages implemented with shared memory. (This component was developed > by Fabrice Le Fessant and OCamlPro.) Large computation arguments or > results (above a certain size) are stored on the file system, to avoid > having to keep them in RAM for too long (if all workers are busy, the > computation might wait for some time being started). > > - The API supports easily distributing computation tasks to several > machines. We have done some experiments with using our application's > database to dispatch computations, but we don't use it in production. > > > > > > Alain >