From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by sympa.inria.fr (Postfix) with ESMTPS id 3740C7EE7A for ; Thu, 28 Mar 2013 10:39:53 +0100 (CET) Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of philippe.veber@gmail.com) identity=pra; client-ip=209.85.223.174; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="philippe.veber@gmail.com"; x-sender="philippe.veber@gmail.com"; x-conformance=sidf_compatible Received-SPF: Pass (mail2-smtp-roc.national.inria.fr: domain of philippe.veber@gmail.com designates 209.85.223.174 as permitted sender) identity=mailfrom; client-ip=209.85.223.174; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="philippe.veber@gmail.com"; x-sender="philippe.veber@gmail.com"; x-conformance=sidf_compatible; x-record-type="v=spf1" Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of postmaster@mail-ie0-f174.google.com) identity=helo; client-ip=209.85.223.174; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="philippe.veber@gmail.com"; x-sender="postmaster@mail-ie0-f174.google.com"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AkoBAOEOVFHRVd+ulGdsb2JhbABDwh97CBYOAQEBAQcLCwkSKoIfAQEEAUABGx0BAwELBgULBwYuIgERAQUBDgENBhOIAQEDCQagY4wygnuEMQoZJw1ZiHwBBQyPDAeDQAOWZ48lFimELzs X-IPAS-Result: AkoBAOEOVFHRVd+ulGdsb2JhbABDwh97CBYOAQEBAQcLCwkSKoIfAQEEAUABGx0BAwELBgULBwYuIgERAQUBDgENBhOIAQEDCQagY4wygnuEMQoZJw1ZiHwBBQyPDAeDQAOWZ48lFimELzs X-IronPort-AV: E=Sophos;i="4.84,925,1355094000"; d="scan'208";a="10816501" Received: from mail-ie0-f174.google.com ([209.85.223.174]) by mail2-smtp-roc.national.inria.fr with ESMTP/TLS/RC4-SHA; 28 Mar 2013 10:39:52 +0100 Received: by mail-ie0-f174.google.com with SMTP id aq17so8421912iec.33 for ; Thu, 28 Mar 2013 02:39:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=YuXn1J3CQEZVXjxFNNiteYRJAr97FjV9SaRxw/+Te7Q=; b=Yg10NkCHvTXCOO9YeHpJrA63wULgjDS7ZCUGvJX+Jw3PtbqP55WtwN1J+xedk5zqSz AEIZfQC4EaxKQ+I28ToE9xAG8yy4I4dYLNu3jHWulXdE1AhezaWIUsFkugg51kY9zXEr cfrDQVy9pvYS7DW15BsyHOPChVPhmuWPFj43g9Pzf3xbJrJlA2XMddBCmycK1vE291Bj sa4YqAsyzBTixd3ZDpe8aN/Iq6YK8OHUnm1yUxeLfToNNqvDLveH5hW0wGcUe+lY28MF oDnEJqu/iYYEFR7CHbSSw9+DHHXhA+CFU6sEAnlqvf+cKYfdmrzt+Z6VGmOSSxadYTTv W56Q== X-Received: by 10.50.216.164 with SMTP id or4mr2561934igc.38.1364463590621; Thu, 28 Mar 2013 02:39:50 -0700 (PDT) MIME-Version: 1.0 Received: by 10.64.136.8 with HTTP; Thu, 28 Mar 2013 02:39:30 -0700 (PDT) In-Reply-To: <51540395.50202@frisch.fr> References: <51520CAE.6020009@ens-lyon.org> <51540395.50202@frisch.fr> From: Philippe Veber Date: Thu, 28 Mar 2013 10:39:30 +0100 Message-ID: To: Alain Frisch Cc: Martin Jambon , caml users Content-Type: multipart/alternative; boundary=14dae93406dfa0c79604d8f8efb3 X-Validation-by: philippe.veber@gmail.com Subject: Re: [Caml-list] Master-slave architecture behind an ocsigen server. --14dae93406dfa0c79604d8f8efb3 Content-Type: text/plain; charset=ISO-8859-1 Thanks Alain for this detailed description. I did not get why you do not use marshalling for computation functions: this should be safe given the same code is run in the GUI process and in the calculation sub-processes, right? Compared to your setting, I'm affraid I cannot use the same trick for running the sub-processes, as the ocsigen server is in charge here. Maybe there are some hooks that can help? Also I like this ability to send partial results, this would be a nice feature in my case. I'll have to think how to achieve this ... Cheers ph. 2013/3/28 Alain Frisch > On 03/28/2013 08:37 AM, Philippe Veber wrote: > >> Hi Martin, >> nproc meets exactly my needs: a simple lwt-friendly interface to >> dispatch function calls on a pool of processes that run on the same >> machine. I have only one concern, that should probably be discussed on >> the ocsigen list, that is I wonder if it is okay to fork the process >> running the ocsigen server. I think I remember warnings on having parent >> and children processes sharing connections/channels but it's really not >> clear to me. >> > > FWIW, LexiFi uses an architecture quite close to this for our application. > The main process manages the GUI and dispatches computations tasks to > external processes. Some points to be noted: > > - Since this is a Windows application, we cannot rely on fork. Instead, > we restart the application (Sys.argv.(0)), with specific command-line flag, > captured by the library in charge of managing computations. This is done > by calling a special function in this library; the function does nothing in > the main process and in the sub-processes, it starts the special mode and > never returns. This gives a chance to the main application to do some > global initialization common to the main and sub processes (for instance, > we dynlink external plugins in this initialization phase). > > - Computation functions are registered as global values. Registration > returns an opaque handle which can be used to call such a function. We > don't rely on marshaling closures. > > - The GUI process actually spawns a single sub-process (the Scheduler), > which itself manages more worker sub-sub-processes (with a maximal number > of workers). Currently, we don't do very clever scheduling based on task > priorities, but this could easily be added. > > - An external computation can spawn sub-computations (by applying a > parallel "map" to a list) either synchronously (direct style) or > asynchronously (by providing a continuation function, which will be applied > to the list of results, maybe in a different process). In both cases, > this is done by sending those tasks to the Scheduler. The Scheduler > dispatches computation tasks to available workers. In the synchronous > parallel map, the caller runs an inner event loop to communicate with the > Scheduler (and it only accepts sub-tasks created by itself or one of its > descendants). > > - Top-level external computations can be stopped by the main process (e.g. > on user request). Concretely, this kills all workers currently working on > that task or one of its sub-tasks. > > - In addition to sending back the final results, computations can report > progress to their caller and more intermediate results. This is useful to > show a progress bar/status and partial results in the GUI before the end of > the entire computation. > > - Communication between processes is done by exchanging marshaled > "variants" (a tagged representation of OCaml values, generated > automatically using our runtime types). Since we can attach special > variantizers/devariantizers to specific types, this gives a chance to > customize how some values have to be exchanged between processes (e.g. > values relying on internal hash-consing are treated specially to recreate > the maximal sharing in the sub-process). > > - Concretely, the communication between processes is done through queues > of messages implemented with shared memory. (This component was developed > by Fabrice Le Fessant and OCamlPro.) Large computation arguments or > results (above a certain size) are stored on the file system, to avoid > having to keep them in RAM for too long (if all workers are busy, the > computation might wait for some time being started). > > - The API supports easily distributing computation tasks to several > machines. We have done some experiments with using our application's > database to dispatch computations, but we don't use it in production. > > > > > > Alain > --14dae93406dfa0c79604d8f8efb3 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Thanks Alain for this detailed description. I did not get = why you do not use marshalling for computation functions: this should be sa= fe given the same code is run in the GUI process and in the calculation sub= -processes, right?

Compared to your setting, I'm affraid I cannot use the same trick f= or running the sub-processes, as the ocsigen server is in charge here. Mayb= e there are some hooks that can help?

Also I like this ability to se= nd partial results, this would be a nice feature in my case. I'll have = to think how to achieve this ...

Cheers
=A0 ph.



2013/3/28 Alain Frisch <alain@frisch.fr>
On 03/28/2013 08:37 AM, Philippe Veber wrote:
Hi Martin,
nproc meets exactly my needs: a simple lwt-friendly interface to
dispatch function calls on a pool of processes that run on the same
machine. I have only one concern, that should probably be discussed on
the ocsigen list, that is I wonder if it is okay to fork the process
running the ocsigen server. I think I remember warnings on having parent
and children processes sharing connections/channels but it's really not=
clear to me.

FWIW, LexiFi uses an architecture quite close to this for our application. = =A0The main process manages the GUI and dispatches computations tasks to ex= ternal processes. =A0Some points to be noted:

- Since this is a Windows application, we cannot rely on fork. =A0Instead, = we restart the application (Sys.argv.(0)), with specific command-line flag,= captured by the library in charge of managing computations. =A0This is don= e by calling a special function in this library; the function does nothing = in the main process and in the sub-processes, it starts the special mode an= d never returns. =A0This gives a chance to the main application to do some = global initialization common to the main and sub processes (for instance, w= e dynlink external plugins in this initialization phase).

- Computation functions are registered as global values. =A0Registration re= turns an opaque handle which can be used to call such a function. =A0We don= 't rely on marshaling closures.

- The GUI process actually spawns a single sub-process (the Scheduler), whi= ch itself manages more worker sub-sub-processes (with a maximal number of w= orkers). =A0Currently, we don't do very clever scheduling based on task= priorities, but this could easily be added.

- An external computation can spawn sub-computations (by applying a paralle= l "map" to a list) either synchronously (direct style) or asynchr= onously (by providing a continuation function, which will be applied to the= list of results, maybe in a different process). =A0In both cases, =A0this = is done by sending those tasks to the Scheduler. =A0The Scheduler dispatche= s computation tasks to available workers. =A0In the synchronous parallel ma= p, the caller runs an inner event loop to communicate with the Scheduler (a= nd it only accepts sub-tasks created by itself or one of its descendants).<= br>
- Top-level external computations can be stopped by the main process (e.g. = on user request). =A0Concretely, this kills all workers currently working o= n that task or one of its sub-tasks.

- In addition to sending back the final results, computations can report pr= ogress to their caller and more intermediate results. =A0This is useful to = show a progress bar/status and partial results in the GUI before the end of= the entire computation.

- Communication between processes is done by exchanging marshaled "var= iants" (a tagged representation of OCaml values, generated automatical= ly using our runtime types). =A0Since we can attach special variantizers/de= variantizers to specific types, this gives a chance to customize how some v= alues have to be exchanged between processes (e.g. values relying on intern= al hash-consing are treated specially to recreate the maximal sharing in th= e sub-process).

- Concretely, the communication between processes is done through queues of= messages implemented with shared memory. =A0(This component was developed = by Fabrice Le Fessant and OCamlPro.) =A0 Large computation arguments or res= ults (above a certain size) are stored on the file system, to avoid having = to keep them in RAM for too long (if all workers are busy, the computation = might wait for some time being started).

- The API supports easily distributing computation tasks to several machine= s. =A0We have done some experiments with using our application's databa= se to dispatch computations, but we don't use it in production.





Alain

--14dae93406dfa0c79604d8f8efb3--