From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by sympa.inria.fr (Postfix) with ESMTPS id 66D2F7EE7A for ; Thu, 28 Mar 2013 12:02:50 +0100 (CET) Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of anil@recoil.org) identity=pra; client-ip=89.16.177.154; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="anil@recoil.org"; x-sender="anil@recoil.org"; x-conformance=sidf_compatible Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of anil@recoil.org) identity=mailfrom; client-ip=89.16.177.154; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="anil@recoil.org"; x-sender="anil@recoil.org"; x-conformance=sidf_compatible Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of postmaster@dark.recoil.org) identity=helo; client-ip=89.16.177.154; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="anil@recoil.org"; x-sender="postmaster@dark.recoil.org"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Al4DAJYiVFFZELGaZmdsb2JhbABDDoMsvmWBGQ4NCwkIFSiCHwEBAwEBeQULCxIGLkkBDQYBEogOCgi+aI5lMweCX2EDlmeBH5I0Pzw X-IPAS-Result: Al4DAJYiVFFZELGaZmdsb2JhbABDDoMsvmWBGQ4NCwkIFSiCHwEBAwEBeQULCxIGLkkBDQYBEogOCgi+aI5lMweCX2EDlmeBH5I0Pzw X-IronPort-AV: E=Sophos;i="4.84,925,1355094000"; d="scan'208";a="10834655" Received: from recoil.dh.bytemark.co.uk (HELO dark.recoil.org) ([89.16.177.154]) by mail2-smtp-roc.national.inria.fr with SMTP; 28 Mar 2013 12:02:49 +0100 Received: (qmail 25893 invoked by uid 634); 28 Mar 2013 11:02:49 -0000 X-Spam-Level: * X-Spam-Check-By: dark.recoil.org Received: from no-dns-yet.demon.co.uk (HELO [192.168.14.198]) (62.49.66.12) (smtp-auth username remote@recoil.org, mechanism cram-md5) by dark.recoil.org (qpsmtpd/0.84) with ESMTPA; Thu, 28 Mar 2013 11:02:48 +0000 Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\)) From: Anil Madhavapeddy In-Reply-To: <51540395.50202@frisch.fr> Date: Thu, 28 Mar 2013 11:02:46 +0000 Cc: Philippe Veber , Martin Jambon , caml users Content-Transfer-Encoding: quoted-printable Message-Id: References: <51520CAE.6020009@ens-lyon.org> <51540395.50202@frisch.fr> To: Alain Frisch , "cl-mirage@lists.cam.ac.uk List" X-Mailer: Apple Mail (2.1503) X-Virus-Checked: Checked by ClamAV on dark.recoil.org Subject: Re: [Caml-list] Master-slave architecture behind an ocsigen server. On 28 Mar 2013, at 08:47, Alain Frisch wrote: > On 03/28/2013 08:37 AM, Philippe Veber wrote: >> Hi Martin, >> nproc meets exactly my needs: a simple lwt-friendly interface to >> dispatch function calls on a pool of processes that run on the same >> machine. I have only one concern, that should probably be discussed on >> the ocsigen list, that is I wonder if it is okay to fork the process >> running the ocsigen server. I think I remember warnings on having parent >> and children processes sharing connections/channels but it's really not >> clear to me. >=20 > FWIW, LexiFi uses an architecture quite close to this for our application= . The main process manages the GUI and dispatches computations tasks to ex= ternal processes. Some points to be noted: >=20 > - Since this is a Windows application, we cannot rely on fork. Instead, = we restart the application (Sys.argv.(0)), with specific command-line flag,= captured by the library in charge of managing computations. This is done = by calling a special function in this library; the function does nothing in= the main process and in the sub-processes, it starts the special mode and = never returns. This gives a chance to the main application to do some glob= al initialization common to the main and sub processes (for instance, we dy= nlink external plugins in this initialization phase). >=20 > - Computation functions are registered as global values. Registration re= turns an opaque handle which can be used to call such a function. We don't= rely on marshaling closures. >=20 > - The GUI process actually spawns a single sub-process (the Scheduler), w= hich itself manages more worker sub-sub-processes (with a maximal number of= workers). Currently, we don't do very clever scheduling based on task pri= orities, but this could easily be added. >=20 > - An external computation can spawn sub-computations (by applying a paral= lel "map" to a list) either synchronously (direct style) or asynchronously = (by providing a continuation function, which will be applied to the list of= results, maybe in a different process). In both cases, this is done by s= ending those tasks to the Scheduler. The Scheduler dispatches computation = tasks to available workers. In the synchronous parallel map, the caller ru= ns an inner event loop to communicate with the Scheduler (and it only accep= ts sub-tasks created by itself or one of its descendants). >=20 > - Top-level external computations can be stopped by the main process (e.g= . on user request). Concretely, this kills all workers currently working o= n that task or one of its sub-tasks. >=20 > - In addition to sending back the final results, computations can report = progress to their caller and more intermediate results. This is useful to = show a progress bar/status and partial results in the GUI before the end of= the entire computation. >=20 > - Communication between processes is done by exchanging marshaled "varian= ts" (a tagged representation of OCaml values, generated automatically using= our runtime types). Since we can attach special variantizers/devariantize= rs to specific types, this gives a chance to customize how some values have= to be exchanged between processes (e.g. values relying on internal hash-co= nsing are treated specially to recreate the maximal sharing in the sub-proc= ess). >=20 > - Concretely, the communication between processes is done through queues = of messages implemented with shared memory. (This component was developed = by Fabrice Le Fessant and OCamlPro.) Large computation arguments or resul= ts (above a certain size) are stored on the file system, to avoid having to= keep them in RAM for too long (if all workers are busy, the computation mi= ght wait for some time being started). Are all of the messages through these queues persistent, or just the larger= ones that are too big to fit in the shared memory segment, and are they al= ways point-to-point streams? We've got a similar need in Xen/Mirage for shared memory communication and = queues, and have been breaking them out into standalone libs such as: https://github.com/djs55/shared-memory-ring ...which is ABI-compatible with the existing Xen shared memory interfaces, = and also an OCaml version of the transport-agnostic API sketched out in: http://anil.recoil.org/papers/2012-resolve-fable.pdf The missing link currently is the persistent queuing service, but we're inv= estigating the options here (ocamlmq looks rather nice). -anil