From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id p2TMucOE022369 for ; Wed, 30 Mar 2011 00:56:38 +0200 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AsgBAN9ikk3RVdg2kGdsb2JhbAClSQgUAQEBAQkJDQcUBCGhUIpVgiGFOy+IXAEBAwWFZQSNAoNUc4REOg X-IronPort-AV: E=Sophos;i="4.63,265,1299452400"; d="scan'208";a="103995472" Received: from mail-qw0-f54.google.com ([209.85.216.54]) by mail1-smtp-roc.national.inria.fr with ESMTP/TLS/RC4-SHA; 30 Mar 2011 00:56:32 +0200 Received: by qwc9 with SMTP id 9so777566qwc.27 for ; Tue, 29 Mar 2011 15:56:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:from:content-type:content-transfer-encoding :subject:date:message-id:to:mime-version:x-mailer; bh=RCz/124BsXfm4AWYlCNmigXZ0B24o1Vj7qYYrDI/7oU=; b=DmReNcVFBPS2xp25rjMxP/zyErP+c21aL3VPgE/IWOTvW/2SyVM07S+wkKT3qGJpaM vI4KG66tV+PuFpWZS1dQkdDp9tzlNA/kPwpBivQeVFBGgWt+g1eZRtP9Rq770ZxuIPTU C0YI/lZOz9PZiMD92yjShkNEhj/s9aUfumO54= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:content-type:content-transfer-encoding:subject:date:message-id :to:mime-version:x-mailer; b=Abxp+P7NbDqHV0s0RQyFCXDBFs1OrcT6E3P0AOvmKnxNG2nm60jhvkDNCvmpZuornH i1sqx47NvE3t9vZb5FgSCkXTRPKQYGZdru1ldXMNSc6FBDXyK+RQcp9m7N5PiXXJ/E66 g/npfU33CSyz1vTbQPZ8c4uksnD9LIN6mJ5/k= Received: by 10.224.201.137 with SMTP id fa9mr448193qab.145.1301439391646; Tue, 29 Mar 2011 15:56:31 -0700 (PDT) Received: from [10.81.11.224] (pat160.dartmouth-secure.border2-cfw.Dartmouth.EDU [129.170.241.160]) by mx.google.com with ESMTPS id c27sm3764181qck.22.2011.03.29.15.56.29 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 29 Mar 2011 15:56:30 -0700 (PDT) From: Alexy Khrabrov Content-Type: text/plain; charset=us-ascii Date: Tue, 29 Mar 2011 18:56:28 -0400 Message-Id: <3CE6E368-B103-472C-B622-672616E7CAB8@gmail.com> To: caml-list Mime-Version: 1.0 (Apple Message framework v1082) X-Mailer: Apple Mail (2.1082) Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by walapai.inria.fr id p2TMucOE022369 Subject: [Caml-list] walking a graph in parallel I have a giant graph of Twitter data which takes several gigabytes in RAM, as a Hashtbl. I need to walk it, collecting various statistics, and building equally huge data structures under each node. Currently I do it all in a single OCaml program, which uses up to 60 GB of RAM and works fine. However, out of the 8 powerful CPUs the box has, only 1 is used. Having seen Joel's tasty bites of ZeroMQ and Thrift and Piqi, I'm thinking of exploring 0MQ as a parallel MPI/Erlang-like way to walk the graph. I'd move the graph into a server, and walkers would be separate processes. I only need inter-process communication, IPC, for the box. I could do threads and inter-thread in 0MQ if OCaml would allow real parallel threads. How would you manage 7 identical worker processes and 1 server, so that in the end, the results of the workers are all reduced together? What's the best way to set up the server? Some ideas: -- hold the graph in MongoDB, it allows for parallel queries -- keep the graph in an OCaml process, it allows for custom queries; but will 0MQ try to fork and copy it when replying to several workers? Copying is impossible, too big Or, is it possible to use a huge chunk of shared memory, to place the read-only graph there and query it somehow separately from each worker, then use 0MQ for the reduce communication phase? -- Alexy