caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Gerd Stolpmann <info@gerd-stolpmann.de>
To: Alexy Khrabrov <deliverable@gmail.com>
Cc: caml-list <caml-list@inria.fr>
Subject: Re: [Caml-list] walking a graph in parallel
Date: Wed, 30 Mar 2011 02:35:14 +0200	[thread overview]
Message-ID: <1301445314.1754.139.camel@gps-desktop> (raw)
In-Reply-To: <3CE6E368-B103-472C-B622-672616E7CAB8@gmail.com>

Am Dienstag, den 29.03.2011, 18:56 -0400 schrieb Alexy Khrabrov:
> I have a giant graph of Twitter data which takes several gigabytes in RAM, as a Hashtbl.  I need to walk it, collecting various statistics, and building equally huge data structures under each node.  Currently I do it all in a single OCaml program, which uses up to 60 GB of RAM and works fine.    However, out of the 8 powerful CPUs the box has, only 1 is used.
> 
> Having seen Joel's tasty bites of ZeroMQ and Thrift and Piqi, I'm thinking of exploring 0MQ as a parallel MPI/Erlang-like way to walk the graph.  I'd move the graph into a server, and walkers would be separate processes.   I only need inter-process communication, IPC, for the box.  I could do threads and inter-thread in 0MQ if OCaml would allow real parallel threads.  
> 
> How would you manage 7 identical worker processes and 1 server, so that in the end, the results of the workers are all reduced together?  What's the best way to set up the server?  Some ideas:

Ocamlnet contains a manager for worker processes called Netplex. Here is
an example how to parallelize a matrix multiplication:

http://blog.camlcity.org/blog/parallelmm.html

Communication between processes is here done via SunRPC (well, I'm not
for these hyped new protocols like Thrift).

> -- hold the graph in MongoDB, it allows for parallel queries
> -- keep the graph in an OCaml process, it allows for custom queries; but will 0MQ try to fork and copy it when replying to several workers?  Copying is impossible, too big

If the graph is highly interconnected, you can practically only store it
in RAM (that's what all social network sites do). For read-only access,
I suggest you simply allocate a big block of shared memory, and move the
graph to it. Ocamlnet contains also helpers for this:

http://projects.camlcity.org/projects/dl/ocamlnet-3.2.1/doc/html-main/Netsys_mem.html

Look for init_value. Shared memory can be allocated using shm_open, in

http://projects.camlcity.org/projects/dl/ocamlnet-3.2.1/doc/html-main/Netsys_posix.html

I'm also working on extending Ocamlnet with more shared memory
functionality, but this is only partially available yet (Netmulticore
library).

> Or, is it possible to use a huge chunk of shared memory, to place the read-only graph there and query it somehow separately from each worker, then use 0MQ for the reduce communication phase?

If you need help with this, I can also offer commercial support for
Ocamlnet. (Provided your company is not a search engine, or a social
network, for contractual reasons.)

Gerd

> 
> -- Alexy
> 



  reply	other threads:[~2011-03-30  0:34 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-29 22:56 Alexy Khrabrov
2011-03-30  0:35 ` Gerd Stolpmann [this message]
2011-03-30 19:36 ` Richard W.M. Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1301445314.1754.139.camel@gps-desktop \
    --to=info@gerd-stolpmann.de \
    --cc=caml-list@inria.fr \
    --cc=deliverable@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).