caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] walking a graph in parallel
@ 2011-03-29 22:56 Alexy Khrabrov
  2011-03-30  0:35 ` Gerd Stolpmann
  2011-03-30 19:36 ` Richard W.M. Jones
  0 siblings, 2 replies; 3+ messages in thread
From: Alexy Khrabrov @ 2011-03-29 22:56 UTC (permalink / raw)
  To: caml-list

I have a giant graph of Twitter data which takes several gigabytes in RAM, as a Hashtbl.  I need to walk it, collecting various statistics, and building equally huge data structures under each node.  Currently I do it all in a single OCaml program, which uses up to 60 GB of RAM and works fine.    However, out of the 8 powerful CPUs the box has, only 1 is used.

Having seen Joel's tasty bites of ZeroMQ and Thrift and Piqi, I'm thinking of exploring 0MQ as a parallel MPI/Erlang-like way to walk the graph.  I'd move the graph into a server, and walkers would be separate processes.   I only need inter-process communication, IPC, for the box.  I could do threads and inter-thread in 0MQ if OCaml would allow real parallel threads.  

How would you manage 7 identical worker processes and 1 server, so that in the end, the results of the workers are all reduced together?  What's the best way to set up the server?  Some ideas:

-- hold the graph in MongoDB, it allows for parallel queries
-- keep the graph in an OCaml process, it allows for custom queries; but will 0MQ try to fork and copy it when replying to several workers?  Copying is impossible, too big

Or, is it possible to use a huge chunk of shared memory, to place the read-only graph there and query it somehow separately from each worker, then use 0MQ for the reduce communication phase?

-- Alexy

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-03-30 19:37 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-29 22:56 [Caml-list] walking a graph in parallel Alexy Khrabrov
2011-03-30  0:35 ` Gerd Stolpmann
2011-03-30 19:36 ` Richard W.M. Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).