caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] walking a graph in parallel
@ 2011-03-29 22:56 Alexy Khrabrov
  2011-03-30  0:35 ` Gerd Stolpmann
  2011-03-30 19:36 ` Richard W.M. Jones
  0 siblings, 2 replies; 3+ messages in thread
From: Alexy Khrabrov @ 2011-03-29 22:56 UTC (permalink / raw)
  To: caml-list

I have a giant graph of Twitter data which takes several gigabytes in RAM, as a Hashtbl.  I need to walk it, collecting various statistics, and building equally huge data structures under each node.  Currently I do it all in a single OCaml program, which uses up to 60 GB of RAM and works fine.    However, out of the 8 powerful CPUs the box has, only 1 is used.

Having seen Joel's tasty bites of ZeroMQ and Thrift and Piqi, I'm thinking of exploring 0MQ as a parallel MPI/Erlang-like way to walk the graph.  I'd move the graph into a server, and walkers would be separate processes.   I only need inter-process communication, IPC, for the box.  I could do threads and inter-thread in 0MQ if OCaml would allow real parallel threads.  

How would you manage 7 identical worker processes and 1 server, so that in the end, the results of the workers are all reduced together?  What's the best way to set up the server?  Some ideas:

-- hold the graph in MongoDB, it allows for parallel queries
-- keep the graph in an OCaml process, it allows for custom queries; but will 0MQ try to fork and copy it when replying to several workers?  Copying is impossible, too big

Or, is it possible to use a huge chunk of shared memory, to place the read-only graph there and query it somehow separately from each worker, then use 0MQ for the reduce communication phase?

-- Alexy

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Caml-list] walking a graph in parallel
  2011-03-29 22:56 [Caml-list] walking a graph in parallel Alexy Khrabrov
@ 2011-03-30  0:35 ` Gerd Stolpmann
  2011-03-30 19:36 ` Richard W.M. Jones
  1 sibling, 0 replies; 3+ messages in thread
From: Gerd Stolpmann @ 2011-03-30  0:35 UTC (permalink / raw)
  To: Alexy Khrabrov; +Cc: caml-list

Am Dienstag, den 29.03.2011, 18:56 -0400 schrieb Alexy Khrabrov:
> I have a giant graph of Twitter data which takes several gigabytes in RAM, as a Hashtbl.  I need to walk it, collecting various statistics, and building equally huge data structures under each node.  Currently I do it all in a single OCaml program, which uses up to 60 GB of RAM and works fine.    However, out of the 8 powerful CPUs the box has, only 1 is used.
> 
> Having seen Joel's tasty bites of ZeroMQ and Thrift and Piqi, I'm thinking of exploring 0MQ as a parallel MPI/Erlang-like way to walk the graph.  I'd move the graph into a server, and walkers would be separate processes.   I only need inter-process communication, IPC, for the box.  I could do threads and inter-thread in 0MQ if OCaml would allow real parallel threads.  
> 
> How would you manage 7 identical worker processes and 1 server, so that in the end, the results of the workers are all reduced together?  What's the best way to set up the server?  Some ideas:

Ocamlnet contains a manager for worker processes called Netplex. Here is
an example how to parallelize a matrix multiplication:

http://blog.camlcity.org/blog/parallelmm.html

Communication between processes is here done via SunRPC (well, I'm not
for these hyped new protocols like Thrift).

> -- hold the graph in MongoDB, it allows for parallel queries
> -- keep the graph in an OCaml process, it allows for custom queries; but will 0MQ try to fork and copy it when replying to several workers?  Copying is impossible, too big

If the graph is highly interconnected, you can practically only store it
in RAM (that's what all social network sites do). For read-only access,
I suggest you simply allocate a big block of shared memory, and move the
graph to it. Ocamlnet contains also helpers for this:

http://projects.camlcity.org/projects/dl/ocamlnet-3.2.1/doc/html-main/Netsys_mem.html

Look for init_value. Shared memory can be allocated using shm_open, in

http://projects.camlcity.org/projects/dl/ocamlnet-3.2.1/doc/html-main/Netsys_posix.html

I'm also working on extending Ocamlnet with more shared memory
functionality, but this is only partially available yet (Netmulticore
library).

> Or, is it possible to use a huge chunk of shared memory, to place the read-only graph there and query it somehow separately from each worker, then use 0MQ for the reduce communication phase?

If you need help with this, I can also offer commercial support for
Ocamlnet. (Provided your company is not a search engine, or a social
network, for contractual reasons.)

Gerd

> 
> -- Alexy
> 



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Caml-list] walking a graph in parallel
  2011-03-29 22:56 [Caml-list] walking a graph in parallel Alexy Khrabrov
  2011-03-30  0:35 ` Gerd Stolpmann
@ 2011-03-30 19:36 ` Richard W.M. Jones
  1 sibling, 0 replies; 3+ messages in thread
From: Richard W.M. Jones @ 2011-03-30 19:36 UTC (permalink / raw)
  To: Alexy Khrabrov; +Cc: caml-list

On Tue, Mar 29, 2011 at 06:56:28PM -0400, Alexy Khrabrov wrote:
> Or, is it possible to use a huge chunk of shared memory, to place
> the read-only graph there and query it somehow separately from each
> worker, then use 0MQ for the reduce communication phase?

As it's read-only, sounds like an ideal application for ancient:

http://git.annexia.org/?p=ocaml-ancient.git;a=summary

Rich.

-- 
Richard Jones
Red Hat

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-03-30 19:37 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-29 22:56 [Caml-list] walking a graph in parallel Alexy Khrabrov
2011-03-30  0:35 ` Gerd Stolpmann
2011-03-30 19:36 ` Richard W.M. Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).