caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Yoann Padioleau <pad@fb.com>
To: Caml List <caml-list@yquem.inria.fr>
Subject: [Caml-list] strategies to deal with huge in-memory "object" graphs?
Date: Fri, 8 Aug 2014 22:20:59 +0000	[thread overview]
Message-ID: <7F92DE4E-875E-4020-AF4F-5BC19080225A@fb.com> (raw)

Hi list,

I have an application that is gradually creating a graph (using ocamlgraph) and 
the amount of memory it is using is around 3 or 4 Gb (my machine has 
74Gb of RAM). There are lots of nodes and edges in this graph. The problem is that building
this graph takes a huge amount of time. As the build progresses, it gets slower
and slower. My guess is that the “object” graph is getting really huge and
so the Gc needs to explore each time even more. I’ve tried things like

  (* see www.elehack.net/michael/blog/2010/06/ocaml-memory-tuning *)
  Gc.set { (Gc.get()) with Gc.minor_heap_size = 4_000_000 };
  (* goes from 5300s to 3000s for building db for www *)
  Gc.set { (Gc.get()) with Gc.major_heap_increment = 8_000_000 };
  Gc.set { (Gc.get()) with Gc.space_overhead = 300 };


but it does not really help. It is still really slow.

In the past I sometimes use the Marshall module to reduce the number of “objects”,
but it forces me to rewrite quite a lot the code. 

Is there a way to partition the heap so that for instance in my case all the graph
related things are put in a different area that the Gc does not have to explore each time.
I’d like a minor heap, major heap, and then  a do_not_gc_this_heap_it_is_only_growing_there_is_no_garbage_here_to_collect.



             reply	other threads:[~2014-08-08 22:21 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-08 22:20 Yoann Padioleau [this message]
2014-08-09 19:57 ` Gabriel Kerneis
2014-08-10  6:51 ` Gerd Stolpmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7F92DE4E-875E-4020-AF4F-5BC19080225A@fb.com \
    --to=pad@fb.com \
    --cc=caml-list@yquem.inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).