caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] strategies to deal with huge in-memory "object" graphs?
@ 2014-08-08 22:20 Yoann Padioleau
  2014-08-09 19:57 ` Gabriel Kerneis
  2014-08-10  6:51 ` Gerd Stolpmann
  0 siblings, 2 replies; 3+ messages in thread
From: Yoann Padioleau @ 2014-08-08 22:20 UTC (permalink / raw)
  To: Caml List

Hi list,

I have an application that is gradually creating a graph (using ocamlgraph) and 
the amount of memory it is using is around 3 or 4 Gb (my machine has 
74Gb of RAM). There are lots of nodes and edges in this graph. The problem is that building
this graph takes a huge amount of time. As the build progresses, it gets slower
and slower. My guess is that the “object” graph is getting really huge and
so the Gc needs to explore each time even more. I’ve tried things like

  (* see www.elehack.net/michael/blog/2010/06/ocaml-memory-tuning *)
  Gc.set { (Gc.get()) with Gc.minor_heap_size = 4_000_000 };
  (* goes from 5300s to 3000s for building db for www *)
  Gc.set { (Gc.get()) with Gc.major_heap_increment = 8_000_000 };
  Gc.set { (Gc.get()) with Gc.space_overhead = 300 };


but it does not really help. It is still really slow.

In the past I sometimes use the Marshall module to reduce the number of “objects”,
but it forces me to rewrite quite a lot the code. 

Is there a way to partition the heap so that for instance in my case all the graph
related things are put in a different area that the Gc does not have to explore each time.
I’d like a minor heap, major heap, and then  a do_not_gc_this_heap_it_is_only_growing_there_is_no_garbage_here_to_collect.



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Caml-list] strategies to deal with huge in-memory "object" graphs?
  2014-08-08 22:20 [Caml-list] strategies to deal with huge in-memory "object" graphs? Yoann Padioleau
@ 2014-08-09 19:57 ` Gabriel Kerneis
  2014-08-10  6:51 ` Gerd Stolpmann
  1 sibling, 0 replies; 3+ messages in thread
From: Gabriel Kerneis @ 2014-08-09 19:57 UTC (permalink / raw)
  To: Yoann Padioleau; +Cc: Caml List

Le 2014-08-09 00:20, Yoann Padioleau a écrit :
> Is there a way to partition the heap so that for instance in my case
> all the graph related things are put in a different area that the Gc 
> does not have
> to explore each time.

Isn't it what Ancient is (was) for? Designed in times where the keyword 
was swap rather than inconceivably large amount of RAM, but the same 
idea should work for both situations I guess.

http://caml.inria.fr/cgi-bin/hump.en.cgi?contrib=538

Best,
-- 
Gabriel

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Caml-list] strategies to deal with huge in-memory "object" graphs?
  2014-08-08 22:20 [Caml-list] strategies to deal with huge in-memory "object" graphs? Yoann Padioleau
  2014-08-09 19:57 ` Gabriel Kerneis
@ 2014-08-10  6:51 ` Gerd Stolpmann
  1 sibling, 0 replies; 3+ messages in thread
From: Gerd Stolpmann @ 2014-08-10  6:51 UTC (permalink / raw)
  To: Yoann Padioleau; +Cc: Caml List

[-- Attachment #1: Type: text/plain, Size: 2272 bytes --]

Am Freitag, den 08.08.2014, 22:20 +0000 schrieb Yoann Padioleau:
> Hi list,
> 
> I have an application that is gradually creating a graph (using ocamlgraph) and 
> the amount of memory it is using is around 3 or 4 Gb (my machine has 
> 74Gb of RAM). There are lots of nodes and edges in this graph. The problem is that building
> this graph takes a huge amount of time. As the build progresses, it gets slower
> and slower. My guess is that the “object” graph is getting really huge and
> so the Gc needs to explore each time even more. I’ve tried things like
> 
>   (* see www.elehack.net/michael/blog/2010/06/ocaml-memory-tuning *)
>   Gc.set { (Gc.get()) with Gc.minor_heap_size = 4_000_000 };
>   (* goes from 5300s to 3000s for building db for www *)
>   Gc.set { (Gc.get()) with Gc.major_heap_increment = 8_000_000 };
>   Gc.set { (Gc.get()) with Gc.space_overhead = 300 };
> 
> 
> but it does not really help. It is still really slow.
> 
> In the past I sometimes use the Marshall module to reduce the number of “objects”,
> but it forces me to rewrite quite a lot the code. 
> 
> Is there a way to partition the heap so that for instance in my case all the graph
> related things are put in a different area that the Gc does not have to explore each time.
> I’d like a minor heap, major heap, and then  a do_not_gc_this_heap_it_is_only_growing_there_is_no_garbage_here_to_collect.

The latter can be accomplished by setting space_overhead to a large
value (maybe 1E6). Also set max_overhead to 1E6 to avoid compactions.

Once you have built the graph, you can move the whole beast (provided it
is read-only) to a non-GC-managed area with either Ancient (simpler to
use, fewer features), or Ocamlnet's Netmulticore. But you really can
only move the whole graph, with everything that is reachable from it.
Any mutation will kill the program.

Gerd

> 
> 

-- 
------------------------------------------------------------
Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
My OCaml site:          http://www.camlcity.org
Contact details:        http://www.camlcity.org/contact.html
Company homepage:       http://www.gerd-stolpmann.de
------------------------------------------------------------


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-08-10  6:51 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-08 22:20 [Caml-list] strategies to deal with huge in-memory "object" graphs? Yoann Padioleau
2014-08-09 19:57 ` Gabriel Kerneis
2014-08-10  6:51 ` Gerd Stolpmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).