caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] A question about GC.
@ 2011-06-13  0:35 Yoonseok Ko
  2011-06-13  9:40 ` Guillaume Yziquel
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Yoonseok Ko @ 2011-06-13  0:35 UTC (permalink / raw)
  To: caml-list

Hello everyone.
I'm a graduate student majoring program analysis.

I'm using Muddy which is BDD library interfacing Buddy.
The problem is that when I construct BDD, its memory blows up in some cases
because GC won't work.
(It was not only for Muddy problem. We already tried to use our own 
buddy interface.)

If I call Gc.compact () explicitly every cycle of constructing BDD, then 
memory consumption is reasonable.
Gc.major () also works well, but Gc.minor () doesn't work.
I watched log messages of GC and figured out that they always try to 
grow heap and very very rarely start new major GC cycle.

In a small example, if I construct BDD only in non-tail-recursive form 
function, memory blows up.
In a real code, tail-recursive form doesn't work. Just memory blows up.
So far, only the solution is just call Gc.major () explicitly.

I'm using GC with default setting.
There was no memory leakage on buddy side.
I check the memory consumption both outside of the process and inside of GC.


I have two questions.

1. Is there any solution? Explicit garbage collection is too slow and 
hard to collect garbage on time.
     I want to know what happened in GC, and why GC won't work.

2. Sometimes GC log has this message: "Growing gray_vals to 32768k bytes."
     What does that means?


Best Regards,

Yoonseok Ko


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] A question about GC.
  2011-06-13  0:35 [Caml-list] A question about GC Yoonseok Ko
@ 2011-06-13  9:40 ` Guillaume Yziquel
  2011-06-13 10:37 ` Philippe Wang
  2011-06-13 12:21 ` Gerd Stolpmann
  2 siblings, 0 replies; 8+ messages in thread
From: Guillaume Yziquel @ 2011-06-13  9:40 UTC (permalink / raw)
  To: Yoonseok Ko; +Cc: caml-list

Le Monday 13 Jun 2011 à 09:35:41 (+0900), Yoonseok Ko a écrit :
> Hello everyone.
> 
> I have two questions.
> 
> 1. Is there any solution? Explicit garbage collection is too slow
> and hard to collect garbage on time.
>     I want to know what happened in GC, and why GC won't work.

First you'll have to know what kind of values accumulate in the heap. If
they are custom blocks, then you'll have a pointer in the block pointing
to a structure that identifies the kind of values (as a general rule).
If it's a problem in the binding, then looking closely at custom blocks
is likely a good idea.

Do not know about muddy, but it may be a good idea to tweak parameters
in alloc_custom in order to force garbage collection often. If the
problem goes away, I'd recommend reading closely about alloc_custom,
look at how it is implemented, and modify muddy.c accordingly.

Anyhow, for such GC problems, I'd recommend using ocamlviz.

> 2. Sometimes GC log has this message: "Growing gray_vals to 32768k bytes."
>     What does that means?

Good read on the topic:

http://caml.inria.fr/pub/docs/oreilly-book/pdf/chap9.pdf

In the marking phase of Mark & Sweep, gray cells are marked cells whose
descendents are not yet marked.

> Best Regards,
> 
> Yoonseok Ko

-- 
     Guillaume Yziquel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] A question about GC.
  2011-06-13  0:35 [Caml-list] A question about GC Yoonseok Ko
  2011-06-13  9:40 ` Guillaume Yziquel
@ 2011-06-13 10:37 ` Philippe Wang
  2011-06-13 11:34   ` ygrek
  2011-06-13 12:21 ` Gerd Stolpmann
  2 siblings, 1 reply; 8+ messages in thread
From: Philippe Wang @ 2011-06-13 10:37 UTC (permalink / raw)
  To: Yoonseok Ko; +Cc: caml-list

As far as I know, when OCaml's GC is not working as expected, it's
(almost) always because there are blocks allocated outside OCaml's
heap (a.k.a. custom blocks).

How big are custom blocks? The bigger they are, the worst the behavior
tends to be.

How much do you allocate in OCaml's heap (i.e., "normal OCaml
values")? The more you allocate small values, the more efficient your
program becomes (well, don't allocate way too much useless values
either).
For instance, what is your program's behavior if you replace
   Gc.compact() (or Gc.whatever_triggers_a_notTooMinor_collection...)
by something like
 (Array.init 10000 string_of_int)
?

-- 
Philippe Wang
   mail@philippewang.info


On Mon, Jun 13, 2011 at 2:35 AM, Yoonseok Ko <ysko@ropas.snu.ac.kr> wrote:
> Hello everyone.
> I'm a graduate student majoring program analysis.
>
> I'm using Muddy which is BDD library interfacing Buddy.
> The problem is that when I construct BDD, its memory blows up in some cases
> because GC won't work.
> (It was not only for Muddy problem. We already tried to use our own buddy
> interface.)
>
> If I call Gc.compact () explicitly every cycle of constructing BDD, then
> memory consumption is reasonable.
> Gc.major () also works well, but Gc.minor () doesn't work.
> I watched log messages of GC and figured out that they always try to grow
> heap and very very rarely start new major GC cycle.
>
> In a small example, if I construct BDD only in non-tail-recursive form
> function, memory blows up.
> In a real code, tail-recursive form doesn't work. Just memory blows up.
> So far, only the solution is just call Gc.major () explicitly.
>
> I'm using GC with default setting.
> There was no memory leakage on buddy side.
> I check the memory consumption both outside of the process and inside of GC.
>
>
> I have two questions.
>
> 1. Is there any solution? Explicit garbage collection is too slow and hard
> to collect garbage on time.
>    I want to know what happened in GC, and why GC won't work.
>
> 2. Sometimes GC log has this message: "Growing gray_vals to 32768k bytes."
>    What does that means?
>
>
> Best Regards,
>
> Yoonseok Ko
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] A question about GC.
  2011-06-13 10:37 ` Philippe Wang
@ 2011-06-13 11:34   ` ygrek
  2011-06-13 12:19     ` Guillaume Yziquel
  0 siblings, 1 reply; 8+ messages in thread
From: ygrek @ 2011-06-13 11:34 UTC (permalink / raw)
  To: caml-list

On Mon, 13 Jun 2011 12:37:46 +0200
Philippe Wang <mail@philippewang.info> wrote:

> As far as I know, when OCaml's GC is not working as expected, it's
> (almost) always because there are blocks allocated outside OCaml's
> heap (a.k.a. custom blocks).

Custom blocks are allocated on ocaml heap.
 
> How big are custom blocks? The bigger they are, the worst the behavior
> tends to be.

Why?

-- 
 ygrek
 http://ygrek.org.ua

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] A question about GC.
  2011-06-13 11:34   ` ygrek
@ 2011-06-13 12:19     ` Guillaume Yziquel
  2011-06-13 13:26       ` Philippe Wang
  0 siblings, 1 reply; 8+ messages in thread
From: Guillaume Yziquel @ 2011-06-13 12:19 UTC (permalink / raw)
  To: ygrek; +Cc: caml-list

Le Monday 13 Jun 2011 à 14:34:07 (+0300), ygrek a écrit :
> On Mon, 13 Jun 2011 12:37:46 +0200
> Philippe Wang <mail@philippewang.info> wrote:
> 
> > As far as I know, when OCaml's GC is not working as expected, it's
> > (almost) always because there are blocks allocated outside OCaml's
> > heap (a.k.a. custom blocks).
> 
> Custom blocks are allocated on ocaml heap.

No, not always. Nothing stops you from putting custom blocks outside of
the ocaml heap. It might even sometimes make sense. Though usually
you're fine allocating custom blocks on the ocaml heap with a pointer to
out of heap data.

> > How big are custom blocks? The bigger they are, the worst the behavior
> > tends to be.
> 
> Why?

Because the GC relies on the two last arguments of caml_alloc_custom to
properly evaluate the memory impact of allocating a custom block
together with the data it refers to. It's easy to get it wrong for large
data, and thus miscalibrate the GC on these blocks.

However, the fact that grey values seem to accumulate doesn't seem to
fit with such an simple explanation. But maybe Philippe is thinking
about something else when talking about "big custom blocks allocated
outside of the ocaml heap".

-- 
     Guillaume Yziquel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] A question about GC.
  2011-06-13  0:35 [Caml-list] A question about GC Yoonseok Ko
  2011-06-13  9:40 ` Guillaume Yziquel
  2011-06-13 10:37 ` Philippe Wang
@ 2011-06-13 12:21 ` Gerd Stolpmann
  2011-06-13 12:26   ` Guillaume Yziquel
  2 siblings, 1 reply; 8+ messages in thread
From: Gerd Stolpmann @ 2011-06-13 12:21 UTC (permalink / raw)
  To: Yoonseok Ko; +Cc: caml-list

Am Montag, den 13.06.2011, 09:35 +0900 schrieb Yoonseok Ko:
> Hello everyone.
> I'm a graduate student majoring program analysis.
> 
> I'm using Muddy which is BDD library interfacing Buddy.
> The problem is that when I construct BDD, its memory blows up in some cases
> because GC won't work.
> (It was not only for Muddy problem. We already tried to use our own 
> buddy interface.)
> 
> If I call Gc.compact () explicitly every cycle of constructing BDD, then 
> memory consumption is reasonable.
> Gc.major () also works well, but Gc.minor () doesn't work.
> I watched log messages of GC and figured out that they always try to 
> grow heap and very very rarely start new major GC cycle.
> 
> In a small example, if I construct BDD only in non-tail-recursive form 
> function, memory blows up.
> In a real code, tail-recursive form doesn't work. Just memory blows up.
> So far, only the solution is just call Gc.major () explicitly.
> 
> I'm using GC with default setting.
> There was no memory leakage on buddy side.
> I check the memory consumption both outside of the process and inside of GC.
> 
> 
> I have two questions.
> 
> 1. Is there any solution? Explicit garbage collection is too slow and 
> hard to collect garbage on time.
>      I want to know what happened in GC, and why GC won't work.

Your problem sounds a bit like as if the custom blocks were allocated
with the wrong parameters. Remember caml_alloc_custom has four
parameters:

caml_alloc_custom(ops, size, used, max)

Often one sets here used=0 and max=1, but this may be totally wrong if
you allocate larger blocks. Better set used=1, and max is the number of
custom block allocations until the major GC is triggered. (Imagine there
is a water level variable w, and for every allocation w:=w+used/max, and
if w exceeds 1.0 the water tank is full, and another major GC is
triggered.)

Maybe this helps.

Gerd

> 
> 2. Sometimes GC log has this message: "Growing gray_vals to 32768k bytes."
>      What does that means?
> 
> 
> Best Regards,
> 
> Yoonseok Ko
> 
> 



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] A question about GC.
  2011-06-13 12:21 ` Gerd Stolpmann
@ 2011-06-13 12:26   ` Guillaume Yziquel
  0 siblings, 0 replies; 8+ messages in thread
From: Guillaume Yziquel @ 2011-06-13 12:26 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: Yoonseok Ko, caml-list

Le Monday 13 Jun 2011 à 14:21:00 (+0200), Gerd Stolpmann a écrit :
> 
> Your problem sounds a bit like as if the custom blocks were allocated
> with the wrong parameters. Remember caml_alloc_custom has four
> parameters:
> 
> caml_alloc_custom(ops, size, used, max)
> 
> Often one sets here used=0 and max=1, but this may be totally wrong if
> you allocate larger blocks.

muddy.c does exactly that. However, I have no idea about the size of the
blocks or data pointed to by the block.

> Better set used=1, and max is the number of
> custom block allocations until the major GC is triggered. (Imagine there
> is a water level variable w, and for every allocation w:=w+used/max, and
> if w exceeds 1.0 the water tank is full, and another major GC is
> triggered.)
> 
> Maybe this helps.
> 
> Gerd

-- 
     Guillaume Yziquel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] A question about GC.
  2011-06-13 12:19     ` Guillaume Yziquel
@ 2011-06-13 13:26       ` Philippe Wang
  0 siblings, 0 replies; 8+ messages in thread
From: Philippe Wang @ 2011-06-13 13:26 UTC (permalink / raw)
  To: Guillaume Yziquel; +Cc: ygrek, caml-list

On Mon, Jun 13, 2011 at 2:19 PM, Guillaume Yziquel
<guillaume.yziquel@gmx.ch> wrote:
> Le Monday 13 Jun 2011 ą 14:34:07 (+0300), ygrek a écrit :
>> On Mon, 13 Jun 2011 12:37:46 +0200
>> Philippe Wang <mail@philippewang.info> wrote:
>>
>> > As far as I know, when OCaml's GC is not working as expected, it's
>> > (almost) always because there are blocks allocated outside OCaml's
>> > heap (a.k.a. custom blocks).
>>
>> Custom blocks are allocated on ocaml heap.
>
> No, not always. Nothing stops you from putting custom blocks outside of
> the ocaml heap. It might even sometimes make sense. Though usually
> you're fine allocating custom blocks on the ocaml heap with a pointer to
> out of heap data.
>
>> > How big are custom blocks? The bigger they are, the worst the behavior
>> > tends to be.
>>
>> Why?
>
> Because the GC relies on the two last arguments of caml_alloc_custom to
> properly evaluate the memory impact of allocating a custom block
> together with the data it refers to. It's easy to get it wrong for large
> data, and thus miscalibrate the GC on these blocks.
>
> However, the fact that grey values seem to accumulate doesn't seem to
> fit with such an simple explanation. But maybe Philippe is thinking
> about something else when talking about "big custom blocks allocated
> outside of the ocaml heap".

Let's consider this simple scenario:

 let tmp = new_very_large_data_outside_ocaml_heap () in
 let result = some_stuff(tmp) in
   result

=> if tmp is not useful anymore, you will probably have to wait a long
time before tmp is fully collected by the GC.
 That's because finalize functions are not called very often (as far
as I remember, so tell me if I'm wrong) notably not during a minor
collection, while finalize functions are those which actually free
those large data allocated outside OCaml's heap. So even if a minor
collection (or several)   is (are) triggered, memory is not collected
until a major collection.

This should explain why in some scenarios, even if C-interface is
written "perfectly", there is some "kind of memory-leak" (which is not
actual memory leak but just some delay on deallocations, which might
cause the program to fail).

The only way I know to deallocate quicker custom data is to trigger
the major collection more often. The only way to trigger the major
collection more often without calling Gc.something(), is to write the
program differently with a lot of small allocations ("a lot" means
"enough", here). Two ways: allocate dummy data (this is not really
good), allocate useful data (this is better). Allocating more data can
happen when programming with threads! When some computation is done
outside OCaml's heap, it's particularly relevant to use OCaml threads.
But this means at least one thread is computing "normal OCaml".

N.B. Well, maybe I'm not addressing the problem at all. Actually, what
I'm saying here is mostly based on my experiment with mlgmp (an
interface to use GMP in OCaml) about 5 years ago.

-- 
Philippe Wang
   mail@philippewang.info


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-06-13 13:26 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-13  0:35 [Caml-list] A question about GC Yoonseok Ko
2011-06-13  9:40 ` Guillaume Yziquel
2011-06-13 10:37 ` Philippe Wang
2011-06-13 11:34   ` ygrek
2011-06-13 12:19     ` Guillaume Yziquel
2011-06-13 13:26       ` Philippe Wang
2011-06-13 12:21 ` Gerd Stolpmann
2011-06-13 12:26   ` Guillaume Yziquel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).