caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] memory leak in toplevel? or, how to implement sizeof
@ 2011-01-06 13:05 Jim Pryor
  2011-01-06 14:20 ` David Allsopp
  0 siblings, 1 reply; 8+ messages in thread
From: Jim Pryor @ 2011-01-06 13:05 UTC (permalink / raw)
  To: caml-list

Starting up the toplevel fresh, with no .ocamlinit file, I see this:

    $ /usr/bin/ocaml
            Objective Caml version 3.12.0

    # Gc.(compact();counters());;                         
    - : float * float * float = (79217., 32932., 82668.)
    # Gc.(compact();counters());;
    - : float * float * float = (93530., 32932., 85816.)
    # Gc.(compact();counters());;
    - : float * float * float = (107843., 32932., 88964.)

Why do both the minor_words and major_words keep climbing? Do these
indicate the number of words _ever_ allocated, and so the overhead of
doing the garbage collection and compacting has pushed them up? Or do
they indicate the number of words _now_ allocated, so the fact that they
continue to climb indicates something is leaking memory?

Omitting the collection, I still see persisting climbs:

    $ /usr/bin/ocaml
            Objective Caml version 3.12.0

    # Gc.(counters());;
    - : float * float * float = (227355., 69894., 150533.)
    # Gc.(counters());;
    - : float * float * float = (240969., 73385., 154024.)
    # Gc.(counters());;
    - : float * float * float = (254583., 73385., 154024.)


Two questions:

1. Is there anything here to be worried about? A memory leak that needs
to be tracked down? (If so, won't in be in the toplevel or some other
piece of the core system? I haven't loaded any other libraries.) Or is
there nothing to worry about?

2. What bit of the Gc should I poll to find out how much memory I just
allocated? I was trying to write a function like this:

    let sizeof maker =
        let baseline = get_baseline_from_gc () in
        let res = maker () in
        let final = get_final_from_gc () in
        (final -. baseline, res);;

This post at the Jane Street blog
<http://ocaml.janestreet.com/?q=node/30> suggests (among other things)
that:

    # (Gc.stat()).Gc.minor_words;;

is the right way to poll the Gc. However:

    # (Gc.stat()).Gc.minor_words;;
    - : float = 226623.
    # (Gc.stat()).Gc.minor_words;;
    - : float = 229632.
    # (Gc.stat()).Gc.minor_words;;
    - : float = 232641.
    # let a = Some 1;;
    val a : int option = Some 1
    # (Gc.stat()).Gc.minor_words;;
    - : float = 242131.
    # (Gc.stat()).Gc.minor_words;;
    - : float = 245140.

says there are 9 words allocated just for each poll of the Gc, and
then 9490 (!) words allocated for the `Some 1`. Something is amiss.


-- 
Jim Pryor
profjim@jimpryor.net

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [Caml-list] memory leak in toplevel? or, how to implement sizeof
  2011-01-06 13:05 [Caml-list] memory leak in toplevel? or, how to implement sizeof Jim Pryor
@ 2011-01-06 14:20 ` David Allsopp
  2011-01-06 14:44   ` Jim Pryor
  0 siblings, 1 reply; 8+ messages in thread
From: David Allsopp @ 2011-01-06 14:20 UTC (permalink / raw)
  To: Jim Pryor, caml-list

Jim Pryor wrote:
> Starting up the toplevel fresh, with no .ocamlinit file, I see this:
> 
>     $ /usr/bin/ocaml
>             Objective Caml version 3.12.0
> 
>     # Gc.(compact();counters());;
>     - : float * float * float = (79217., 32932., 82668.)
>     # Gc.(compact();counters());;
>     - : float * float * float = (93530., 32932., 85816.)
>     # Gc.(compact();counters());;
>     - : float * float * float = (107843., 32932., 88964.)
> 
> Why do both the minor_words and major_words keep climbing? Do these
> indicate the number of words _ever_ allocated,

The docs clearly state that is "ever" allocated - see type Gc.stat in the reference manual.

> and so the overhead of
> doing the garbage collection and compacting has pushed them up? Or do they
> indicate the number of words _now_ allocated, so the fact that they
> continue to climb indicates something is leaking memory?

There are all kinds of things going on when you do this - even the result of Gc.counters causes an allocation (it has to create a tuple to return the numbers). The toploop uses the Parsing module to parse each expression and that will result in some allocations. Printf probably allocates as well (which is responsible for the displaying the output in the toploop). If you re-order the variables to do the three statements in a row and then print all nine results at the end then you'll see a rise which looks pretty consistent with just the allocation of your nine variables and the tuples coming from Gc.counters.


David


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] memory leak in toplevel? or, how to implement sizeof
  2011-01-06 14:20 ` David Allsopp
@ 2011-01-06 14:44   ` Jim Pryor
  2011-01-06 14:59     ` Jim Pryor
  2011-01-06 15:05     ` Pascal Zimmer
  0 siblings, 2 replies; 8+ messages in thread
From: Jim Pryor @ 2011-01-06 14:44 UTC (permalink / raw)
  To: caml-list

Thanks David. So does this look like a reasonable implementation of
sizeof:

    # let sizeof maker arg =
        let first = Gc.((stat()).minor_words) in
        let res = maker arg in
        let second = Gc.((stat()).minor_words) in
        int_of_float (second -. first) - 23 ( *overhead *), res;;

    # sizeof (fun x -> [x]) 1;;
    - : int * int list = (3, [1])

    # sizeof (fun x -> Some x) 1;;
    - : int * int option = (2, Some 1)

It does give the right answers. ([1] is a block containing two non-blocks; Some 1 is a block containing one non-block.)


-- 
Jim Pryor
profjim@jimpryor.net

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] memory leak in toplevel? or, how to implement sizeof
  2011-01-06 14:44   ` Jim Pryor
@ 2011-01-06 14:59     ` Jim Pryor
  2011-01-07 12:25       ` Damien Doligez
  2011-01-06 15:05     ` Pascal Zimmer
  1 sibling, 1 reply; 8+ messages in thread
From: Jim Pryor @ 2011-01-06 14:59 UTC (permalink / raw)
  To: caml-list

On Thu, Jan 06, 2011 at 09:44:06AM -0500, Jim Pryor wrote:
> Thanks David. So does this look like a reasonable implementation of
> sizeof:
> 
>     # let sizeof maker arg =
>         let first = Gc.((stat()).minor_words) in
>         let res = maker arg in
>         let second = Gc.((stat()).minor_words) in
>         int_of_float (second -. first) - 23 ( *overhead *), res;;
> 
>     # sizeof (fun x -> [x]) 1;;
>     - : int * int list = (3, [1])
> 
>     # sizeof (fun x -> Some x) 1;;
>     - : int * int option = (2, Some 1)
> 
> It does give the right answers. ([1] is a block containing two non-blocks; Some 1 is a block containing one non-block.)

Actually I think the overhead should be 22. Making it 22 gives these
sensible results:

    # sizeof (fun x -> x) 10;;
    - : int * int = (1, 10)

That is, one word for the int.

    # sizeof (fun x -> (x,()) ) 10;;
    - : int * (int * unit) = (4, (10, ()))

One word for the block pointer, one word for the block
header, one word for each of the two elements of the block.

    # sizeof (fun x -> (x,(x,())) ) 10;;
    - : int * (int * (int * unit)) = (7, (10, (10, ())))

Four words as above, one of them pointing to a new block, which contains
a header and two elements.

-- 
Jim Pryor
profjim@jimpryor.net

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] memory leak in toplevel? or, how to implement sizeof
  2011-01-06 14:44   ` Jim Pryor
  2011-01-06 14:59     ` Jim Pryor
@ 2011-01-06 15:05     ` Pascal Zimmer
  1 sibling, 0 replies; 8+ messages in thread
From: Pascal Zimmer @ 2011-01-06 15:05 UTC (permalink / raw)
  To: Jim Pryor; +Cc: caml-list

Because the overhead parameter can vary based on which version of OCaml
you are running, what the Gc module does, etc, I usually implement this
kind of measurements in this way:

# let sizeof maker arg =
    let first = Gc.((stat()).minor_words) in
    for i = 1 to 1000000 do
      ignore (maker arg)
    done;
    let second = Gc.((stat()).minor_words) in
    (second -. first) /. 1e6;;

# sizeof (fun x -> [x]) 1;;
- : float = 3.000022

The overhead gets amortized enough that is becomes negligible (you only
have to rely on the fact that the loop does not do any allocation).

Pascal


On Thu, 2011-01-06 at 09:44 -0500, Jim Pryor wrote:
> Thanks David. So does this look like a reasonable implementation of
> sizeof:
> 
>     # let sizeof maker arg =
>         let first = Gc.((stat()).minor_words) in
>         let res = maker arg in
>         let second = Gc.((stat()).minor_words) in
>         int_of_float (second -. first) - 23 ( *overhead *), res;;
> 
>     # sizeof (fun x -> [x]) 1;;
>     - : int * int list = (3, [1])
> 
>     # sizeof (fun x -> Some x) 1;;
>     - : int * int option = (2, Some 1)
> 
> It does give the right answers. ([1] is a block containing two non-blocks; Some 1 is a block containing one non-block.)
> 
> 
> -- 
> Jim Pryor
> profjim@jimpryor.net
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] memory leak in toplevel? or, how to implement sizeof
  2011-01-06 14:59     ` Jim Pryor
@ 2011-01-07 12:25       ` Damien Doligez
  2011-01-07 14:14         ` Jim Pryor
  0 siblings, 1 reply; 8+ messages in thread
From: Damien Doligez @ 2011-01-07 12:25 UTC (permalink / raw)
  To: caml users


On 2011-01-06, at 15:59, Jim Pryor wrote:

> Actually I think the overhead should be 22. Making it 22 gives these
> sensible results:

The overhead is exactly the heap space used by the value returned by
Gc.stat.  In your case it is 23 words, not 22.

>    # sizeof (fun x -> x) 10;;
>    - : int * int = (1, 10)
> 
> That is, one word for the int.

The int is not allocated in the heap, so the result should be 0.

It is best to let ocaml compute the overhead, because it will be different
on 32- and 64-bit machines:

  let overhead =
    let first = Gc.((stat()).minor_words) in
    let second = Gc.((stat()).minor_words) in
    int_of_float (second -. first)
  ;;

-- Damien


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] memory leak in toplevel? or, how to implement sizeof
  2011-01-07 12:25       ` Damien Doligez
@ 2011-01-07 14:14         ` Jim Pryor
  2011-01-07 14:22           ` dmitry grebeniuk
  0 siblings, 1 reply; 8+ messages in thread
From: Jim Pryor @ 2011-01-07 14:14 UTC (permalink / raw)
  To: caml users

On Thu, Jan 06, 2011 at 10:05:32AM -0500, Pascal Zimmer wrote:
> Because the overhead parameter can vary based on which version of OCaml
> you are running, what the Gc module does, etc, I usually implement this
> kind of measurements in this way:
> 
> # let sizeof maker arg =
>     let first = Gc.((stat()).minor_words) in
>     for i = 1 to 1000000 do
>       ignore (maker arg)
>     done;
>     let second = Gc.((stat()).minor_words) in
>     (second -. first) /. 1e6;;
> 
> # sizeof (fun x -> [x]) 1;;
> - : float = 3.000022
> 
> The overhead gets amortized enough that is becomes negligible (you only
> have to rely on the fact that the loop does not do any allocation).

Thanks, that's a nice and sensible shortcut.


On Fri, Jan 07, 2011 at 01:25:42PM +0100, Damien Doligez wrote:
> The overhead is exactly the heap space used by the value returned by
> Gc.stat.  In your case it is 23 words, not 22.

Yes, that's right, the Gc.stat on my machine uses 23 words of heap, and of course we should let OCaml calculate the size for us, because as you say it will
differ on different architectures.

> 
> >    # sizeof (fun x -> x) 10;;
> >    - : int * int = (1, 10)
> > 
> > That is, one word for the int.
> 
> The int is not allocated in the heap, so the result should be 0.

There's a design choice here: we have to decide whether sizeof counts only the allocated heap space or also the pointer to it. Initially I also found it more natural to count only the allocated heap space, as you're proposing. But then when testing on embedded blocks I talked myself into thinking the other choice worked more smoothly. Looking at it again, though, I think I had just confused myself and you're right, counting only the heap space is best.

Thanks for the feedback, both of you.

-- 
Jim Pryor
profjim@jimpryor.net

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] memory leak in toplevel? or, how to implement sizeof
  2011-01-07 14:14         ` Jim Pryor
@ 2011-01-07 14:22           ` dmitry grebeniuk
  0 siblings, 0 replies; 8+ messages in thread
From: dmitry grebeniuk @ 2011-01-07 14:22 UTC (permalink / raw)
  To: caml users

Hello.

> # let sizeof maker arg =

  Note that what you are counting is not the size of the value ( =
count of words that value uses on heaps), but the count of words
allocated during the value's construction.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-01-07 14:22 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-06 13:05 [Caml-list] memory leak in toplevel? or, how to implement sizeof Jim Pryor
2011-01-06 14:20 ` David Allsopp
2011-01-06 14:44   ` Jim Pryor
2011-01-06 14:59     ` Jim Pryor
2011-01-07 12:25       ` Damien Doligez
2011-01-07 14:14         ` Jim Pryor
2011-01-07 14:22           ` dmitry grebeniuk
2011-01-06 15:05     ` Pascal Zimmer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).