caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Questions on replacing finalizers and memory footprints
@ 2007-12-06 11:12 Thomas Fischbacher
  2007-12-06 12:51 ` [Caml-list] " dmitry grebeniuk
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Thomas Fischbacher @ 2007-12-06 11:12 UTC (permalink / raw)
  To: caml-list


Hello everybody,

is there a simple and straightforward way to replace a finalizer
function attached to some entity by a different one, or do I have
to work around this problem by modifying my finalizers appropriately?

Also, is there a simple way to implement a function (perhaps using
Obj.magic) which will walk a (possibly circular) network of tuples,
arrays, variadic entities and lists, and return the total number of
bytes used up by that structure? I see that this should be possible
in principle with the present implementation of the runtime if one
could get some basic information about the internal type of an array.

Actually, the Marshal module will have to do something very similar
(but not exactly the same thing). Is there a way to get hold of
the underlying functions?

-- 
best regards,
Thomas Fischbacher
tf@functionality.de


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Questions on replacing finalizers and memory footprints
  2007-12-06 11:12 Questions on replacing finalizers and memory footprints Thomas Fischbacher
@ 2007-12-06 12:51 ` dmitry grebeniuk
  2007-12-06 14:26 ` Richard Jones
  2007-12-07  8:52 ` Xavier Leroy
  2 siblings, 0 replies; 17+ messages in thread
From: dmitry grebeniuk @ 2007-12-06 12:51 UTC (permalink / raw)
  To: Thomas Fischbacher; +Cc: caml-list

Hello, Thomas.

TF> Also, is there a simple way to implement a function
TF> (perhaps using Obj.magic) which will walk a (possibly
TF> circular) network of tuples, arrays, variadic entities
TF> and lists, and return the total number of bytes used up
TF> by that structure? I see that this should be possible in
TF> principle with the present implementation of the runtime
TF> if one could get some basic information about the
TF> internal type of an array.

 When I faced this problem, I wrote a function to
get the size of value.  This is a C-world function
which walks though values and marks visited
values by changing (and then restoring) their "color".
Colors are stored using RLE-like compression to
decrease memory usage.

  But it's possible to calculate the size of the value
using pure ocaml code, using Obj to examine values and
Hashtbl to store visited values.  It will work well
with small or medium-sized values.

-- 
WBR,
 dmitry             mailto:gds-mlsts@moldavcable.com


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Questions on replacing finalizers and memory footprints
  2007-12-06 11:12 Questions on replacing finalizers and memory footprints Thomas Fischbacher
  2007-12-06 12:51 ` [Caml-list] " dmitry grebeniuk
@ 2007-12-06 14:26 ` Richard Jones
  2007-12-06 14:57   ` Thomas Fischbacher
  2007-12-07  8:52 ` Xavier Leroy
  2 siblings, 1 reply; 17+ messages in thread
From: Richard Jones @ 2007-12-06 14:26 UTC (permalink / raw)
  To: Thomas Fischbacher; +Cc: caml-list

On Thu, Dec 06, 2007 at 11:12:04AM +0000, Thomas Fischbacher wrote:
> Also, is there a simple way to implement a function (perhaps using
> Obj.magic) which will walk a (possibly circular) network of tuples,
> arrays, variadic entities and lists, and return the total number of
> bytes used up by that structure? I see that this should be possible
> in principle with the present implementation of the runtime if one
> could get some basic information about the internal type of an array.

You might get some ideas by having a look at the ancient module
(http://merjis.com/developers/ancient), specifically at how the C
function _mark in ancient_c.c is implemented.  Also have a look at the
implementation of the Marshal module in the OCaml sources which takes
a slightly different approach.

If you want to do this in pure OCaml, probably your best bet would be
to just Marshal the structure and count how big it is.  It'll be slow
of course.

Rich.

-- 
Richard Jones
Red Hat


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Questions on replacing finalizers and memory footprints
  2007-12-06 14:26 ` Richard Jones
@ 2007-12-06 14:57   ` Thomas Fischbacher
  2007-12-06 16:50     ` Jon Harrop
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas Fischbacher @ 2007-12-06 14:57 UTC (permalink / raw)
  To: Richard Jones; +Cc: caml-list


Richard Jones wrote:

> If you want to do this in pure OCaml, probably your best bet would be
> to just Marshal the structure and count how big it is.  It'll be slow
> of course.

Actually, the situation that brought up this question is that I have a
complicated internal data structure which will free 300 MB of RAM if I
delete it, while serializing it produces a file of 94 MB only...
So, I would like to have more clarity what is going on here, and which
part of this data structure eats how much space.

-- 
Best regards,
Thomas Fischbacher
tf@functionality.de


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Questions on replacing finalizers and memory footprints
  2007-12-06 14:57   ` Thomas Fischbacher
@ 2007-12-06 16:50     ` Jon Harrop
  2007-12-06 21:33       ` forum
  0 siblings, 1 reply; 17+ messages in thread
From: Jon Harrop @ 2007-12-06 16:50 UTC (permalink / raw)
  To: caml-list

On Thursday 06 December 2007 14:57, Thomas Fischbacher wrote:
> Richard Jones wrote:
> > If you want to do this in pure OCaml, probably your best bet would be
> > to just Marshal the structure and count how big it is.  It'll be slow
> > of course.
>
> Actually, the situation that brought up this question is that I have a
> complicated internal data structure which will free 300 MB of RAM if I
> delete it, while serializing it produces a file of 94 MB only...
> So, I would like to have more clarity what is going on here, and which
> part of this data structure eats how much space.

I had never though of measuring the size of a marshalled data structure. Turns 
out its representation of ints can be more concise than the code 
representation though:

# String.length (Marshal.to_string (Array.make 1000000 0) []);;
- : int = 1000025
# String.length (Marshal.to_string (Array.make 1000000 123456789) []);;
- : int = 5000025
# String.length (Marshal.to_string (Array.make 1000000 max_int) []);;
- : int = 9000025

which is probably what you're observing.

Marshalling also handles sharing but that seems to refer to DAGs in memory 
rather than hash consing:

# String.length (Marshal.to_string (Array.make 1000000 0., Array.make 1000000 
0.) []);;
- : int = 16000031

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Questions on replacing finalizers and memory footprints
  2007-12-06 16:50     ` Jon Harrop
@ 2007-12-06 21:33       ` forum
  0 siblings, 0 replies; 17+ messages in thread
From: forum @ 2007-12-06 21:33 UTC (permalink / raw)
  To: caml-list


Le 6 déc. 07 à 17:50, Jon Harrop a écrit :

> On Thursday 06 December 2007 14:57, Thomas Fischbacher wrote:
>> Richard Jones wrote:
>>> If you want to do this in pure OCaml, probably your best bet  
>>> would be
>>> to just Marshal the structure and count how big it is.  It'll be  
>>> slow
>>> of course.
>>
>> Actually, the situation that brought up this question is that I  
>> have a
>> complicated internal data structure which will free 300 MB of RAM  
>> if I
>> delete it, while serializing it produces a file of 94 MB only...
>> So, I would like to have more clarity what is going on here, and  
>> which
>> part of this data structure eats how much space.
>
> I had never though of measuring the size of a marshalled data  
> structure. Turns
> out its representation of ints can be more concise than the code
> representation though:

(...)

Yes, the marshalled format although not compressed is quite optimized.
For example, if I am not wrong, an "int" value in 0..63 will be coded  
on a
single byte. Such a value may take up to 8 times more space in memory
if you are using a 64-bit architecture. The same is of course also  
true for
all data types that are represented at runtime by values in 0..63
(e.g. variant with no embedded data).
The same kind of encoding is used for "small" blocks, whose headers are
stored more compactly in a marshalled stream than in memory.

As a consequence, it is not very surprising that you observe a 3:1 ratio
between in-memory and marshalled representations.


Xavier Clerc

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Questions on replacing finalizers and memory footprints
  2007-12-06 11:12 Questions on replacing finalizers and memory footprints Thomas Fischbacher
  2007-12-06 12:51 ` [Caml-list] " dmitry grebeniuk
  2007-12-06 14:26 ` Richard Jones
@ 2007-12-07  8:52 ` Xavier Leroy
  2007-12-07 10:44   ` Jean-Christophe Filliâtre
  2007-12-07 11:31   ` Berke Durak
  2 siblings, 2 replies; 17+ messages in thread
From: Xavier Leroy @ 2007-12-07  8:52 UTC (permalink / raw)
  To: Thomas Fischbacher; +Cc: caml-list

> Also, is there a simple way to implement a function (perhaps using
> Obj.magic) which will walk a (possibly circular) network of tuples,
> arrays, variadic entities and lists, and return the total number of
> bytes used up by that structure? I see that this should be possible
> in principle with the present implementation of the runtime if one
> could get some basic information about the internal type of an
> array.

Jean-Christophe Filliâtre's "size" library does exactly this:

       http://www.lri.fr/~filliatr/software.en.html

- Xavier Leroy


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Questions on replacing finalizers and memory footprints
  2007-12-07 10:44   ` Jean-Christophe Filliâtre
@ 2007-12-07 10:35     ` Jon Harrop
  2007-12-07 11:18     ` forum
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Jon Harrop @ 2007-12-07 10:35 UTC (permalink / raw)
  To: caml-list

On Friday 07 December 2007 10:44, Jean-Christophe Filliâtre wrote:
> Xavier Leroy wrote:
> >> Also, is there a simple way to implement a function (perhaps using
> >> Obj.magic) which will walk a (possibly circular) network of tuples,
> >> arrays, variadic entities and lists, and return the total number of
> >> bytes used up by that structure? I see that this should be possible
> >> in principle with the present implementation of the runtime if one
> >> could get some basic information about the internal type of an
> >> array.
> >
> > Jean-Christophe Filliâtre's "size" library does exactly this:
> >
> >        http://www.lri.fr/~filliatr/software.en.html
>
> Indeed. However, note that it uses internally a hash table to store
> blocks already considered (in order to correctly account for sharing),
> and thus it is potentially incorrect if the GC moves some blocks during
> the count, for instance during a resizing of the hash table (which
> triggers the GC). I don't know how to avoid this issue; any help is
> welcome.

Yes. I wanted this functionality as well and tried your "size" library but it 
only segfaults on my machine. Presumably this must be written in C to avoid 
using the GC?

Perhaps this would be a good product... :-)

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Questions on replacing finalizers and memory footprints
  2007-12-07  8:52 ` Xavier Leroy
@ 2007-12-07 10:44   ` Jean-Christophe Filliâtre
  2007-12-07 10:35     ` Jon Harrop
                       ` (3 more replies)
  2007-12-07 11:31   ` Berke Durak
  1 sibling, 4 replies; 17+ messages in thread
From: Jean-Christophe Filliâtre @ 2007-12-07 10:44 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: Thomas Fischbacher, caml-list

Xavier Leroy wrote:
>> Also, is there a simple way to implement a function (perhaps using
>> Obj.magic) which will walk a (possibly circular) network of tuples,
>> arrays, variadic entities and lists, and return the total number of
>> bytes used up by that structure? I see that this should be possible
>> in principle with the present implementation of the runtime if one
>> could get some basic information about the internal type of an
>> array.
> 
> Jean-Christophe Filliâtre's "size" library does exactly this:
> 
>        http://www.lri.fr/~filliatr/software.en.html

Indeed. However, note that it uses internally a hash table to store
blocks already considered (in order to correctly account for sharing),
and thus it is potentially incorrect if the GC moves some blocks during
the count, for instance during a resizing of the hash table (which
triggers the GC). I don't know how to avoid this issue; any help is welcome.

-- 
Jean-Christophe Filliâtre
http://www.lri.fr/~filliatr/


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Questions on replacing finalizers and memory footprints
  2007-12-07 10:44   ` Jean-Christophe Filliâtre
  2007-12-07 10:35     ` Jon Harrop
@ 2007-12-07 11:18     ` forum
  2007-12-07 19:54       ` Jean-Christophe Filliâtre
  2007-12-07 20:31     ` Christophe Raffalli
  2008-01-23 12:08     ` Hendrik Tews
  3 siblings, 1 reply; 17+ messages in thread
From: forum @ 2007-12-07 11:18 UTC (permalink / raw)
  To: caml-list

Selon Jean-Christophe Filliâtre <Jean-Christophe.Filliatre@lri.fr>:

> Xavier Leroy wrote:
> >> Also, is there a simple way to implement a function (perhaps using
> >> Obj.magic) which will walk a (possibly circular) network of tuples,
> >> arrays, variadic entities and lists, and return the total number of
> >> bytes used up by that structure? I see that this should be possible
> >> in principle with the present implementation of the runtime if one
> >> could get some basic information about the internal type of an
> >> array.
> >
> > Jean-Christophe Filliâtre's "size" library does exactly this:
> >
> >        http://www.lri.fr/~filliatr/software.en.html
>
> Indeed. However, note that it uses internally a hash table to store
> blocks already considered (in order to correctly account for sharing),
> and thus it is potentially incorrect if the GC moves some blocks during
> the count, for instance during a resizing of the hash table (which
> triggers the GC). I don't know how to avoid this issue; any help is welcome.

Sorry for the noise but doesn't this mean that the "size" function may not
terminate ?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Questions on replacing finalizers and memory footprints
  2007-12-07  8:52 ` Xavier Leroy
  2007-12-07 10:44   ` Jean-Christophe Filliâtre
@ 2007-12-07 11:31   ` Berke Durak
  1 sibling, 0 replies; 17+ messages in thread
From: Berke Durak @ 2007-12-07 11:31 UTC (permalink / raw)
  To: caml-list

Xavier Leroy a écrit :
>> Also, is there a simple way to implement a function (perhaps using
>> Obj.magic) which will walk a (possibly circular) network of tuples,
>> arrays, variadic entities and lists, and return the total number of
>> bytes used up by that structure? I see that this should be possible
>> in principle with the present implementation of the runtime if one
>> could get some basic information about the internal type of an
>> array.
> 
> Jean-Christophe Filliâtre's "size" library does exactly this:
> 
>        http://www.lri.fr/~filliatr/software.en.html
> 
> - Xavier Leroy
> 
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs

Hello,

I once had more or less the same problem in the EDOS project with huge 
data structures.

Filliâtre's size library was also growing too large a hashtable,
so I decided to use something else.

My solution was to marshal the data structure and then analyze the 
marshalled data by recomputing its type.

You first write a TYPER module that represents the types in your
marshalled data structure, then use the Analyzer module on your
output.

I was more interested in the size strings were taking in memory,
but it should be easy to modify.  All the ugly parts (parsing the
marshal format, and the combinators for describing the types)
have been defined.
 
https://protactinium.pps.jussieu.fr:12345/svn/edos/users/berke/dvhfz/analyze.ml
-- 
Berke DURAK


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Questions on replacing finalizers and memory footprints
  2007-12-07 11:18     ` forum
@ 2007-12-07 19:54       ` Jean-Christophe Filliâtre
  2007-12-07 21:01         ` forum
  0 siblings, 1 reply; 17+ messages in thread
From: Jean-Christophe Filliâtre @ 2007-12-07 19:54 UTC (permalink / raw)
  To: forum; +Cc: caml-list

forum@x9c.fr a écrit :
>> Indeed. However, note that it uses internally a hash table to store
>> blocks already considered (in order to correctly account for sharing),
>> and thus it is potentially incorrect if the GC moves some blocks during
>> the count, for instance during a resizing of the hash table (which
>> triggers the GC). I don't know how to avoid this issue; any help is welcome.
> 
> Sorry for the noise but doesn't this mean that the "size" function may not
> terminate ?

No, simply that it may count a same block several times, because it was
moved by the GC during a resizing of the hash table, between the first
time it was seen and the next time it is reached from another block.

So you may only overestimate the size of the data. It should not fail,
and should terminate, even on cyclic values.

Jon is reporting segfaults, though, so there must be a bug somewhere in
my code (which uses Obj for obvious reasons). I'll have a look at it if
Jon is able to send some faulty application.

-- 
Jean-Christophe Filliâtre
http://www.lri.fr/~filliatr/


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Questions on replacing finalizers and memory footprints
  2007-12-07 10:44   ` Jean-Christophe Filliâtre
  2007-12-07 10:35     ` Jon Harrop
  2007-12-07 11:18     ` forum
@ 2007-12-07 20:31     ` Christophe Raffalli
  2008-01-23 12:08     ` Hendrik Tews
  3 siblings, 0 replies; 17+ messages in thread
From: Christophe Raffalli @ 2007-12-07 20:31 UTC (permalink / raw)
  To: Jean-Christophe Filliâtre
  Cc: Xavier Leroy, Thomas Fischbacher, caml-list

Jean-Christophe Filliâtre a écrit :
> Xavier Leroy wrote:
>   
>>> Also, is there a simple way to implement a function (perhaps using
>>> Obj.magic) which will walk a (possibly circular) network of tuples,
>>> arrays, variadic entities and lists, and return the total number of
>>> bytes used up by that structure? I see that this should be possible
>>> in principle with the present implementation of the runtime if one
>>> could get some basic information about the internal type of an
>>> array.
>>>       
>> Jean-Christophe Filliâtre's "size" library does exactly this:
>>
>>        http://www.lri.fr/~filliatr/software.en.html
>>     
>
> Indeed. However, note that it uses internally a hash table to store
> blocks already considered (in order to correctly account for sharing),
> and thus it is potentially incorrect if the GC moves some blocks during
> the count, for instance during a resizing of the hash table (which
> triggers the GC). I don't know how to avoid this issue; any help is welcome.
>
>   

Write size in C, marshaling is written with a hashtbl in C probably for 
that same reason ...

Christophe


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Questions on replacing finalizers and memory footprints
  2007-12-07 19:54       ` Jean-Christophe Filliâtre
@ 2007-12-07 21:01         ` forum
  2007-12-08  9:57           ` Alexandre Pilkiewicz
  0 siblings, 1 reply; 17+ messages in thread
From: forum @ 2007-12-07 21:01 UTC (permalink / raw)
  To: caml-list


Le 7 déc. 07 à 20:54, Jean-Christophe Filliâtre a écrit :

> forum@x9c.fr a écrit :
>>> Indeed. However, note that it uses internally a hash table to store
>>> blocks already considered (in order to correctly account for  
>>> sharing),
>>> and thus it is potentially incorrect if the GC moves some blocks  
>>> during
>>> the count, for instance during a resizing of the hash table (which
>>> triggers the GC). I don't know how to avoid this issue; any help  
>>> is welcome.
>>
>> Sorry for the noise but doesn't this mean that the "size" function  
>> may not
>> terminate ?
>
> No, simply that it may count a same block several times, because it  
> was
> moved by the GC during a resizing of the hash table, between the first
> time it was seen and the next time it is reached from another block.
>
> So you may only overestimate the size of the data. It should not fail,
> and should terminate, even on cyclic values.

My mistake, I did not take into account the fact that if a block is  
moved by the
garbage collector, its reference is updated in the hashtable *too*.
Hence the termination guarantee.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Questions on replacing finalizers and memory footprints
  2007-12-07 21:01         ` forum
@ 2007-12-08  9:57           ` Alexandre Pilkiewicz
  2007-12-08 14:20             ` Benjamin Canou
  0 siblings, 1 reply; 17+ messages in thread
From: Alexandre Pilkiewicz @ 2007-12-08  9:57 UTC (permalink / raw)
  To: caml-list

Le Friday 07 December 2007 22:01:23 forum@x9c.fr, vous avez écrit :
> Le 7 déc. 07 à 20:54, Jean-Christophe Filliâtre a écrit :
> My mistake, I did not take into account the fact that if a block is
> moved by the
> garbage collector, its reference is updated in the hashtable *too*.
> Hence the termination guarantee.

I think the problem is with the hash function :

	let hash o = Hashtbl.hash (magic o : int)

If you put an object in the hash table, it is stored under a key that depends 
on it's address a1. Once it's moved by the GC to the address a2, its 
reference is changed to a2, but not its key which is still the hash of a1. So 
when you check after that if your object is allready in the hash table, you 
look under the key hash(a2) if it's allready there, but it's not ! And if you 
are very unlucky (and have very few memory), it might append several time.

One solution could be to store the objects in a normal list and to look at the 
entire list every time. It would be *much* slower on huge structures, but 
probably more "correct" (so if used only for debug purpose, why not..)

let node_list = ref []

let in_list o = List.memq o !node_list

let add_in_list o = node_list := o::!node_list

let reset_list () = node_list := []

-- 
Alexandre Pilkiewicz


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Questions on replacing finalizers and memory footprints
  2007-12-08  9:57           ` Alexandre Pilkiewicz
@ 2007-12-08 14:20             ` Benjamin Canou
  0 siblings, 0 replies; 17+ messages in thread
From: Benjamin Canou @ 2007-12-08 14:20 UTC (permalink / raw)
  To: caml-list

  Hi,

Another solution (not safe if the value is used at the same time in
another thread or signal handler however) is to modify the value while
exploring it.

Here is the principle :
When you explore a block node b, you replace its first field with a
special block saying "I've not restored this node, the original first
field is v" (a tuple (false, v) with a special tag) and you call the
recursive size function on v and remaining fields of b; when you explore
this block again, you don't call the recursive function since the first
field is a special block.
When the calculation is done, you explore the value again, and when you
find a block b with a special block (false, v) as first field, you set
its status to "I'm restoring this block", you call the recursive
reconstruction on v and replace the first field of b with v in the end;
otherwise, you do nothing.

Here is the code, but it does only handle simple blocks (not strings,
etc.) Also since one cannot mark blocks easily, I use a high block tag
which should not occur in most programs, but the code in totally unsafe
and can destroy the value if it is the case. A safer way could be to use
ad hoc custom blocks. As said in caps lock at the beginning of many
source files : "use at your own risk, no warranty"...

open Obj

(* the special tag is 240 , not safe if used in v *)
let calc_size v =
  let rec calc_size v =
    if is_int v then 1 else (
      if size v >= 1 then (
	let v0 = field v 0 in
	  if is_int v0 || tag v0 <> 240 then (
	    let s = ref (size v + 1 (* header *)) in
	    let d = new_block 240 2 in
	      set_field v 0 d ;
	      set_field d 0 (repr false) ;
	      set_field d 1 v0 ;
	      if is_block v0 then
		s := !s + calc_size v0;
	      for i = 1 to size v - 1 do
		if is_block (field v i) then
		  s := !s + calc_size (field v i)
	      done ;
	      !s
	  ) else 0
      ) else 0 (* atoms are preallocated *)
    ) in
  let rec restore v =
    if is_block v then (
      if size v >= 1 then (
	let v0 = field v 0 in
	  if is_block v0 && tag v0 == 240 then (
	    if field v0 0 = repr false then (
	      set_field v0 0 (repr true) ;
	      restore (field v0 1) ;
	      for i = 1 to size v - 1 do
		restore (field v i) ;
	      done ;
	      set_field v 0 (field v0 1)
	    )
	  )
      )
    ) in
  let size = calc_size v in restore v ; size

Benjamin.

PS: My code is not much tested, if you find bugs and/or enhance it with
strings, doubles, etc., I'm interested.

Le samedi 08 décembre 2007 à 10:57 +0100, Alexandre Pilkiewicz a écrit :
> Le Friday 07 December 2007 22:01:23 forum@x9c.fr, vous avez écrit :
> > Le 7 déc. 07 à 20:54, Jean-Christophe Filliâtre a écrit :
> > My mistake, I did not take into account the fact that if a block is
> > moved by the
> > garbage collector, its reference is updated in the hashtable *too*.
> > Hence the termination guarantee.
> 
> I think the problem is with the hash function :
> 
> 	let hash o = Hashtbl.hash (magic o : int)
> 
> If you put an object in the hash table, it is stored under a key that depends 
> on it's address a1. Once it's moved by the GC to the address a2, its 
> reference is changed to a2, but not its key which is still the hash of a1. So 
> when you check after that if your object is allready in the hash table, you 
> look under the key hash(a2) if it's allready there, but it's not ! And if you 
> are very unlucky (and have very few memory), it might append several time.
> 
> One solution could be to store the objects in a normal list and to look at the 
> entire list every time. It would be *much* slower on huge structures, but 
> probably more "correct" (so if used only for debug purpose, why not..)
> 
> let node_list = ref []
> 
> let in_list o = List.memq o !node_list
> 
> let add_in_list o = node_list := o::!node_list
> 
> let reset_list () = node_list := []
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Questions on replacing finalizers and memory footprints
  2007-12-07 10:44   ` Jean-Christophe Filliâtre
                       ` (2 preceding siblings ...)
  2007-12-07 20:31     ` Christophe Raffalli
@ 2008-01-23 12:08     ` Hendrik Tews
  3 siblings, 0 replies; 17+ messages in thread
From: Hendrik Tews @ 2008-01-23 12:08 UTC (permalink / raw)
  To: caml-list

Jean-Christophe Filliâtre <Jean-Christophe.Filliatre@lri.fr>
writes:

   > Jean-Christophe Filliâtre's "size" library does exactly this:
   > 
   >        http://www.lri.fr/~filliatr/software.en.html

   Indeed. However, note that it uses internally a hash table to store
   blocks already considered (in order to correctly account for sharing),
   and thus it is potentially incorrect if the GC moves some blocks during
   the count, for instance during a resizing of the hash table (which
   triggers the GC). I don't know how to avoid this issue; any help is welcome.

Use the normal hash function (ie. the hash depends on the block
and not on its address) and use the blocks themselves as keys!
Then the GC will not change the hash and will update the keys in
the hash table. Of course all in a hash table with physical
equality. I use this solution in memcheck
(http://www.cs.ru.nl/~tews/memcheck/).

The problem is that your runtime will be quadratic in the number
of blocks that are equal (=) but different (not ==). For my
applications of memcheck this is problematic, because the data
contains lots of (ref None) blocks, which cannot be identified
for obvious reasons.

The (apparently abondoned) ocaml-ty branch
(http://www.pps.jussieu.fr/~henry/marshal/) uses a list instead
of a hash table and is therefore quadratic in _all_ blocks. The
performence is only sufficient for toy applications (4 hours for
checking 4MB).

For applications like size or memcheck it would be nice if the GC
could notify the program when it moves blocks.

Bye,

Hendrik

(Better answering late than never...)


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2008-01-23 12:08 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-12-06 11:12 Questions on replacing finalizers and memory footprints Thomas Fischbacher
2007-12-06 12:51 ` [Caml-list] " dmitry grebeniuk
2007-12-06 14:26 ` Richard Jones
2007-12-06 14:57   ` Thomas Fischbacher
2007-12-06 16:50     ` Jon Harrop
2007-12-06 21:33       ` forum
2007-12-07  8:52 ` Xavier Leroy
2007-12-07 10:44   ` Jean-Christophe Filliâtre
2007-12-07 10:35     ` Jon Harrop
2007-12-07 11:18     ` forum
2007-12-07 19:54       ` Jean-Christophe Filliâtre
2007-12-07 21:01         ` forum
2007-12-08  9:57           ` Alexandre Pilkiewicz
2007-12-08 14:20             ` Benjamin Canou
2007-12-07 20:31     ` Christophe Raffalli
2008-01-23 12:08     ` Hendrik Tews
2007-12-07 11:31   ` Berke Durak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).