caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Michael Ekstrand <michael@elehack.net>
To: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] How does OCaml update references when values are moved by the GC?
Date: Sat, 30 Oct 2010 12:38:10 -0500	[thread overview]
Message-ID: <4CCC5802.4080909@elehack.net> (raw)
In-Reply-To: <018c01cb7856$1e228350$5a6789f0$@com>

On 10/30/2010 12:15 PM, Jon Harrop wrote:
> I was hoping for a little more detail, of course. :-)
> 
> How is the mapping from old to new pointers stored? Does the GC rewrite all
> of the thread-local stacks in series before allowing any of them to
> continue?

I imagine so.

> Does the write barrier record pointers written into the major heap
> so only those specific locations are rewritten at minor heap collections but
> the entire major heap is rewritten upon compaction?

Yes.  The runtime maintains a 'remembered set', a list of pointers from
the major heap back to the minor heap.  Maintaining this set is why
mutable data can be expensive in OCaml - any time you store a pointer
into a mutable field, the runtime must check whether the new link is
from the major to the minor heap and update the refs list accordingly.
Richard WM Jones has details here:

http://rwmj.wordpress.com/2009/08/08/ocaml-internals-part-5-garbage-collection/

> Can the GC distinguish
> between an array of ints and an array of pointers at run-time in order to
> avoid traversing all of the ints when trying to rewrite pointers?

Not that I know of.  The tag block does not have a documented reserved
value to indicate that - there are values to indicate an unboxed float
array, a string, and an array of opaque values, but not an integer array
(unless the opaque value flag is set for integer arrays).

> Also, any idea what the maximum proportion of the running time of a program
> is spent doing this rewriting? For example, if you fill a float->float hash
> table with values does updating the pointers in the major heap account for a
> significant proportion of the total running time of the program?

In my data analysis jobs (which wind up allocating quite large heaps),
the compactor almost never (detectably) runs.  Minor cycles and major
slices are a much larger concern in my experience.  I work around that
by increasing the minor heap size to decrease minor heap thrashing (my
general rule of thumb is that I want each "work unit", whatever that may
be, to fit in the minor heap).

It could well be that other applications will have different
characteristics that trigger more compactions.  I cannot speak for those
applications.  Further, when I have huge floating-point data structures,
I'm usually using bigarrays (not because I choose them over arrays,
typically, but because such code in my work frequently has to interact
with BLAS or SPARSKIT at some point).

- Michael


  reply	other threads:[~2010-10-30 17:38 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <418632253.26199.1288302511712.JavaMail.root@zmbs1.inria.fr>
2010-10-29  7:47 ` Damien Doligez
2010-10-30 11:14   ` Elias Gabriel Amaral da Silva
2010-10-30 17:15   ` Jon Harrop
2010-10-30 17:38     ` Michael Ekstrand [this message]
2010-10-30 20:40       ` Jon Harrop
2010-11-03  9:49         ` Richard Jones
2010-10-30 20:40   ` Jon Harrop
     [not found]   ` <1033013046.78315.1288458970997.JavaMail.root@zmbs4.inria.fr>
2010-11-02 10:25     ` Xavier Leroy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CCC5802.4080909@elehack.net \
    --to=michael@elehack.net \
    --cc=caml-list@yquem.inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).