caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: "Jon Harrop" <jon@ffconsultancy.com>
To: <caml-list@yquem.inria.fr>
Subject: [Caml-list] Mark-region GC
Date: Sun, 9 Jan 2011 14:14:39 -0000	[thread overview]
Message-ID: <01b901cbb007$8e525090$aaf6f1b0$@com> (raw)

Thought you guys might appreciate my recent foray into GC design:

 
http://flyingfrogblog.blogspot.com/2011/01/importance-of-locality-and-sparsi
ty-in.html
  http://flyingfrogblog.blogspot.com/2011/01/sweeping-700-faster.html

In summary, I think I have found a way to make a simple GC for HLVM that is
almost as fast as OCaml's on serial functional code, significantly faster on
imperative code and can easily support both parallelism among mutators and
even parallel+concurrent collection.

Specifically, I noted that the advantage of OCaml's generational GC is
largely its ability to sweep large numbers of dead infants from the nursery
in constant time but the disadvantage is the overhead of evacuating
survivors. I was able to mimic the advantage without the disadvantage by
allocating regions and storing bitvectors representing the blocks within the
region that are allocated and marked. Hundreds of heap allocated blocks in a
region can then be swept simply by reading those two bitvectors, computing
the bitwise AND and writing the result as an updated allocated bitvector
(with everything unmarked zeroed out). Moreover, 512-bit bitvectors fill a
cache line and offer near-optimal granularity for the rest of the system.
Thus, the new collector is sweeping at a rate of 40Gb of heap blocks per
second (!), not only for the "nusery" (thread-local region) but potentially
also for the entire "major heap" (global queues of full and non-full
regions).

I was also able to mimic the advantage of Appel's semi-generational
collector with the added benefit of reducing synchronization by sweeping the
thread-local region when it fills. Many local sweeps are typically performed
before the local region fills so contention for obtaining new regions from
the global queue is greatly reduced.

However, my design does degrade the performance of allocation. Rather than
using a bump allocator, I search for the next non-255 byte in the allocated
bitvector and then use a 255-element lookup table to find the next zero bit,
set it, calculate and return the corresponding pointer into the region. I
haven't profiled the overhead of this allocator but my C++ prototype gets
within 10% of OCaml's performance on the list-based n-queens solver. I am
*extremely* happy with that result! :-)

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com


                 reply	other threads:[~2011-01-09 14:14 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='01b901cbb007$8e525090$aaf6f1b0$@com' \
    --to=jon@ffconsultancy.com \
    --cc=caml-list@yquem.inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).