caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Brian Hurt <bhurt@janestcapital.com>
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] OCaml runtime using too much memory in 64-bit Linux
Date: Wed, 14 Nov 2007 08:45:31 -0500	[thread overview]
Message-ID: <473AFBFB.1090802@janestcapital.com> (raw)
In-Reply-To: <473AF04C.7030107@inria.fr>

Xavier Leroy wrote:

>
>If the problem is confirmed, there are several ways to go about it.
>One is to implement the page table with a sparse data structure,
>e.g. a hash table.  However, the major GC and some primitives like
>polymorphic equality perform *lots* of page table lookups, so a
>performance hit is to be expected.  
>

I've been contemplating doing this on my own (not at work, I comment), 
just to see how much of a hit it is.  If no one else steps up to the 
plate, I will.

One important comment I will make, in moving to a hash table the size of 
pages has to increase.  The current implementation uses 1 byte per (4K) 
page for the map- a hash table would use, I project, about 4 words (16 
or 32 bytes) per page.  To keep the memory utilization equal, we'd need 
to go for a larger page size- 64K for 32-bit systems.  I'd be inclined 
to go larger than that- the 4K page size was standardized back when the 
average system (using memory protection) has 1MB, and 16MB was a huge 
amount of memory.  Memory sizes have increased 1024-fold in the same 
time, meaning I could make an argument for 4M pages.  I'm not sure I 
would (I think you'd run into a fragmentation problem on 32-bit), but it 
makes the page sizes I'd probably settle on (256K for 32-bit, 1M for 
64-bit) a lot more palatable. :-)

An advantage large pages would have is that they'd make the table a lot 
smaller, and thus a lot more cache friendly.  I mean, think about it- 
1GB of 4K pages at 1 byte per page is 256K, while 1GB of 256K pages at 
16 bytes per page is 64K, 1/4 the size.  1GB of 1M pages at 32 bytes per 
page is 32K, smaller yet.  The smaller table is more likely to fit into 
cache, and cheaper to load into cache.  While I wouldn't want to 
gaurentee anything, I could easily see the smaller table size that fits 
into cache actually gives a performance boost.  I've certainly seen 
weirder things happen.

>The other is to revise OCaml's
>data representations so that the GC and polymorphic primitives no
>longer need to know which pointers fall in the major heap.  This seems
>possible in principle, but will take quite a bit of work and break a
>lot of badly written C/OCaml interface code.  You've been warned :-)
>
>  
>
Of the two, I think I like the hashtable idea better.

If it isn't clear, this whole email is just speaking for myself.  I'm 
not allowed to speak for Jane Street (heck, if I'm luck, I'm allowed to 
speak *to* Jane Street :-).

Brian


  reply	other threads:[~2007-11-14 13:45 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-11-07 17:28 Adam Chlipala
2007-11-07 18:20 ` [Caml-list] " Gerd Stolpmann
2007-11-07 19:12   ` Adam Chlipala
2007-11-08 12:56     ` Samuel Mimram
2007-11-14  4:20     ` Romain Beauxis
2007-11-14 12:03       ` Vladimir Shabanov
2007-11-14 12:55         ` Xavier Leroy
2007-11-14 13:45           ` Brian Hurt [this message]
2007-11-14 14:16           ` Romain Beauxis
2007-11-14 15:56           ` Markus Mottl
2007-11-14 16:22           ` Stefan Monnier
2007-11-14 16:36             ` [Caml-list] " Brian Hurt
2007-11-14 17:08               ` Lionel Elie Mamane
2007-11-14 17:26               ` Stefan Monnier
2007-11-14 16:45             ` Lionel Elie Mamane
2007-11-14 17:08               ` Lionel Elie Mamane
2007-11-08 20:51 ` [Caml-list] " Romain Beauxis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=473AFBFB.1090802@janestcapital.com \
    --to=bhurt@janestcapital.com \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).