caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Richard Jones <rich@annexia.org>
To: John Caml <camljohn42@gmail.com>
Cc: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] large hash tables
Date: Fri, 22 Feb 2008 00:33:15 +0000	[thread overview]
Message-ID: <20080222003315.GA5326@annexia.org> (raw)
In-Reply-To: <33d2b3f70802211445q7781d296ka7dd94114b8033b1@mail.gmail.com>

Mine version's a bit longer than your version, but hopefully more
idiomatic and easier to understand.

Program - http://www.annexia.org/tmp/movies.ml
Create the test file - http://www.annexia.org/tmp/make_movies.ml

It's best to read the program like this:

(1) Start with the _interface_ ('signature') of the new ExtArray1
module & type.  _Ignore_ the implementation of this module for now.

(2) Then look at the main part of the program (from where we allocate
the result array down through the loop which reads the data).

(3) Then look at the implementation of the module.  The main
complexity is that you can't just extend a Bigarray, but you have to
keep reallocating it (in large chunks for efficiency).

I measured it as taking some 230 MB for a 10 million line data file,
but that doesn't necessarily mean it'll take 2 GB for 100 million
lines because there's some space overhead which will decline as a
proportion of the total memory used.

Rich.

-- 
Richard Jones
Red Hat


  reply	other threads:[~2008-02-22  0:33 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-19 23:01 John Caml
2008-02-19 23:34 ` [Caml-list] " Gabriel Kerneis
2008-02-19 23:36 ` Gerd Stolpmann
2008-02-19 23:51 ` Francois Rouaix
2008-02-20  9:37   ` Berke Durak
2008-02-20  9:56     ` Berke Durak
2008-02-20 12:48 ` Richard Jones
2008-02-20 15:54 ` Oliver Bandel
2008-02-21 22:45   ` John Caml
2008-02-22  0:33     ` Richard Jones [this message]
2008-02-24  5:39       ` John Caml
2008-02-22 14:19     ` Brian Hurt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080222003315.GA5326@annexia.org \
    --to=rich@annexia.org \
    --cc=caml-list@yquem.inria.fr \
    --cc=camljohn42@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).