caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: "Francois Rouaix" <francois.rouaix@gmail.com>
To: "John Caml" <camljohn42@gmail.com>
Cc: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] large hash tables
Date: Tue, 19 Feb 2008 15:51:08 -0800	[thread overview]
Message-ID: <f2d6cc680802191551r2385d6b4p89af2bbd1db4ff15@mail.gmail.com> (raw)
In-Reply-To: <33d2b3f70802191501v39346a56x4b1852b84d4067b4@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2978 bytes --]

In the resizing code there is a non-tailrec function (insert_bucket). This
is most likely causing the stack overflow, as I can't see any other non tail
recursive function at first glance. Looks like it's not tail rec in order to
maintain an invariant on the order of elements. If that invariant is not
useful to you, you might want to write a slightly different version of the
Hashtbl module, where insert_bucket would be tail rec.
Also, during resizing, memory usage will be twice the memory required for
the table (roughly), since the bucket array remains available until the
resize is completed, so all the bucket contents exist in two versions (old
and new). You might want to stick to a large initial size and do not attempt
resizing.
--f


2008/2/19 John Caml <camljohn42@gmail.com>:

> I need to load roughly 100 million items into a memory based hash table,
> where each key is an integer and each value is an integer and a float. Under
> C++ stdlib's map library, this requires 800 MB of memory. Under my naive
> porting of my C++ program to Ocaml, only 12 million rows get loaded before I
> get the program crashes with the message "Fatal error: exception
> Stack_overflow". At the time of the crash, nearly 2 GB of memory are in use.
> (My machine -- AMD64 architecture -- has 8 GB of memory, so insufficient
> memory is not the issue.)
>
> Two questions:
>
> 1. How can I overcome the Stack_overflow error? (I'm already compiling
> with ocamlopt rather than ocamlc, which seems to have increased the stack
> size from 500 MB to 2 GB.) I'd like to be able to use the full 8 GB on my
> machine.
> 2. How can I implment a more efficient hash table? My Ocaml program seems
> to require 10x more memory than under C++.
>
> My code appears below. Thank you very much.
>
> --John
>
>
> --------------
> exception SplitError
>
> let read_whole_chan chan =
>     let movieMajor = Hashtbl.create 777777 in
>
>     let rec loadLines count =
>         let line = input_line chan in
>         let murList = Pcre.split line in
>         match murList with
>             | m::u::r::[] ->
>                 let rFloat = float_of_string r in
>                 Hashtbl.add movieMajor m (u, rFloat);
>                 if (count mod 10000) == 0 then Printf.printf "count: %d,
> m: %s, u: %s, r: %f \n" count m u rFloat;
>                 loadLines (count + 1)
>             | _ -> raise SplitError
>   in
>
>     try
>         loadLines 0
>     with
>         End_of_file -> ()
>     ;;
>
> let read_whole_file filename =
>     let chan = open_in filename in
>     read_whole_chan chan
>     ;;
>
> let filename = Sys.argv.(1);;
>
> let str = read_whole_file filename;;
>
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
>

[-- Attachment #2: Type: text/html, Size: 4256 bytes --]

  parent reply	other threads:[~2008-02-19 23:51 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-19 23:01 John Caml
2008-02-19 23:34 ` [Caml-list] " Gabriel Kerneis
2008-02-19 23:36 ` Gerd Stolpmann
2008-02-19 23:51 ` Francois Rouaix [this message]
2008-02-20  9:37   ` Berke Durak
2008-02-20  9:56     ` Berke Durak
2008-02-20 12:48 ` Richard Jones
2008-02-20 15:54 ` Oliver Bandel
2008-02-21 22:45   ` John Caml
2008-02-22  0:33     ` Richard Jones
2008-02-24  5:39       ` John Caml
2008-02-22 14:19     ` Brian Hurt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f2d6cc680802191551r2385d6b4p89af2bbd1db4ff15@mail.gmail.com \
    --to=francois.rouaix@gmail.com \
    --cc=caml-list@yquem.inria.fr \
    --cc=camljohn42@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).