Re: large hash tables

* Re: large hash tables
       [not found] ` <55e81f00-5ef7-4946-9272-05595299e114@41g2000hsc.googlegroups.com>
@ 2008-02-20  5:18   ` John Caml
  2008-02-20  6:11     ` [Caml-list] " Francois Rouaix
                       ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: John Caml @ 2008-02-20  5:18 UTC (permalink / raw)
  To: caml-list

Thank you all for the assistance.

I've resolved the Stack_overflow problem by using an Array instead of
a Hashtbl; my keys were just consecutive integers, so this later
approach is clearly preferable.

However, the memory usage is still pretty bad...it takes nearly an
order of magnitude more memory than the equivalent C++ program. While
the C++ program required 800 MB, my ocaml program requires roughly 6
GB. Am I doing something very inefficiently? My revised code appears
below.

Also, if you have any other coding suggestions I'd appreciate hearing
them. I'm a long-time coder but new to Ocaml and eager to learn.

--------------

exception SplitError

let loadWholeFile filename =
    let infile = open_in filename
    and movieMajor = Array.make 17770 [] in

    let rec loadLines count =
        let line = input_line infile in
        let murList = Pcre.split line in

        match murList with
            | m::u::r::[] ->
                let rFloat = float_of_string r
                and mInt = int_of_string m
                and uInt = int_of_string u in

                let newElement = (uInt, rFloat)
                and oldList = movieMajor.(mInt) in
                let newList = List.rev_append [newElement] oldList in
                Array.set movieMajor mInt newList;

                if (count mod 1000000) == 0 then begin
                    Printf.printf "count: %d\n" count;
                    flush stdout;
                    end;

                    loadLines (count + 1)

            | _ -> raise SplitError
  in

    try
        loadLines 0
    with
        End_of_file -> close_in infile;
        movieMajor
;;

let filename = Sys.argv.(1);;
let str = loadWholeFile filename;;

^ permalink raw reply	[flat|nested] 10+ messages in thread