Well, you could use resizeable arrays instead of lists for each bucket.If you have 100M items, each "cons" becomes fairly expensive. A pointer per item is 400MB... I'm a bit surprised that the C++ program only required 800MB - that would be 8 bytes exactly per item; if each item is an int (4 bytes) and a double (8 bytes), it doesn't add up. Or are you using single precision floats and arrays everywhere (no objects, structs of any kind) ? The most memory efficient representation in OCaml would probably be a couple of arrays, ints and floats. For an item indexed by j, the int value is ints.(j) and the float value is in floats.(j). ints.(j) == [| i0; ... |] floats.(j) == [ f0; ... |] --f On Feb 19, 2008 9:18 PM, John Caml wrote: > Thank you all for the assistance. > > I've resolved the Stack_overflow problem by using an Array instead of > a Hashtbl; my keys were just consecutive integers, so this later > approach is clearly preferable. > > However, the memory usage is still pretty bad...it takes nearly an > order of magnitude more memory than the equivalent C++ program. While > the C++ program required 800 MB, my ocaml program requires roughly 6 > GB. Am I doing something very inefficiently? My revised code appears > below. > > Also, if you have any other coding suggestions I'd appreciate hearing > them. I'm a long-time coder but new to Ocaml and eager to learn. > > > -------------- > > exception SplitError > > > let loadWholeFile filename = > let infile = open_in filename > and movieMajor = Array.make 17770 [] in > > let rec loadLines count = > let line = input_line infile in > let murList = Pcre.split line in > > match murList with > | m::u::r::[] -> > let rFloat = float_of_string r > and mInt = int_of_string m > and uInt = int_of_string u in > > let newElement = (uInt, rFloat) > and oldList = movieMajor.(mInt) in > let newList = List.rev_append [newElement] oldList in > Array.set movieMajor mInt newList; > > if (count mod 1000000) == 0 then begin > Printf.printf "count: %d\n" count; > flush stdout; > end; > > loadLines (count + 1) > > | _ -> raise SplitError > in > > try > loadLines 0 > with > End_of_file -> close_in infile; > movieMajor > ;; > > > let filename = Sys.argv.(1);; > let str = loadWholeFile filename;; > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs >