From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: * X-Spam-Status: No, score=1.0 required=5.0 tests=AWL,SPF_NEUTRAL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by yquem.inria.fr (Postfix) with ESMTP id C42EBBBC1 for ; Wed, 20 Feb 2008 06:18:41 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAAAGNHu0fRVca6k2dsb2JhbACQUgEBAQEHBAQLCBaXZ4c/ X-IronPort-AV: E=Sophos;i="4.25,379,1199660400"; d="scan'208";a="22813865" Received: from rv-out-0910.google.com ([209.85.198.186]) by mail4-smtp-sop.national.inria.fr with ESMTP; 20 Feb 2008 06:18:34 +0100 Received: by rv-out-0910.google.com with SMTP id k20so1729955rvb.3 for ; Tue, 19 Feb 2008 21:18:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=NHVkYuR46VvRugJp5Nxz91+WTbLt86Bm1LTPpZ+s5MA=; b=xJpROOpsYHfszzgZ3+fOc5Jyixrx23EJhIPeEm2Rjk5bDKxxwcbJxjdZ+8scceSVhn2/zan0Q/fwm4AoEsCacFSugJiEJ1rLL64x51SthFxiXD58kEzD/WMrgFnyQj5dndkLlFudApwwHaeMOrBTnuhgE7hw8IqVTGoAa8Gcwbk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=lWRx/4icJ9LCoKTYZKH2aISgaMkiNJaQv+tYid/9fGS8LhAu46bkrlI2N5w/kY9m6xRtsdB+TKGcc5jWKXuqBFPKgvbfJBPgd2Q0MAkOaoHHm8FY9WCm1zOhXXfmQpwa+7P0o/AyqxLVP0VZxHf3Ojkp8mtIHJHjKDNZwIMsKgI= Received: by 10.140.136.6 with SMTP id j6mr5344801rvd.199.1203484712707; Tue, 19 Feb 2008 21:18:32 -0800 (PST) Received: by 10.141.77.9 with HTTP; Tue, 19 Feb 2008 21:18:32 -0800 (PST) Message-ID: <33d2b3f70802192118p4d887212mcf76e34447c54e52@mail.gmail.com> Date: Tue, 19 Feb 2008 21:18:32 -0800 From: "John Caml" To: caml-list@yquem.inria.fr Subject: Re: large hash tables In-Reply-To: <55e81f00-5ef7-4946-9272-05595299e114@41g2000hsc.googlegroups.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <55e81f00-5ef7-4946-9272-05595299e114@41g2000hsc.googlegroups.com> X-Spam: no; 0.00; hash:01 hashtbl:01 integers:01 ocaml:01 long-time:01 ocaml:01 infile:01 infile:01 pcre:01 printf:01 printf:01 stdout:01 argv:01 stack:01 rec:01 Thank you all for the assistance. I've resolved the Stack_overflow problem by using an Array instead of a Hashtbl; my keys were just consecutive integers, so this later approach is clearly preferable. However, the memory usage is still pretty bad...it takes nearly an order of magnitude more memory than the equivalent C++ program. While the C++ program required 800 MB, my ocaml program requires roughly 6 GB. Am I doing something very inefficiently? My revised code appears below. Also, if you have any other coding suggestions I'd appreciate hearing them. I'm a long-time coder but new to Ocaml and eager to learn. -------------- exception SplitError let loadWholeFile filename = let infile = open_in filename and movieMajor = Array.make 17770 [] in let rec loadLines count = let line = input_line infile in let murList = Pcre.split line in match murList with | m::u::r::[] -> let rFloat = float_of_string r and mInt = int_of_string m and uInt = int_of_string u in let newElement = (uInt, rFloat) and oldList = movieMajor.(mInt) in let newList = List.rev_append [newElement] oldList in Array.set movieMajor mInt newList; if (count mod 1000000) == 0 then begin Printf.printf "count: %d\n" count; flush stdout; end; loadLines (count + 1) | _ -> raise SplitError in try loadLines 0 with End_of_file -> close_in infile; movieMajor ;; let filename = Sys.argv.(1);; let str = loadWholeFile filename;;