From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=AWL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by yquem.inria.fr (Postfix) with ESMTP id 73806BBC1 for ; Fri, 22 Feb 2008 15:19:38 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ao8CAKJovkdCm3xr/2dsb2JhbACMQoVwnVQ X-IronPort-AV: E=Sophos;i="4.25,391,1199660400"; d="scan'208";a="7583165" Received: from janestcapital.com (HELO smtp.janestcapital.com) ([66.155.124.107]) by mail2-smtp-roc.national.inria.fr with ESMTP; 22 Feb 2008 15:19:37 +0100 Received: from [172.25.129.161] [38.96.172.125] by janestcapital.com with ESMTP (SMTPD-9.10) id A9F801A8; Fri, 22 Feb 2008 09:19:36 -0500 Message-ID: <47BED9F6.4050700@janestcapital.com> Date: Fri, 22 Feb 2008 09:19:34 -0500 From: Brian Hurt User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.2) Gecko/20040804 Netscape/7.2 (ax) X-Accept-Language: en-us, en MIME-Version: 1.0 To: John Caml Cc: caml-list@yquem.inria.fr Subject: Re: [Caml-list] large hash tables References: <33d2b3f70802191501v39346a56x4b1852b84d4067b4@mail.gmail.com> <1203522851.47bc4d2310d0c@webmail.in-berlin.de> <33d2b3f70802211445q7781d296ka7dd94114b8033b1@mail.gmail.com> In-Reply-To: <33d2b3f70802211445q7781d296ka7dd94114b8033b1@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam: no; 0.00; hash:01 ocaml:01 int's:01 int's:01 stdio:01 printf:01 sizeof:01 ocaml:01 bigarray:01 874:98 losses:98 wrote:01 ints:01 caml-list:01 immutable:01 John Caml wrote: >The equivalent C++ program uses 874 MB of memory in total. Each of the >1 million records is stored in a vector using 1 single-precision float >and 1 int. Indeed, my machine is AMD64 so Ocaml int's are presumably 8 >bytes. > > C int's on AMD64 are still 4 bytes- longs are 8 bytes. You can prove this by compiling a quick program: #include int main(void) { printf("Ints are %lu bytes long.\n", (unsigned long) sizeof(int)); return 0; } >I've rewritten my Ocaml program again, this time using Bigarray. Its >memory usage is now the same as under C++, so that's good news. >However, my program is quite ugly now, and it's actually more than >twice as long as my C++ program. Any suggestions for simplifying this >program? The way I initialize the "movieMajor" Array seems especially >wonky, but I couldn't figure out a better way. > > > It's generally a good idea to back off and think about what problem you're trying to solve. Where Ocaml generally wins on memory utilization is using immutable data structures and sharing data, instead of copying them. This is where a lot of decisions Ocaml made on how to represent things suddenly make a lot of sense, if you think in terms of data sharing. And in lots of complicated "real" code, the memory gains made by sharing are huge compared to the losses not incurred by not copying. Brian