From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: * X-Spam-Status: No, score=1.0 required=5.0 tests=AWL,SPF_NEUTRAL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by yquem.inria.fr (Postfix) with ESMTP id 906AEBBC1 for ; Sun, 24 Feb 2008 06:39:57 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAAAP6RwEfRVca5kmdsb2JhbACQWwEBAQEHBAQJChaBD5JbhXA X-IronPort-AV: E=Sophos;i="4.25,397,1199660400"; d="scan'208";a="22950561" Received: from rv-out-0910.google.com ([209.85.198.185]) by mail4-smtp-sop.national.inria.fr with ESMTP; 24 Feb 2008 06:39:55 +0100 Received: by rv-out-0910.google.com with SMTP id k20so616711rvb.3 for ; Sat, 23 Feb 2008 21:39:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=gjfnyRswq8ENmE4qk/3rGY3Ym3TfUI6lZW6KC9bvrJE=; b=WxQYsywrU0cJ1eSfBomfTU5EQNiHhEzBQ+8emn/icZXVHKd1VrOplBBIboUoEmONLpvABngME6qeRnpKvepSWN8qoaiYkUP7Dfr4WWuakIS8cosG68UcQU5Owujnou35aGfmbBbNya4VDfNRY6ASrfq+jwK6EBHGrI5WsqykliA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=eiaEFQBS99OW6uD1k6axOOYMVt7SGH3uc25Q/rWlBRjVemXAkptViSFWy1GDIY67ChAcJ4VrFnGgvwhiO2XzJUcoOOaoreVLyxS3qZ39/UBQp88mUb1psT5prByfvpds379wuOYYzXMDf9SSDU48d3kgwaHI1wNZRN15tut4wGA= Received: by 10.140.144.4 with SMTP id r4mr864012rvd.15.1203831593710; Sat, 23 Feb 2008 21:39:53 -0800 (PST) Received: by 10.141.77.9 with HTTP; Sat, 23 Feb 2008 21:39:53 -0800 (PST) Message-ID: <33d2b3f70802232139x6c4cb118ycdccd42c90d8644e@mail.gmail.com> Date: Sat, 23 Feb 2008 21:39:53 -0800 From: "John Caml" To: "Richard Jones" Subject: Re: [Caml-list] large hash tables Cc: caml-list@yquem.inria.fr In-Reply-To: <20080222003315.GA5326@annexia.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <33d2b3f70802191501v39346a56x4b1852b84d4067b4@mail.gmail.com> <1203522851.47bc4d2310d0c@webmail.in-berlin.de> <33d2b3f70802211445q7781d296ka7dd94114b8033b1@mail.gmail.com> <20080222003315.GA5326@annexia.org> X-Spam: no; 0.00; hash:01 bigarray:01 chunks:01 1.2:98 wrote:01 caml-list:01 data:02 data:02 caml:02 idiomatic:02 module:03 module:03 overhead:04 thu:05 tmp:05 Richard, Thank you so much revising my program. I learned a lot from reading over your changes, and the program works very nicely now. 1.2 GB for all 1 million items, which is efficient enough for all practical purposes. Thanks again. John On Thu, Feb 21, 2008 at 4:33 PM, Richard Jones wrote: > Mine version's a bit longer than your version, but hopefully more > idiomatic and easier to understand. > > Program - http://www.annexia.org/tmp/movies.ml > Create the test file - http://www.annexia.org/tmp/make_movies.ml > > It's best to read the program like this: > > (1) Start with the _interface_ ('signature') of the new ExtArray1 > module & type. _Ignore_ the implementation of this module for now. > > (2) Then look at the main part of the program (from where we allocate > the result array down through the loop which reads the data). > > (3) Then look at the implementation of the module. The main > complexity is that you can't just extend a Bigarray, but you have to > keep reallocating it (in large chunks for efficiency). > > I measured it as taking some 230 MB for a 10 million line data file, > but that doesn't necessarily mean it'll take 2 GB for 100 million > lines because there's some space overhead which will decline as a > proportion of the total memory used. > > > > Rich. > > -- > Richard Jones > Red Hat >