From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id AF9BDBC88 for ; Sun, 6 Feb 2005 15:59:59 +0100 (CET) Received: from smtp1.adl2.internode.on.net (smtp1.adl2.internode.on.net [203.16.214.181]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id j16Exvxx004182 for ; Sun, 6 Feb 2005 15:59:58 +0100 Received: from [192.168.1.200] (ppp212-197.lns2.syd3.internode.on.net [203.122.212.197]) by smtp1.adl2.internode.on.net (8.12.9/8.12.9) with ESMTP id j16Exhh5097187; Mon, 7 Feb 2005 01:29:44 +1030 (CST) Subject: Re: [Caml-list] The boon of static type checking From: skaller Reply-To: skaller@users.sourceforge.net To: Gerd Stolpmann Cc: Radu Grigore , Jon Harrop , caml-list@yquem.inria.fr In-Reply-To: <1107692201.5887.18.camel@localhost.localdomain> References: <891bd33905020213315a2ebb18@mail.gmail.com> <877e9a170502031856175260c8@mail.gmail.com> <877e9a17050203185674680413@mail.gmail.com> <200502041026.56107.jon@jdh30.plus.com> <7f8e92aa0502060222383aac60@mail.gmail.com> <1107692201.5887.18.camel@localhost.localdomain> Content-Type: text/plain Organization: Message-Id: <1107701982.6363.252.camel@pelican.wigram> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-4) Date: 07 Feb 2005 01:59:42 +1100 Content-Transfer-Encoding: 7bit X-Miltered: at concorde with ID 420630ED.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; caml-list:01 sourceforge:01 gerd:01 stolpmann:01 wrote:01 hash:01 indexing:01 hashing:01 glebe:01 061:98 nsw:01 strings:01 strings:01 avoids:01 checking:01 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on yquem.inria.fr X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.2 X-Spam-Level: On Sun, 2005-02-06 at 23:16, Gerd Stolpmann wrote: > For your problem, I would guess that a hash table (in phase 1) plus > sorting (in phase 2) is the relatively best solution (from your > description). That avoids a lot of overhead compared to Map, especially > memory management consumes much less time. If you know something of the distribution of your keys, which are strings, you can also make this much faster by indexing using some suitable monotonic function on the string prefix, and only sorting equivalent strings. EG, 64K word array of lists, indexed by first two bytes of the string, then 64K sorts. If your keys are not well distributed on the first two bytes, use a monotonic hashing function. -- John Skaller, mailto:skaller@users.sf.net voice: 061-2-9660-0850, snail: PO BOX 401 Glebe NSW 2037 Australia Checkout the Felix programming language http://felix.sf.net