From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from discorde.inria.fr (discorde.inria.fr [192.93.2.38]) by yquem.inria.fr (Postfix) with ESMTP id D8B83BC0A for ; Tue, 5 Jun 2007 04:24:16 +0200 (CEST) Received: from ipmail03.adl2.internode.on.net (ipmail03.adl2.internode.on.net [203.16.214.135]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l552OEjO004440 for ; Tue, 5 Jun 2007 04:24:15 +0200 X-IronPort-AV: E=Sophos;i="4.16,383,1175437800"; d="scan'208";a="99017459" Received: from ppp9-14.lns1.syd7.internode.on.net (HELO [192.168.1.201]) ([59.167.9.14]) by ipmail03.adl2.internode.on.net with ESMTP; 05 Jun 2007 11:54:09 +0930 Subject: Re: Fwd: [Caml-list] Comparison of OCaml and MLton for numerics From: skaller To: Till Varoquaux Cc: OCaml In-Reply-To: <9d3ec8300706041519y57915f66s9506096a0cb9f9d6@mail.gmail.com> References: <5195a210705302250u6a9e5adey4ed857480f9e5cd8@mail.gmail.com> <200706011813.48515.jon@ffconsultancy.com> <46641BC1.9090305@cs.umd.edu> <200706041539.03286.jon@ffconsultancy.com> <1180980488.6220.18.camel@rosella.wigram> <9d3ec8300706041518y115d22bdsa120d4010261d841@mail.gmail.com> <9d3ec8300706041519y57915f66s9506096a0cb9f9d6@mail.gmail.com> Content-Type: text/plain Date: Tue, 05 Jun 2007 12:24:07 +1000 Message-Id: <1181010247.7344.11.camel@rosella.wigram> Mime-Version: 1.0 X-Mailer: Evolution 2.10.1 Content-Transfer-Encoding: 7bit X-Miltered: at discorde with ID 4664C94E.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; ocaml:01 0200,:01 translated:01 inserting:01 integers:01 integers:01 nodes:01 hashtable:01 iterator:01 caching:01 arrays:01 mutable:01 trie:98 sourceforge:01 garbage:01 On Tue, 2007-06-05 at 00:19 +0200, Till Varoquaux wrote: > Forgot the reply-to-all.... > At a first glance these could be translated to persistent data > structures, although with a cost (they are "upside down": inserting a > new element modifies one of the leafs... but this is also the case > with red/black trees). It doesn't seem to me that most operations > would be O(1) (sounds too good to be true anyways*), I'd rather guess > O(ln(n)) (where log is base 256 and n is length of the element we are > storing/querying etc...). Integers are fixed size ... it's an array remember, integers are the keys, so it's O(1) because ln 64 ~= 1 :) The main thing is it isn't O(ln N) where N is the number of elements in the array. > I do wonder how well these would scale when > used with big elements. they might prove an interesting back-end for > an implementation of maps (values would be stored in the nodes). JudySL accepts C style null terminated strings. It sorts them faster than the GNU sort command. JudyHS tries to work for strings with length count, and combines a Trie with a Hashtable, but it doesn't supply an iterator at the moment so it's fairly useless. I make no claims for non-fixed sized keys. > Most of the optimizations seem to reside more on the low level details > of the implementation rather than on the algorithmic side (which seems > fairly classical from a distance). Correct .. Tries are already theoretically optimal .. they're just not very fast in practice on modern machines because size matters due to caching .. Judy arrays solve that problem, which isn't really a theoretical one. However the implementation is mutable .. in a functional version if you write: newA = OldA + element and OldA isn't reachable now, there's also a load on the garbage collector which is part of the real cost, and that cost might be proportional to the number of insertions.. so maybe this data structure isn't very good in functional form? -- John Skaller Felix, successor to C++: http://felix.sf.net