From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <skaller@users.sourceforge.net>
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr
X-Spam-Level: 
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled 
	version=3.1.3
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from discorde.inria.fr (discorde.inria.fr [192.93.2.38])
	by yquem.inria.fr (Postfix) with ESMTP id D8B83BC0A
	for <caml-list@yquem.inria.fr>; Tue,  5 Jun 2007 04:24:16 +0200 (CEST)
Received: from ipmail03.adl2.internode.on.net (ipmail03.adl2.internode.on.net [203.16.214.135])
	by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l552OEjO004440
	for <caml-list@yquem.inria.fr>; Tue, 5 Jun 2007 04:24:15 +0200
X-IronPort-AV: E=Sophos;i="4.16,383,1175437800"; 
   d="scan'208";a="99017459"
Received: from ppp9-14.lns1.syd7.internode.on.net (HELO [192.168.1.201]) ([59.167.9.14])
  by ipmail03.adl2.internode.on.net with ESMTP; 05 Jun 2007 11:54:09 +0930
Subject: Re: Fwd: [Caml-list] Comparison of OCaml and MLton for numerics
From: skaller <skaller@users.sourceforge.net>
To: Till Varoquaux <till.varoquaux@gmail.com>
Cc: OCaml <caml-list@yquem.inria.fr>
In-Reply-To: <9d3ec8300706041519y57915f66s9506096a0cb9f9d6@mail.gmail.com>
References: <5195a210705302250u6a9e5adey4ed857480f9e5cd8@mail.gmail.com>
	 <200706011813.48515.jon@ffconsultancy.com> <46641BC1.9090305@cs.umd.edu>
	 <200706041539.03286.jon@ffconsultancy.com>
	 <1180980488.6220.18.camel@rosella.wigram>
	 <9d3ec8300706041518y115d22bdsa120d4010261d841@mail.gmail.com>
	 <9d3ec8300706041519y57915f66s9506096a0cb9f9d6@mail.gmail.com>
Content-Type: text/plain
Date: Tue, 05 Jun 2007 12:24:07 +1000
Message-Id: <1181010247.7344.11.camel@rosella.wigram>
Mime-Version: 1.0
X-Mailer: Evolution 2.10.1 
Content-Transfer-Encoding: 7bit
X-Miltered: at discorde with ID 4664C94E.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)!
X-Spam: no; 0.00; ocaml:01 0200,:01 translated:01 inserting:01 integers:01 integers:01 nodes:01 hashtable:01 iterator:01 caching:01 arrays:01 mutable:01 trie:98 sourceforge:01 garbage:01 

On Tue, 2007-06-05 at 00:19 +0200, Till Varoquaux wrote:
> Forgot the reply-to-all....

> At a first glance these could be translated to persistent data
> structures, although with a cost  (they are "upside down": inserting a
> new element modifies one of the leafs... but this is also the case
> with red/black trees). It doesn't seem to me that most operations
> would be O(1) (sounds too good to be true anyways*), I'd rather guess
> O(ln(n)) (where log is base 256 and n is length of the element we are
> storing/querying etc...). 

Integers are fixed size ... it's an array remember, integers
are the keys, so it's O(1) because ln 64 ~= 1 :) The main thing
is it isn't O(ln N) where N is the number of elements in the array.

> I do wonder how well these would scale when
> used with big elements. they might prove an interesting back-end for
> an implementation of maps (values would be stored in the nodes).

JudySL accepts C style null terminated strings. It sorts them faster
than the GNU sort command. JudyHS tries to work for strings with length
count, and combines a Trie with a Hashtable, but it doesn't supply
an iterator at the moment so it's fairly useless.

I make no claims for non-fixed sized keys.

> Most of the optimizations seem to reside more on the low level details
> of the implementation rather than on the algorithmic side (which seems
> fairly classical from a distance).

Correct .. Tries are already theoretically optimal .. they're just
not very fast in practice on modern machines because size matters
due to caching .. Judy arrays solve that problem, which isn't really
a theoretical one.

However the implementation is mutable .. in a functional version
if you write:

	newA = OldA + element

and OldA isn't reachable now, there's also a load on the garbage
collector which is part of the real cost, and that cost might
be proportional to the number of insertions.. so maybe this
data structure isn't very good in functional form?


-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net