From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id C681BBC88 for ; Mon, 7 Feb 2005 03:22:35 +0100 (CET) Received: from ptb-relay02.plus.net (ptb-relay02.plus.net [212.159.14.213]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id j172MZP3020855 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Mon, 7 Feb 2005 03:22:35 +0100 Received: from [80.229.56.224] (helo=chetara) by ptb-relay02.plus.net with esmtp (Exim) id 1CxyXq-0000dT-Vy for caml-list@yquem.inria.fr; Mon, 07 Feb 2005 02:22:35 +0000 From: Jon Harrop Organization: University of Cambridge Subject: Fwd: Re: [Caml-list] The boon of static type checking Date: Mon, 7 Feb 2005 02:24:23 +0000 User-Agent: KMail/1.7.1 To: caml-list@yquem.inria.fr MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200502070224.23304.jon@jdh30.plus.com> X-Miltered: at concorde with ID 4206D0EB.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; caml-list:01 wrote:01 stack:01 recursive:01 unbalanced:01 corresponds:01 ocaml:01 recursive:01 unreasonable:01 binary:01 stl:01 stl:01 ocaml's:01 bindings:01 hashtbl:01 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on yquem.inria.fr X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.2 X-Spam-Level: On Sunday 06 February 2005 22:26, Radu Grigore wrote: > By "fork lineages" you mean keeping "old" versions of the map around? Exactly, "persistence". > > I would like to think that your stack will be big enough to fit 40 > > recursive calls if you expect your computer to handle trillion-element > > maps! > > In the worst case (most unbalanced) 40 corresponds to a few million > keys, not trillions. If I haven't made a mistake the minimum number of > keys in a (OCaml) Map with height h is given by the recurrence: > T(h)=1+T(h-1)+T(h-3). I think you are right. Regardless, your computer is unlikely to struggle with 80 recursive function calls. > > > Yet another problem is the missing cardinal function. > > > > An arbitrary number of functions are "missing". You can't expect INRIA to > > implement all infinity of them for you! :-) > > Considering the fact that the code in set.ml and map.ml looks like it > is copy&pasted I guess asking for some consistency in the interface is > not unreasonable. In any case copy&pasting the cardinal function > wouldn't provide better performance (I think). Yes, Set's cardinal looks O(n) to me. So you're asking for an equally inefficient cardinal function in Map? ;-) Personally, I'd like to see a core balanced binary tree implementation with various different set and map (and other) data structures built from it. > > The STL will probably take <<1ms because the STL's size() is probably > > O(1) whereas Map's fold is O(n). In contrast, forking lineages is > > probably O(n) with the STL (requiring a copy) and O(1) with ocaml's Map. > > I haven't actually tested with STL but the implementation of size() is > indeed a simple return. Do you have an example where forking lineages > is useful? I often exploit persistence when writing interpreters in a functional style, to handle bindings in an inner scope. Note that more intelligent people use an imperative style and the seemingly-quirky Hashtbl for this... > As I said, I don't doubt there are situations where it is > useful; just that right now I'm having trouble coming up with a > realistic example. I can't think of any other places I've used persistence... > I am exploring possibilities right now. My only "complaint" is that > switching from Hashtbl to Map for a dictionary of about 200000 keys > decreases the performance a LOT! I would have expected a factor of > 20-30 but it seems to be a lot more: probably due to memory management > (as Gerd Stolpmann notices) due to the functional nature of Map. Now that I've had a little more time to think about it, I can think of three possible causes of this poor performance. The most likely is the O(ln n) (unnecessary) string comparisons it will be doing (particularly if your strings are similar "lexicographically"), compared to a single hash. Next is the time spent in allocate and GC, as someone else said. Third is cache incoherence, as I said before. You could profile your code to work out which but you might as well skip this and just use a more appropriate data structure. Or use Hashtbl and sort, as Gerd says. If you really want to stick with Map then you could also store a hash along with each string, so the comparison acts on the hash before it acts on the string. This should greatly speed up the comparisons (oh, and write your own comparison function, using String.compare and not Pervasives.compare) and, I think, the whole program. -- Dr Jon D Harrop, Flying Frog Consultancy Ltd.