From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id AC9E9BC88 for ; Sun, 6 Feb 2005 18:28:37 +0100 (CET) Received: from ppsw-2.csi.cam.ac.uk (ppsw-2.csi.cam.ac.uk [131.111.8.132]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id j16HSb8w022917 for ; Sun, 6 Feb 2005 18:28:37 +0100 Received: from hermes-1.csi.cam.ac.uk ([131.111.8.51]:45793) by ppsw-2.csi.cam.ac.uk (smtp.hermes.cam.ac.uk [131.111.8.152]:25) with esmtpa (EXTERNAL:jdh30) id 1CxqD3-00030y-7w (Exim 4.44) for caml-list@yquem.inria.fr (return-path ); Sun, 06 Feb 2005 17:28:33 +0000 Received: from prayer by hermes-1.csi.cam.ac.uk (hermes.cam.ac.uk) with local (PRAYER:jdh30) id 1CxqD3-0007xw-Dv (Exim 4.43) for caml-list@yquem.inria.fr (return-path ); Sun, 06 Feb 2005 17:28:33 +0000 From: Jon To: caml-list@yquem.inria.fr Subject: Re: [Caml-list] The boon of static type checking Date: 06 Feb 2005 17:28:33 +0000 X-Mailer: Prayer v1.0.13 X-Originating-IP: [212.219.72.136] In-Reply-To: <7f8e92aa0502060222383aac60@mail.gmail.com> Message-ID: References: <891bd33905020213315a2ebb18@mail.gmail.com> <7f8e92aa0502060222383aac60@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; format=flowed; charset=ISO-8859-1 Sender: jdh30@hermes.cam.ac.uk X-Cam-ScannerInfo: http://www.cam.ac.uk/cs/email/scanner/ X-Cam-AntiVirus: No virus found X-Cam-SpamDetails: Not scanned X-Miltered: at concorde with ID 420653C5.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; caml-list:01 wrote:01 wrote:01 stl:01 stl:01 imho:01 ocaml:01 ocaml:01 recursive:01 recursion:01 stack:01 iirc:01 stack:01 recursive:01 mappings:01 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on yquem.inria.fr X-Spam-Status: No, score=0.5 required=5.0 tests=FROM_ENDS_IN_NUMS autolearn=disabled version=3.0.2 X-Spam-Level: On Feb 6 2005, Radu Grigore wrote: > On Fri, 4 Feb 2005 10:26:55 +0000, Jon Harrop wrote: > > The code to implement a simple tree is unwieldy in C++ (e.g. 1975 LOC > > in stl_tree.h and stl_map.h vs 198 LOC in map.ml). > > The implementation in stl_tree is not a "simple" tree. The difference > in size is also because the STL implementation is more efficient. This is a much smaller effect compared to the verbosity imposed by inherent inefficiencies of the C++ language though. > For one thing RB trees are better balanced than HB(2) trees. The implementation details should be considered secondarily to the functionality provided, IMHO. Thus, I would consider the ability of the (functional) ocaml Set and Map to fork lineages to be more important. There might be an imperative RB-tree ocaml implementation hanging around somewhere if you want to make a more accurate comparison... > Another observation is that fold on maps is not tail recursive. But the depth of recursion on the stack is O(ln_2 n), IIRC. > True, the > stack will probably never go above 40 calls but it is a problem if you > want to accumulate large structures. I would like to think that your stack will be big enough to fit 40 recursive calls if you expect your computer to handle trillion-element maps! > Yet another problem is the missing cardinal function. An arbitrary number of functions are "missing". You can't expect INRIA to implement all infinity of them for you! :-) As someone else has posted, you can easily supplement the current Map with code to keep track of the number of mappings with a good complexity and reasonable efficiency. A reason not to implement the cardinal function is the overhead of having counters in the nodes of the tree or the cost of testing for membership with every add/remove. > I have a StringMap with about 200000: it > takes about 3 minutes (!!!) to get its length using a fold. I guess this poor performance is due to cache incoherence but, of course, this is the wrong data structure for the job. > That would > take something like 1 millisecond in C++. The STL will probably take <<1ms because the STL's size() is probably O(1) whereas Map's fold is O(n). In contrast, forking lineages is probably O(n) with the STL (requiring a copy) and O(1) with ocaml's Map. > I'm a bit annoyed right now with OCaml's Map because I'm writting a > tool where I need to construct big associative tables from strings to > other data. An implementation with Hashtbl makes the program work in 1 > minute. The problem is that the next execution phase needs to fold > over the tables in increasing order of keys. Simple, no? Switch from > Hashtbl to Map. Right. Now the same program doesn't finish executing > in 3 hours. I would have expected a running time of about 20-30 > minutes. > Maybe a hand-written lexical-tree will be good enough. You seem to be complaining about poor performance when you know you are using the wrong data structure. Talk about unreasonable expectations. :-) IIRC, the exact data structure you want is already on the hump. HTH. Cheers, Jon.