From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id 6D784BC88 for ; Sun, 6 Feb 2005 13:16:45 +0100 (CET) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.177]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id j16CGjfJ018223 for ; Sun, 6 Feb 2005 13:16:45 +0100 Received: from [212.227.126.160] (helo=mrelayng.kundenserver.de) by moutng.kundenserver.de with esmtp (Exim 3.35 #1) id 1CxlLG-00041Z-00; Sun, 06 Feb 2005 13:16:42 +0100 Received: from [80.129.108.153] (helo=gate.lan.gerd-stolpmann.de) by mrelayng.kundenserver.de with asmtp (Exim 3.35 #1) id 1CxlLG-00059l-00; Sun, 06 Feb 2005 13:16:42 +0100 Received: from flakew.lan.gerd-stolpmann.de (flakew.lan.gerd-stolpmann.de [192.168.0.32]) by gate.lan.gerd-stolpmann.de (Postfix) with ESMTP id 1EAC1C04E; Sun, 6 Feb 2005 13:16:42 +0100 (CET) Subject: Re: [Caml-list] The boon of static type checking From: Gerd Stolpmann To: Radu Grigore Cc: Jon Harrop , caml-list@yquem.inria.fr In-Reply-To: <7f8e92aa0502060222383aac60@mail.gmail.com> References: <891bd33905020213315a2ebb18@mail.gmail.com> <877e9a170502031856175260c8@mail.gmail.com> <877e9a17050203185674680413@mail.gmail.com> <200502041026.56107.jon@jdh30.plus.com> <7f8e92aa0502060222383aac60@mail.gmail.com> Content-Type: text/plain Date: Sun, 06 Feb 2005 13:16:41 +0100 Message-Id: <1107692201.5887.18.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.0.2 Content-Transfer-Encoding: 7bit X-Provags-ID: kundenserver.de abuse@kundenserver.de auth:a6865a839c0178d9aa0ce41878507ea2 X-Miltered: at concorde with ID 42060AAD.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; caml-list:01 gerd:01 stolpmann:01 gerd:01 wrote:01 stl:01 stl:01 recursive:01 stack:01 ocaml's:01 associative:01 hashtbl:01 hashtbl:01 hand-written:01 ocaml:01 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on yquem.inria.fr X-Spam-Status: No, score=0.1 required=5.0 tests=FORGED_RCVD_HELO autolearn=disabled version=3.0.2 X-Spam-Level: Am Sonntag, den 06.02.2005, 12:22 +0200 schrieb Radu Grigore: > On Fri, 4 Feb 2005 10:26:55 +0000, Jon Harrop wrote: > > > The code to implement a simple tree is unwieldy in C++ (e.g. 1975 LOC > > in stl_tree.h and stl_map.h vs 198 LOC in map.ml). > > The implementation in stl_tree is not a "simple" tree. The difference > in size is also because the STL implementation is more efficient. For > one thing RB trees are better balanced than HB(2) trees. Another > observation is that fold on maps is not tail recursive. True, the > stack will probably never go above 40 calls but it is a problem if you > want to accumulate large structures. Yet another problem is the > missing cardinal function. I have a StringMap with about 200000: it > takes about 3 minutes (!!!) to get its length using a fold. That would > take something like 1 millisecond in C++. > > I'm a bit annoyed right now with OCaml's Map because I'm writting a > tool where I need to construct big associative tables from strings to > other data. An implementation with Hashtbl makes the program work in 1 > minute. The problem is that the next execution phase needs to fold > over the tables in increasing order of keys. Simple, no? Switch from > Hashtbl to Map. Right. Now the same program doesn't finish executing > in 3 hours. I would have expected a running time of about 20-30 > minutes. > > Maybe a hand-written lexical-tree will be good enough. I think you are a bit unfair, and it seems you did not analyse your problem well enough to get a good solution. The philosophy of the ocaml standard library is to provide light-weight data structures that can be easily extended to one's needs. If you come to the conclusion that a Map with cardinality is your optimal structure, simply construct it: module MakeMapWithCard (Ord : Map.OrderedType) = struct module M = Map.Make(Ord) type key = M.key type t = int * M.t (* cardinality, map *) let empty = (0, M.empty) let add key x (card,m) = if M.mem x m then (card,m) else (card+1, M.add x m) ... end You need not to program another balanced tree, just wrap it around the available implementation. (The best is that the resulting module is still compatible with the Map module of the standard library.) The other point is that the hash tables, balanced trees etc. in ocaml are just implementations of the common algorithms. There is no optimal container that works best for all needs. Not in ocaml, not in C++, not in any other language. For your problem, I would guess that a hash table (in phase 1) plus sorting (in phase 2) is the relatively best solution (from your description). That avoids a lot of overhead compared to Map, especially memory management consumes much less time. Gerd -- ------------------------------------------------------------ Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de ------------------------------------------------------------