From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id 4C52ABC88 for ; Fri, 4 Feb 2005 13:51:39 +0100 (CET) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id j14CpcCh028700 for ; Fri, 4 Feb 2005 13:51:38 +0100 Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id NAA21946 for ; Fri, 4 Feb 2005 13:51:38 +0100 (MET) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.187]) by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id j14Cpc8O008816 for ; Fri, 4 Feb 2005 13:51:38 +0100 Received: from [212.227.126.160] (helo=mrelayng.kundenserver.de) by moutng.kundenserver.de with esmtp (Exim 3.35 #1) id 1Cx2vx-0006a4-00; Fri, 04 Feb 2005 13:51:37 +0100 Received: from [80.129.107.185] (helo=gate.lan.gerd-stolpmann.de) by mrelayng.kundenserver.de with asmtp (Exim 3.35 #1) id 1Cx2vx-0005yd-00; Fri, 04 Feb 2005 13:51:37 +0100 Received: from mydomain.com (localhost [127.0.0.1]) by gate.lan.gerd-stolpmann.de (Postfix) with SMTP id 61689C053; Fri, 4 Feb 2005 13:51:36 +0100 (CET) Received: from 192.168.0.1 (proxying for 193.194.7.92) (SquirrelMail authenticated user gerd) by gps.dynxs.de with HTTP; Fri, 4 Feb 2005 13:51:36 +0100 (CET) Message-ID: <23368.192.168.0.1.1107521496.squirrel@gps.dynxs.de> Date: Fri, 4 Feb 2005 13:51:36 +0100 (CET) Subject: Re: [Caml-list] Estimating the size of the ocaml community From: "Gerd Stolpmann" To: In-Reply-To: <20050204121505.GA31752@furbychan.cocan.org> References: <20050204.111102.71086746.oandrieu@nerim.net> <005501c50aaa$b35d9560$0100a8c0@mshome.net> <20050204121505.GA31752@furbychan.cocan.org> Cc: X-Mailer: SquirrelMail (version 1.2.6) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Provags-ID: kundenserver.de abuse@kundenserver.de auth:a6865a839c0178d9aa0ce41878507ea2 X-Miltered: at concorde with ID 42036FDA.001 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Miltered: at nez-perce with ID 42036FDA.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; caml-list:01 ocaml:01 gerd:01 stolpmann:01 arrays:01 constructors:01 arrays:01 constructors:01 hash:01 variants:01 gerd:01 stolpmann:01 viktoriastr:01 64293:01 darmstadt:01 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on yquem.inria.fr X-Spam-Status: No, score=0.1 required=5.0 tests=FORGED_RCVD_HELO, RCVD_FAKE_HELO_DOTCOM autolearn=disabled version=3.0.2 X-Spam-Level: Richard Jones said: > I've been thinking about this a bit more, and I'm not sure I > understand why the tag needs to be so large. If you look at the "tag > space" now, it's something like this: > > 0 used for tuples, arrays, records > 1-251 used for constructors (eg. Some, None) > 252 marks strings > 253 marks floats > 254 marks float arrays > 255 marks structures with custom ops (lots of stuff, like Int32.t) > > It's not clear to me why so much "tag space" is used for constructed > values, at the same time limiting you to around 250 different > constructors in a type definition. Couldn't the constructor number be > encoded in the first field in the value (obviously shifting all the > subsequent fields along one, and making constructed values 4 bytes > larger)? Then the tag could be reduced to a few bits, making strings a > few orders larger. Sure, this is an option, but I think the price is quite high. A number of frequently used data structures would need a lot more memory, and would become slower. For example lists. Currently, every list element needs three words (header + contents + reference to next element). If the variant tag is not part of the header, another word is needed =3D 33 % more space. There are similar effects for many basic data structures, including options, trees and hash tables. >>From a practical point of view, the non-polymorphic variants are still part of the language because they are more efficient than the polymorphic counterparts (which could fully replace the former, and for which there is no limit for the number of constructors). If one wanted really to improve the situation, it would be better to chan= ge the representation of strings, and to encode the string length in the first word after the header. This wastes much less memory (except in the special case when you have many small strings), and is probably neutral for the execution speed. Gerd ------------------------------------------------------------ Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany gerd@gerd-stolpmann.de http://www.gerd-stolpmann.de ------------------------------------------------------------