From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id CEEE8BC88 for ; Fri, 4 Feb 2005 13:15:08 +0100 (CET) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id j14CF8cl023548 for ; Fri, 4 Feb 2005 13:15:08 +0100 Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id NAA20047 for ; Fri, 4 Feb 2005 13:15:08 +0100 (MET) Received: from furbychan.cocan.org (furbychan.cocan.org [80.68.91.176]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id j14CF6Wk023542 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO) for ; Fri, 4 Feb 2005 13:15:07 +0100 Received: from rich by furbychan.cocan.org with local (Exim 3.35 #1 (Debian)) id 1Cx2Mc-0008SM-00 for ; Fri, 04 Feb 2005 12:15:06 +0000 Date: Fri, 4 Feb 2005 12:15:06 +0000 Cc: caml-list@inria.fr Subject: Re: [Caml-list] Estimating the size of the ocaml community Message-ID: <20050204121505.GA31752@furbychan.cocan.org> References: <20050204.111102.71086746.oandrieu@nerim.net> <005501c50aaa$b35d9560$0100a8c0@mshome.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <005501c50aaa$b35d9560$0100a8c0@mshome.net> User-Agent: Mutt/1.3.28i From: Richard Jones X-Miltered: at concorde with ID 4203674C.001 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Miltered: at concorde with ID 4203674A.001 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; caml-list:01 ocaml:01 gava:01 wrote:01 arises:01 ocaml:01 byte:01 byte:01 serialize:01 arrays:01 constructors:01 arrays:01 constructors:01 notepad:01 ric:98 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on yquem.inria.fr X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.2 X-Spam-Level: On Fri, Feb 04, 2005 at 12:14:31PM +0100, Fr?d?ric Gava wrote: > >It's a problem of the implementation, but one which looks hard to > >change. The reason it arises is because each OCaml value has a 4 > >byte[*] header, divided into a 1 byte tag, 2 bits for the GC, and the > >remaining 22 bits to store the size of the value in words. Since > >strings are stored directly in the value (which is very efficient), > >they are limited to 4 * (2^22 - 1) bytes ~ 16 MB in size. > Sure it is a hard problem. But I thinks that in the futur (now my bench are > not bench of true applications but bench of the limits of the language, > values that exceed 16Mo are rare), true scientific applications will need to > serialize big values...(big graphs for example) I've been thinking about this a bit more, and I'm not sure I understand why the tag needs to be so large. If you look at the "tag space" now, it's something like this: 0 used for tuples, arrays, records 1-251 used for constructors (eg. Some, None) 252 marks strings 253 marks floats 254 marks float arrays 255 marks structures with custom ops (lots of stuff, like Int32.t) It's not clear to me why so much "tag space" is used for constructed values, at the same time limiting you to around 250 different constructors in a type definition. Couldn't the constructor number be encoded in the first field in the value (obviously shifting all the subsequent fields along one, and making constructed values 4 bytes larger)? Then the tag could be reduced to a few bits, making strings a few orders larger. Rich. -- Richard Jones, CTO Merjis Ltd. Merjis - web marketing and technology - http://merjis.com Team Notepad - intranets and extranets for business - http://team-notepad.com