From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id 53B5EBB83 for ; Thu, 25 May 2006 21:54:34 +0200 (CEST) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id k4PJsXkP011413 for ; Thu, 25 May 2006 21:54:34 +0200 Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id VAA16437 for ; Thu, 25 May 2006 21:54:33 +0200 (MET DST) Received: from out3.smtp.messagingengine.com (out3.smtp.messagingengine.com [66.111.4.27]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id k4PJsWFm011407 for ; Thu, 25 May 2006 21:54:32 +0200 Received: from frontend3.internal (frontend3.internal [10.202.2.152]) by frontend1.messagingengine.com (Postfix) with ESMTP id C60FCD67B08 for ; Thu, 25 May 2006 15:54:30 -0400 (EDT) Received: from heartbeat2.messagingengine.com ([10.202.2.161]) by frontend3.internal (MEProxy); Thu, 25 May 2006 15:54:31 -0400 X-Sasl-enc: y0BOkA8wN48GVRTmOK9xRflHDd7YDSif4s5abujaBbwI 1148586871 Received: from munge (burnham.ljcrf.edu [192.231.106.2]) by mail.messagingengine.com (Postfix) with ESMTP id 2212A364F; Thu, 25 May 2006 15:54:30 -0400 (EDT) Date: Thu, 25 May 2006 12:54:17 -0700 (PDT) From: Martin Jambon X-X-Sender: martin@munge To: Caml List Cc: Martin Jambon Subject: Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit) In-Reply-To: <4475E9E0.2030009@cs.caltech.edu> Message-ID: References: <20060515141230.ajyupn2z28k0484s@horde.akalin.cx> <446D5E4A.8060005@akalin.cx> <20060519162844.GA32550@osiris.uid0.sk> <200605192226.34917.jon@ffconsultancy.com> <20060520211117.GA2670@first.in-berlin.de> <4475E9E0.2030009@cs.caltech.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Miltered: at concorde with ID 44760B79.001 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Miltered: at concorde with ID 44760B78.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; mutable:01 byte:01 arrays:01 ocaml:01 byte:01 arrays:01 ocaml:01 tokens:01 compiler:01 mutable:01 syntax:01 camlp:01 buffer:01 trivial:01 suboptimal:01 X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.3 On Thu, 25 May 2006, Aleksey Nogin wrote: > On 24.05.2006 22:56, Martin Jambon wrote: > >>> I think it's OK to have (mutable) byte arrays, but strings should simply >>> always be immutable. >> OCaml strings are compact byte arrays which serve their purpose well. > > Yes, however immutable strings are also very useful and that functionality is > simply missing in OCaml. The usage I am very interested in is essentially > using strings as "printable tokens". In other words, a data type that is easy > to compare and has an obvious I/O representation. > >> Having a whole different type for immutable strings is in my opinion a >> waste of energy. The problem is that freezing or unfreezing a string safely >> involves a copy of the whole string. And obviously it's not possible to >> handle only immutable strings since somehow you have to create them, and >> unlike record fields, they won't be set in one operation but in n >> operations, n being the length of the string. > > This is not true. All I want is having a purely functional interface with: > - Constants (a compiler flag for turning "..." constants into immutable > strings instead of mutable ones). > - Inputing from a channel > - Concatenation > - Things like string_of_int for immutable string. Isn't it a bit limited? What if I want other functions? But if it satisfies you, you can do the syntax part with an unsafe_freeze function and a bit of camlp4. The rest is just plain old OCaml. > Of course, it might be the case that the standard library might have to use > some sort of "unsafe" operations that would "inappropriately" mutate the > newly created immutable string buffer, but this is IMHO no different than how > the unsafe operations are already used in standard library for arrays and > strings. I disagree: has it ever happened to you to mutate a string by accident? I never met this situation and this is mostly why I don't see the point of your suggestions. This strongly constrasts with mistakes in array/string indices which happen all the time. >> So I'd really love to see actual examples where using immutable strings >> would be such an improvement over mutable strings. >> If the problem is just to ensure that string data won't be changed by the >> user of a library, then it is trivial using module signatures and >> String.copy for the conversions. > > Such a copy operation can be extremely prohibitive in a setting that assumes > that a data structure is immutable and tries really hard to preserve sharing > (including using functions like a sharing-preserving version of map (*), > etc). In such a setting, these extra copies can potentially have a > devastating effect on memory usage, cache performance, etc. And this > situation is exactly what we have in our MetaPRL project - there we have > resorted to simply using strings and pretending they are immutable, but this > is clearly suboptimal. Yes, so how do you avoid copies without using the "unsafe" conversions all over the place? > ---- > (*) > let rec smap f = function > [] -> [] > | (hd :: tl) as l -> > let hd' = f hd in > let tl' = smap f tl in > if hd == hd' && tl == tl' then l else hd' :: tl' In order to maximize sharing, I'd rather use a global weak hash table. In your context, it seems that you could afford String.copy, as long as it doesn't break sharing: let freeze s = let s' = make_constant s (* using a copy! *) in if s' is in the table then return the element from the table else add s' and return s' Martin -- Martin Jambon, PhD http://martin.jambon.free.fr Edit http://wikiomics.org, bioinformatics wiki