From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id CC01EBB83 for ; Thu, 25 May 2006 19:31:17 +0200 (CEST) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id k4PHVHkY030051 for ; Thu, 25 May 2006 19:31:17 +0200 Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id TAA14118 for ; Thu, 25 May 2006 19:31:16 +0200 (MET DST) Received: from blizzard.cs.caltech.edu (blizzard.cs.caltech.edu [131.215.44.2]) by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id k4PHVFKl001692 for ; Thu, 25 May 2006 19:31:15 +0200 Received: from localhost (flood.cs.caltech.edu [131.215.44.31]) by blizzard.cs.caltech.edu (Postfix) with ESMTP id 9E84E402B9C; Thu, 25 May 2006 10:31:13 -0700 (PDT) Received: from blizzard.cs.caltech.edu ([131.215.44.2]) by localhost (flood.cs.caltech.edu [131.215.44.31]) (amavisd-new, port 10024) with ESMTP id 26952-08; Thu, 25 May 2006 10:31:13 -0700 (PDT) Received: from [192.168.1.100] (charter-242-015.caltech.edu [131.215.242.15]) by blizzard.cs.caltech.edu (Postfix) with ESMTP id 530EE402BA0; Thu, 25 May 2006 10:31:13 -0700 (PDT) Message-ID: <4475E9E0.2030009@cs.caltech.edu> Date: Thu, 25 May 2006 10:31:12 -0700 From: Aleksey Nogin Reply-To: Caml List Organization: California Institute of Technology, Computer Science Department User-Agent: Mozilla Thunderbird 1.0.7-1.1.fc3 (X11/20050929) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Caml List Cc: Martin Jambon Subject: Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit) References: <20060515141230.ajyupn2z28k0484s@horde.akalin.cx> <446D5E4A.8060005@akalin.cx> <20060519162844.GA32550@osiris.uid0.sk> <200605192226.34917.jon@ffconsultancy.com> <20060520211117.GA2670@first.in-berlin.de> In-Reply-To: X-Enigmail-Version: 0.89.5.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Miltered: at concorde with ID 4475E9E5.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Miltered: at nez-perce with ID 4475E9E3.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; mutable:01 byte:01 arrays:01 ocaml:01 byte:01 arrays:01 ocaml:01 tokens:01 compiler:01 mutable:01 buffer:01 trivial:01 suboptimal:01 rec:01 wrote:01 X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.3 On 24.05.2006 22:56, Martin Jambon wrote: >> I think it's OK to have (mutable) byte arrays, but strings should simply >> always be immutable. > > OCaml strings are compact byte arrays which serve their purpose well. Yes, however immutable strings are also very useful and that functionality is simply missing in OCaml. The usage I am very interested in is essentially using strings as "printable tokens". In other words, a data type that is easy to compare and has an obvious I/O representation. > Having a whole different type for immutable strings is in my opinion a > waste of energy. The problem is that freezing or unfreezing a string > safely involves a copy of the whole string. And obviously it's not > possible to handle only immutable strings since somehow you have to > create them, and unlike record fields, they won't be set in one > operation but in n operations, n being the length of the string. This is not true. All I want is having a purely functional interface with: - Constants (a compiler flag for turning "..." constants into immutable strings instead of mutable ones). - Inputing from a channel - Concatenation - Things like string_of_int for immutable string. Of course, it might be the case that the standard library might have to use some sort of "unsafe" operations that would "inappropriately" mutate the newly created immutable string buffer, but this is IMHO no different than how the unsafe operations are already used in standard library for arrays and strings. > So I'd really love to see actual examples where using immutable strings > would be such an improvement over mutable strings. > If the problem is just to ensure that string data won't be changed by > the user of a library, then it is trivial using module signatures and > String.copy for the conversions. Such a copy operation can be extremely prohibitive in a setting that assumes that a data structure is immutable and tries really hard to preserve sharing (including using functions like a sharing-preserving version of map (*), etc). In such a setting, these extra copies can potentially have a devastating effect on memory usage, cache performance, etc. And this situation is exactly what we have in our MetaPRL project - there we have resorted to simply using strings and pretending they are immutable, but this is clearly suboptimal. ---- (*) let rec smap f = function [] -> [] | (hd :: tl) as l -> let hd' = f hd in let tl' = smap f tl in if hd == hd' && tl == tl' then l else hd' :: tl' -- Aleksey Nogin Home Page: http://nogin.org/ E-Mail: nogin@cs.caltech.edu (office), aleksey@nogin.org (personal) Office: Moore 04, tel: (626) 395-2200