caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Martin Jambon <martin_jambon@emailuser.net>
To: Caml List <caml-list@inria.fr>
Cc: Martin Jambon <martin_jambon@emailuser.net>
Subject: Re: [Caml-list] Re: immutable strings (Re: Array 4 MB size limit)
Date: Thu, 25 May 2006 12:54:17 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.63.0605251207150.7706@munge> (raw)
In-Reply-To: <4475E9E0.2030009@cs.caltech.edu>

On Thu, 25 May 2006, Aleksey Nogin wrote:

> On 24.05.2006 22:56, Martin Jambon wrote:
>
>>> I think it's OK to have (mutable) byte arrays, but strings should simply
>>> always be immutable.
>>  OCaml strings are compact byte arrays which serve their purpose well.
>
> Yes, however immutable strings are also very useful and that functionality is 
> simply missing in OCaml. The usage I am very interested in is essentially 
> using strings as "printable tokens". In other words, a data type that is easy 
> to compare and has an obvious I/O representation.
>
>> Having a whole different type for immutable strings is in my opinion a 
>> waste of energy. The problem is that freezing or unfreezing a string safely 
>> involves a copy of the whole string. And obviously it's not possible to 
>> handle only immutable strings since somehow you have to create them, and 
>> unlike record fields, they won't be set in one operation but in n 
>> operations, n being the length of the string.
>
> This is not true. All I want is having a purely functional interface with:
> - Constants (a compiler flag for turning "..." constants into immutable 
> strings instead of mutable ones).
> - Inputing from a channel
> - Concatenation
> - Things like string_of_int for immutable string.

Isn't it a bit limited? What if I want other functions?

But if it satisfies you, you can do the syntax part with an unsafe_freeze 
function and a bit of camlp4. The rest is just plain old OCaml.

> Of course, it might be the case that the standard library might have to use 
> some sort of "unsafe" operations that would "inappropriately" mutate the 
> newly created immutable string buffer, but this is IMHO no different than how 
> the unsafe operations are already used in standard library for arrays and 
> strings.

I disagree: has it ever happened to you to mutate a string by accident?
I never met this situation and this is mostly why I don't see the point of 
your suggestions. This strongly constrasts with mistakes in array/string 
indices which happen all the time.


>> So I'd really love to see actual examples where using immutable strings 
>> would be such an improvement over mutable strings.
>> If the problem is just to ensure that string data won't be changed by the 
>> user of a library, then it is trivial using module signatures and 
>> String.copy for the conversions.
>
> Such a copy operation can be extremely prohibitive in a setting that assumes 
> that a data structure is immutable and tries really hard to preserve sharing 
> (including using functions like a sharing-preserving version of map (*), 
> etc). In such a setting, these extra copies can potentially have a 
> devastating effect on memory usage, cache performance, etc. And this 
> situation is exactly what we have in our MetaPRL project - there we have 
> resorted to simply using strings and pretending they are immutable, but this 
> is clearly suboptimal.

Yes, so how do you avoid copies without using the "unsafe" conversions all 
over the place?


> ----
> (*)
> let rec smap f = function
>   [] -> []
> | (hd :: tl) as l ->
>      let hd' = f hd in
>      let tl' = smap f tl in
>         if hd == hd' && tl == tl' then l else hd' :: tl'

In order to maximize sharing, I'd rather use a global weak hash table.
In your context, it seems that you could afford String.copy, as long as it 
doesn't break sharing:

let freeze s =
   let s' = make_constant s (* using a copy! *) in
   if s' is in the table then return the element from the table
   else add s' and return s'



Martin

--
Martin Jambon, PhD
http://martin.jambon.free.fr

Edit http://wikiomics.org, bioinformatics wiki


  reply	other threads:[~2006-05-25 19:54 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-05-15 18:12 Array 4 MB size limit akalin
2006-05-15 18:22 ` [Caml-list] " Nicolas Cannasse
2006-05-15 20:32 ` Damien Doligez
2006-05-15 21:27   ` akalin
2006-05-15 22:51 ` Oliver Bandel
2006-05-16  0:48 ` Brian Hurt
2006-05-16  9:57   ` Damien Doligez
2006-05-16 15:10     ` Markus Mottl
2006-05-16  8:01 ` Xavier Leroy
2006-05-16  8:20   ` Nicolas Cannasse
2006-05-19 17:13     ` Xavier Leroy
2006-05-19  5:57   ` Frederick Akalin
2006-05-19  6:21     ` Oliver Bandel
2006-05-19 12:15     ` Jon Harrop
2006-05-19 19:36       ` akalin
2006-05-19 20:17         ` Oliver Bandel
2006-05-19 16:28     ` Jozef Kosoru
2006-05-19 20:08       ` Oliver Bandel
2006-05-19 21:26       ` Jon Harrop
2006-05-20  1:06         ` Brian Hurt
2006-05-20 18:32           ` brogoff
2006-05-20 21:29             ` immutable strings II ([Caml-list] Array 4 MB size limit) Oliver Bandel
2006-05-22 22:09               ` Aleksey Nogin
2006-05-20 21:11           ` immutable strings (Re: [Caml-list] " Oliver Bandel
2006-05-25  4:32             ` immutable strings (Re: " Stefan Monnier
2006-05-25  5:56               ` [Caml-list] " Martin Jambon
2006-05-25  7:23                 ` j h woodyatt
2006-05-25 10:22                   ` Jon Harrop
2006-05-25 19:28                   ` Oliver Bandel
2006-05-25 11:14                 ` Brian Hurt
2006-05-25 19:42                   ` Oliver Bandel
2006-05-26  6:51                   ` Alain Frisch
2006-05-25 17:31                 ` Aleksey Nogin
2006-05-25 19:54                   ` Martin Jambon [this message]
2006-05-25 11:18               ` Brian Hurt
2006-05-25 17:34                 ` Aleksey Nogin
2006-05-25 18:44                   ` Tom
2006-05-25 23:00                     ` Jon Harrop
2006-05-25 23:15                       ` Martin Jambon
2006-05-20  0:57       ` [Caml-list] Array 4 MB size limit Brian Hurt
2006-05-20  1:17         ` Frederick Akalin
2006-05-20  1:52           ` Brian Hurt
2006-05-20  9:08             ` Jozef Kosoru
2006-05-20 10:12               ` skaller
2006-05-20 11:06                 ` Jozef Kosoru
2006-05-20 12:02                   ` skaller
2006-05-20 21:42                 ` Oliver Bandel
2006-05-21  1:24                   ` skaller
2006-05-21 14:10                     ` Oliver Bandel
     [not found]               ` <Pine.LNX.4.63.0605200847530.10710@localhost.localdomain>
2006-05-20 19:52                 ` Jozef Kosoru
2006-05-20 21:45                   ` Oliver Bandel
2006-05-21  9:26           ` Richard Jones
     [not found]             ` <5CE30707-5DCE-4A22-970E-A49C36F9C901@akalin.cx>
2006-05-22 10:40               ` Richard Jones
2006-05-20 10:51         ` Jozef Kosoru
2006-05-20 14:22           ` Brian Hurt
2006-05-20 18:41             ` j h woodyatt
2006-05-20 19:37               ` Jon Harrop
2006-05-20 20:47             ` Jozef Kosoru
2006-05-26 18:34             ` Ken Rose
2006-05-20 22:07           ` Oliver Bandel
2006-05-20 15:15         ` Don Syme
2006-05-20 22:15           ` Oliver Bandel
2006-05-21  1:25             ` skaller
2006-05-28 23:20 [Caml-list] Re: immutable strings (Re: Array 4 MB size limit) Harrison, John R
2006-05-29  2:36 ` Martin Jambon
2006-05-31 12:53 ` Jean-Christophe Filliatre
2006-05-29 20:52 Harrison, John R

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.63.0605251207150.7706@munge \
    --to=martin_jambon@emailuser.net \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).