caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Strings
@ 2009-04-03 11:56 Jon Harrop
  2009-04-03 12:25 ` [Caml-list] Strings Paolo Donadeo
                   ` (2 more replies)
  0 siblings, 3 replies; 47+ messages in thread
From: Jon Harrop @ 2009-04-03 11:56 UTC (permalink / raw)
  To: caml-list


I read that batteries included provides first-class rope-based strings and I 
was just reading up on some horror stories about immutable strings on 
StackOverflow. This made me wonder what people's thoughts are about mutable 
vs immutable strings?

I had never thought about it until I started porting my OCaml code to F# where 
strings are immutable and I found immutable strings to be a PITA. Immutable 
array-based strings seem pointless to me but I'd still appreciate an 
immutable rope-based string...

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-03 11:56 Strings Jon Harrop
@ 2009-04-03 12:25 ` Paolo Donadeo
  2009-04-03 14:18 ` Ashish Agarwal
  2009-04-04  9:14 ` David Rajchenbach-Teller
  2 siblings, 0 replies; 47+ messages in thread
From: Paolo Donadeo @ 2009-04-03 12:25 UTC (permalink / raw)
  To: caml-list

On Fri, Apr 3, 2009 at 13:56, Jon Harrop wrote:
> was just reading up on some horror stories about immutable strings on
> StackOverflow.

Can you post any link?


-- 
Paolo
~
~
:wq


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-03 11:56 Strings Jon Harrop
  2009-04-03 12:25 ` [Caml-list] Strings Paolo Donadeo
@ 2009-04-03 14:18 ` Ashish Agarwal
  2009-04-03 14:46   ` Jon Harrop
  2009-04-04  9:14 ` David Rajchenbach-Teller
  2 siblings, 1 reply; 47+ messages in thread
From: Ashish Agarwal @ 2009-04-03 14:18 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 1053 bytes --]

> I found immutable strings to be a PITAin what way?


On Fri, Apr 3, 2009 at 7:56 AM, Jon Harrop <jon@ffconsultancy.com> wrote:

>
> I read that batteries included provides first-class rope-based strings and
> I
> was just reading up on some horror stories about immutable strings on
> StackOverflow. This made me wonder what people's thoughts are about mutable
> vs immutable strings?
>
> I had never thought about it until I started porting my OCaml code to F#
> where
> strings are immutable and I found immutable strings to be a PITA. Immutable
> array-based strings seem pointless to me but I'd still appreciate an
> immutable rope-based string...
>
> --
> Dr Jon Harrop, Flying Frog Consultancy Ltd.
> http://www.ffconsultancy.com/?e
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

[-- Attachment #2: Type: text/html, Size: 1730 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-03 14:18 ` Ashish Agarwal
@ 2009-04-03 14:46   ` Jon Harrop
  2009-04-03 15:03     ` Daniel Bünzli
  0 siblings, 1 reply; 47+ messages in thread
From: Jon Harrop @ 2009-04-03 14:46 UTC (permalink / raw)
  To: caml-list

On Friday 03 April 2009 15:18:51 Ashish Agarwal wrote:
> > I found immutable strings to be a PITA
>
> in what way? 

Just because my OCaml programs were mutating strings and translating that into 
F# is non-trivial if the string is shared or big. In essence, I've always 
used OCaml's strings as a more efficient byte array. In fact, the best 
translation to F# is often to use byte arrays as a replacement for strings.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-03 14:46   ` Jon Harrop
@ 2009-04-03 15:03     ` Daniel Bünzli
  2009-04-03 16:52       ` Martin Jambon
  2009-04-04 10:20       ` Jon Harrop
  0 siblings, 2 replies; 47+ messages in thread
From: Daniel Bünzli @ 2009-04-03 15:03 UTC (permalink / raw)
  To: OCaml List


Le 3 avr. 09 à 16:46, Jon Harrop a écrit :

> Just because my OCaml programs were mutating strings and translating  
> that into
> F# is non-trivial if the string is shared or big. In essence, I've  
> always
> used OCaml's strings as a more efficient byte array. In fact, the best
> translation to F# is often to use byte arrays as a replacement for  
> strings.

So immutable strings are not a "PITA" you are just using them for  
something they should not be taken for (mutable byte arrays).

Daniel



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-03 15:03     ` Daniel Bünzli
@ 2009-04-03 16:52       ` Martin Jambon
  2009-04-03 17:50         ` Daniel Bünzli
                           ` (2 more replies)
  2009-04-04 10:20       ` Jon Harrop
  1 sibling, 3 replies; 47+ messages in thread
From: Martin Jambon @ 2009-04-03 16:52 UTC (permalink / raw)
  To: Daniel Bünzli; +Cc: OCaml List

Daniel Bünzli wrote:
> 
> Le 3 avr. 09 à 16:46, Jon Harrop a écrit :
> 
>> Just because my OCaml programs were mutating strings and translating
>> that into
>> F# is non-trivial if the string is shared or big. In essence, I've always
>> used OCaml's strings as a more efficient byte array. In fact, the best
>> translation to F# is often to use byte arrays as a replacement for
>> strings.
> 
> So immutable strings are not a "PITA" you are just using them for
> something they should not be taken for (mutable byte arrays).

I love this recurrent discussion!

Here is my firm point of view which hasn't changed over the years and hundreds
of millions documents processed:

- I see absolutely no practical advantage of having an immutable "character
string" type.

- There is nothing to change in OCaml's string type because it is an "array of
bytes", with type char representing single bytes.




Martin

-- 
http://mjambon.com/


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-03 16:52       ` Martin Jambon
@ 2009-04-03 17:50         ` Daniel Bünzli
  2009-04-03 19:46           ` Paolo Donadeo
                             ` (3 more replies)
  2009-04-03 18:24         ` Florian Hars
  2009-04-03 20:34         ` Arnaud Spiwack
  2 siblings, 4 replies; 47+ messages in thread
From: Daniel Bünzli @ 2009-04-03 17:50 UTC (permalink / raw)
  To: OCaml List


Le 3 avr. 09 à 18:52, Martin Jambon a écrit :

> I love this recurrent discussion!

I love your carefully argumented response !

> - I see absolutely no practical advantage of having an immutable  
> "character
> string" type.

In fact I find the result of the following sequence of operations very  
disappointing for a functional programming language :

         Objective Caml version 3.11.0

# Sys.os_type;;
- : string = "Unix"
# let s = Sys.os_type;;
val s : string = "Unix"
# s.[0] <- 'a';;
- : unit = ()
# Sys.os_type;;
- : string = "anix"

I think it is a design error to conflate strings and byte arrays. You  
clearly want both, but each with its own type and strings as  
immutable. Individual character mutability is rarely needed in text  
processing and having immutable strings avoids the kind of quirks as  
seen above.

You'll think that's a marginal example, but that actually happens in  
practice. For example in xmlm when I return a signal for a start tag I  
do not String.copy the tag name to avoid allocating too much. Thus in  
the documentation there's the following ugly advice :

"The module assumes strings are immutable, thus strings the client  
gives or receives during the input and output process must not be  
modified."

And if you don't follow the advice and mutate the tag's name before  
the end tag was parsed (or output) you'll get a tag mismatch error  
even though the document (or the output) is perfectly valid.

Having immutable strings would not rely on the client for correctness  
of operation and that's always an advantage. Of course you'll tell me  
just use String.copy inside xmlm et voilà, but then you traded  
correctness for performance in a case where you could have both with  
immutable strings.

> - There is nothing to change in OCaml's string type because it is an  
> "array of
> bytes", with type char representing single bytes.


Oh no, there's nothing to change at all, that's a perfect  
implementation of byte arrays. You just want another type for  
immutable strings.

Best,

Daniel


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-03 16:52       ` Martin Jambon
  2009-04-03 17:50         ` Daniel Bünzli
@ 2009-04-03 18:24         ` Florian Hars
  2009-04-03 20:34         ` Arnaud Spiwack
  2 siblings, 0 replies; 47+ messages in thread
From: Florian Hars @ 2009-04-03 18:24 UTC (permalink / raw)
  To: Martin Jambon; +Cc: Daniel Bünzli, OCaml List

Martin Jambon wrote:
> - There is nothing to change in OCaml's string type because it is an "array of
> bytes", with type char representing single bytes.

$ ocaml
         Objective Caml version 3.10.2

# let c = 'ö';;
Characters 8-9:
   let c = 'ö';;
           ^
Syntax error
#

Meanwhile, in another terminal window:

$ ocaml
         Objective Caml version 3.10.2

# let c  = 'ö';;
val c : char = '\246'
#

I prefer my strings to be composed of chars that represent characters.

- Florian.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-03 17:50         ` Daniel Bünzli
@ 2009-04-03 19:46           ` Paolo Donadeo
  2009-04-03 20:41             ` Harrison, John R
  2009-04-04 10:13             ` Jon Harrop
  2009-04-03 21:44           ` Goswin von Brederlow
                             ` (2 subsequent siblings)
  3 siblings, 2 replies; 47+ messages in thread
From: Paolo Donadeo @ 2009-04-03 19:46 UTC (permalink / raw)
  To: OCaml mailing list

> You clearly want both, but each with its own type and strings as immutable.
> Individual character mutability is rarely needed in text processing

I can agree with you on this argument, but a question still remains:
why should you ever do things like:

> # s.[0] <- 'a';;


Regards,


-- 
Paolo
~
~
:wq


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-03 16:52       ` Martin Jambon
  2009-04-03 17:50         ` Daniel Bünzli
  2009-04-03 18:24         ` Florian Hars
@ 2009-04-03 20:34         ` Arnaud Spiwack
  2 siblings, 0 replies; 47+ messages in thread
From: Arnaud Spiwack @ 2009-04-03 20:34 UTC (permalink / raw)
  To: Martin Jambon; +Cc: Daniel Bünzli, OCaml List

If my experience is worth anything, I've had hardly any use of mutable 
character strings. Many of immutable ones. The practical advantage of 
having immutable character strings is the same as that of having 
immutable integer data : when you do not need to mutate data of a type, 
you'd better keep this type immutable (also internal representation of 
the type might very well vary seriously depending on it being mutable or 
not).

Notice that with pa_do, I guess I can now define easily this type while 
using a string syntax for them (which is good). Thus


Arnaud Spiwack

Martin Jambon a écrit :
> I love this recurrent discussion!
>
> Here is my firm point of view which hasn't changed over the years and hundreds
> of millions documents processed:
>
> - I see absolutely no practical advantage of having an immutable "character
> string" type.
>
> - There is nothing to change in OCaml's string type because it is an "array of
> bytes", with type char representing single bytes.
>
>
>
>
> Martin
>
>   


^ permalink raw reply	[flat|nested] 47+ messages in thread

* RE: [Caml-list] Strings
  2009-04-03 19:46           ` Paolo Donadeo
@ 2009-04-03 20:41             ` Harrison, John R
  2009-04-04 10:11               ` Jon Harrop
  2009-04-04 10:13             ` Jon Harrop
  1 sibling, 1 reply; 47+ messages in thread
From: Harrison, John R @ 2009-04-03 20:41 UTC (permalink / raw)
  To: Paolo Donadeo; +Cc: OCaml

| I can agree with you on this argument, but a question still remains:
| why should you ever do things like:
|
| > # s.[0] <- 'a';;

The point is that it might not be your own code that does it, but a
function written by someone else to which you innocently pass a string
argument. You may think you're writing purely functional code, but
unless you've checked all the other functions you're using that accept
or return strings, who knows?

Even if you aren't a doctrinaire functional programmer, the use of
mutable strings as the default weakens the abstraction boundary
provided by functions and procedures, and surely everyone agrees on
the importance of that.

John.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-03 17:50         ` Daniel Bünzli
  2009-04-03 19:46           ` Paolo Donadeo
@ 2009-04-03 21:44           ` Goswin von Brederlow
  2009-04-04  9:10             ` David Rajchenbach-Teller
  2009-04-04 17:11           ` [Caml-list] Strings Kuba Ober
  2009-04-05 20:54           ` Richard Jones
  3 siblings, 1 reply; 47+ messages in thread
From: Goswin von Brederlow @ 2009-04-03 21:44 UTC (permalink / raw)
  To: Daniel Buenzli; +Cc: OCaml List

Daniel Bünzli <daniel.buenzli@erratique.ch> writes:

> Le 3 avr. 09 à 18:52, Martin Jambon a écrit :
>
>> I love this recurrent discussion!
>
> I love your carefully argumented response !
>
>> - I see absolutely no practical advantage of having an immutable
>> "character
>> string" type.
>
> In fact I find the result of the following sequence of operations very
> disappointing for a functional programming language :
>
>         Objective Caml version 3.11.0
>
> # Sys.os_type;;
> - : string = "Unix"
> # let s = Sys.os_type;;
> val s : string = "Unix"
> # s.[0] <- 'a';;
> - : unit = ()
> # Sys.os_type;;
> - : string = "anix"
>
> I think it is a design error to conflate strings and byte arrays. You
> clearly want both, but each with its own type and strings as
> immutable. Individual character mutability is rarely needed in text
> processing and having immutable strings avoids the kind of quirks as
> seen above.

I think that is a design flaw in Sys. Strings are mutable. The os_type
is a constant. It should not hand out mutable access to a constant.

With the current string module a better way would be to return a copy
of os_type on each invocation. Drawback there is that then

Sys.os_type () != Sys.os_type ()

> You'll think that's a marginal example, but that actually happens in
> practice. For example in xmlm when I return a signal for a start tag I
> do not String.copy the tag name to avoid allocating too much. Thus in
> the documentation there's the following ugly advice :
>
> "The module assumes strings are immutable, thus strings the client
> gives or receives during the input and output process must not be
> modified."
>
> And if you don't follow the advice and mutate the tag's name before
> the end tag was parsed (or output) you'll get a tag mismatch error
> even though the document (or the output) is perfectly valid.
>
> Having immutable strings would not rely on the client for correctness
> of operation and that's always an advantage. Of course you'll tell me
> just use String.copy inside xmlm et voilà, but then you traded
> correctness for performance in a case where you could have both with
> immutable strings.

This is not just a problem for strings. Any data type can suffer the same.

>> - There is nothing to change in OCaml's string type because it is an
>> "array of
>> bytes", with type char representing single bytes.
>
>
> Oh no, there's nothing to change at all, that's a perfect
> implementation of byte arrays. You just want another type for
> immutable strings.
>
> Best,
>
> Daniel

It wouldn't be too hard to change the string module to allow for both
mutable and immutable strings:

module S :
sig
  type const
  type mutabl
  type 'a t
  val make : string -> mutabl t
  val set : mutabl t -> int -> char -> unit
  val get : 'a t -> int -> char
  val const : 'a t -> const t
  val print : 'a t -> unit
end = struct
  type const
  type mutabl
  type 'a t = string
  let make s = s
  let set = String.set
  let get = String.get
  let const s = s
  let print = print_string
end

let str = S.make "hallo" in
  S.set str 0 'H'; S.print str
let str = S.const (S.make "hallo") in
  S.set str 0 'H'; S.print str
        ^^^
Error: This expression has type S.const S.t but is here used with type
         S.mutabl S.t

By adding a phantom type the type system can keep track of where a
string is mutable and where not. The only restriction is that "const"
does not mean the string will not change. It only means that that
reference to the string can not change it:

# let str = S.make "hallo" in
  let str2 = S.const str in
    S.set str 0 'H'; S.print str2;;
Hallo- : unit = ()

If you let a mutable reference to the string escape and then assume it
remains const that is your problem. Easily avoidable in a library or
module.

MfG
        Goswin


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-03 21:44           ` Goswin von Brederlow
@ 2009-04-04  9:10             ` David Rajchenbach-Teller
  2009-04-05 10:06               ` Strings Zheng Li
  0 siblings, 1 reply; 47+ messages in thread
From: David Rajchenbach-Teller @ 2009-04-04  9:10 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: Daniel Buenzli, OCaml List


On Fri, 2009-04-03 at 23:44 +0200, Goswin von Brederlow wrote:
> It wouldn't be too hard to change the string module to allow for both
> mutable and immutable strings:

[...]

Done in Batteries.

# "foo";;   (*OCaml base string*)
- : string = "foo"
# ro"foo";;(*Read-only string*)
- : [ `Read ] Batteries.String.Cap.t = ro"foo"

Cheers,
 David


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-03 11:56 Strings Jon Harrop
  2009-04-03 12:25 ` [Caml-list] Strings Paolo Donadeo
  2009-04-03 14:18 ` Ashish Agarwal
@ 2009-04-04  9:14 ` David Rajchenbach-Teller
  2009-04-04  9:26   ` Alp Mestan
                     ` (3 more replies)
  2 siblings, 4 replies; 47+ messages in thread
From: David Rajchenbach-Teller @ 2009-04-04  9:14 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list


On Fri, 2009-04-03 at 12:56 +0100, Jon Harrop wrote:
> I read that batteries included provides first-class rope-based strings and I 
> was just reading up on some horror stories about immutable strings on 
> StackOverflow. This made me wonder what people's thoughts are about mutable 
> vs immutable strings?

Note that Batteries provides
* regular OCaml strings
* strings with capabilities (i.e. strings which, depending on their
type, can be read-only/write-only/read-write) -- sometimes faster than
regular strings, never slower
* immutable Unicode ropes.

I personally can't remember the last time I've needed mutable strings in
OCaml. On the other hand, I can remember a handful of times where, to
return a constant string, I had to make a function that would rebuild
the string at every call. Which is both needlessly slow and awkward for
what looks like a constant.

Cheers,
 David



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-04  9:14 ` David Rajchenbach-Teller
@ 2009-04-04  9:26   ` Alp Mestan
  2009-04-04 10:55     ` blue storm
  2009-04-04 21:51     ` Goswin von Brederlow
  2009-04-04 10:11   ` Jon Harrop
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 47+ messages in thread
From: Alp Mestan @ 2009-04-04  9:26 UTC (permalink / raw)
  To: David Rajchenbach-Teller, caml-list

[-- Attachment #1: Type: text/plain, Size: 1273 bytes --]

On Sat, Apr 4, 2009 at 11:14 AM, David Rajchenbach-Teller <
David.Teller@ens-lyon.org> wrote:

> I personally can't remember the last time I've needed mutable strings in
> OCaml.


Neither do I.


> On the other hand, I can remember a handful of times where, to
> return a constant string, I had to make a function that would rebuild
> the string at every call. Which is both needlessly slow and awkward for
> what looks like a constant.
>

I think providing both capabilities is the best solution.
However, let's study Haskell's strings.
They simply are a list of characters. This let the ability to use heavily
list-related functions (take, takeWhile, drop, dropWhile, map, etc.). On the
other hand, OCaml's standard library lacks of many functions for strings ! I
think this is too much imperative oriented. Maybe we could try to implement
(for Batteries or in a separate project) string lists and then use the power
of Batteries' list module with many (really many... that's a real pleasure
to read its documentation) functions to work with. I think with a bit of
internal laziness, we could get a great immutable string type with many
functions to manipulate it.

But I guess there are many cons to do so, otherwise it would have been
"standardized".


-- 
Alp Mestan

[-- Attachment #2: Type: text/html, Size: 1785 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-03 20:41             ` Harrison, John R
@ 2009-04-04 10:11               ` Jon Harrop
  2009-04-04 11:12                 ` David Teller
  0 siblings, 1 reply; 47+ messages in thread
From: Jon Harrop @ 2009-04-04 10:11 UTC (permalink / raw)
  To: caml-list

On Friday 03 April 2009 21:41:26 Harrison, John R wrote:
> | I can agree with you on this argument, but a question still remains:
> |
> | why should you ever do things like:
> | > # s.[0] <- 'a';;
>
> The point is that it might not be your own code that does it, but a
> function written by someone else to which you innocently pass a string
> argument. You may think you're writing purely functional code...

Why? This data structure is a mutable array of bytes and should be treated as 
such.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-04  9:14 ` David Rajchenbach-Teller
  2009-04-04  9:26   ` Alp Mestan
@ 2009-04-04 10:11   ` Jon Harrop
  2009-04-04 21:39   ` Goswin von Brederlow
  2009-04-05  7:14   ` Romain Beauxis
  3 siblings, 0 replies; 47+ messages in thread
From: Jon Harrop @ 2009-04-04 10:11 UTC (permalink / raw)
  To: caml-list

On Saturday 04 April 2009 10:14:34 David Rajchenbach-Teller wrote:
> Note that Batteries provides
> * regular OCaml strings
> * strings with capabilities (i.e. strings which, depending on their
> type, can be read-only/write-only/read-write) -- sometimes faster than
> regular strings, never slower
> * immutable Unicode ropes.

Wow!

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-03 19:46           ` Paolo Donadeo
  2009-04-03 20:41             ` Harrison, John R
@ 2009-04-04 10:13             ` Jon Harrop
  1 sibling, 0 replies; 47+ messages in thread
From: Jon Harrop @ 2009-04-04 10:13 UTC (permalink / raw)
  To: caml-list

On Friday 03 April 2009 20:46:26 Paolo Donadeo wrote:
> > You clearly want both, but each with its own type and strings as
> > immutable. Individual character mutability is rarely needed in text
> > processing
>
> I can agree with you on this argument, but a question still remains:
>
> why should you ever do things like:
> > # s.[0] <- 'a';;

Exactly. I find that to be useful and not a source of errors.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-03 15:03     ` Daniel Bünzli
  2009-04-03 16:52       ` Martin Jambon
@ 2009-04-04 10:20       ` Jon Harrop
  1 sibling, 0 replies; 47+ messages in thread
From: Jon Harrop @ 2009-04-04 10:20 UTC (permalink / raw)
  To: caml-list

On Friday 03 April 2009 16:03:25 Daniel Bünzli wrote:
> Le 3 avr. 09 à 16:46, Jon Harrop a écrit :
> > Just because my OCaml programs were mutating strings and translating
> > that into
> > F# is non-trivial if the string is shared or big. In essence, I've
> > always
> > used OCaml's strings as a more efficient byte array. In fact, the best
> > translation to F# is often to use byte arrays as a replacement for
> > strings.
>
> So immutable strings are not a "PITA" you are just using them for
> something they should not be taken for (mutable byte arrays).

Except that mutable byte arrays are not treated as strings, e.g. they are 
pretty printed as a garbled pile of bytes. Moreover, immutable strings 
implemented on top of arrays have awful performance characteristics. If 
you're going to use immutable strings then implement it as a rope...

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-04  9:26   ` Alp Mestan
@ 2009-04-04 10:55     ` blue storm
  2009-04-04 21:51     ` Goswin von Brederlow
  1 sibling, 0 replies; 47+ messages in thread
From: blue storm @ 2009-04-04 10:55 UTC (permalink / raw)
  To: caml-list

On Sat, Apr 4, 2009 at 11:26 AM, Alp Mestan <alp@mestan.fr> wrote:
> However, let's study Haskell's strings.
> They simply are a list of characters. This let the ability to use heavily
> list-related functions (take, takeWhile, drop, dropWhile, map, etc.). On the
> other hand, OCaml's standard library lacks of many functions for strings ! I
> think this is too much imperative oriented. Maybe we could try to implement
> (for Batteries or in a separate project) string lists and then use the power
> of Batteries' list module with many (really many... that's a real pleasure
> to read its documentation) functions to work with. I think with a bit of
> internal laziness, we could get a great immutable string type with many
> functions to manipulate it.
>
> But I guess there are many cons to do so, otherwise it would have been
> "standardized".

This is actually a bad idea : strings and lists are used for different
purpose. Some operations are common (takeWhile is probably a good
example) but the concepts are differents (does "tail" as a primitive
on a string really match your usage pattern ?) and using lists entails
bad performance caracteristics (hence the need for byteString, etc.,
the lack of standardization among the different Haskell libraries
using strings, and the general confusion resulting). In the (rare
imho) cases where you exactly want a list of characters, you can use
(string -> char list) (or string -> char Enum.t) conversion functions.

In the immutable string land, we have much better representations
(that is, implementations that match the common use case of strings
better), such as ropes. Ropes provide (amortized ?) constant-time
concatenation, wich I think is the killer feature. I very rarely use
mutable strings, whereas in my experience string concatenation is a
very common operation (more than random access). I expect performance
benefits for most real programs.

In some cases, byte arrays are still useful (as John Harrop noted),
but I think they are anecdotical and I'm ready to explicitely convert
my strings to a mutable format before performing any mutation.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-04 10:11               ` Jon Harrop
@ 2009-04-04 11:12                 ` David Teller
  2009-04-04 11:40                   ` Jon Harrop
  2009-04-18 12:31                   ` Arkady Andrukonis
  0 siblings, 2 replies; 47+ messages in thread
From: David Teller @ 2009-04-04 11:12 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

The bad thing is that, whenever you have to return text in an otherwise
functional program, you need to enter "mutable array of bytes" land. You
can't just assume that the user isn't going to modify that string,
because, they can, possibly by accident, and any invariant relying on
the fact that your strings can't change are going to be broken. In
particular, any StringSet, any StringMap, etc.

Cheers,
 David

On Sat, 2009-04-04 at 11:11 +0100, Jon Harrop wrote:
> Why? This data structure is a mutable array of bytes and should be treated as 
> such.
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-04 11:12                 ` David Teller
@ 2009-04-04 11:40                   ` Jon Harrop
  2009-04-04 12:34                     ` David Rajchenbach-Teller
  2009-04-18 12:31                   ` Arkady Andrukonis
  1 sibling, 1 reply; 47+ messages in thread
From: Jon Harrop @ 2009-04-04 11:40 UTC (permalink / raw)
  To: caml-list

On Saturday 04 April 2009 12:12:52 David Teller wrote:
> The bad thing is that, whenever you have to return text in an otherwise
> functional program, you need to enter "mutable array of bytes" land. You
> can't just assume that the user isn't going to modify that string,
> because, they can, possibly by accident, and any invariant relying on
> the fact that your strings can't change are going to be broken. In
> particular, any StringSet, any StringMap, etc.

Sure but that is no different to arrays and an ArraySet, ArrayMap etc.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-04 11:40                   ` Jon Harrop
@ 2009-04-04 12:34                     ` David Rajchenbach-Teller
  0 siblings, 0 replies; 47+ messages in thread
From: David Rajchenbach-Teller @ 2009-04-04 12:34 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list


On Sat, 2009-04-04 at 12:40 +0100, Jon Harrop wrote:
> On Saturday 04 April 2009 12:12:52 David Teller wrote:
> > The bad thing is that, whenever you have to return text in an otherwise
> > functional program, you need to enter "mutable array of bytes" land. You
> > can't just assume that the user isn't going to modify that string,
> > because, they can, possibly by accident, and any invariant relying on
> > the fact that your strings can't change are going to be broken. In
> > particular, any StringSet, any StringMap, etc.
> 
> Sure but that is no different to arrays and an ArraySet, ArrayMap etc.

Of course. The thing is that when I'm using arrays, I'm expecting to
enter mutable-land. When I'm manipulating text, most of the time, I
don't.

Cheers,
 David


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-03 17:50         ` Daniel Bünzli
  2009-04-03 19:46           ` Paolo Donadeo
  2009-04-03 21:44           ` Goswin von Brederlow
@ 2009-04-04 17:11           ` Kuba Ober
  2009-04-04 17:26             ` Jon Harrop
  2009-04-05 20:54           ` Richard Jones
  3 siblings, 1 reply; 47+ messages in thread
From: Kuba Ober @ 2009-04-04 17:11 UTC (permalink / raw)
  To: caml-list


On Apr 3, 2009, at 1:50 PM, Daniel Bünzli wrote:
> Le 3 avr. 09 à 18:52, Martin Jambon a écrit :
>> - I see absolutely no practical advantage of having an immutable  
>> "character
>> string" type.
>
> In fact I find the result of the following sequence of operations  
> very disappointing for a functional programming language :
>
>        Objective Caml version 3.11.0
>
> # Sys.os_type;;
> - : string = "Unix"
> # let s = Sys.os_type;;
> val s : string = "Unix"
> # s.[0] <- 'a';;
> - : unit = ()
> # Sys.os_type;;
> - : string = "anix"

Perhaps mutable data of any type should be markable as const, with the  
marking being
permanent.

Or maybe OCaml should have an immutable string wrapper that can be  
created from
a mutable string? Such immutable strings would be returned where the  
string is not supposed
to be further modified.

That's where I like C++/Qt copy-on-write strings a lot. They make life  
easy.

Cheers, Kuba

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-04 17:11           ` [Caml-list] Strings Kuba Ober
@ 2009-04-04 17:26             ` Jon Harrop
  0 siblings, 0 replies; 47+ messages in thread
From: Jon Harrop @ 2009-04-04 17:26 UTC (permalink / raw)
  To: caml-list

On Saturday 04 April 2009 18:11:20 Kuba Ober wrote:
> On Apr 3, 2009, at 1:50 PM, Daniel Bünzli wrote:
> > In fact I find the result of the following sequence of operations
> > very disappointing for a functional programming language :
> >
> >        Objective Caml version 3.11.0
> >
> > # Sys.os_type;;
> > - : string = "Unix"
> > # let s = Sys.os_type;;
> > val s : string = "Unix"
> > # s.[0] <- 'a';;
> > - : unit = ()
> > # Sys.os_type;;
> > - : string = "anix"
>
> Perhaps mutable data of any type should be markable as const, with the
> marking being permanent.

Would a phantom type parameter be sufficient:

  [`RO|`RW] string

> That's where I like C++/Qt copy-on-write strings a lot. They make life
> easy.

Just as long as they don't copy the entire string when I mutate a single char!

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-04  9:14 ` David Rajchenbach-Teller
  2009-04-04  9:26   ` Alp Mestan
  2009-04-04 10:11   ` Jon Harrop
@ 2009-04-04 21:39   ` Goswin von Brederlow
  2009-04-05  7:14   ` Romain Beauxis
  3 siblings, 0 replies; 47+ messages in thread
From: Goswin von Brederlow @ 2009-04-04 21:39 UTC (permalink / raw)
  To: David Rajchenbach-Teller; +Cc: Jon Harrop, caml-list

David Rajchenbach-Teller <David.Teller@ens-lyon.org> writes:

> I personally can't remember the last time I've needed mutable strings in
> OCaml.

Only for byte arrays or a buffering module where I String.blit to the
string itself instead of into a new string. Saves a mutable.

MfG
        Goswin


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-04  9:26   ` Alp Mestan
  2009-04-04 10:55     ` blue storm
@ 2009-04-04 21:51     ` Goswin von Brederlow
  2009-04-04 23:35       ` Yaron Minsky
  2009-04-05  2:55       ` Jon Harrop
  1 sibling, 2 replies; 47+ messages in thread
From: Goswin von Brederlow @ 2009-04-04 21:51 UTC (permalink / raw)
  To: Alp Mestan; +Cc: David Rajchenbach-Teller, caml-list

Alp Mestan <alp@mestan.fr> writes:

> I think providing both capabilities is the best solution.

Phantom types solvethis beautifully with not truntime penalty.

GO BATTERIES!

> However, let's study Haskell's strings.
> They simply are a list of characters. This let the ability to use heavily
> list-related functions (take, takeWhile, drop, dropWhile, map, etc.). On the

The beauty of ocaml strings is that they are really compact. An ocaml
string on 32bit is 5-8 bytes longer than the contained string and 9-16
bytes on 64bit.

On the other hand a list of characters is 8/16 times as large. Not to
mention that you have no random access.

> other hand, OCaml's standard library lacks of many functions for strings ! I
> think this is too much imperative oriented. Maybe we could try to implement
> (for Batteries or in a separate project) string lists and then use the power
> of Batteries' list module with many (really many... that's a real pleasure to
> read its documentation) functions to work with. I think with a bit of internal
> laziness, we could get a great immutable string type with many functions to
> manipulate it.
> But I guess there are many cons to do so, otherwise it would have been
> "standardized".

Mutable/Immutable can really nicely done with phantom types and is
independent of the data structure used. It works for strings, lists,
arrays, sets, trees, ... and I think all standard modules should have
it. The official standard lib is rather lacking there but that is why
there is Batteries. The more I hear/see of it the more I like it.

As for char lists for strings you can always convert strings to char
lists when that is a better representation. I've done so myself on a
number of ocasions. But just as often I used strings as char
arrays. Doing e.g. String.blit with char lists would be ugly.

MfG
        Goswin


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-04 21:51     ` Goswin von Brederlow
@ 2009-04-04 23:35       ` Yaron Minsky
  2009-04-05  9:36         ` David Rajchenbach-Teller
  2009-04-05  2:55       ` Jon Harrop
  1 sibling, 1 reply; 47+ messages in thread
From: Yaron Minsky @ 2009-04-04 23:35 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 2140 bytes --]

On Sat, Apr 4, 2009 at 5:51 PM, Goswin von Brederlow <goswin-v-b@web.de>wrote:

> Mutable/Immutable can really nicely done with phantom types and is
> independent of the data structure used. It works for strings, lists,
> arrays, sets, trees, ... and I think all standard modules should have
> it. The official standard lib is rather lacking there but that is why
> there is Batteries. The more I hear/see of it the more I like it.


On this note, there's a small variation on this idea that we've experimented
with at Jane Street that I think is worth mentioning.  When people do this
kind of thing, they usually have two phantom tags, "immutable" and
"mutable", but, there is another natural one to add: "readonly".

A mutable value is, as one would expect, a value that can be modified; and
an immutable value is one that can not be modified.  A readonly value is one
that can not be modified, but that might change nonetheless because
somewhere else in the program there is a mutable handle to the same
underlying value.  You implement this by making the interface for immutable
and readonly values identical, except that a readonly value can be created
from a mutable value without copying.  Mutable and immutable are typically
the most important types of access control to implement, but readonly can be
useful in special cases, where you have a value that you really want to
mutate, but that you want to control where in the program the mutation can
happen.

There's one extra tradeoff worth mentioning with using phantom types for
controlling mutability, which is that values made immutable in this way are
not quite as good as values that are immutable at the lowest level.  The
reason is that ordinary immutable values are understood by the type system
as such, and this goes into the compiler's understanding of the value
restriction and variance.  This means there are cases where, say, a
phantom-immutable list won't work, but an ordinary immutable list will.

(For those interested in a short and elementary tutorial on how to implement
access control with phantom types, look here:
http://ocaml.janestreet.com/?q=node/11)

y

[-- Attachment #2: Type: text/html, Size: 2552 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-04 21:51     ` Goswin von Brederlow
  2009-04-04 23:35       ` Yaron Minsky
@ 2009-04-05  2:55       ` Jon Harrop
  2009-04-05  4:22         ` Edgar Friendly
  2009-04-05  6:57         ` Goswin von Brederlow
  1 sibling, 2 replies; 47+ messages in thread
From: Jon Harrop @ 2009-04-05  2:55 UTC (permalink / raw)
  To: caml-list

On Saturday 04 April 2009 22:51:50 Goswin von Brederlow wrote:
> The beauty of ocaml strings is that they are really compact. An ocaml
> string on 32bit is 5-8 bytes longer than the contained string and 9-16
> bytes on 64bit.

The ugliness is that 16Mb limit. I assume those limits have been removed in 
batteries?!

> As for char lists for strings you can always convert strings to char
> lists when that is a better representation.

I would rather use a view pattern to create substrings efficiently.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-05  2:55       ` Jon Harrop
@ 2009-04-05  4:22         ` Edgar Friendly
  2009-04-05  7:03           ` Goswin von Brederlow
  2009-04-05  6:57         ` Goswin von Brederlow
  1 sibling, 1 reply; 47+ messages in thread
From: Edgar Friendly @ 2009-04-05  4:22 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

Jon Harrop wrote:
> On Saturday 04 April 2009 22:51:50 Goswin von Brederlow wrote:
>> The beauty of ocaml strings is that they are really compact. An ocaml
>> string on 32bit is 5-8 bytes longer than the contained string and 9-16
>> bytes on 64bit.
> 
> The ugliness is that 16Mb limit. I assume those limits have been removed in 
> batteries?!
>
Our implementation of ropes (well, really Mauricio Fernandez's) has a
limit of ~700MB in 32-bit and 220GB in 64-bit.  Users that need more
than that will have to recompile batteries with a larger
'Rope.max-height' constant.

E.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-05  2:55       ` Jon Harrop
  2009-04-05  4:22         ` Edgar Friendly
@ 2009-04-05  6:57         ` Goswin von Brederlow
  2009-04-05  7:11           ` Jon Harrop
  1 sibling, 1 reply; 47+ messages in thread
From: Goswin von Brederlow @ 2009-04-05  6:57 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

Jon Harrop <jon@ffconsultancy.com> writes:

> On Saturday 04 April 2009 22:51:50 Goswin von Brederlow wrote:
>> The beauty of ocaml strings is that they are really compact. An ocaml
>> string on 32bit is 5-8 bytes longer than the contained string and 9-16
>> bytes on 64bit.
>
> The ugliness is that 16Mb limit. I assume those limits have been removed in 
> batteries?!

As that is a limitation of the GC memory structure of a string there
is nothing batteries can do there. Not for the basic type string. It
is also just 32bit that is so severly limited.

MfG
        Goswin


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-05  4:22         ` Edgar Friendly
@ 2009-04-05  7:03           ` Goswin von Brederlow
  0 siblings, 0 replies; 47+ messages in thread
From: Goswin von Brederlow @ 2009-04-05  7:03 UTC (permalink / raw)
  To: Edgar Friendly; +Cc: Jon Harrop, caml-list

Edgar Friendly <thelema314@gmail.com> writes:

> Jon Harrop wrote:
>> On Saturday 04 April 2009 22:51:50 Goswin von Brederlow wrote:
>>> The beauty of ocaml strings is that they are really compact. An ocaml
>>> string on 32bit is 5-8 bytes longer than the contained string and 9-16
>>> bytes on 64bit.
>> 
>> The ugliness is that 16Mb limit. I assume those limits have been removed in 
>> batteries?!
>>
> Our implementation of ropes (well, really Mauricio Fernandez's) has a
> limit of ~700MB in 32-bit and 220GB in 64-bit.  Users that need more
> than that will have to recompile batteries with a larger
> 'Rope.max-height' constant.
>
> E.

# Sys.max_string_length;;
- : int = 144115188075855863

Why is the limit for rRope 220GB in 64bit?

MfG
        Goswin


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-05  6:57         ` Goswin von Brederlow
@ 2009-04-05  7:11           ` Jon Harrop
  0 siblings, 0 replies; 47+ messages in thread
From: Jon Harrop @ 2009-04-05  7:11 UTC (permalink / raw)
  To: caml-list

On Sunday 05 April 2009 07:57:56 Goswin von Brederlow wrote:
> Jon Harrop <jon@ffconsultancy.com> writes:
> > On Saturday 04 April 2009 22:51:50 Goswin von Brederlow wrote:
> >> The beauty of ocaml strings is that they are really compact. An ocaml
> >> string on 32bit is 5-8 bytes longer than the contained string and 9-16
> >> bytes on 64bit.
> >
> > The ugliness is that 16Mb limit. I assume those limits have been removed
> > in batteries?!
>
> As that is a limitation of the GC memory structure of a string there
> is nothing batteries can do there. Not for the basic type string. It
> is also just 32bit that is so severly limited.

Batteries could replace all uses of the built-in string type with another 
type, such as a byte big array, just as they have done to implement ropes.

I don't think they could address the array limitation though, because that is 
polymorphic.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-04  9:14 ` David Rajchenbach-Teller
                     ` (2 preceding siblings ...)
  2009-04-04 21:39   ` Goswin von Brederlow
@ 2009-04-05  7:14   ` Romain Beauxis
  2009-04-05  9:34     ` David Rajchenbach-Teller
  2009-04-05 21:37     ` Goswin von Brederlow
  3 siblings, 2 replies; 47+ messages in thread
From: Romain Beauxis @ 2009-04-05  7:14 UTC (permalink / raw)
  To: caml-list

Le Saturday 04 April 2009 11:14:34 David Rajchenbach-Teller, vous avez écrit :
> Note that Batteries provides
> * regular OCaml strings
> * strings with capabilities (i.e. strings which, depending on their
> type, can be read-only/write-only/read-write) -- sometimes faster than
> regular strings, never slower
> * immutable Unicode ropes.

Impressive !

I am taking this opportunity to ask wether there exists a String-compatible 
API that would in fact manipulate lazy strings, i.e. strings which content is 
read on-the-fly ?


Romain


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-05  7:14   ` Romain Beauxis
@ 2009-04-05  9:34     ` David Rajchenbach-Teller
  2009-04-05 21:37     ` Goswin von Brederlow
  1 sibling, 0 replies; 47+ messages in thread
From: David Rajchenbach-Teller @ 2009-04-05  9:34 UTC (permalink / raw)
  To: Romain Beauxis; +Cc: caml-list

Er... not yet :)


On Sun, 2009-04-05 at 09:14 +0200, Romain Beauxis wrote:
> Impressive !
> 
> I am taking this opportunity to ask wether there exists a String-compatible 
> API that would in fact manipulate lazy strings, i.e. strings which content is 
> read on-the-fly ?
> 
> 
> Romain



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-04 23:35       ` Yaron Minsky
@ 2009-04-05  9:36         ` David Rajchenbach-Teller
  2009-04-05 10:08           ` Alp Mestan
  2009-04-05 21:40           ` Goswin von Brederlow
  0 siblings, 2 replies; 47+ messages in thread
From: David Rajchenbach-Teller @ 2009-04-05  9:36 UTC (permalink / raw)
  To: yminsky; +Cc: caml-list


On Sat, 2009-04-04 at 19:35 -0400, Yaron Minsky wrote:
> On Sat, Apr 4, 2009 at 5:51 PM, Goswin von Brederlow
> <goswin-v-b@web.de> wrote:
>         Mutable/Immutable can really nicely done with phantom types
>         and is
>         independent of the data structure used. It works for strings,
>         lists,
>         arrays, sets, trees, ... and I think all standard modules
>         should have
>         it. The official standard lib is rather lacking there but that
>         is why
>         there is Batteries. The more I hear/see of it the more I like
>         it.
> 
> On this note, there's a small variation on this idea that we've
> experimented with at Jane Street that I think is worth mentioning.
> When people do this kind of thing, they usually have two phantom tags,
> "immutable" and "mutable", but, there is another natural one to add:
> "readonly".

Actually, thats exactly what we have for strings and arrays in
Batteries.

Cheers,
 David

> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Strings
  2009-04-04  9:10             ` David Rajchenbach-Teller
@ 2009-04-05 10:06               ` Zheng Li
  2009-04-06  9:20                 ` Strings David Rajchenbach-Teller
  0 siblings, 1 reply; 47+ messages in thread
From: Zheng Li @ 2009-04-05 10:06 UTC (permalink / raw)
  To: David Rajchenbach-Teller; +Cc: Goswin von Brederlow, Daniel Buenzli, OCaml List

On 4/4/2009 11:10 AM, David Rajchenbach-Teller wrote:
>
> On Fri, 2009-04-03 at 23:44 +0200, Goswin von Brederlow wrote:
>> It wouldn't be too hard to change the string module to allow for both
>> mutable and immutable strings:
>
> [...]
>
> Done in Batteries.
>
> # "foo";;   (*OCaml base string*)
> - : string = "foo"
> # ro"foo";;(*Read-only string*)
> - : [ `Read ] Batteries.String.Cap.t = ro"foo"

With phantom type alone, the abstraction can still leak.

----
# open Batteries.String.Cap;;
# let s = "asdf";;
val s : string = "asdf"
# let ro_s = read_only (of_string s);;
val ro_s : [ `Read ] Batteries.String.Cap.t = ro"asdf"
# s.[3] <- 'z';;
# ro_s;;
- : [ `Read ] Batteries.String.Cap.t = ro"asdz"
----

Well, this example is artificial. A more intuitive example would be 
(Look how similar it is with the previous os_type string example !):

----
# let ro_s = ro"asdf";;
val ro_s : [ `Read ] Batteries.String.Cap.t = ro"asdf"
# (to_string ro_s).[3] <- 'z';;
....
----

however, with no covariants defined in batteries' String.Cap.t type 
(why?), the second example won't compile. The compiler simply doesn't 
allow me to print out a read-only string, nor does it allow many 
reasonable things like <<join ro"asdf" ro"jkl">> etc.

I understand that the phantom trick can't guarantee the immutability in 
a whole since the underlying data type (if exposed) can still be 
mutable. However, String.Cap.t is not dealing with arbitrary polymorphic 
types here! it deals with just the specific primitive string type, and 
controlling capacity over standard string is the only thing it intend to do.

String.copy on the border might solve this problem, even though there 
could still be other issues left.

All the best
--
Zheng


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-05  9:36         ` David Rajchenbach-Teller
@ 2009-04-05 10:08           ` Alp Mestan
  2009-04-05 21:41             ` Goswin von Brederlow
  2009-04-05 21:40           ` Goswin von Brederlow
  1 sibling, 1 reply; 47+ messages in thread
From: Alp Mestan @ 2009-04-05 10:08 UTC (permalink / raw)
  To: David Rajchenbach-Teller, caml-list

[-- Attachment #1: Type: text/plain, Size: 73 bytes --]

By the way, what would be the use case of lazy strings ?

-- 
Alp Mestan

[-- Attachment #2: Type: text/html, Size: 98 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-03 17:50         ` Daniel Bünzli
                             ` (2 preceding siblings ...)
  2009-04-04 17:11           ` [Caml-list] Strings Kuba Ober
@ 2009-04-05 20:54           ` Richard Jones
  2009-04-05 23:40             ` Daniel Bünzli
  3 siblings, 1 reply; 47+ messages in thread
From: Richard Jones @ 2009-04-05 20:54 UTC (permalink / raw)
  To: Daniel Bünzli; +Cc: OCaml List

On Fri, Apr 03, 2009 at 07:50:48PM +0200, Daniel Bünzli wrote:
> Having immutable strings would not rely on the client for correctness  
> of operation and that's always an advantage. Of course you'll tell me  
> just use String.copy inside xmlm et voilà, but then you traded  
> correctness for performance in a case where you could have both with  
> immutable strings.

Could you not just define a new module which is String, but with a
restricted interface (removing all the functions that can modify
strings)?  This is essentially what Batteries does, along with a bit
of camlp4 [I think] to provide a rope construction syntax.

Rich.

-- 
Richard Jones
Red Hat


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-05  7:14   ` Romain Beauxis
  2009-04-05  9:34     ` David Rajchenbach-Teller
@ 2009-04-05 21:37     ` Goswin von Brederlow
  1 sibling, 0 replies; 47+ messages in thread
From: Goswin von Brederlow @ 2009-04-05 21:37 UTC (permalink / raw)
  To: Romain Beauxis; +Cc: caml-list

Romain Beauxis <toots@rastageeks.org> writes:

> Le Saturday 04 April 2009 11:14:34 David Rajchenbach-Teller, vous avez écrit :
>> Note that Batteries provides
>> * regular OCaml strings
>> * strings with capabilities (i.e. strings which, depending on their
>> type, can be read-only/write-only/read-write) -- sometimes faster than
>> regular strings, never slower
>> * immutable Unicode ropes.
>
> Impressive !
>
> I am taking this opportunity to ask wether there exists a String-compatible 
> API that would in fact manipulate lazy strings, i.e. strings which content is 
> read on-the-fly ?
>
>
> Romain

mmap?

MfG
        Goswin


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-05  9:36         ` David Rajchenbach-Teller
  2009-04-05 10:08           ` Alp Mestan
@ 2009-04-05 21:40           ` Goswin von Brederlow
  1 sibling, 0 replies; 47+ messages in thread
From: Goswin von Brederlow @ 2009-04-05 21:40 UTC (permalink / raw)
  To: David Rajchenbach-Teller; +Cc: yminsky, caml-list

David Rajchenbach-Teller <David.Teller@ens-lyon.org> writes:

> On Sat, 2009-04-04 at 19:35 -0400, Yaron Minsky wrote:
>> On Sat, Apr 4, 2009 at 5:51 PM, Goswin von Brederlow
>> <goswin-v-b@web.de> wrote:
>>         Mutable/Immutable can really nicely done with phantom types
>>         and is
>>         independent of the data structure used. It works for strings,
>>         lists,
>>         arrays, sets, trees, ... and I think all standard modules
>>         should have
>>         it. The official standard lib is rather lacking there but that
>>         is why
>>         there is Batteries. The more I hear/see of it the more I like
>>         it.
>> 
>> On this note, there's a small variation on this idea that we've
>> experimented with at Jane Street that I think is worth mentioning.
>> When people do this kind of thing, they usually have two phantom tags,
>> "immutable" and "mutable", but, there is another natural one to add:
>> "readonly".
>
> Actually, thats exactly what we have for strings and arrays in
> Batteries.
>
> Cheers,
>  David

Where do you have an immutable string? One where you are sure that
nobody can hold a [> `Write] reference to the string. I only saw the
read and write phantom flags.

MfG
        Goswin


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-05 10:08           ` Alp Mestan
@ 2009-04-05 21:41             ` Goswin von Brederlow
  0 siblings, 0 replies; 47+ messages in thread
From: Goswin von Brederlow @ 2009-04-05 21:41 UTC (permalink / raw)
  To: Alp Mestan; +Cc: David Rajchenbach-Teller, caml-list

Alp Mestan <alp@mestan.fr> writes:

> By the way, what would be the use case of lazy strings ?

Strings larger than available memory or where only some small part of
the string are used at any one execution.

MfG
        Goswin


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-05 20:54           ` Richard Jones
@ 2009-04-05 23:40             ` Daniel Bünzli
  0 siblings, 0 replies; 47+ messages in thread
From: Daniel Bünzli @ 2009-04-05 23:40 UTC (permalink / raw)
  To: Richard Jones; +Cc: OCaml List


Le 5 avr. 09 à 22:54, Richard Jones a écrit :

> Could you not just define a new module which is String, but with a
> restricted interface (removing all the functions that can modify
> strings)?

Yes. But I think that at a certain point the client will certainly  
want a regular string and to convert from this restricted  
representation to a string the module will have to String.copy.

Best,

Daniel

P.S. If you have an immutable string implementation you can use the  
functor Xmlm.Make to use them.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Strings
  2009-04-05 10:06               ` Strings Zheng Li
@ 2009-04-06  9:20                 ` David Rajchenbach-Teller
  2009-04-06 10:07                   ` Strings Goswin von Brederlow
  2009-04-06 11:03                   ` Strings Zheng Li
  0 siblings, 2 replies; 47+ messages in thread
From: David Rajchenbach-Teller @ 2009-04-06  9:20 UTC (permalink / raw)
  To: Zheng Li; +Cc: Goswin von Brederlow, Daniel Buenzli, OCaml List


On Sun, 2009-04-05 at 12:06 +0200, Zheng Li wrote: 
> With phantom type alone, the abstraction can still leak.

Ok, it's actually not *immutable* strings but *read-only* strings. The
objective is to be able to distribute a text and make sure that the
client won't be able to modify it. The original owner can still decide
to enter "mutable-land" if they need it.

> 
> ----
> # let ro_s = ro"asdf";;
> val ro_s : [ `Read ] Batteries.String.Cap.t = ro"asdf"
> # (to_string ro_s).[3] <- 'z';;
> ....
> ----
> 
> however, with no covariants defined in batteries' String.Cap.t type 
> (why?), the second example won't compile. 

That's by design: [to_string] is the identity operation and it only
applies to strings for which you have both read and write capabilities.
If you wish to do what I believe you have in mind, you need to first
[copy] the string.

> The compiler simply doesn't 
> allow me to print out a read-only string, nor does it allow many 
> reasonable things like <<join ro"asdf" ro"jkl">> etc.

Er, how can you "not print out a read-only string"?

And for your [join] problem, well, works for me (as soon as you remember
that it takes as second argument a *list* of readable strings).

# print stdout (join ro";" [ro"1"; ro"2"; ro"3"]);;
1;2;3- : unit

Cheers,
 David



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Strings
  2009-04-06  9:20                 ` Strings David Rajchenbach-Teller
@ 2009-04-06 10:07                   ` Goswin von Brederlow
  2009-04-06 11:03                   ` Strings Zheng Li
  1 sibling, 0 replies; 47+ messages in thread
From: Goswin von Brederlow @ 2009-04-06 10:07 UTC (permalink / raw)
  To: David Rajchenbach-Teller
  Cc: Zheng Li, Goswin von Brederlow, Daniel Buenzli, OCaml List

David Rajchenbach-Teller <David.Teller@ens-lyon.org> writes:

> On Sun, 2009-04-05 at 12:06 +0200, Zheng Li wrote: 
>> With phantom type alone, the abstraction can still leak.
>
> Ok, it's actually not *immutable* strings but *read-only* strings. The
> objective is to be able to distribute a text and make sure that the
> client won't be able to modify it. The original owner can still decide
> to enter "mutable-land" if they need it.

Maybe there should be another capability `Const that can only be
gained when creating/copying a string.

let make_const : 'a string -> [`Const; `Read] string = String.copy

Those would be truely immutable. The use would be for strings passed
to libraries that the library wants to ensure are immutable so it can
avoid copying them.

MfG
        Goswin


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Strings
  2009-04-06  9:20                 ` Strings David Rajchenbach-Teller
  2009-04-06 10:07                   ` Strings Goswin von Brederlow
@ 2009-04-06 11:03                   ` Zheng Li
  1 sibling, 0 replies; 47+ messages in thread
From: Zheng Li @ 2009-04-06 11:03 UTC (permalink / raw)
  To: David Rajchenbach-Teller; +Cc: Daniel Buenzli, OCaml List

On 4/6/2009 11:20 AM, David Rajchenbach-Teller wrote:
> Ok, it's actually not *immutable* strings but *read-only* strings. The
> objective is to be able to distribute a text and make sure that the
> client won't be able to modify it.The original owner can still decide
> to enter "mutable-land" if they need it.

Fair enough. But then users of the library still have to hard copy a 
batteries string to ensure it won't change in the future (which is the 
starting topic of the this thread).

> That's by design: [to_string] is the identity operation and it only
> applies to strings for which you have both read and write capabilities.
> If you wish to do what I believe you have in mind, you need to first
> [copy] the string.

OK.

>> The compiler simply doesn't
>> allow me to print out a read-only string, nor does it allow many
>> reasonable things like<<join ro"asdf" ro"jkl">>  etc.
>
> Er, how can you "not print out a read-only string"?
>
> And for your [join] problem, well, works for me (as soon as you remember
> that it takes as second argument a *list* of readable strings).
>
> # print stdout (join ro";" [ro"1"; ro"2"; ro"3"]);;
> 1;2;3- : unit

My mistake.

Regards
--
Zheng


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Caml-list] Strings
  2009-04-04 11:12                 ` David Teller
  2009-04-04 11:40                   ` Jon Harrop
@ 2009-04-18 12:31                   ` Arkady Andrukonis
  1 sibling, 0 replies; 47+ messages in thread
From: Arkady Andrukonis @ 2009-04-18 12:31 UTC (permalink / raw)
  To: Caml-list


Hi,

It appears that Java sidesteps this issue by making string classes final, so that whenever a string object gets modified a new copy of it is generated, giving the appearance of mutable strings, but not really so. It seems Caml is on the right track with immutable strings, we have work-around methods for those occasions where relegating a state change into a confined area is a workable solution. Otherwise the benefit of functional programming would be lost.

kadee

--- On Sat, 4/4/09, David Teller <David.Teller@univ-orleans.fr> wrote:

> From: David Teller <David.Teller@univ-orleans.fr>
> Subject: Re: [Caml-list] Strings
> To: "Jon Harrop" <jon@ffconsultancy.com>
> Cc: caml-list@yquem.inria.fr
> Date: Saturday, April 4, 2009, 7:12 AM
> The bad thing is that, whenever you have to return text in
> an otherwise
> functional program, you need to enter "mutable array
> of bytes" land. You
> can't just assume that the user isn't going to
> modify that string,
> because, they can, possibly by accident, and any invariant
> relying on
> the fact that your strings can't change are going to be
> broken. In
> particular, any StringSet, any StringMap, etc.
> 
> Cheers,
>  David
> 
> On Sat, 2009-04-04 at 11:11 +0100, Jon Harrop wrote:
> > Why? This data structure is a mutable array of bytes
> and should be treated as 
> > such.
> > 
> 
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list:
> http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs


      


^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2009-04-18 12:31 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-03 11:56 Strings Jon Harrop
2009-04-03 12:25 ` [Caml-list] Strings Paolo Donadeo
2009-04-03 14:18 ` Ashish Agarwal
2009-04-03 14:46   ` Jon Harrop
2009-04-03 15:03     ` Daniel Bünzli
2009-04-03 16:52       ` Martin Jambon
2009-04-03 17:50         ` Daniel Bünzli
2009-04-03 19:46           ` Paolo Donadeo
2009-04-03 20:41             ` Harrison, John R
2009-04-04 10:11               ` Jon Harrop
2009-04-04 11:12                 ` David Teller
2009-04-04 11:40                   ` Jon Harrop
2009-04-04 12:34                     ` David Rajchenbach-Teller
2009-04-18 12:31                   ` Arkady Andrukonis
2009-04-04 10:13             ` Jon Harrop
2009-04-03 21:44           ` Goswin von Brederlow
2009-04-04  9:10             ` David Rajchenbach-Teller
2009-04-05 10:06               ` Strings Zheng Li
2009-04-06  9:20                 ` Strings David Rajchenbach-Teller
2009-04-06 10:07                   ` Strings Goswin von Brederlow
2009-04-06 11:03                   ` Strings Zheng Li
2009-04-04 17:11           ` [Caml-list] Strings Kuba Ober
2009-04-04 17:26             ` Jon Harrop
2009-04-05 20:54           ` Richard Jones
2009-04-05 23:40             ` Daniel Bünzli
2009-04-03 18:24         ` Florian Hars
2009-04-03 20:34         ` Arnaud Spiwack
2009-04-04 10:20       ` Jon Harrop
2009-04-04  9:14 ` David Rajchenbach-Teller
2009-04-04  9:26   ` Alp Mestan
2009-04-04 10:55     ` blue storm
2009-04-04 21:51     ` Goswin von Brederlow
2009-04-04 23:35       ` Yaron Minsky
2009-04-05  9:36         ` David Rajchenbach-Teller
2009-04-05 10:08           ` Alp Mestan
2009-04-05 21:41             ` Goswin von Brederlow
2009-04-05 21:40           ` Goswin von Brederlow
2009-04-05  2:55       ` Jon Harrop
2009-04-05  4:22         ` Edgar Friendly
2009-04-05  7:03           ` Goswin von Brederlow
2009-04-05  6:57         ` Goswin von Brederlow
2009-04-05  7:11           ` Jon Harrop
2009-04-04 10:11   ` Jon Harrop
2009-04-04 21:39   ` Goswin von Brederlow
2009-04-05  7:14   ` Romain Beauxis
2009-04-05  9:34     ` David Rajchenbach-Teller
2009-04-05 21:37     ` Goswin von Brederlow

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).