caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Strings as arrays or lists...
@ 2003-02-27 22:31 Oliver Bandel
  2003-02-28  1:03 ` brogoff
  0 siblings, 1 reply; 9+ messages in thread
From: Oliver Bandel @ 2003-02-27 22:31 UTC (permalink / raw)
  To: caml-list

Hello,


in Haskell, strings are lists of chars.

In Ocaml, strings seem to be array-like,
but it's not possible to apply Array-functions
on Strings, but nevertheless, the char's of a
string are used in a very similar way, as arrays.

The indexing is done with .(idx) in the one case
and with .[idx] in the other case.


Well, when strings would be lists of chars,
List-commands like List.map and List.filter
could be used.
This would be more FPL-like. (And more convenient
and IMHO more consistently.)


But even if only the imperative Array-access would
be possible (instead of the FPL-like Lists), it would
be more convenient then to have such a string-type,
which can't be used with the many libraries.


Any suggestions on that topic?

Ciao,
   Oliver
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Strings as arrays or lists...
  2003-02-27 22:31 [Caml-list] Strings as arrays or lists Oliver Bandel
@ 2003-02-28  1:03 ` brogoff
  2003-03-02 18:34   ` Xavier Leroy
  0 siblings, 1 reply; 9+ messages in thread
From: brogoff @ 2003-02-28  1:03 UTC (permalink / raw)
  To: Oliver Bandel; +Cc: caml-list

On Thu, 27 Feb 2003, Oliver Bandel wrote:
> Hello,
> 
> 
> in Haskell, strings are lists of chars.

And what a horrible design decision that is! Even the other 
lazy language (Clean) doesn't screw this one up, and last I looked 
at it had strings as unboxed character arrays. 

> In Ocaml, strings seem to be array-like,
> but it's not possible to apply Array-functions
> on Strings, but nevertheless, the char's of a
> string are used in a very similar way, as arrays.

Think of them as packed character arrays. 

> The indexing is done with .(idx) in the one case
> and with .[idx] in the other case.

OK, so some of us will be much happier if (when?) OCaml implements 
extensional polymorphism and we can treat all array like thingies uniformly. 

> Well, when strings would be lists of chars,
> List-commands like List.map and List.filter
> could be used.

Oh really? And what do we do when we want to use List.map on a long string? 
On my machine, Sys.max_string_length is 16_777_211. List.map will crap out 
on a much smaller list. 

> This would be more FPL-like. (And more convenient
> and IMHO more consistently.)

It would also be way slower, and restrict strings to much smaller lengths.
The length restriction can (and should!) be fixed by extending the set of 
functions which are tail recursive, as has been discussed on this list in 
the last month. However, I think array like representations for strings make 
more sense, and if "FPL" means "handles arrays poorly" then I think the flaw 
is in FPLs. 

> But even if only the imperative Array-access would
> be possible (instead of the FPL-like Lists), it would
> be more convenient then to have such a string-type,
> which can't be used with the many libraries.

I can't quite parse this, but if you mean that you'd like to treat strings 
like arrays notationally, I agree. Agitate for extensional polymorphism. 
Wow, that sounds even better than "eschew obfuscation"!

> Any suggestions on that topic?

There is no one string representation which will satisfy everyone. I prefer 
the built in one to be array based, like in OCaml. You can easily write a 
function like this 

let explode s =
  let rec exap n l =
    if n < 0 then l else
    exap (n - 1) ((String.get s n)::l) in
  exap (String.length s - 1) []

to get the char list you want. You couldn't get the packed char array from 
a char list if it wasn't built in. 

-- Brian


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Strings as arrays or lists...
  2003-02-28  1:03 ` brogoff
@ 2003-03-02 18:34   ` Xavier Leroy
  2003-03-02 19:03     ` Alain.Frisch
                       ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Xavier Leroy @ 2003-03-02 18:34 UTC (permalink / raw)
  To: brogoff; +Cc: Oliver Bandel, caml-list

> > in Haskell, strings are lists of chars.
> 
> And what a horrible design decision that is!

Agreed.  Well, it's a great way to multiply the memory requirements
for your strings by a factor of 12 (on 32-bit platforms) or 24 (on
64-bit platforms), while at the same time losing constant-time
indexing :-)  

Actually, the list representation of strings is so repugnant that I
don't even want to include "explode" and "implode" coercions between
string and char list in the standard library.  A standard library
should steer users away from algorithmically-inefficient code.  By not
having implode and explode in the library, I hope OCaml programmers
will come to the realization that the proper way to operate on strings
is either via block operations (the String module, regexps, etc), or
by recursion on integer indices.

- Xavier Leroy

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Strings as arrays or lists...
  2003-03-02 18:34   ` Xavier Leroy
@ 2003-03-02 19:03     ` Alain.Frisch
  2003-03-03  8:50     ` Luc Maranget
  2003-03-04  2:49     ` Eric C. Cooper
  2 siblings, 0 replies; 9+ messages in thread
From: Alain.Frisch @ 2003-03-02 19:03 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: brogoff, Oliver Bandel, caml-list

On Sun, 2 Mar 2003, Xavier Leroy wrote:

> > > in Haskell, strings are lists of chars.
> >
> > And what a horrible design decision that is!
>
> Agreed.  Well, it's a great way to multiply the memory requirements
> for your strings by a factor of 12 (on 32-bit platforms) or 24 (on
> 64-bit platforms), while at the same time losing constant-time
> indexing :-)

Not necessarily. Considering strings as lists of chars is a design
decision about the semantics of the language, not its implementation. For
instance in CDuce, we choosed to define the type String as [Char*], that
is, homogeneous sequences of (Unicode) characters. The idea is that we
want to have characters and elements at the same level inside XML elements
(the other solution is to have elements and strings; but when you
concatenate two sequences of elements-or-strings, you want to perform
implicit concatenation of strings at the middle to avoid two consecutive
strings). For the implementation, we introduce a compact representation of
strings at runtime, namely a pointer to a segment of a buffer (OCaml
string) + a "continuation" (which can be a representation of the same
kind, or a real list cell). When a pattern tries to deconstruct such a
representation to see it as list cell (head,tail), the runtime system
extracts the first character from the buffer and compute a pointer to the
next character (depending on the internal Unicode encoding of the buffer).
When a pattern extracts a consecutive subsequence (substrings)  - patterns
in CDuce are roughly regular expressions - it actually returns a pointer
to a subsegment of the buffer. Concatenation of sequences is computed
lazily, to avoid quadractic behavior when appending elements one by one at
the end. Basically, in many situations, we avoid space overhead and keep a
reasonable time overhead.

I'm not advocating this kind of techniques for a fully general
language such as OCaml; I just wanted to defend the "horrible" design for
specific applications...


-- Alain

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Strings as arrays or lists...
  2003-03-02 18:34   ` Xavier Leroy
  2003-03-02 19:03     ` Alain.Frisch
@ 2003-03-03  8:50     ` Luc Maranget
  2003-03-03 17:12       ` brogoff
  2003-03-04  2:49     ` Eric C. Cooper
  2 siblings, 1 reply; 9+ messages in thread
From: Luc Maranget @ 2003-03-03  8:50 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: brogoff, Oliver Bandel, caml-list

> 
> > > in Haskell, strings are lists of chars.
> > 
> > And what a horrible design decision that is!
> 
> Agreed.  Well, it's a great way to multiply the memory requirements
> for your strings by a factor of 12 (on 32-bit platforms) or 24 (on
> 64-bit platforms), while at the same time losing constant-time
> indexing :-)  
> 
> Actually, the list representation of strings is so repugnant that I
> don't even want to include "explode" and "implode" coercions between
> string and char list in the standard library.  A standard library
> should steer users away from algorithmically-inefficient code.  By not
> having implode and explode in the library, I hope OCaml programmers
> will come to the realization that the proper way to operate on strings
> is either via block operations (the String module, regexps, etc), or
> by recursion on integer indices.
> 
> - Xavier Leroy
> 


Xavier is right, of course.
However, in a lazy context, seeing strings as list of chars has some
advantages. This is not relevant to Caml anyway.

--Luc

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Strings as arrays or lists...
  2003-03-03  8:50     ` Luc Maranget
@ 2003-03-03 17:12       ` brogoff
  2003-03-03 17:40         ` Diego Olivier Fernandez Pons
  0 siblings, 1 reply; 9+ messages in thread
From: brogoff @ 2003-03-03 17:12 UTC (permalink / raw)
  To: Luc Maranget; +Cc: Xavier Leroy, Oliver Bandel, caml-list

Yes, Xavier is right here, and I apologize for posting an "explode" function 
on this list. I'll appeal for leniency, mentioning that I didn't post 
"implode", and that I only posted the former function to demonstrate that 
it could be done, and to point out that one of the many reasons strings as 
char lists is wrong as a basic type is that you get an abstraction inversion. 

As far as char lists being somewhat advantageous in a lazy language, well, I 
won't start a flamewar as to whether laziness as the default is a good design 
decision (oh hell, I'll admit, I think it isn't) but I'll repeat my 
observation that in the Clean language, also lazy by default like Haskell, 
strings are unboxed character arrays. 

Back to the topic which interests us, OCaml. One thing I'd like to see in an 
extended library is a fairly rich substring library, perhaps like the one in 
the SML Basis. I have the beginnings of one, and I also ported the SML one 
from Moscow ML to OCaml. It can certainly be made richer, the first improvement 
on my list being an integration with a regexp package. 

I'm sure there are numerous ideas just for string libraries, and that we could 
fill an entire mailing list just with those. Ropes (a binary tree representation 
for applicative "big strings") and extended character sets (I guess Camomile is 
doing that now?) are my favorites. 

-- Brian


On Mon, 3 Mar 2003, Luc Maranget wrote:
> > > > in Haskell, strings are lists of chars.
> > > 
> > > And what a horrible design decision that is!
> > 
> > Agreed.  Well, it's a great way to multiply the memory requirements
> > for your strings by a factor of 12 (on 32-bit platforms) or 24 (on
> > 64-bit platforms), while at the same time losing constant-time
> > indexing :-)  
> > 
> > Actually, the list representation of strings is so repugnant that I
> > don't even want to include "explode" and "implode" coercions between
> > string and char list in the standard library.  A standard library
> > should steer users away from algorithmically-inefficient code.  By not
> > having implode and explode in the library, I hope OCaml programmers
> > will come to the realization that the proper way to operate on strings
> > is either via block operations (the String module, regexps, etc), or
> > by recursion on integer indices.
> > 
> > - Xavier Leroy
> > 
> 
> 
> Xavier is right, of course.
> However, in a lazy context, seeing strings as list of chars has some
> advantages. This is not relevant to Caml anyway.
> 
> --Luc
> 
> -------------------
> To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
> Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> 

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Strings as arrays or lists...
  2003-03-03 17:12       ` brogoff
@ 2003-03-03 17:40         ` Diego Olivier Fernandez Pons
  0 siblings, 0 replies; 9+ messages in thread
From: Diego Olivier Fernandez Pons @ 2003-03-03 17:40 UTC (permalink / raw)
  To: caml-list

    Bonjour,

Brian Rogoff wrote :

> As far as char lists being somewhat advantageous in a lazy language,
> well, I won't start a flamewar as to whether laziness as the default
> is a good design decision (oh hell, I'll admit, I think it isn't)
> but I'll repeat my observation that in the Clean language, also lazy
> by default like Haskell, strings are unboxed character arrays.

Caml is a 'most of the time strict but lazy when you need' language
whereas Haskell is a 'most of the time lazy but strict when you need'
one.

And Caml already provides support for lazy lists by means of the
Stream module, which is pretty good and convenient.

> I'm sure there are numerous ideas just for string libraries, and
> that we could fill an entire mailing list just with those. Ropes (a
> binary tree representation for applicative "big strings") and
> extended character sets (I guess Camomile is doing that now?) are my
> favorites.

Hum... Fast catenable arrays are the subject of Ralf Hinze last paper
'bootstrapping one-side flexible arrays'. He uses weight balanced
trees of imperative arrays.

It also makes a good representation for (precomputed) streams.


        Diego Olivier

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Strings as arrays or lists...
  2003-03-02 18:34   ` Xavier Leroy
  2003-03-02 19:03     ` Alain.Frisch
  2003-03-03  8:50     ` Luc Maranget
@ 2003-03-04  2:49     ` Eric C. Cooper
  2003-03-04  8:29       ` Fabrice Le Fessant
  2 siblings, 1 reply; 9+ messages in thread
From: Eric C. Cooper @ 2003-03-04  2:49 UTC (permalink / raw)
  To: caml-list

On Sun, Mar 02, 2003 at 07:34:37PM +0100, Xavier Leroy wrote:
> Actually, the list representation of strings is so repugnant that I
> don't even want to include "explode" and "implode" coercions between
> string and char list in the standard library.  A standard library
> should steer users away from algorithmically-inefficient code.  By not
> having implode and explode in the library, I hope OCaml programmers
> will come to the realization that the proper way to operate on strings
> is either via block operations (the String module, regexps, etc), or
> by recursion on integer indices.

I recently wrote some code that made use of char lists, and explode
and implode came in handy.  I needed to marshal a recursive datatype
into a packet to be sent over a communication channel according to a
protocol that imposed a specific format, including a length byte at
the beginning and a checksum byte at the end.

I could have made one pass over the data to compute the packet length,
then a second pass marshaling it into a buffer.  But it was very
natural to just build up a list of bytes in a single traversal of the
datatype.  Then the length and checksum could easily be added to the
beginning and the end, and the result written out.

I used explode when I encountered string values at the leaves of the
datatype.  I didn't really need implode (I could just iterate
output_byte over the final list), but it came in handy for dumping
packets for debugging.

I wasn't really using char lists as a representation of strings, but
rather as a buffer-like data structure that just happened to need
conversion to and from strings at certain points.

As another poster pointed out, explode and implode are analogous to
Array.to_list and Array.of_list, which don't seem to entice OCaml
programmers down the path of algorithmic ineffiency.

-- 
Eric C. Cooper          e c c @ c m u . e d u

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Strings as arrays or lists...
  2003-03-04  2:49     ` Eric C. Cooper
@ 2003-03-04  8:29       ` Fabrice Le Fessant
  0 siblings, 0 replies; 9+ messages in thread
From: Fabrice Le Fessant @ 2003-03-04  8:29 UTC (permalink / raw)
  To: Eric C. Cooper; +Cc: caml-list


>  I recently wrote some code that made use of char lists, and explode
>  and implode came in handy.  I needed to marshal a recursive datatype
>  into a packet to be sent over a communication channel according to a
>  protocol that imposed a specific format, including a length byte at
>  the beginning and a checksum byte at the end.
>  
>  I could have made one pass over the data to compute the packet length,
>  then a second pass marshaling it into a buffer.  But it was very
>  natural to just build up a list of bytes in a single traversal of the
>  datatype.  Then the length and checksum could easily be added to the
>  beginning and the end, and the result written out.

I had exactly the same code to write one year ago, and I simply built
the packet inside a buffer, with a 0 length-field, changed it to a
string, and filled the length-field just before sending, by changing
the chars in place in the string. It only allocates the buffer once,
and then the final string at the end, I cannot believe that using a
list of chars can be more efficient, but sometimes strange things
happen... 

- Fabrice

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2003-03-04  9:20 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-02-27 22:31 [Caml-list] Strings as arrays or lists Oliver Bandel
2003-02-28  1:03 ` brogoff
2003-03-02 18:34   ` Xavier Leroy
2003-03-02 19:03     ` Alain.Frisch
2003-03-03  8:50     ` Luc Maranget
2003-03-03 17:12       ` brogoff
2003-03-03 17:40         ` Diego Olivier Fernandez Pons
2003-03-04  2:49     ` Eric C. Cooper
2003-03-04  8:29       ` Fabrice Le Fessant

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).