caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Goswin von Brederlow <goswin-v-b@web.de>
To: caml-list@inria.fr
Subject: Re: [Caml-list] Immutable strings
Date: Mon, 28 Jul 2014 13:14:52 +0200	[thread overview]
Message-ID: <20140728111452.GA26816@frosties> (raw)
In-Reply-To: <CAP_800rUwwmYYs7fSKt-i2SnFxPGLo+9fczijR=U0Z7GaQhicA@mail.gmail.com>

On Fri, Jul 04, 2014 at 05:01:18PM -0400, Markus Mottl wrote:
> I agree that the new concept has some noteworthy downsides as
> demonstrated in the Lexing-example.  Your proposed solution 2
> (stringlike) would probably solve these issues from a safety point of
> view.  The downside is that the complexity of string-handling would
> increase even more, because then we would have three types to deal
> with.  I personally prefer safety over convenience, but other people's
> (especially beginner's) mileage may vary.
> 
> The Bigarray-approach doesn't seem appealing to me.  Strings are much
> more lightweight, since they can be allocated cheaply on the
> OCaml-heap.  E.g. String.create is about 10x-100x faster than
> Bigarray.create.  That seems too big to ignore.
> 
> Regards,
> Markus

Why is that? A bigarray allocates a small block on the ocaml heap and
the buffer outside the ocaml heap. Is that normal malloc() call just
so much slower? Or are there other factors involved?

On the other hand if your app is IO heavy then you should allocate a
few buffers and reuse them. In that case the allocation overhead is
constant and the time saved for not copying in the I/O will more than
make up for it.

Or read/mmap the file into a huge bigarray and the slice it into
smaller chunks.

> On Fri, Jul 4, 2014 at 3:18 PM, Gerd Stolpmann <info@gerd-stolpmann.de> wrote:
> > Hi list,
> >
> > I've just posted a blog article where I criticize the new concept of
> > immutable strings that will be available in OCaml 4.02 (as option):
> >
> > http://blog.camlcity.org/blog/bytes1.html
> >
> > In short my point is that it the new concept is not far reaching enough,
> > and will even have negative impact on the code quality when it is not
> > improved. I also present three ideas how to improve it.
> >
> > Gerd

You have a few more points:

1) there are 3 kinds of strings:

- string literal / constant strings [which never change ever]
- read-only strings [which YOU are not allowed to change but might change]
- mutable strings [which you are allowed to changed]

There is one other thing you didn't mention here. While it is nice to
pass a mutable string to the lexer (or similar) one has to realize
that that is not thread save. Another thread might be mutating the
string while it is being used.

So I would suggest there is a 4th kind of string:

- frozen strings [which are mutable but won't be changed anymore]

That is basically like read-only strings but with the addes promise
that they won't be changed. Nothing in the type system garanties that,
it is just a promise from the programmer.

2) there are lots of functions that just need any kind of string and
   should accept all 3

This kind of asks for type classes. There should be a read-from-string
type class that all 3 string types would fit. Then one could have one
function accepting a read-from-string type class and all 3 string
types could be passed. But unfortunately ocaml doesn't have type
classes.

The next best thing would be enumerations (not in stdlib). Make
enumerations accept all 3 string types and then have everything else
accept enumerations. This would also mean you could pass a char list
or rope or any other type that gives you an enumeration of chars.

3) I/O code

That the stdlib uses strings for I/O and needs to copy the data around
all the time has been nagging me for years. There certainly should be
read/write functions dealing with bigarrays.

There also should be a function to create a bigarray with special
alignment (e.g. PAGESIZE) to get the best I/O performance (or in case
of linux async IO make it work at all).

As for mutable/immutable strings there should be a read function
returning an immutable string, which it creates internally. The string
can't be passed as argument so creating a fresh one is the only way.


Here is a completly new point:

4) What is good for strings is also good for bigarray

The same arguments concerning strings applies to bigarrays. Say you
pass a bigarray to the lexer. Can it just use it as is for its lexbuf
or does it need to copy it because it might mutate? An immutable
bigarray could be used savely as is.


And this doesn't realy stop at bigarray. Even references could be
read-only, in the sense of "this might change but YOU aren't allowed
to change it". And I think the only way to solve the
const/immutable/mutable/frozen sub-types that will be applicable to
more than just string is to use phantom types.

MfG
	Goswin

  parent reply	other threads:[~2014-07-28 11:14 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-04 19:18 Gerd Stolpmann
2014-07-04 20:31 ` Anthony Tavener
2014-07-04 20:38   ` Malcolm Matalka
2014-07-04 23:44   ` Daniel Bünzli
2014-07-05 11:04   ` Gerd Stolpmann
2014-07-16 11:38     ` Damien Doligez
2014-07-04 21:01 ` Markus Mottl
2014-07-05 11:24   ` Gerd Stolpmann
2014-07-08 13:23     ` Jacques Garrigue
2014-07-08 13:37       ` Alain Frisch
2014-07-08 14:04         ` Jacques Garrigue
2014-07-28 11:14   ` Goswin von Brederlow [this message]
2014-07-28 15:51     ` Markus Mottl
2014-07-29  2:54       ` Yaron Minsky
2014-07-29  9:46         ` Goswin von Brederlow
2014-07-29 11:48         ` John F. Carr
2014-07-07 12:42 ` Alain Frisch
2014-07-08 12:24   ` Gerd Stolpmann
2014-07-09 13:54     ` Alain Frisch
2014-07-09 18:04       ` Gerd Stolpmann
2014-07-10  6:41         ` Nicolas Boulay
2014-07-14 17:40       ` Richard W.M. Jones
2014-07-08 18:15 ` mattiasw
2014-07-08 19:24   ` Daniel Bünzli
2014-07-08 19:27     ` Raoul Duke
2014-07-09 14:15   ` Daniel Bünzli
2014-07-14 17:45   ` Richard W.M. Jones
2014-07-21 15:06 ` Alain Frisch
     [not found]   ` <20140722.235104.405798419265248505.Christophe.Troestler@umons.ac.be>
2014-08-29 16:30     ` Damien Doligez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140728111452.GA26816@frosties \
    --to=goswin-v-b@web.de \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).