caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Yaron Minsky <yminsky@janestreet.com>
To: Markus Mottl <markus.mottl@gmail.com>
Cc: Goswin von Brederlow <goswin-v-b@web.de>,
	"caml-list@inria.fr" <caml-list@inria.fr>
Subject: Re: [Caml-list] Immutable strings
Date: Mon, 28 Jul 2014 22:54:36 -0400	[thread overview]
Message-ID: <CACLX4jQx=06KPK14PbMvQDOd8qams1w_kCYKd=6KVkDvJu6HjA@mail.gmail.com> (raw)
In-Reply-To: <CAP_800oiMeXnquwyHsN+XWf7ewGqTBX1ZKyJmDvEtxPZceFFPg@mail.gmail.com>

This isn't my idea, but it seems worth repeating: perhaps it would
make sense to have an unmovable byte-array type that had the same
memory representation as Bytes.t, with the extra guarantee that the
collector wouldn't move it.

You could imagine representing this as a private type:

module Immovable_bytes : sig
   type t = private Bytes.t
   val create : int -> t
end

with a special creation function for creating these immovable strings.
This would avoid some of the current need to write what is effectively
the same code twice, once for bigarrays and once for Bytes.t's.  In
particular, you could modify an Immovable_bytes by first up-casting it
to a Bytes.t.  But you could only actually create one by going through
the special creation function.

I'm not sure if the runtime details could be made to work out, but if
they could, I think it would be a bit nicer than the current world.

y

On Mon, Jul 28, 2014 at 11:51 AM, Markus Mottl <markus.mottl@gmail.com> wrote:
> On Mon, Jul 28, 2014 at 7:14 AM, Goswin von Brederlow <goswin-v-b@web.de> wrote:
>> Why is that? A bigarray allocates a small block on the ocaml heap and
>> the buffer outside the ocaml heap. Is that normal malloc() call just
>> so much slower? Or are there other factors involved?
>
> If you look at the runtime code, you'll see that there is quite a lot
> going on to create a bigarray value.  Allocating small OCaml-strings
> on the minor heap only costs a handful of cheap instructions, which is
> obviously way more efficient.  There is some threshold at which malloc
> will perform more expensive system calls to obtain memory whereas
> OCaml may still be able to get some larger chunks from the major heap.
> Unless Bigarrays become really large, standard OCaml strings can be
> obtained much more cheaply.
>
>> On the other hand if your app is IO heavy then you should allocate a
>> few buffers and reuse them. In that case the allocation overhead is
>> constant and the time saved for not copying in the I/O will more than
>> make up for it.
>
> Exactly.  Bigarrays are my buffer of choice for I/O.
>
>> Or read/mmap the file into a huge bigarray and the slice it into
>> smaller chunks.
>
> This can improve performance for certain operations, but beware of
> page faults when accessing ranges that only reside on disk.  Unless
> this access is done outside of the OCaml-lock, your application could
> freeze longer than allowed for realtime applications.
>
> Regards,
> Markus
>
>>> On Fri, Jul 4, 2014 at 3:18 PM, Gerd Stolpmann <info@gerd-stolpmann.de> wrote:
>>> > Hi list,
>>> >
>>> > I've just posted a blog article where I criticize the new concept of
>>> > immutable strings that will be available in OCaml 4.02 (as option):
>>> >
>>> > http://blog.camlcity.org/blog/bytes1.html
>>> >
>>> > In short my point is that it the new concept is not far reaching enough,
>>> > and will even have negative impact on the code quality when it is not
>>> > improved. I also present three ideas how to improve it.
>>> >
>>> > Gerd
>>
>> You have a few more points:
>>
>> 1) there are 3 kinds of strings:
>>
>> - string literal / constant strings [which never change ever]
>> - read-only strings [which YOU are not allowed to change but might change]
>> - mutable strings [which you are allowed to changed]
>>
>> There is one other thing you didn't mention here. While it is nice to
>> pass a mutable string to the lexer (or similar) one has to realize
>> that that is not thread save. Another thread might be mutating the
>> string while it is being used.
>>
>> So I would suggest there is a 4th kind of string:
>>
>> - frozen strings [which are mutable but won't be changed anymore]
>>
>> That is basically like read-only strings but with the addes promise
>> that they won't be changed. Nothing in the type system garanties that,
>> it is just a promise from the programmer.
>>
>> 2) there are lots of functions that just need any kind of string and
>>    should accept all 3
>>
>> This kind of asks for type classes. There should be a read-from-string
>> type class that all 3 string types would fit. Then one could have one
>> function accepting a read-from-string type class and all 3 string
>> types could be passed. But unfortunately ocaml doesn't have type
>> classes.
>>
>> The next best thing would be enumerations (not in stdlib). Make
>> enumerations accept all 3 string types and then have everything else
>> accept enumerations. This would also mean you could pass a char list
>> or rope or any other type that gives you an enumeration of chars.
>>
>> 3) I/O code
>>
>> That the stdlib uses strings for I/O and needs to copy the data around
>> all the time has been nagging me for years. There certainly should be
>> read/write functions dealing with bigarrays.
>>
>> There also should be a function to create a bigarray with special
>> alignment (e.g. PAGESIZE) to get the best I/O performance (or in case
>> of linux async IO make it work at all).
>>
>> As for mutable/immutable strings there should be a read function
>> returning an immutable string, which it creates internally. The string
>> can't be passed as argument so creating a fresh one is the only way.
>>
>>
>> Here is a completly new point:
>>
>> 4) What is good for strings is also good for bigarray
>>
>> The same arguments concerning strings applies to bigarrays. Say you
>> pass a bigarray to the lexer. Can it just use it as is for its lexbuf
>> or does it need to copy it because it might mutate? An immutable
>> bigarray could be used savely as is.
>>
>>
>> And this doesn't realy stop at bigarray. Even references could be
>> read-only, in the sense of "this might change but YOU aren't allowed
>> to change it". And I think the only way to solve the
>> const/immutable/mutable/frozen sub-types that will be applicable to
>> more than just string is to use phantom types.
>>
>> MfG
>>         Goswin
>>
>> --
>> Caml-list mailing list.  Subscription management and archives:
>> https://sympa.inria.fr/sympa/arc/caml-list
>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
>
>
> --
> Markus Mottl        http://www.ocaml.info        markus.mottl@gmail.com
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs

  reply	other threads:[~2014-07-29  2:54 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-04 19:18 Gerd Stolpmann
2014-07-04 20:31 ` Anthony Tavener
2014-07-04 20:38   ` Malcolm Matalka
2014-07-04 23:44   ` Daniel Bünzli
2014-07-05 11:04   ` Gerd Stolpmann
2014-07-16 11:38     ` Damien Doligez
2014-07-04 21:01 ` Markus Mottl
2014-07-05 11:24   ` Gerd Stolpmann
2014-07-08 13:23     ` Jacques Garrigue
2014-07-08 13:37       ` Alain Frisch
2014-07-08 14:04         ` Jacques Garrigue
2014-07-28 11:14   ` Goswin von Brederlow
2014-07-28 15:51     ` Markus Mottl
2014-07-29  2:54       ` Yaron Minsky [this message]
2014-07-29  9:46         ` Goswin von Brederlow
2014-07-29 11:48         ` John F. Carr
2014-07-07 12:42 ` Alain Frisch
2014-07-08 12:24   ` Gerd Stolpmann
2014-07-09 13:54     ` Alain Frisch
2014-07-09 18:04       ` Gerd Stolpmann
2014-07-10  6:41         ` Nicolas Boulay
2014-07-14 17:40       ` Richard W.M. Jones
2014-07-08 18:15 ` mattiasw
2014-07-08 19:24   ` Daniel Bünzli
2014-07-08 19:27     ` Raoul Duke
2014-07-09 14:15   ` Daniel Bünzli
2014-07-14 17:45   ` Richard W.M. Jones
2014-07-21 15:06 ` Alain Frisch
     [not found]   ` <20140722.235104.405798419265248505.Christophe.Troestler@umons.ac.be>
2014-08-29 16:30     ` Damien Doligez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACLX4jQx=06KPK14PbMvQDOd8qams1w_kCYKd=6KVkDvJu6HjA@mail.gmail.com' \
    --to=yminsky@janestreet.com \
    --cc=caml-list@inria.fr \
    --cc=goswin-v-b@web.de \
    --cc=markus.mottl@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).