From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by sympa.inria.fr (Postfix) with ESMTPS id 429A97FA4D for ; Tue, 29 Jul 2014 04:54:42 +0200 (CEST) Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of yminsky@janestreet.com) identity=pra; client-ip=38.105.200.229; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="yminsky@janestreet.com"; x-sender="yminsky@janestreet.com"; x-conformance=sidf_compatible Received-SPF: Pass (mail2-smtp-roc.national.inria.fr: domain of yminsky@janestreet.com designates 38.105.200.229 as permitted sender) identity=mailfrom; client-ip=38.105.200.229; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="yminsky@janestreet.com"; x-sender="yminsky@janestreet.com"; x-conformance=sidf_compatible; x-record-type="v=spf1" Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of postmaster@mxout3.mail.janestreet.com) identity=helo; client-ip=38.105.200.229; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="yminsky@janestreet.com"; x-sender="postmaster@mxout3.mail.janestreet.com"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AjcBAKwL11MmacjlnGdsb2JhbABZg2BXBIJ0sDCYZIdLAYEMCBYQAQEBAQEIFAk9hAQBAQQSEQQZAQETGQsBDwsLDQICCR0CAiEBEgEFAQoSBhMSEIgMAxEDCpsUaooyd4UCAQWLLgMKFYZ+EQYKgSKLcwqBN2wHgnmBUYR4BZRPA4ICgVKFTIcLhDEYKYUWUAGBCw X-IPAS-Result: AjcBAKwL11MmacjlnGdsb2JhbABZg2BXBIJ0sDCYZIdLAYEMCBYQAQEBAQEIFAk9hAQBAQQSEQQZAQETGQsBDwsLDQICCR0CAiEBEgEFAQoSBhMSEIgMAxEDCpsUaooyd4UCAQWLLgMKFYZ+EQYKgSKLcwqBN2wHgnmBUYR4BZRPA4ICgVKFTIcLhDEYKYUWUAGBCw X-IronPort-AV: E=Sophos;i="5.01,753,1400018400"; d="scan'208";a="87349172" Received: from mxout3.mail.janestreet.com ([38.105.200.229]) by mail2-smtp-roc.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-SHA; 29 Jul 2014 04:54:40 +0200 Received: from tot-oib-smtp1 ([172.27.22.15] helo=tot-smtp) by mxout3.mail.janestreet.com with esmtps (UNKNOWN:DHE-RSA-AES256-GCM-SHA384:256) (Exim 4.82) (envelope-from ) id 1XBxYk-0007Xj-Jk for caml-list@inria.fr; Mon, 28 Jul 2014 22:54:38 -0400 Received: from [172.27.229.230] (helo=mxgoog1.mail.janestreet.com) by tot-smtp with esmtps (UNKNOWN:AES256-GCM-SHA384:256) (Exim 4.72) (envelope-from ) id 1XBxYk-0006mi-Ge for caml-list@inria.fr; Mon, 28 Jul 2014 22:54:38 -0400 Received: from mail-lb0-f179.google.com ([209.85.217.179]) by mxgoog1.mail.janestreet.com with esmtps (TLSv1:RC4-SHA:128) (Exim 4.72) (envelope-from ) id 1XBxYk-0001W9-7C for caml-list@inria.fr; Mon, 28 Jul 2014 22:54:38 -0400 Received: by mail-lb0-f179.google.com with SMTP id v6so6352776lbi.24 for ; Mon, 28 Jul 2014 19:54:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=+z/5ILPC6qQiCGAX+A3na7tx/gN3Aux4OnAMutnjVzY=; b=zHUGYveVbRakFeSLncxaCq8g8dS2MXm6P6BwC3M2QL8g+Dxp7J5s5SUVLfd88E1yRj tW9H0DTRnhru+cYmm5VwyLBMM97Q6PB6PyomxqwNF3zHXCIRvWVOpvFGh0ePAf5ShYGH SlIRJQPDl2Y5QyhT4/E8Znl7EE5Ta6hZ1Uh8A= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=+z/5ILPC6qQiCGAX+A3na7tx/gN3Aux4OnAMutnjVzY=; b=GsL/jeXP0NRa1d0VfxWnowgQcWAu1zb6baD8JrAFXc59+kP0uW5X5zvQfcYjPlgBsQ u6U7hJ2iJE/0Zhw1vVjMgTS3Vl22k3LPVNP5xvkilPJPaR0BcLHNPGNta+mF7Xs+FvlR anjba8uDk/aYM1QtXo6iuEx6sH6fnIFakuNBQYNLRVdShaT8nJVHVVhqYT+cO3KmGA+f KZQNJaFOhl72z7f9LsfMYmsDjQ2sNN5RlBrst/dPeLw4C0E9W5HVVQyqvg/20TRIxeE0 Bht6NEeUPDHT8oaLrKwvbOBG9YQd8GUk1mg3nMuugzihGVG/6jsr2K0WFAEfO72RKcvJ WCgA== X-Gm-Message-State: ALoCoQnlqnN7EacMEqkGx1XqK9SJNlYkooIBIst21h5U0WAHJDvOzO/T+QJ1rpWM/HjmxlzEKq8zXuia9AjEkrZxuIPmhEyeNVZHRTVieTj1i47cxxbEfgLce205+b75nIQRHcXlZSmu X-Received: by 10.153.5.44 with SMTP id cj12mr40396286lad.36.1406602476989; Mon, 28 Jul 2014 19:54:36 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.153.5.44 with SMTP id cj12mr40396276lad.36.1406602476822; Mon, 28 Jul 2014 19:54:36 -0700 (PDT) Received: by 10.112.65.228 with HTTP; Mon, 28 Jul 2014 19:54:36 -0700 (PDT) In-Reply-To: References: <1404501528.4384.4.camel@e130> <20140728111452.GA26816@frosties> Date: Mon, 28 Jul 2014 22:54:36 -0400 Message-ID: From: Yaron Minsky To: Markus Mottl Cc: Goswin von Brederlow , "caml-list@inria.fr" Content-Type: text/plain; charset=UTF-8 Subject: Re: [Caml-list] Immutable strings This isn't my idea, but it seems worth repeating: perhaps it would make sense to have an unmovable byte-array type that had the same memory representation as Bytes.t, with the extra guarantee that the collector wouldn't move it. You could imagine representing this as a private type: module Immovable_bytes : sig type t = private Bytes.t val create : int -> t end with a special creation function for creating these immovable strings. This would avoid some of the current need to write what is effectively the same code twice, once for bigarrays and once for Bytes.t's. In particular, you could modify an Immovable_bytes by first up-casting it to a Bytes.t. But you could only actually create one by going through the special creation function. I'm not sure if the runtime details could be made to work out, but if they could, I think it would be a bit nicer than the current world. y On Mon, Jul 28, 2014 at 11:51 AM, Markus Mottl wrote: > On Mon, Jul 28, 2014 at 7:14 AM, Goswin von Brederlow wrote: >> Why is that? A bigarray allocates a small block on the ocaml heap and >> the buffer outside the ocaml heap. Is that normal malloc() call just >> so much slower? Or are there other factors involved? > > If you look at the runtime code, you'll see that there is quite a lot > going on to create a bigarray value. Allocating small OCaml-strings > on the minor heap only costs a handful of cheap instructions, which is > obviously way more efficient. There is some threshold at which malloc > will perform more expensive system calls to obtain memory whereas > OCaml may still be able to get some larger chunks from the major heap. > Unless Bigarrays become really large, standard OCaml strings can be > obtained much more cheaply. > >> On the other hand if your app is IO heavy then you should allocate a >> few buffers and reuse them. In that case the allocation overhead is >> constant and the time saved for not copying in the I/O will more than >> make up for it. > > Exactly. Bigarrays are my buffer of choice for I/O. > >> Or read/mmap the file into a huge bigarray and the slice it into >> smaller chunks. > > This can improve performance for certain operations, but beware of > page faults when accessing ranges that only reside on disk. Unless > this access is done outside of the OCaml-lock, your application could > freeze longer than allowed for realtime applications. > > Regards, > Markus > >>> On Fri, Jul 4, 2014 at 3:18 PM, Gerd Stolpmann wrote: >>> > Hi list, >>> > >>> > I've just posted a blog article where I criticize the new concept of >>> > immutable strings that will be available in OCaml 4.02 (as option): >>> > >>> > http://blog.camlcity.org/blog/bytes1.html >>> > >>> > In short my point is that it the new concept is not far reaching enough, >>> > and will even have negative impact on the code quality when it is not >>> > improved. I also present three ideas how to improve it. >>> > >>> > Gerd >> >> You have a few more points: >> >> 1) there are 3 kinds of strings: >> >> - string literal / constant strings [which never change ever] >> - read-only strings [which YOU are not allowed to change but might change] >> - mutable strings [which you are allowed to changed] >> >> There is one other thing you didn't mention here. While it is nice to >> pass a mutable string to the lexer (or similar) one has to realize >> that that is not thread save. Another thread might be mutating the >> string while it is being used. >> >> So I would suggest there is a 4th kind of string: >> >> - frozen strings [which are mutable but won't be changed anymore] >> >> That is basically like read-only strings but with the addes promise >> that they won't be changed. Nothing in the type system garanties that, >> it is just a promise from the programmer. >> >> 2) there are lots of functions that just need any kind of string and >> should accept all 3 >> >> This kind of asks for type classes. There should be a read-from-string >> type class that all 3 string types would fit. Then one could have one >> function accepting a read-from-string type class and all 3 string >> types could be passed. But unfortunately ocaml doesn't have type >> classes. >> >> The next best thing would be enumerations (not in stdlib). Make >> enumerations accept all 3 string types and then have everything else >> accept enumerations. This would also mean you could pass a char list >> or rope or any other type that gives you an enumeration of chars. >> >> 3) I/O code >> >> That the stdlib uses strings for I/O and needs to copy the data around >> all the time has been nagging me for years. There certainly should be >> read/write functions dealing with bigarrays. >> >> There also should be a function to create a bigarray with special >> alignment (e.g. PAGESIZE) to get the best I/O performance (or in case >> of linux async IO make it work at all). >> >> As for mutable/immutable strings there should be a read function >> returning an immutable string, which it creates internally. The string >> can't be passed as argument so creating a fresh one is the only way. >> >> >> Here is a completly new point: >> >> 4) What is good for strings is also good for bigarray >> >> The same arguments concerning strings applies to bigarrays. Say you >> pass a bigarray to the lexer. Can it just use it as is for its lexbuf >> or does it need to copy it because it might mutate? An immutable >> bigarray could be used savely as is. >> >> >> And this doesn't realy stop at bigarray. Even references could be >> read-only, in the sense of "this might change but YOU aren't allowed >> to change it". And I think the only way to solve the >> const/immutable/mutable/frozen sub-types that will be applicable to >> more than just string is to use phantom types. >> >> MfG >> Goswin >> >> -- >> Caml-list mailing list. Subscription management and archives: >> https://sympa.inria.fr/sympa/arc/caml-list >> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners >> Bug reports: http://caml.inria.fr/bin/caml-bugs > > > > -- > Markus Mottl http://www.ocaml.info markus.mottl@gmail.com > > -- > Caml-list mailing list. Subscription management and archives: > https://sympa.inria.fr/sympa/arc/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs