From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by sympa.inria.fr (Postfix) with ESMTPS id 83D397FA4D for ; Mon, 28 Jul 2014 13:14:56 +0200 (CEST) Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of goswin-v-b@web.de) identity=pra; client-ip=212.227.17.12; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="goswin-v-b@web.de"; x-sender="goswin-v-b@web.de"; x-conformance=sidf_compatible Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of goswin-v-b@web.de) identity=mailfrom; client-ip=212.227.17.12; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="goswin-v-b@web.de"; x-sender="goswin-v-b@web.de"; x-conformance=sidf_compatible Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of postmaster@mout.web.de) identity=helo; client-ip=212.227.17.12; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="goswin-v-b@web.de"; x-sender="postmaster@mout.web.de"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsBAJQv1lPU4xEMnGdsb2JhbABZg2BXy3aBXYVoAYEUFhABAQEBAQYNCQkUKYQDAQEEATpECwsYCSUPBSg0G4gSAQwMCbU/H4dJF40pgiqDL4EbBZlKggGBU4U0E5B+agE X-IPAS-Result: AvsBAJQv1lPU4xEMnGdsb2JhbABZg2BXy3aBXYVoAYEUFhABAQEBAQYNCQkUKYQDAQEEATpECwsYCSUPBSg0G4gSAQwMCbU/H4dJF40pgiqDL4EbBZlKggGBU4U0E5B+agE X-IronPort-AV: E=Sophos;i="5.01,748,1400018400"; d="scan'208";a="87267635" Received: from mout.web.de ([212.227.17.12]) by mail2-smtp-roc.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-SHA; 28 Jul 2014 13:14:56 +0200 Received: from frosties.localnet ([78.43.112.61]) by smtp.web.de (mrweb103) with ESMTPSA (Nemesis) id 0MgfBj-1WoPPt2dS3-00NwUw for ; Mon, 28 Jul 2014 13:14:54 +0200 Received: from mrvn by frosties.localnet with local (Exim 4.82) (envelope-from ) id 1XBitI-0007Jj-AZ for caml-list@inria.fr; Mon, 28 Jul 2014 13:14:52 +0200 Date: Mon, 28 Jul 2014 13:14:52 +0200 From: Goswin von Brederlow To: caml-list@inria.fr Message-ID: <20140728111452.GA26816@frosties> References: <1404501528.4384.4.camel@e130> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Provags-ID: V03:K0:GiZbyOU6VQufMcHneN62fWyfsLhMWdN2G6Q5JLmdQucTsV3HTL+ hI3a6/4Q56H/kLxdpKSL2LyepRKnA+bSxSQTBU7tP0HkPAns6TbCKrF1zC19Ip8uo02FgMA ujSUQxUHAxVqbvwJH+HploVl56DB89lcF9/75uyORX6XAy6jbJ4pk8RgEHC5tGP5Y0d/Yd0 +xS0EmHzGMcMEVdXPQqQg== Subject: Re: [Caml-list] Immutable strings On Fri, Jul 04, 2014 at 05:01:18PM -0400, Markus Mottl wrote: > I agree that the new concept has some noteworthy downsides as > demonstrated in the Lexing-example. Your proposed solution 2 > (stringlike) would probably solve these issues from a safety point of > view. The downside is that the complexity of string-handling would > increase even more, because then we would have three types to deal > with. I personally prefer safety over convenience, but other people's > (especially beginner's) mileage may vary. > > The Bigarray-approach doesn't seem appealing to me. Strings are much > more lightweight, since they can be allocated cheaply on the > OCaml-heap. E.g. String.create is about 10x-100x faster than > Bigarray.create. That seems too big to ignore. > > Regards, > Markus Why is that? A bigarray allocates a small block on the ocaml heap and the buffer outside the ocaml heap. Is that normal malloc() call just so much slower? Or are there other factors involved? On the other hand if your app is IO heavy then you should allocate a few buffers and reuse them. In that case the allocation overhead is constant and the time saved for not copying in the I/O will more than make up for it. Or read/mmap the file into a huge bigarray and the slice it into smaller chunks. > On Fri, Jul 4, 2014 at 3:18 PM, Gerd Stolpmann wrote: > > Hi list, > > > > I've just posted a blog article where I criticize the new concept of > > immutable strings that will be available in OCaml 4.02 (as option): > > > > http://blog.camlcity.org/blog/bytes1.html > > > > In short my point is that it the new concept is not far reaching enough, > > and will even have negative impact on the code quality when it is not > > improved. I also present three ideas how to improve it. > > > > Gerd You have a few more points: 1) there are 3 kinds of strings: - string literal / constant strings [which never change ever] - read-only strings [which YOU are not allowed to change but might change] - mutable strings [which you are allowed to changed] There is one other thing you didn't mention here. While it is nice to pass a mutable string to the lexer (or similar) one has to realize that that is not thread save. Another thread might be mutating the string while it is being used. So I would suggest there is a 4th kind of string: - frozen strings [which are mutable but won't be changed anymore] That is basically like read-only strings but with the addes promise that they won't be changed. Nothing in the type system garanties that, it is just a promise from the programmer. 2) there are lots of functions that just need any kind of string and should accept all 3 This kind of asks for type classes. There should be a read-from-string type class that all 3 string types would fit. Then one could have one function accepting a read-from-string type class and all 3 string types could be passed. But unfortunately ocaml doesn't have type classes. The next best thing would be enumerations (not in stdlib). Make enumerations accept all 3 string types and then have everything else accept enumerations. This would also mean you could pass a char list or rope or any other type that gives you an enumeration of chars. 3) I/O code That the stdlib uses strings for I/O and needs to copy the data around all the time has been nagging me for years. There certainly should be read/write functions dealing with bigarrays. There also should be a function to create a bigarray with special alignment (e.g. PAGESIZE) to get the best I/O performance (or in case of linux async IO make it work at all). As for mutable/immutable strings there should be a read function returning an immutable string, which it creates internally. The string can't be passed as argument so creating a fresh one is the only way. Here is a completly new point: 4) What is good for strings is also good for bigarray The same arguments concerning strings applies to bigarrays. Say you pass a bigarray to the lexer. Can it just use it as is for its lexbuf or does it need to copy it because it might mutate? An immutable bigarray could be used savely as is. And this doesn't realy stop at bigarray. Even references could be read-only, in the sense of "this might change but YOU aren't allowed to change it". And I think the only way to solve the const/immutable/mutable/frozen sub-types that will be applicable to more than just string is to use phantom types. MfG Goswin