From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by sympa.inria.fr (Postfix) with ESMTPS id 1C3EE7F02D for ; Mon, 6 Oct 2014 04:11:20 +0200 (CEST) Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of garrigue@math.nagoya-u.ac.jp) identity=pra; client-ip=133.6.130.5; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="garrigue@math.nagoya-u.ac.jp"; x-sender="garrigue@math.nagoya-u.ac.jp"; x-conformance=sidf_compatible Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of garrigue@math.nagoya-u.ac.jp) identity=mailfrom; client-ip=133.6.130.5; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="garrigue@math.nagoya-u.ac.jp"; x-sender="garrigue@math.nagoya-u.ac.jp"; x-conformance=sidf_compatible Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of postmaster@mailhost.math.nagoya-u.ac.jp) identity=helo; client-ip=133.6.130.5; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="garrigue@math.nagoya-u.ac.jp"; x-sender="postmaster@mailhost.math.nagoya-u.ac.jp"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AkcBAIn5MVSFBoIFfGdsb2JhbABfg2FYgwLIXQqGeVQCgRgBEQEBCxYFQoQEAQEEIwQZAwE1AQEOCxgCAhgOAgJXBi6IIg6pU3iFCAKJBIZBARABBoEsjmYzB4I2QRIkgR6LXopXhw6BLTuQGYVpIIFoXQGCSQEBAQ X-IPAS-Result: AkcBAIn5MVSFBoIFfGdsb2JhbABfg2FYgwLIXQqGeVQCgRgBEQEBCxYFQoQEAQEEIwQZAwE1AQEOCxgCAhgOAgJXBi6IIg6pU3iFCAKJBIZBARABBoEsjmYzB4I2QRIkgR6LXopXhw6BLTuQGYVpIIFoXQGCSQEBAQ X-IronPort-AV: E=Sophos;i="5.04,661,1406584800"; d="scan'208";a="99509279" Received: from rabbit.math.nagoya-u.ac.jp (HELO mailhost.math.nagoya-u.ac.jp) ([133.6.130.5]) by mail2-smtp-roc.national.inria.fr with ESMTP; 06 Oct 2014 04:11:17 +0200 Received: from mailhost.math.nagoya-u.ac.jp (localhost [127.0.0.1]) by mailhost.math.nagoya-u.ac.jp (Postfix) with ESMTP id EB5AB63C6; Mon, 6 Oct 2014 11:11:15 +0900 (JST) Received: from mailhost.math.nagoya-u.ac.jp (localhost [127.0.0.1]) by mailhost.math.nagoya-u.ac.jp (Postfix) with ESMTP id 771A54123; Mon, 6 Oct 2014 11:11:15 +0900 (JST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=math.nagoya-u.ac.jp; h= content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; s=alpha; bh=UOEuP+ZQ35JKKfPqU+K7K3lnxk4=; b=L3WZbSyfpvcDnU5HMhHCMepgAAda o9YiAh7Aa3Ql0ojdd6CygumwmevtFbJX/8RkjV4adbbz7E3ZN1oYcD9WAOU9MG+c GNmjv/BRO4fhuYrLDBFxSBx8esU5ks86kjQ5t5bLPhdICTiCIzDomgA2Spbl8NSY 5oJaAXfTBijDFhQ= DomainKey-Signature: a=rsa-sha1; h=Received:Content-Type:Mime-Version:Subject:From:In-Reply-To:Date:Cc:Content-Transfer-Encoding:Message-Id:References:To:X-Mailer; b=nJIQOaTef09hyp0eMP2EbVqdZaplebbBPWb8iil7Oc8khKTQtxwVLcOweZnisvyC/nrqjFTrrEW0SqAvQzvc7G+5DU4VA6X77pwFTs90xG6QYIX9OHHWzo8i7pPOgP/JvKzttcp7rrKwK1/3uNx+aI+Nzct/hvUayfCAdKIjibE=; c=nofws; d=math.nagoya-u.ac.jp; q=dns; s=alpha Received: from tet.garrigue.jp (58x158x128x157.ap58.ftth.ucom.ne.jp [58.158.128.157]) by mailhost.math.nagoya-u.ac.jp (Postfix) with ESMTPSA id CAAA32503; Mon, 6 Oct 2014 11:11:14 +0900 (JST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) From: Jacques Garrigue In-Reply-To: Date: Mon, 6 Oct 2014 11:11:14 +0900 Cc: OCaML List Mailing Content-Transfer-Encoding: quoted-printable Message-Id: <8BB0EB51-3516-4CB9-94EC-513BA87CD4FF@math.nagoya-u.ac.jp> References: To: Gabriel Scherer X-Mailer: Apple Mail (2.1878.6) Subject: Re: [Caml-list] Feedback on -safe-string migration attempts Hi Gabriel, I think this is an interesting proposal. We didn=E2=80=99t consider it when adding the -safe-string option, but I do not think it is too late to change things in the compiler: keeping compatibility with pre-4.02 code is essential, but Bytes itself is still experimental, so changing the bytes type should be ok. Actually, I think no decision was reached whether the ability to have different internal representations for string and bytes is really important. This may matter when you use javascript as backend, but otherwise? If we decided to keep the same representation by default, then a reasonable approach would be to adopt your proposal with the following extra twist: * in safe-string mode, string is an alias for immut bytes type 'a bytes type string =3D immut bytes * in legacy mode, bytes is an alias for string type string type 'a bytes =3D string To keep good compatibility, the functions in the String module would only have monomorphic types. The notation "s.[n]" is a subtle case, but a solution could be to have it expanded to the monomorphic String.get in legacy mode, and to the polymorph= ic Bytes.get in safe-string mode. This would allow to keep the "s.[n] <- e=E2= =80=9D notation too. I think this would be more comfortable to use than the current state. Jacques On 2014/10/06 02:19, Gabriel Scherer wrote: >=20 > Hi list, >=20 > I recently converted Extlib to work with safe-string ( the patch can > be found in the ocaml-lib-devel archives, > http://sourceforge.net/p/ocaml-lib/mailman/message/32877133/ ), and > while it mostly went smoothly, there was a pain point that I think > would be worth discussing. >=20 > The question is, when converting an existing library interface, how to > decide whether any given part of the API should remain a "string" or > be moved to "bytes" ( > http://caml.inria.fr/pub/docs/manual-ocaml/libref/Bytes.html ) -- or > maybe provide two functions, one for each type. >=20 > # The problem >=20 > The new distinction between bytes and string, added in 4.02, actually > plays on two different intuitions: > - bytes represents (1) mutable (2) sequences of bytes > - string represents (1) immutable (2) end-user text (which happen to > be represented as sequence of bytes, but we could think of > representing them as eg. Javascript strings in the future and with > js_of_ocaml, or with ropes, etc.) >=20 > The problem is that aspects (1) and (2) are somewhat orthogonal. I > don't think we're interested in mutable end-user texts, but I > encountered a few notable cases of (1) immutable (2) sequences of > bytes. The problem is: should those be typed as string, or bytes? >=20 > (There may be a difference between functions that assume their > arguments are immutable, and function that simply guarantee that they > won't themselves mutate their arguments. For now I'll assume those two > cases count as "immutable sequences of bytes"). >=20 > Right now, the standard library itself does a strange job of making a > choice. The Marshal module ( > http://caml.inria.fr/pub/docs/manual-ocaml/libref/Marshal.html ) > appears to favor the choice of "bytes" for non-mutated byte sequences > (eg. data_size, total_size), while the Digest module ( > http://caml.inria.fr/pub/docs/manual-ocaml/libref/Digest.html ) > remained in the land of strings. >=20 >=20 > # An ideal solution >=20 > In an ideal world, I claim the best solution would be the following. > Given that it is clear (to me) that mutable byte sequences and > immutable byte sequences share the same representation, we should use > phantom type to distinguish them: >=20 > type mut > type immut > type 'a bytes >=20 > val get : 'a bytes -> int -> char > val set : mut bytes -> int -> char > Digest.t =3D immut bytes >=20 > Using phantom types had been considered at the time of the > bytes/string split, but rejected because suddenly adding polymorphism > to string literals and string functions broke a lot of code ("The type > of this expression, ..., contains a type variable that cannot be > generalized", or suddenly-polymorphic method return types). More > importantly, we do not want to enforce string and bytes to always have > the same underlying representation. Neither arguments hold for > mutable/immutable bytes. >=20 >=20 > # Going forward >=20 > It is probably a bit too late to change the "bytes" type in the > compiler standard library. (Well, feel free to disagree on this.) > And maybe we don't need to: just as more featureful, higher-level > libraries have been developed outside the OCaml distribution, we could > think of having a safer, higher-level phantom representation of byte > sequences, as an external library. >=20 > Regardless of what we do about this, I would recommend that immutable > byte sequences (things that are, by design, not text) be represented > as "bytes" rather than "string"=C2=B9. If/whenever a consensus on a safer > phantom representation appear, it will be possible to convert to it > without changing the representation. > Similarly, if your bytes-taking function does not mutate or capture > its input, you should mention it informally in its > specification/documentation (and maybe express this with a phantom > type later): this is important to reason about, for example, (un)safe > conversions on those byte sequences. >=20 > =C2=B9: a dissenting opinion could suggest that it is more important to g= et > the type-checker help re. mutability than expose the distinction > between byte-level data and text (which should be an abstract type in > some UTF8 library anyway), and thus immutable anything should rather > be "string". I think the phantom type approach is superior, and we > should design interfaces with it in mind.