caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Feedback on -safe-string migration attempts
@ 2014-10-05 17:19 Gabriel Scherer
  2014-10-06  2:11 ` Jacques Garrigue
  2014-10-06 10:03 ` Gerd Stolpmann
  0 siblings, 2 replies; 5+ messages in thread
From: Gabriel Scherer @ 2014-10-05 17:19 UTC (permalink / raw)
  To: caml users

Hi list,

I recently converted Extlib to work with safe-string ( the patch can
be found in the ocaml-lib-devel archives,
http://sourceforge.net/p/ocaml-lib/mailman/message/32877133/ ), and
while it mostly went smoothly, there was a pain point that I think
would be worth discussing.

The question is, when converting an existing library interface, how to
decide whether any given part of the API should remain a "string" or
be moved to "bytes" (
http://caml.inria.fr/pub/docs/manual-ocaml/libref/Bytes.html ) -- or
maybe provide two functions, one for each type.

# The problem

The new distinction between bytes and string, added in 4.02, actually
plays on two different intuitions:
- bytes represents (1) mutable (2) sequences of bytes
- string represents (1) immutable (2) end-user text (which happen to
be represented as sequence of bytes, but we could think of
representing them as eg. Javascript strings in the future and with
js_of_ocaml, or with ropes, etc.)

The problem is that aspects (1) and (2) are somewhat orthogonal. I
don't think we're interested in mutable end-user texts, but I
encountered a few notable cases of (1) immutable (2) sequences of
bytes. The problem is: should those be typed as string, or bytes?

(There may be a difference between functions that assume their
arguments are immutable, and function that simply guarantee that they
won't themselves mutate their arguments. For now I'll assume those two
cases count as "immutable sequences of bytes").

Right now, the standard library itself does a strange job of making a
choice. The Marshal module (
http://caml.inria.fr/pub/docs/manual-ocaml/libref/Marshal.html )
appears to favor the choice of "bytes" for non-mutated byte sequences
(eg. data_size, total_size), while the Digest module (
http://caml.inria.fr/pub/docs/manual-ocaml/libref/Digest.html )
remained in the land of strings.


# An ideal solution

In an ideal world, I claim the best solution would be the following.
Given that it is clear (to me) that mutable byte sequences and
immutable byte sequences share the same representation, we should use
phantom type to distinguish them:

  type mut
  type immut
  type 'a bytes

  val get : 'a bytes -> int -> char
  val set : mut bytes -> int -> char
  Digest.t = immut bytes

Using phantom types had been considered at the time of the
bytes/string split, but rejected because suddenly adding polymorphism
to string literals and string functions broke a lot of code ("The type
of this expression, ..., contains a type variable that cannot be
generalized", or suddenly-polymorphic method return types). More
importantly, we do not want to enforce string and bytes to always have
the same underlying representation. Neither arguments hold for
mutable/immutable bytes.


# Going forward

It is probably a bit too late to change the "bytes" type in the
compiler standard library. (Well, feel free to disagree on this.)
And maybe we don't need to: just as more featureful, higher-level
libraries have been developed outside the OCaml distribution, we could
think of having a safer, higher-level phantom representation of byte
sequences, as an external library.

Regardless of what we do about this, I would recommend that immutable
byte sequences (things that are, by design, not text) be represented
as "bytes" rather than "string"¹. If/whenever a consensus on a safer
phantom representation appear, it will be possible to convert to it
without changing the representation.
Similarly, if your bytes-taking function does not mutate or capture
its input, you should mention it informally in its
specification/documentation (and maybe express this with a phantom
type later): this is important to reason about, for example, (un)safe
conversions on those byte sequences.

¹: a dissenting opinion could suggest that it is more important to get
the type-checker help re. mutability than expose the distinction
between byte-level data and text (which should be an abstract type in
some UTF8 library anyway), and thus immutable anything should rather
be "string". I think the phantom type approach is superior, and we
should design interfaces with it in mind.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Feedback on -safe-string migration attempts
  2014-10-05 17:19 [Caml-list] Feedback on -safe-string migration attempts Gabriel Scherer
@ 2014-10-06  2:11 ` Jacques Garrigue
  2014-10-06  8:15   ` Alain Frisch
  2014-10-06 11:08   ` Gabriel Scherer
  2014-10-06 10:03 ` Gerd Stolpmann
  1 sibling, 2 replies; 5+ messages in thread
From: Jacques Garrigue @ 2014-10-06  2:11 UTC (permalink / raw)
  To: Gabriel Scherer; +Cc: OCaML List Mailing

Hi Gabriel,

I think this is an interesting proposal.
We didn’t consider it when adding the -safe-string option, but I do
not think it is too late to change things in the compiler:
keeping compatibility with pre-4.02 code is essential, but Bytes
itself is still experimental, so changing the bytes type should be ok.

Actually, I think no decision was reached whether the ability to have
different internal representations for string and bytes is really important.
This may matter when you use javascript as backend, but otherwise?
If we decided to keep the same representation by default, then a
reasonable approach would be to adopt your proposal with the following
extra twist:

* in safe-string mode, string is an alias for immut bytes
	type 'a bytes
	type string = immut bytes
* in legacy mode, bytes is an alias for string
	type string
	type 'a bytes = string

To keep good compatibility, the functions in the String module would
only have monomorphic types.
The notation "s.[n]" is a subtle case, but a solution could be to have it
expanded to the monomorphic String.get in legacy mode, and to the polymorphic
Bytes.get in safe-string mode. This would allow to keep the "s.[n] <- e”
notation too.

I think this would be more comfortable to use than the current state.

	Jacques

On 2014/10/06 02:19, Gabriel Scherer wrote:
> 
> Hi list,
> 
> I recently converted Extlib to work with safe-string ( the patch can
> be found in the ocaml-lib-devel archives,
> http://sourceforge.net/p/ocaml-lib/mailman/message/32877133/ ), and
> while it mostly went smoothly, there was a pain point that I think
> would be worth discussing.
> 
> The question is, when converting an existing library interface, how to
> decide whether any given part of the API should remain a "string" or
> be moved to "bytes" (
> http://caml.inria.fr/pub/docs/manual-ocaml/libref/Bytes.html ) -- or
> maybe provide two functions, one for each type.
> 
> # The problem
> 
> The new distinction between bytes and string, added in 4.02, actually
> plays on two different intuitions:
> - bytes represents (1) mutable (2) sequences of bytes
> - string represents (1) immutable (2) end-user text (which happen to
> be represented as sequence of bytes, but we could think of
> representing them as eg. Javascript strings in the future and with
> js_of_ocaml, or with ropes, etc.)
> 
> The problem is that aspects (1) and (2) are somewhat orthogonal. I
> don't think we're interested in mutable end-user texts, but I
> encountered a few notable cases of (1) immutable (2) sequences of
> bytes. The problem is: should those be typed as string, or bytes?
> 
> (There may be a difference between functions that assume their
> arguments are immutable, and function that simply guarantee that they
> won't themselves mutate their arguments. For now I'll assume those two
> cases count as "immutable sequences of bytes").
> 
> Right now, the standard library itself does a strange job of making a
> choice. The Marshal module (
> http://caml.inria.fr/pub/docs/manual-ocaml/libref/Marshal.html )
> appears to favor the choice of "bytes" for non-mutated byte sequences
> (eg. data_size, total_size), while the Digest module (
> http://caml.inria.fr/pub/docs/manual-ocaml/libref/Digest.html )
> remained in the land of strings.
> 
> 
> # An ideal solution
> 
> In an ideal world, I claim the best solution would be the following.
> Given that it is clear (to me) that mutable byte sequences and
> immutable byte sequences share the same representation, we should use
> phantom type to distinguish them:
> 
>  type mut
>  type immut
>  type 'a bytes
> 
>  val get : 'a bytes -> int -> char
>  val set : mut bytes -> int -> char
>  Digest.t = immut bytes
> 
> Using phantom types had been considered at the time of the
> bytes/string split, but rejected because suddenly adding polymorphism
> to string literals and string functions broke a lot of code ("The type
> of this expression, ..., contains a type variable that cannot be
> generalized", or suddenly-polymorphic method return types). More
> importantly, we do not want to enforce string and bytes to always have
> the same underlying representation. Neither arguments hold for
> mutable/immutable bytes.
> 
> 
> # Going forward
> 
> It is probably a bit too late to change the "bytes" type in the
> compiler standard library. (Well, feel free to disagree on this.)
> And maybe we don't need to: just as more featureful, higher-level
> libraries have been developed outside the OCaml distribution, we could
> think of having a safer, higher-level phantom representation of byte
> sequences, as an external library.
> 
> Regardless of what we do about this, I would recommend that immutable
> byte sequences (things that are, by design, not text) be represented
> as "bytes" rather than "string"¹. If/whenever a consensus on a safer
> phantom representation appear, it will be possible to convert to it
> without changing the representation.
> Similarly, if your bytes-taking function does not mutate or capture
> its input, you should mention it informally in its
> specification/documentation (and maybe express this with a phantom
> type later): this is important to reason about, for example, (un)safe
> conversions on those byte sequences.
> 
> ¹: a dissenting opinion could suggest that it is more important to get
> the type-checker help re. mutability than expose the distinction
> between byte-level data and text (which should be an abstract type in
> some UTF8 library anyway), and thus immutable anything should rather
> be "string". I think the phantom type approach is superior, and we
> should design interfaces with it in mind.




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Feedback on -safe-string migration attempts
  2014-10-06  2:11 ` Jacques Garrigue
@ 2014-10-06  8:15   ` Alain Frisch
  2014-10-06 11:08   ` Gabriel Scherer
  1 sibling, 0 replies; 5+ messages in thread
From: Alain Frisch @ 2014-10-06  8:15 UTC (permalink / raw)
  To: Jacques Garrigue, Gabriel Scherer; +Cc: OCaML List Mailing

On 10/06/2014 04:11 AM, Jacques Garrigue wrote:
> Actually, I think no decision was reached whether the ability to have
> different internal representations for string and bytes is really important.
> This may matter when you use javascript as backend, but otherwise?

I think it's good to keep the freedom to change the representation of 
immutable text.  This could simplify some migration path towards Unicode 
and/or allow using more clever representations (such as ropes, strings 
with lazy concatenation, etc).

Generally speaking, I agree with Gabriel that the distinction between 
strings and bytes is more about "text" vs "compact byte array" data, 
than between immutable vs mutable.  I'm not sure about the need to track 
immutability (or read-only permission) on "byte array", though.  Why 
would this be more important than on other kind of arrays?   In 
particular, for numerical code, it would be equally useful to specify 
immutability of vectors/matrices.  Do we want to go into this direction 
of tracking mutation of arrays?

Actually, I think it would be interesting to introduce a module type 
"ARRAY" for arrays with a fixed type "elt" of elements, add a Make 
functor to the Array module, and arrange so that "Bytes" is a subtype of 
"ARRAY with type elt = char" (in particular, it's "t" type shouldn't be 
more parametric than the one in the ARRAY signature).  We could 
similarly provide a compact "BoolArray" implementation, and if we ever 
decide to drop the automatic runtime unboxing of float arrays, we would 
of course provide a "FloatArray" replacement. "Polymorphic" algorithms 
on arrays could be parametrized by a first-class ARRAY module argument 
(and this would possibly work nicely with modular implicits, and 
possibly with a more aggressive inliner).

This is quite independent from the current discussion, but perhaps it 
shows that we shouldn't treat "Bytes" too specifically compared to other 
kinds of arrays.


Alain

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Feedback on -safe-string migration attempts
  2014-10-05 17:19 [Caml-list] Feedback on -safe-string migration attempts Gabriel Scherer
  2014-10-06  2:11 ` Jacques Garrigue
@ 2014-10-06 10:03 ` Gerd Stolpmann
  1 sibling, 0 replies; 5+ messages in thread
From: Gerd Stolpmann @ 2014-10-06 10:03 UTC (permalink / raw)
  To: Gabriel Scherer; +Cc: caml users

[-- Attachment #1: Type: text/plain, Size: 7014 bytes --]

Am Sonntag, den 05.10.2014, 19:19 +0200 schrieb Gabriel Scherer:
> The question is, when converting an existing library interface, how to
> decide whether any given part of the API should remain a "string" or
> be moved to "bytes" (
> http://caml.inria.fr/pub/docs/manual-ocaml/libref/Bytes.html ) -- or
> maybe provide two functions, one for each type.
> 
> # The problem
> 
> The new distinction between bytes and string, added in 4.02, actually
> plays on two different intuitions:
> - bytes represents (1) mutable (2) sequences of bytes
> - string represents (1) immutable (2) end-user text (which happen to
> be represented as sequence of bytes, but we could think of
> representing them as eg. Javascript strings in the future and with
> js_of_ocaml, or with ropes, etc.)

Well, I think there are different views on this: In the OCaml stdlib
there is no distinction between character and byte, and it is left to
the user how to represent text (e.g. to use multibyte UTF-8 text). From
that point of view it is clear that "string" is for immutable data, no
matter weather text or bytes, and "bytes" is for mutable data of either
kind. However, as you found out this is sometimes impractical. If you
have some data you don't want to draw a somewhat arbitrary line between
mutable and immutable appearances of it.

I also will have to convert a fairly amount of code, and esp. for
Ocamlnet I really don't see how to do this cleanly, because the kind of
data suddenly changes. For instance the HTTP client reads bytes into a
bytes buffer, but the HTTP headers have more the characteristics of
text.

> The problem is that aspects (1) and (2) are somewhat orthogonal. I
> don't think we're interested in mutable end-user texts, but I
> encountered a few notable cases of (1) immutable (2) sequences of
> bytes. The problem is: should those be typed as string, or bytes?
> 
> (There may be a difference between functions that assume their
> arguments are immutable, and function that simply guarantee that they
> won't themselves mutate their arguments. For now I'll assume those two
> cases count as "immutable sequences of bytes").
> 
> Right now, the standard library itself does a strange job of making a
> choice. The Marshal module (
> http://caml.inria.fr/pub/docs/manual-ocaml/libref/Marshal.html )
> appears to favor the choice of "bytes" for non-mutated byte sequences
> (eg. data_size, total_size), while the Digest module (
> http://caml.inria.fr/pub/docs/manual-ocaml/libref/Digest.html )
> remained in the land of strings.

I also noticed that some functionality is now available over two
interfaces. In particular, there are now functions for writing from a
bytes buffer and for writing from a string buffer (e.g. Unix.write and
Unix.write_substring). My thinking is that all stdlib functions should
now be provided in that manner.

> # An ideal solution
> 
> In an ideal world, I claim the best solution would be the following.
> Given that it is clear (to me) that mutable byte sequences and
> immutable byte sequences share the same representation, we should use
> phantom type to distinguish them:
> 
>   type mut
>   type immut
>   type 'a bytes
> 
>   val get : 'a bytes -> int -> char
>   val set : mut bytes -> int -> char
>   Digest.t = immut bytes
> 
> Using phantom types had been considered at the time of the
> bytes/string split, but rejected because suddenly adding polymorphism
> to string literals and string functions broke a lot of code ("The type
> of this expression, ..., contains a type variable that cannot be
> generalized", or suddenly-polymorphic method return types). More
> importantly, we do not want to enforce string and bytes to always have
> the same underlying representation. Neither arguments hold for
> mutable/immutable bytes.

A new variant! The problem is still that it introduces polymorphisms.

For OCamlnet I was more thinking of providing all internal string
functionality also with string_reader interface. A string_reader is a
little abstraction on top of string/bytes/char bigarray that abstracts
the representation, and provides the most needed functions (at least
what String/Bytes have, plus extensions like searching, conversions,
maybe even regexps). I think that's the missing piece to make the
"bytes" change acceptable:

module String_reader : sig
  type t
  val for_string : string -> t
  val for_bytes : bytes -> t
  val for_memory : (char,...,...) Bigarray.Array1.t -> t

  val get : t -> int -> char
  val sub_string : t -> int -> int -> string
  val sub_bytes : t -> int -> int -> bytes
  val blit_to_bytes : t -> int -> bytes -> int -> int -> unit
  val blit_to_memory : t -> int -> (char,...) Bigarray.Array1.t -> int
-> int -> unit

  val index : t -> char -> int

  val search_leftmost : t -> t -> int -> int
  val search_rightmost : t -> t -> int -> int

  val get_int32_le : t -> int -> int32
  val get_int32_be : t -> int -> int32
  val get_int64_le : t -> int -> int64
  val get_int64_be : t -> int -> int64

  (* plus more ... not sure yet what to cover exactly *)

end


Gerd


> 
> 
> # Going forward
> 
> It is probably a bit too late to change the "bytes" type in the
> compiler standard library. (Well, feel free to disagree on this.)
> And maybe we don't need to: just as more featureful, higher-level
> libraries have been developed outside the OCaml distribution, we could
> think of having a safer, higher-level phantom representation of byte
> sequences, as an external library.
> 
> Regardless of what we do about this, I would recommend that immutable
> byte sequences (things that are, by design, not text) be represented
> as "bytes" rather than "string"¹. If/whenever a consensus on a safer
> phantom representation appear, it will be possible to convert to it
> without changing the representation.
> Similarly, if your bytes-taking function does not mutate or capture
> its input, you should mention it informally in its
> specification/documentation (and maybe express this with a phantom
> type later): this is important to reason about, for example, (un)safe
> conversions on those byte sequences.
> 
> ¹: a dissenting opinion could suggest that it is more important to get
> the type-checker help re. mutability than expose the distinction
> between byte-level data and text (which should be an abstract type in
> some UTF8 library anyway), and thus immutable anything should rather
> be "string". I think the phantom type approach is superior, and we
> should design interfaces with it in mind.
> 

-- 
------------------------------------------------------------
Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
My OCaml site:          http://www.camlcity.org
Contact details:        http://www.camlcity.org/contact.html
Company homepage:       http://www.gerd-stolpmann.de
------------------------------------------------------------


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Feedback on -safe-string migration attempts
  2014-10-06  2:11 ` Jacques Garrigue
  2014-10-06  8:15   ` Alain Frisch
@ 2014-10-06 11:08   ` Gabriel Scherer
  1 sibling, 0 replies; 5+ messages in thread
From: Gabriel Scherer @ 2014-10-06 11:08 UTC (permalink / raw)
  To: Jacques Garrigue; +Cc: OCaML List Mailing

I'm strongly convinced that allowing a difference in representation is
the right choice. I'm not actively pushing for using another
representation, but in my experience thinking of them as distinct is
a very good thought-experiment to design better interfaces.

I would compare this to the idea that "data created at an immutable
type might be allocated in read-only memory": I'm not pushing for this
to be implemented, but it's an excellent thought-experiment to explain
why certains use of Obj.magic are assuredly wrong.

(In my experience so far working with the stdlib, Extlib and
Batteries, I have not felt that the distinct representations were
painful. In particular, I needed Bytes.get rarely enough that the lack
of s.[n] syntax was never an issue.)


I would thus rather suggest the following interfaces:
  type 'a bytes (* = string  under -unsafe-string *)
  type string

An interesting interface change would be the conversion functions. We
currently have:

  Bytes.of_string : string -> bytes
  Bytes.to_string : bytes -> string

  Bytes.copy : bytes -> bytes

  (* see Bytes documentation *)
  Bytes.unsafe_of_string : string -> bytes
  Bytes.unsafe_to_string : bytes -> string

We could instead have something like the following:

  Bytes.immut_of_string : string -> immut bytes
  Bytes.immut_to_string : immut bytes -> string

  Bytes.copy : 'a bytes -> 'b bytes

  (* same usage restrinction than Bytes.unsafe_{of,to}_string,
     see the documentation *)
  Bytes.unsafe_mut : immut bytes -> mut bytes
  Bytes.unsafe_immut : bytes bytes -> immut bytes

with the aliases

  Bytes.of_string : string -> 'a bytes
  of_string s = copy (immut_of_string s)

  Bytes.to_string : 'a bytes -> string
  to_string s = immut_to_string (copy s)

  Bytes.unsafe_to_string : mut bytes -> string
  unsafe_to_string s = immut_to_string (unsafe_immut s)

(unsafe_of_string could go away: it can only be correctly used if the
resulting bytes is used immutably, so it is superseded by the safe
immut_of_string function.)

On Mon, Oct 6, 2014 at 4:11 AM, Jacques Garrigue
<garrigue@math.nagoya-u.ac.jp> wrote:
> Hi Gabriel,
>
> I think this is an interesting proposal.
> We didn’t consider it when adding the -safe-string option, but I do
> not think it is too late to change things in the compiler:
> keeping compatibility with pre-4.02 code is essential, but Bytes
> itself is still experimental, so changing the bytes type should be ok.
>
> Actually, I think no decision was reached whether the ability to have
> different internal representations for string and bytes is really important.
> This may matter when you use javascript as backend, but otherwise?
> If we decided to keep the same representation by default, then a
> reasonable approach would be to adopt your proposal with the following
> extra twist:
>
> * in safe-string mode, string is an alias for immut bytes
>         type 'a bytes
>         type string = immut bytes
> * in legacy mode, bytes is an alias for string
>         type string
>         type 'a bytes = string
>
> To keep good compatibility, the functions in the String module would
> only have monomorphic types.
> The notation "s.[n]" is a subtle case, but a solution could be to have it
> expanded to the monomorphic String.get in legacy mode, and to the polymorphic
> Bytes.get in safe-string mode. This would allow to keep the "s.[n] <- e”
> notation too.
>
> I think this would be more comfortable to use than the current state.
>
>         Jacques
>
> On 2014/10/06 02:19, Gabriel Scherer wrote:
>>
>> Hi list,
>>
>> I recently converted Extlib to work with safe-string ( the patch can
>> be found in the ocaml-lib-devel archives,
>> http://sourceforge.net/p/ocaml-lib/mailman/message/32877133/ ), and
>> while it mostly went smoothly, there was a pain point that I think
>> would be worth discussing.
>>
>> The question is, when converting an existing library interface, how to
>> decide whether any given part of the API should remain a "string" or
>> be moved to "bytes" (
>> http://caml.inria.fr/pub/docs/manual-ocaml/libref/Bytes.html ) -- or
>> maybe provide two functions, one for each type.
>>
>> # The problem
>>
>> The new distinction between bytes and string, added in 4.02, actually
>> plays on two different intuitions:
>> - bytes represents (1) mutable (2) sequences of bytes
>> - string represents (1) immutable (2) end-user text (which happen to
>> be represented as sequence of bytes, but we could think of
>> representing them as eg. Javascript strings in the future and with
>> js_of_ocaml, or with ropes, etc.)
>>
>> The problem is that aspects (1) and (2) are somewhat orthogonal. I
>> don't think we're interested in mutable end-user texts, but I
>> encountered a few notable cases of (1) immutable (2) sequences of
>> bytes. The problem is: should those be typed as string, or bytes?
>>
>> (There may be a difference between functions that assume their
>> arguments are immutable, and function that simply guarantee that they
>> won't themselves mutate their arguments. For now I'll assume those two
>> cases count as "immutable sequences of bytes").
>>
>> Right now, the standard library itself does a strange job of making a
>> choice. The Marshal module (
>> http://caml.inria.fr/pub/docs/manual-ocaml/libref/Marshal.html )
>> appears to favor the choice of "bytes" for non-mutated byte sequences
>> (eg. data_size, total_size), while the Digest module (
>> http://caml.inria.fr/pub/docs/manual-ocaml/libref/Digest.html )
>> remained in the land of strings.
>>
>>
>> # An ideal solution
>>
>> In an ideal world, I claim the best solution would be the following.
>> Given that it is clear (to me) that mutable byte sequences and
>> immutable byte sequences share the same representation, we should use
>> phantom type to distinguish them:
>>
>>  type mut
>>  type immut
>>  type 'a bytes
>>
>>  val get : 'a bytes -> int -> char
>>  val set : mut bytes -> int -> char
>>  Digest.t = immut bytes
>>
>> Using phantom types had been considered at the time of the
>> bytes/string split, but rejected because suddenly adding polymorphism
>> to string literals and string functions broke a lot of code ("The type
>> of this expression, ..., contains a type variable that cannot be
>> generalized", or suddenly-polymorphic method return types). More
>> importantly, we do not want to enforce string and bytes to always have
>> the same underlying representation. Neither arguments hold for
>> mutable/immutable bytes.
>>
>>
>> # Going forward
>>
>> It is probably a bit too late to change the "bytes" type in the
>> compiler standard library. (Well, feel free to disagree on this.)
>> And maybe we don't need to: just as more featureful, higher-level
>> libraries have been developed outside the OCaml distribution, we could
>> think of having a safer, higher-level phantom representation of byte
>> sequences, as an external library.
>>
>> Regardless of what we do about this, I would recommend that immutable
>> byte sequences (things that are, by design, not text) be represented
>> as "bytes" rather than "string"¹. If/whenever a consensus on a safer
>> phantom representation appear, it will be possible to convert to it
>> without changing the representation.
>> Similarly, if your bytes-taking function does not mutate or capture
>> its input, you should mention it informally in its
>> specification/documentation (and maybe express this with a phantom
>> type later): this is important to reason about, for example, (un)safe
>> conversions on those byte sequences.
>>
>> ¹: a dissenting opinion could suggest that it is more important to get
>> the type-checker help re. mutability than expose the distinction
>> between byte-level data and text (which should be an abstract type in
>> some UTF8 library anyway), and thus immutable anything should rather
>> be "string". I think the phantom type approach is superior, and we
>> should design interfaces with it in mind.
>
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-10-06 11:09 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-05 17:19 [Caml-list] Feedback on -safe-string migration attempts Gabriel Scherer
2014-10-06  2:11 ` Jacques Garrigue
2014-10-06  8:15   ` Alain Frisch
2014-10-06 11:08   ` Gabriel Scherer
2014-10-06 10:03 ` Gerd Stolpmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).