caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Safe UTF-8 string literals and pattern matching for OCaml
@ 2015-05-01 18:15 Daniel Bünzli
  2015-05-01 20:13 ` Gabriel Scherer
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel Bünzli @ 2015-05-01 18:15 UTC (permalink / raw)
  To: Ocaml Mailing List

Hello, 

If you ever wanted to have that *safely* you may want to checkout this experimental ppx: 

https://github.com/dbuenzli/ppx_utf8_lit

Here's the design rationale: 

https://github.com/dbuenzli/ppx_utf8_lit#rationale

Feedback welcome,

Daniel



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] Safe UTF-8 string literals and pattern matching for OCaml
  2015-05-01 18:15 [Caml-list] Safe UTF-8 string literals and pattern matching for OCaml Daniel Bünzli
@ 2015-05-01 20:13 ` Gabriel Scherer
  2015-05-01 20:42   ` Daniel Bünzli
  0 siblings, 1 reply; 4+ messages in thread
From: Gabriel Scherer @ 2015-05-01 20:13 UTC (permalink / raw)
  To: Daniel Bünzli; +Cc: Ocaml Mailing List

[-- Attachment #1: Type: text/plain, Size: 1044 bytes --]

I wonder whether a system based on extensions [%u "..."] rather than
attributes "..." [@u] could be easier to extend in the future. For example,
you might want to introduce a different annotation `u16` that generates an
integer array representing an utf16-encoded literal (or an abstract type of
your liking, but then not in pattern position). Having an annotation change
the type of the code would not be very nice.

On Fri, May 1, 2015 at 8:15 PM, Daniel Bünzli <daniel.buenzli@erratique.ch>
wrote:

> Hello,
>
> If you ever wanted to have that *safely* you may want to checkout this
> experimental ppx:
>
> https://github.com/dbuenzli/ppx_utf8_lit
>
> Here's the design rationale:
>
> https://github.com/dbuenzli/ppx_utf8_lit#rationale
>
> Feedback welcome,
>
> Daniel
>
>
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

[-- Attachment #2: Type: text/html, Size: 1837 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] Safe UTF-8 string literals and pattern matching for OCaml
  2015-05-01 20:13 ` Gabriel Scherer
@ 2015-05-01 20:42   ` Daniel Bünzli
  2015-05-01 21:19     ` Daniel Bünzli
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel Bünzli @ 2015-05-01 20:42 UTC (permalink / raw)
  To: Gabriel Scherer; +Cc: Ocaml Mailing List

Le vendredi, 1 mai 2015 à 22:13, Gabriel Scherer a écrit :
> I wonder whether a system based on extensions [%u "..."] rather than attributes "..." [@u] could be easier to extend in the future. For example, you might want to introduce a different annotation `u16` that generates an integer array representing an utf16-encoded literal (or an abstract type of your liking, but then not in pattern position). Having an annotation change the type of the code would not be very nice.

I don't think that multiplying *representations* is a good idea and a single canonical one should be eventually chosen for this in OCaml. So personally I'm not interested in that kind of extension.

In any case desugaring the same notation to integer arrays is part of the design space I consider (though if we imagine a compiler integration it would be much less convenient). But I would never use UTF-16 for that. Using Unicode scalar values directly is less costly and leads to a conceptually correct type from a Unicode processing point of view.

Best,

Daniel



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] Safe UTF-8 string literals and pattern matching for OCaml
  2015-05-01 20:42   ` Daniel Bünzli
@ 2015-05-01 21:19     ` Daniel Bünzli
  0 siblings, 0 replies; 4+ messages in thread
From: Daniel Bünzli @ 2015-05-01 21:19 UTC (permalink / raw)
  To: Gabriel Scherer; +Cc: Ocaml Mailing List

Le vendredi, 1 mai 2015 à 22:42, Daniel Bünzli a écrit :
> I don't think that multiplying *representations* is a good idea and a single canonical one should be eventually chosen for this in OCaml. So personally I'm not interested in that kind of extension.

Just to make that more clear.  

From a programming language design perspective and Unicode processing point of view you are not interested in dealing with *encodings* of Unicode scalar values in your programs. You are interested in dealing with Unicode scalar values themselves, the "characters". Encodings is only something that happens at the boundary of your programs when you IO your scalar values to sequences of bytes and/or something that is *hidden* behind your Unicode string abstraction. That's the reason why I don't want to have more than one representation.  

In the ppx_utf8_lit proposal this unique representation happens to be a (sadly but practically) non-abstract, particular, encoding, namely UTF-8. The advantage of this representation is an excellent, easy and costless integration with OCaml's existing string design which happened a long time ago in times where the character encoding business was still a very messy and fragmented landscape.

Best,

Daniel



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-05-01 21:20 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-01 18:15 [Caml-list] Safe UTF-8 string literals and pattern matching for OCaml Daniel Bünzli
2015-05-01 20:13 ` Gabriel Scherer
2015-05-01 20:42   ` Daniel Bünzli
2015-05-01 21:19     ` Daniel Bünzli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).