caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: "Daniel Bünzli" <daniel.buenzli@erratique.ch>
To: Christophe TROESTLER <Christophe.Troestler+ocaml@umh.ac.be>
Cc: OCaml Mailing List <caml-list@inria.fr>
Subject: Re: [Caml-list] GSoC: better UTF-8 support
Date: Mon, 28 Feb 2011 15:11:56 +0100	[thread overview]
Message-ID: <AANLkTinchxD5oZk9_Su7_JKvHeUmg3oUSM-L98XtgaEv@mail.gmail.com> (raw)
In-Reply-To: <20110228.143157.1265982603697554449.Christophe.Troestler+ocaml@umons.ac.be>

> Thinking more about this, one could introduce a new type (say “utf8”
> or “ustring”) for these UTF-8 strings.  It should be compatible with
> the way UTF-8 strings are handled on the C side for interoperability
> but “optimized” — e.g. should they contain their length (number of
> unicode chars)?
>
> Another thing: it could be a nice way to transition to *immutable*
> unicode strings.  This is not possible for (standard) strings because,
> as you all know, they are both used as strings and as buffers.  The
> introduction of unicode strings may be the right opportunity to
> distinguish both [1].

Frankly I see no benefit of introducing this half-baked UTF-8 support
into the standard library (which is what this proposal is about).

This will just bring in more noise in the interfaces. Even worse,
developers will think they handle unicode properly while they do in
fact not, bringing more confusion on already confusing topic (I'm
always surprised the little programmers know about unicode). Again,
pretending supporting unicode character level processing by replacing
latin1 character level processing the way you suggest is just plain
wrong.

For me either you :

1) Provide full unicode support in the standard library with at least
normal form and collation support in a new API, separate from the
current, existing String and Char modules.

2) Leave full unicode support to a third party library and keep the
current state with some improvements for coping with UTF-8 encoded
string literals and for interacting with file systems correctly.

Given signals given in the past by the ocaml dev team 2) seems more
likely to be accepted.

Best,

Daniel


  parent reply	other threads:[~2011-02-28 14:12 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-28  8:35 Christophe TROESTLER
2011-02-28  8:58 ` Daniel Bünzli
2011-02-28 10:07   ` David Allsopp
2011-02-28 11:21     ` Daniel Bünzli
2011-02-28 11:46       ` David Allsopp
2011-02-28 12:32         ` Daniel Bünzli
2011-02-28 12:59           ` [Caml-list] " Sylvain Le Gall
2011-02-28 10:59   ` Sylvain Le Gall
2011-02-28 14:39   ` [Caml-list] " David Rajchenbach-Teller
2011-02-28 10:07 ` David Allsopp
     [not found]   ` <20110228.143157.1265982603697554449.Christophe.Troestler+ocaml@umons.ac.be>
2011-02-28 14:11     ` Daniel Bünzli [this message]
2011-02-28 14:57       ` Dario Teixeira
2011-02-28 14:13 ` Gerd Stolpmann
2011-02-28 14:31   ` [Caml-list] " Sylvain Le Gall
2011-02-28 15:09   ` [Caml-list] " Dario Teixeira
2011-02-28 15:50   ` David Allsopp
2011-03-01  5:49     ` [Caml-list] " Yoriyuki Yamagata
2011-02-28 14:21 ` [Caml-list] " Michael Ekstrand
2011-03-03 15:37 ` Damien Doligez
2011-03-03 16:42   ` Dario Teixeira

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AANLkTinchxD5oZk9_Su7_JKvHeUmg3oUSM-L98XtgaEv@mail.gmail.com \
    --to=daniel.buenzli@erratique.ch \
    --cc=Christophe.Troestler+ocaml@umh.ac.be \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).