caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Raoul Duke <raould@gmail.com>
To: OCaml <caml-list@inria.fr>
Subject: Re: [Caml-list] Immutable strings
Date: Tue, 8 Jul 2014 12:27:57 -0700	[thread overview]
Message-ID: <CAJ7XQb5FX61FpAes9cuBxsBQZXQjgP=ohp4Z0XicT4Nrf9iOKw@mail.gmail.com> (raw)
In-Reply-To: <A55AB3EABAC8457B8D777447055C49DA@erratique.ch>

ja wohl, n'est pas, это жизнь, based on my experiences with strings
and stuff over the years, i resonate with what Daniel posted. :-)
things like UTF-whatever are baseline requirements, but beyond that
(a) nobody has it right (b) unicode sucks. :-)

On Tue, Jul 8, 2014 at 12:24 PM, Daniel Bünzli
<daniel.buenzli@erratique.ch> wrote:
> Le mardi, 8 juillet 2014 à 19:15, mattiasw@gmail.com a écrit :
>> My two cents:
>>
>> To me it seems very strange to introduce a new string type and not make it
>> UTF-8 from start.
>
> No new string type was introduced. A bytes type was introduced.
>
>> ocaml will be that last language that doesn't have standardize unicode
>> support.
>
> What do you mean by standarized unicode support in the language *exactly* ?
>
> I'd be genuinely interested in knowing the actual real level of support for Unicode in these language, beyond saying our string is an UTF-X encoded sequence of scalar values. For example do these other language do perform Unicode normalisation on string literals/patterns (and identifiers if they choose that craze) ? This for example would be absolutely necessary to have for performing any kind of real world processing on unicode strings, but then there's not only a single normalisation form and the one you want depends on the context. Do they have a notation to indicate in which form they want the literal/pattern to be ?
>
>> Even old languages like Erlang has gone the UTF-8 way, and that
>> includes program code.
>
> For a very very very very long time it has been possible to write, unnormalized or normalized according to the normal form your editor, UTF-8 encoded literals in your OCaml sources; you just had to drop the idea of using latin1 identifiers, which are now anyway deprecated since 4.01.
>
> As for being able to write Unicode *identifiers* in the language I'm actually quite glad OCaml hasn't that, there are both too many arrow characters to use in Unicode and too many unreasonable programmers out there.
>
>> Bytes and strings have nothing in common, but str.[4] is still relevant for
>> UTF-8 strings.
>
> Direct indexing is rarely relevant in Unicode as usually you want those indexes to correspond to user perceived characters (e.g. to align things in text formatting) and user perceived characters may be written as a sequence of unicode scalar value… or not (even in normal forms, since an arbitrary number of combining character can be applied to a base character). The unicode segmentation algorithm allows you to find these boundaries, simple indexing doesn't and is mostly worthless in Unicode processing.
>
> Best,
>
> Daniel
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs

  reply	other threads:[~2014-07-08 19:28 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-04 19:18 Gerd Stolpmann
2014-07-04 20:31 ` Anthony Tavener
2014-07-04 20:38   ` Malcolm Matalka
2014-07-04 23:44   ` Daniel Bünzli
2014-07-05 11:04   ` Gerd Stolpmann
2014-07-16 11:38     ` Damien Doligez
2014-07-04 21:01 ` Markus Mottl
2014-07-05 11:24   ` Gerd Stolpmann
2014-07-08 13:23     ` Jacques Garrigue
2014-07-08 13:37       ` Alain Frisch
2014-07-08 14:04         ` Jacques Garrigue
2014-07-28 11:14   ` Goswin von Brederlow
2014-07-28 15:51     ` Markus Mottl
2014-07-29  2:54       ` Yaron Minsky
2014-07-29  9:46         ` Goswin von Brederlow
2014-07-29 11:48         ` John F. Carr
2014-07-07 12:42 ` Alain Frisch
2014-07-08 12:24   ` Gerd Stolpmann
2014-07-09 13:54     ` Alain Frisch
2014-07-09 18:04       ` Gerd Stolpmann
2014-07-10  6:41         ` Nicolas Boulay
2014-07-14 17:40       ` Richard W.M. Jones
2014-07-08 18:15 ` mattiasw
2014-07-08 19:24   ` Daniel Bünzli
2014-07-08 19:27     ` Raoul Duke [this message]
2014-07-09 14:15   ` Daniel Bünzli
2014-07-14 17:45   ` Richard W.M. Jones
2014-07-21 15:06 ` Alain Frisch
     [not found]   ` <20140722.235104.405798419265248505.Christophe.Troestler@umons.ac.be>
2014-08-29 16:30     ` Damien Doligez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJ7XQb5FX61FpAes9cuBxsBQZXQjgP=ohp4Z0XicT4Nrf9iOKw@mail.gmail.com' \
    --to=raould@gmail.com \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).