caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: YAMAGATA yoriyuki <yoriyuki@ms.u-tokyo.ac.jp>
To: mattias.waldau@abc.se
Cc: caml-list@inria.fr, xavier.leroy@inria.fr
Subject: RE: [Caml-list] Non-mutable strings
Date: Thu, 17 Jan 2002 18:56:34 +0900	[thread overview]
Message-ID: <20020117185634Y.yoriyuki@ms.u-tokyo.ac.jp> (raw)
In-Reply-To: <AAEBJHFJOIPMMIILCEPBKEPBDGAA.mattias.waldau@abc.se>

From: "Mattias Waldau" <mattias.waldau@abc.se>
Subject: RE: [Caml-list] Non-mutable strings
Date: Wed, 16 Jan 2002 20:22:36 +0100

> Thus, introducing Unicode strings (or something similar, I heard that Asians
> don't like Unicode at all) and introducing non-mutable strings should
> preferrable be done simultaneously.

There is criticism to Unicode (Most of them goes to Han-unification,
which integrates all regional variants of ideographics to a single set
of character), but as far as I know, it is the only international
character set in which the standard ways of string matching,
comparison and sorting are defined.  Pattern matching is important to
caml, so I think using Unicode is preferable.

> P.s. Microsoft NT, 2000, XP handles double byte chars everywhere, it is
> called BSTR and in order to make string comparasion etc library-routines are
> called all the time. However, since Unicode can be 4 byte, I don't know how
> that is encoded into 2 bytes.

Unicode standard requires handling an unicode character as one or two
16bits integers.  If a characters is longer than 2 bytes, it is
represented as a pair of surrogate points (specially aligned 16 bits
integers for this purpose.)  Surrogate pairs can only represent 3
bytes character, so Unicode as its narrow sense can only be 3 bytes.
I don't know whether Windows supports surrogates, but since MS is one
of the founding members of Unicode consortium, they will be supported
in the future, any ways.

However, Unicode, as customary called, has another standard, ISO-UCS.
ISO-UCS allows that characters becomes 31-bits long, and ISO seems to
recommend that all characters are represented as 32-bits integers.

Clearly, ISO approaches are more simple and allows fast indexing.  On
the other hand, Unicode is more widely used and provide better
algorithm for case mapping, character classification etc.

For caml, in my really humble opinion, the language had better to hide
such difference (16-bits or 32-bits) and if it can not be hidden (like
case mapping), offer choice to users.

Regards
--
YAMAGATA, yoriyuki (doctoral student)
Department of Mathematical Science, University of Tokyo.
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


  reply	other threads:[~2002-01-17 11:26 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-01-04  2:55 [Caml-list] Stop at exception Magesh Kannan
2002-01-04 13:46 ` Xavier Leroy
2002-01-05 11:19   ` [Caml-list] Non-mutable strings Mattias Waldau
2002-01-05 22:01     ` YAMAGATA yoriyuki
2002-01-10 17:56     ` Xavier Leroy
2002-01-10 18:25       ` [Caml-list] Float and OCaml C interface Christophe Raffalli
2002-01-12 21:12         ` David Mentre
2002-01-12 21:32           ` David Mentre
2002-01-23 15:07         ` [Caml-list] " Xavier Leroy
2002-01-23 16:02           ` David Monniaux
2002-01-10 18:41       ` [Caml-list] Non-mutable strings Patrick M Doane
2002-01-10 18:50         ` Brian Rogoff
2002-01-13 20:05           ` Nicolas George
2002-01-16 19:22       ` Mattias Waldau
2002-01-17  9:56         ` YAMAGATA yoriyuki [this message]
2002-01-17 10:19         ` Jerome Vouillon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20020117185634Y.yoriyuki@ms.u-tokyo.ac.jp \
    --to=yoriyuki@ms.u-tokyo.ac.jp \
    --cc=caml-list@inria.fr \
    --cc=mattias.waldau@abc.se \
    --cc=xavier.leroy@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).