caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Xavier Leroy <Xavier.Leroy@inria.fr>
To: caml-list@inria.fr
Subject: Re: localization, internationalization and Caml
Date: Sun, 17 Oct 1999 16:29:17 +0200	[thread overview]
Message-ID: <19991017162917.48773@pauillac.inria.fr> (raw)
In-Reply-To: <199910151406.QAA07501@yana.inria.fr>; from Gerard Huet on Fri, Oct 15, 1999 at 03:53:15PM +0200

Wow, there's nothing like internationalization to spark lively
discussions.  Since even Gérard Huet (oops, sorry for that 8859-1
accent, couldn't resist) and Francis Dupont broke their
vows of silence, I guess I have to say something too.

The support for ISO-8859-1 in Caml Light and OCaml is essentially an
historical and geographical accident.  The first books on Caml were
written in French, and it was nice to be able to use accented french
words as identifiers.  Also, that was at a time (1991-1992) where
Unicode and consorts didn't even exist.

The choice of ISO-8859-1 is not that politically incorrect either: it
works not only for western Europe, but also for Latin America, many
Pacific countries, and large parts of Africa.  If we were to choose an
8-bit character set based on the number of OCaml programmers that
actually need it, I guess ISO-8859-1 (or its newer incarnation with
the Euro sign whose name I can't remember) would still win.  (At least
until we get OCaml in the Chinese curriculum...)

Notice also that Caml doesn't prevent the programmer from putting any
character set that includes ASCII (ISO-8859-x, but also UTF8-encoded
Unicode) in character strings and in comments.  

There are several ways to internationalize further.  One is to support
other 8-bit character sets the POSIX way (the LC_CTYPE stuff).  There
are several problems with this:
- It's not enough for Asian languages.
- The POSIX localization stuff isn't supported under Windows.
- It's badly supported on all Unixes I know (e.g. to get French, I
  need to set LC_CTYPE to different values under Linux, Solaris, and
  Digital Unix; it gets worse for other languages such as Japanese).
- Handling of mixed-language texts is a nightmare.

Unicode / ISO10646 is probably a better approach.  However, it has its
own problems:
- There's 16-bit Unicode and 32-bit Unicode.  Early adopters of that
  technology (Windows, Java) chose 16-bit Unicode; late adopters (Unix)
  chose 32-bit Unicode.   (That's the great things about standards:
  there are so many to choose from...)
- Apparently, not everyone agrees on multi-byte encodings (UTF8) as well.
  E.g. Java seems to have its own variant of UTF8.  How are we going
  to interoperate?
- I/O is a nightmare.  The API has to handle at least byte streams,
  wide character streams, and UTF8-encoded streams.
- Support for Unicode / UTF8 files in today's operating systems and GUIs
  is very low.  When will I be able to do "more" on an UTF8 file and see my
  French accented letters? 

My conclusion is that I18N is such a mess that I don't think we'll do
much about it in Caml anytime soon.  Perhaps some basic support for
wide characters and wide character strings will be added at some
point, if only because COM interoperability requires it.

- Xavier Leroy




  parent reply	other threads:[~1999-10-18 14:18 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
1999-10-15 13:53 Gerard Huet
1999-10-15 20:28 ` Gerd Stolpmann
1999-10-19 18:06   ` skaller
1999-10-20 21:05     ` Gerd Stolpmann
1999-10-21  4:42       ` skaller
1999-10-21 12:05       ` Matías Giovannini
1999-10-21 15:35         ` skaller
1999-10-21 16:27           ` Matías Giovannini
1999-10-21 16:36             ` skaller
1999-10-21 17:21               ` Matías Giovannini
1999-10-23  9:53               ` Benoit Deboursetty
1999-10-25 21:06                 ` Jan Skibinski
1999-10-26 18:02                 ` skaller
1999-10-25  0:54               ` How to format a float? skaller
1999-10-26  0:53                 ` Michel Quercia
1999-10-26  4:36         ` Go for ultimate localization! Benoit Deboursetty
1999-10-28 17:04           ` Pierre Weis
1999-10-28 17:41           ` Matías Giovannini
1999-10-28 17:59           ` Matías Giovannini
1999-10-29  9:44             ` Francois Pottier
1999-10-28 21:00           ` Gerd Stolpmann
1999-10-29  4:29           ` skaller
1999-10-17 14:29 ` Xavier Leroy [this message]
1999-10-19 18:36   ` localization, internationalization and Caml skaller
  -- strict thread matches above, loose matches on Subject: below --
1999-10-13 12:12 STARYNKEVITCH Basile
1999-10-14 22:20 ` skaller
1999-10-15  8:26   ` Francis Dupont
1999-10-17 11:27     ` skaller
1999-10-17 15:54       ` Francis Dupont
1999-10-19 18:48         ` skaller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=19991017162917.48773@pauillac.inria.fr \
    --to=xavier.leroy@inria.fr \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).