caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Xavier Leroy <Xavier.Leroy@inria.fr>
To: Jens Wagner <jwagner@handshake.de>, caml-list@inria.fr
Subject: Re: Unicode-Support in Ocaml?
Date: Fri, 12 Mar 1999 16:26:10 +0100	[thread overview]
Message-ID: <19990312162610.21453@pauillac.inria.fr> (raw)
In-Reply-To: <36E6C8E6.45008AEF@handshake.de>; from Jens Wagner on Wed, Mar 10, 1999 at 08:32:54PM +0100

> Are there any plans of supporting Unicode in objective caml?
> A support for Unicode characters is very important for writing
> Web-Applications and would be a great improvement for other applications
> too (IMHO)!

Several applications need Unicode support, indeed.  For instance, my
CamlIDL COM interface will eventually need Unicode strings, because
many Windows API use them.  Here are my thoughts about it.

Switching everything to use Unicode strings is too much work, so we'd
need a separate type "wstring" of wide-character strings in addition
to the standard "string", with conversion functions between the two.

Now, I can see two ways to go about it:

- Build on the ISO C "wide character" and "wide string" abstractions.
  Pros:

  * all recent C libraries support those, and provide powerful
    functions for conversion between wide strings and multi-byte strings
    (e.g. Unicode <-> UTF8), for comparing wide strings, etc;
  * the use of Unicode and UTF8 is not hardwired, most systems provide
    32-bit characters and support several multi-byte encodings.
    I've been told this is better for e.g. Japanese, which has
    several popular encodings that are not Unicode-based.

  Cons:

  * the use of Unicode and UTF8 is not hardwired, so different systems
    represent wide strings differently (e.g. as Unicode under Windows
    and 32-bit chars under Unix), precluding binary exchange of
    wide strings via output_value/input_value;
  * the various multi-byte encodings provided differ from Unix vendor
    to Unix vendor (different encodings, different names for the same
    encoding, ...);
  * while in theory Unicode and UTF8 are just one encoding among others,
    I don't know of any Unix system that actually provides "locale"
    files describing Unicode/UTF8:  Linux with glibc 2.0 doesn't,
    Digital Unix 4 doesn't seem to either, and ditto for Solaris 2.5.  

- Decide that Caml will use Unicode on every platform (like Java does).
  Pros:

  * binary compatibility between platforms;
  * proper Unicode support is guaranteed.

  Cons:

  * lots of code to write by hand (Unicode/UTF8 conversion,
    comparisons, upper/lowercasing, etc);
  * the Japanese might not be happy about it.

> Id could also be interesting having a standard signature for the DOM (
> http://www.w3.org/ ) in ocaml.
> Is there any organization defining signatures of existing API's? Of
> course one could use an IDL->stub converter, but there is much more
> possible using signatures than IDL-Files.

In the great tradition of free software, whomever writes the code (to
interface to an existing API) gets to choose the signature.  However,
you can ask for feedback or suggestions on this list.

- Xavier Leroy




      parent reply	other threads:[~1999-03-12 17:10 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
1999-03-10 19:32 Jens Wagner
1999-03-11 12:50 ` Pierpaolo Bernardi
1999-03-12 15:26 ` Xavier Leroy [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=19990312162610.21453@pauillac.inria.fr \
    --to=xavier.leroy@inria.fr \
    --cc=caml-list@inria.fr \
    --cc=jwagner@handshake.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).