From mboxrd@z Thu Jan  1 00:00:00 1970
Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id QAA05830 for caml-red; Sat, 20 Jan 2001 16:32:29 +0100 (MET)
Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id SAA05906 for <caml-list@pauillac.inria.fr>; Thu, 18 Jan 2001 18:51:10 +0100 (MET)
Received: from mailserver.cli.di.unipi.it (crudelia.cli.di.unipi.it [131.114.11.37])
	by concorde.inria.fr (8.11.1/8.10.0) with ESMTP id f0IHp8n26454
	for <caml-list@inria.fr>; Thu, 18 Jan 2001 18:51:08 +0100 (MET)
Organization:  Centro di Calcolo - Dipartimento di Informatica di Pisa - Italy
Received: from fire.cli.di.unipi.it (fire-ext.cli.di.unipi.it [131.114.11.52])
	by mailserver.cli.di.unipi.it (8.9.3/8.9.3) with SMTP id SAA13412
	for <caml-list@inria.fr>; Thu, 18 Jan 2001 18:48:57 +0100
Received: (qmail 4941 invoked by uid 7794); 18 Jan 2001 17:51:00 -0000
Received: from carlotta.cli.di.unipi.it(131.114.11.15)
	via SMTP by crudelia.cli.di.unipi.it, id smtpda04889; Thu Jan 18 18:48:49 2001
Received: from localhost (bernardp@localhost) by carlotta.cli.di.unipi.it (8.8.5/8.6.12) with SMTP id SAA02562; Thu, 18 Jan 2001 18:49:04 +0100 (MET)
X-Authentication-Warning: carlotta.cli.di.unipi.it: bernardp owned process doing -bs
Date: Thu, 18 Jan 2001 18:49:04 +0100 (MET)
From: Pierpaolo BERNARDI <bernardp@cli.di.unipi.it>
To: John Max Skaller <skaller@ozemail.com.au>
cc: OCAML <caml-list@inria.fr>
Subject: Re: Unicode (was RE: JIT-compilation for OCaml?)
In-Reply-To: <3A65F487.3F91E502@ozemail.com.au>
Message-ID: <Pine.GSO.4.00.10101181843090.1886-100000@carlotta.cli.di.unipi.it>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: weis@pauillac.inria.fr


On Thu, 18 Jan 2001, John Max Skaller wrote:

> Pierpaolo BERNARDI wrote:
> > 
> > On Thu, 11 Jan 2001, Dave Berry wrote:
> > 
> > > I thought Unicode was a recognised subset of ISO-10646, corresponding to the
> > > range 0-2^16.
> > 
> > No. ISO-10646 and Unicode contains exactly the same code points.
> > Unicode has room for about 2^20 code points. The ISO committee has
> > agreed to limit ISO-10646 to the same range.
> 
> 	Unless it has changed recently, the first 64K code points of ISO-10646
> are known as the Basic Multilingual Plane (BMP), which corresponds
> to ISO-10646. The other 'planes' are not currently used AFAIK,
> but they exist. 

Let me repeat: ISO has formally agreed to not use code points outside of
the Unicode possibility.  This leaves room for about 2^20 characters.
Today has been published a draft of Unicode 3.1 (the definitive version 
is due out in a couple of months, which already uses code points outside
of the BMP.  See the Unicode FAQs at www.unicode.org for more
informations.

> Indeed, some code points from the BMP are reserved
> so Unicode can use multi-word encodings of the lower 4 planes.

Unicode can be encoded in several ways, for example, UTF-8, UTF-16,
UTF-32, UCS2, etc..  This has nothing to do with the number of characters
that can be encoded.

Cheers,
  Pierpaolo