From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id QAA05830 for caml-red; Sat, 20 Jan 2001 16:32:29 +0100 (MET) Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id SAA05906 for ; Thu, 18 Jan 2001 18:51:10 +0100 (MET) Received: from mailserver.cli.di.unipi.it (crudelia.cli.di.unipi.it [131.114.11.37]) by concorde.inria.fr (8.11.1/8.10.0) with ESMTP id f0IHp8n26454 for ; Thu, 18 Jan 2001 18:51:08 +0100 (MET) Organization: Centro di Calcolo - Dipartimento di Informatica di Pisa - Italy Received: from fire.cli.di.unipi.it (fire-ext.cli.di.unipi.it [131.114.11.52]) by mailserver.cli.di.unipi.it (8.9.3/8.9.3) with SMTP id SAA13412 for ; Thu, 18 Jan 2001 18:48:57 +0100 Received: (qmail 4941 invoked by uid 7794); 18 Jan 2001 17:51:00 -0000 Received: from carlotta.cli.di.unipi.it(131.114.11.15) via SMTP by crudelia.cli.di.unipi.it, id smtpda04889; Thu Jan 18 18:48:49 2001 Received: from localhost (bernardp@localhost) by carlotta.cli.di.unipi.it (8.8.5/8.6.12) with SMTP id SAA02562; Thu, 18 Jan 2001 18:49:04 +0100 (MET) X-Authentication-Warning: carlotta.cli.di.unipi.it: bernardp owned process doing -bs Date: Thu, 18 Jan 2001 18:49:04 +0100 (MET) From: Pierpaolo BERNARDI To: John Max Skaller cc: OCAML Subject: Re: Unicode (was RE: JIT-compilation for OCaml?) In-Reply-To: <3A65F487.3F91E502@ozemail.com.au> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: weis@pauillac.inria.fr On Thu, 18 Jan 2001, John Max Skaller wrote: > Pierpaolo BERNARDI wrote: > > > > On Thu, 11 Jan 2001, Dave Berry wrote: > > > > > I thought Unicode was a recognised subset of ISO-10646, corresponding to the > > > range 0-2^16. > > > > No. ISO-10646 and Unicode contains exactly the same code points. > > Unicode has room for about 2^20 code points. The ISO committee has > > agreed to limit ISO-10646 to the same range. > > Unless it has changed recently, the first 64K code points of ISO-10646 > are known as the Basic Multilingual Plane (BMP), which corresponds > to ISO-10646. The other 'planes' are not currently used AFAIK, > but they exist. Let me repeat: ISO has formally agreed to not use code points outside of the Unicode possibility. This leaves room for about 2^20 characters. Today has been published a draft of Unicode 3.1 (the definitive version is due out in a couple of months, which already uses code points outside of the BMP. See the Unicode FAQs at www.unicode.org for more informations. > Indeed, some code points from the BMP are reserved > so Unicode can use multi-word encodings of the lower 4 planes. Unicode can be encoded in several ways, for example, UTF-8, UTF-16, UTF-32, UCS2, etc.. This has nothing to do with the number of characters that can be encoded. Cheers, Pierpaolo