caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: John Max Skaller <skaller@ozemail.com.au>
To: Pierpaolo BERNARDI <bernardp@cli.di.unipi.it>
Cc: OCAML <caml-list@inria.fr>
Subject: Re: Unicode (was RE: JIT-compilation for OCaml?)
Date: Tue, 23 Jan 2001 07:27:47 +1100	[thread overview]
Message-ID: <3A6C97C3.C109DC15@ozemail.com.au> (raw)
In-Reply-To: <Pine.GSO.4.00.10101181843090.1886-100000@carlotta.cli.di.unipi.it>

Pierpaolo BERNARDI wrote:

> Let me repeat: ISO has formally agreed to not use code points outside of
> the Unicode possibility.  

	OK, accepted.

> This leaves room for about 2^20 characters.

> > Indeed, some code points from the BMP are reserved
> > so Unicode can use multi-word encodings of the lower 4 planes.
> 
> Unicode can be encoded in several ways, for example, UTF-8, UTF-16,
> UTF-32, UCS2, etc..  This has nothing to do with the number of characters
> that can be encoded.

	This is not quite right. Unicode is 16 bit, it supports
only 2^16 code points: again, unless this has
changed recently. However, some of the code points are reserved
for UCS-16 encoding of a larger space of 2^20 code points (another
four bits -- I was wrong, this is the lower 16 (not 4) planes).

	So it is not quite true that it has 'nothing to do with
the number of characters that can be encoded', since some of
the code points of the BMP are reserved precisely for the purpose
of two word encodings of a larger space. (I think these are
the High and Low Surrogates: U+d8xx, U+dcxx respectively).

	Note that this is only loosely connected with
the encoding of _characters_, since some code points are
not characters (such as 'newline'), and some sequences
of code points represent a single (accented) character :-)

-- 
John (Max) Skaller, mailto:skaller@maxtal.com.au
10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850
checkout Vyper http://Vyper.sourceforge.net
download Interscript http://Interscript.sourceforge.net



  reply	other threads:[~2001-01-22 22:05 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-01-11 12:58 Dave Berry
2001-01-11 18:49 ` Xavier Leroy
2001-01-12  9:24   ` John Max Skaller
2001-01-12 12:05   ` Pierpaolo BERNARDI
     [not found]   ` <3A5F7685.FF2593BB@snob.spb.ru>
2001-01-12 21:33     ` Nickolay Semyonov
2001-01-17 19:47       ` John Max Skaller
2001-01-12  0:19 ` Pierpaolo BERNARDI
2001-01-17 19:37   ` John Max Skaller
2001-01-18 17:49     ` Pierpaolo BERNARDI
2001-01-22 20:27       ` John Max Skaller [this message]
2001-01-22 21:44         ` Pierpaolo BERNARDI
2001-01-24 13:41           ` John Max Skaller
2001-01-12  8:33 ` John Max Skaller
     [not found]   ` <3A5F77B7.52D8F933@snob.spb.ru>
2001-01-12 21:33     ` Nickolay Semyonov
2001-01-12 21:25 ` Nickolay Semyonov
     [not found] <Pine.GSO.4.00.10101222155260.697-100000@carlotta.cli.di.unipi .it>
2001-01-22 21:57 ` Pierpaolo BERNARDI

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3A6C97C3.C109DC15@ozemail.com.au \
    --to=skaller@ozemail.com.au \
    --cc=bernardp@cli.di.unipi.it \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).