Unicode (was RE: JIT-compilation for OCaml?)

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

* Unicode (was RE: JIT-compilation for OCaml?)
@ 2001-01-11 12:58 Dave Berry
  2001-01-11 18:49 ` Xavier Leroy
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Dave Berry @ 2001-01-11 12:58 UTC (permalink / raw)
  To: John Max Skaller; +Cc: Markus Mottl, OCAML

I thought Unicode was a recognised subset of ISO-10646, corresponding to the
range 0-2^16.  Also, don't Windows NT/2000 use Unicode?  

My knowledge of C/C++ is probably out of date, but I thought they just used
the wide character type, without requiring a particular internal
representation.  In what way do ISO C/C++ support ISO-10646?

(I realise this isn't directly on-topic, but it may be relevant for future
extensions to OCaml?)

Dave.


-----Original Message-----
From: John Max Skaller [mailto:skaller@ozemail.com.au]
Sent: Thursday, January 11, 2001 7:01
To: Dave Berry
Cc: Markus Mottl; OCAML
Subject: Re: JIT-compilation for OCaml?


Dave Berry wrote:
> 
> This view seems extreme to me.  Certainly the Java type system has faults
--
> lack of generics being one, lack of enumerated types another, and various
> other points as well.  But surely Unicode is a useful de facto standard?

	No. Unicode was abandoned years ago: there is an 'offical'
ISO Standard: ISO-10646. There are 2^31 code points, unlike
Unicode's 2^16, which is already barely adequate. ISO C and ISO C++
support ISO-10646. Linux runs ISO-10646 (via UTF-8).



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unicode (was RE: JIT-compilation for OCaml?)
  2001-01-11 12:58 Unicode (was RE: JIT-compilation for OCaml?) Dave Berry
@ 2001-01-11 18:49 ` Xavier Leroy
  2001-01-12  9:24   ` John Max Skaller
                     ` (2 more replies)
  2001-01-12  0:19 ` Pierpaolo BERNARDI
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 16+ messages in thread
From: Xavier Leroy @ 2001-01-11 18:49 UTC (permalink / raw)
  To: Dave Berry; +Cc: John Max Skaller, Markus Mottl, OCAML

> I thought Unicode was a recognised subset of ISO-10646, corresponding to the
> range 0-2^16.  Also, don't Windows NT/2000 use Unicode?  

Yes, Win32 (i.e. 95, 98, ME, NT, 2000 and whatnot) uses 16-bit
characters.  Java too.  But Unix C libraries that support wide chars
seem to prefer 32-bit characters.  Remember:

    "Standards are great: there are so many to choose from."

> (I realise this isn't directly on-topic, but it may be relevant for future
> extensions to OCaml?)

It is very relevant indeed.  We've been contemplating adding some
simple support for wide characters and wide strings, e.g. as two new
library modules, but the stumbling point is whether to use 16-bit or
32-bit wide characters.  While 32 bits is probably the wave of the
future, 16 bits is what we need to interface easily with Java and with
many Microsoft products (e.g. COM dispatch components, Visual Basic,
various Win32 APIs).

Shall we "do it right" (for some notion of "right") or favor
interoperability?  Hard question.  My current answer is to
procrastinate...  Actually, multi-byte encoded strings (UTF-8) are not
so bad and already have full support in OCaml :-)

- Xavier Leroy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unicode (was RE: JIT-compilation for OCaml?)
  2001-01-11 12:58 Unicode (was RE: JIT-compilation for OCaml?) Dave Berry
  2001-01-11 18:49 ` Xavier Leroy
@ 2001-01-12  0:19 ` Pierpaolo BERNARDI
  2001-01-17 19:37   ` John Max Skaller
  2001-01-12  8:33 ` John Max Skaller
  2001-01-12 21:25 ` Nickolay Semyonov
  3 siblings, 1 reply; 16+ messages in thread
From: Pierpaolo BERNARDI @ 2001-01-12  0:19 UTC (permalink / raw)
  To: OCAML

On Thu, 11 Jan 2001, Dave Berry wrote:

> I thought Unicode was a recognised subset of ISO-10646, corresponding to the
> range 0-2^16.  

No. ISO-10646 and Unicode contains exactly the same code points.
Unicode has room for about 2^20 code points. The ISO committee has
agreed to limit ISO-10646 to the same range.

The current version of Unicode 3.0.1 (and consequently of ISO-10646) has
less than 2^16 code points assigned.  The next version, due
out in a couple of months will contain about 100.000 characters 

> My knowledge of C/C++ is probably out of date, but I thought they just used
> the wide character type, without requiring a particular internal
> representation.  

This is correct.

> In what way do ISO C/C++ support ISO-10646?

I have not been following this discussion, so I missed the message you are
replying to. One can say that C supports ISO-10646 in the sense that a C
environment *can* use ISO-10646 as its internal representation for wide
chars, if the C implementor so chooses.  Many compilers do just this, in
fact.

P.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unicode (was RE: JIT-compilation for OCaml?)
  2001-01-11 12:58 Unicode (was RE: JIT-compilation for OCaml?) Dave Berry
  2001-01-11 18:49 ` Xavier Leroy
  2001-01-12  0:19 ` Pierpaolo BERNARDI
@ 2001-01-12  8:33 ` John Max Skaller
       [not found]   ` <3A5F77B7.52D8F933@snob.spb.ru>
  2001-01-12 21:25 ` Nickolay Semyonov
  3 siblings, 1 reply; 16+ messages in thread
From: John Max Skaller @ 2001-01-12  8:33 UTC (permalink / raw)
  To: Dave Berry; +Cc: Markus Mottl, OCAML

Dave Berry wrote:
> 
> I thought Unicode was a recognised subset of ISO-10646, corresponding to the
> range 0-2^16.  Also, don't Windows NT/2000 use Unicode?

	Yes and Yes. More precisely, Unicode is often 'ahead' of ISO,
adding new characters which make it into new versions of ISO-10646
later.

> My knowledge of C/C++ is probably out of date, but I thought they just used
> the wide character type, without requiring a particular internal
> representation.  In what way do ISO C/C++ support ISO-10646?

	There are, for example, both 16 and 31 bit escapes.
What the compiler does with them is implementation defined I think,
that is, it can silently truncate to 16 or even 8 bits, but
the programmer can still encode any ISO-10646 character.

	The type 'whchar_t' has implementation defined size in C++
(like all the other integral types). This doesn't exclude using
32 bit characters.

> (I realise this isn't directly on-topic, but it may be relevant for future
> extensions to OCaml?)

	I think it is. In particular, Ocaml supports 8 bit characters,
and even allows the high 128 bytes to be used in identifiers
(to allow French names :-)

	When and if this support is upgraded, Ocaml should go to
full ISO-10646 support: for identifiers this is easily done by
using UTF-8 (and providing an codec to convert Latin-1 for
backward compatibility). Supporting 2^31 code points in regular
expressions is more difficult. Collation is a nightmare :-)

-- 
John (Max) Skaller, mailto:skaller@maxtal.com.au
10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850
checkout Vyper http://Vyper.sourceforge.net
download Interscript http://Interscript.sourceforge.net



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unicode (was RE: JIT-compilation for OCaml?)
  2001-01-11 18:49 ` Xavier Leroy
@ 2001-01-12  9:24   ` John Max Skaller
  2001-01-12 12:05   ` Pierpaolo BERNARDI
       [not found]   ` <3A5F7685.FF2593BB@snob.spb.ru>
  2 siblings, 0 replies; 16+ messages in thread
From: John Max Skaller @ 2001-01-12  9:24 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: Dave Berry, Markus Mottl, OCAML

Xavier Leroy wrote:

> Shall we "do it right" (for some notion of "right") or favor
> interoperability?  Hard question.  My current answer is to
> procrastinate...  Actually, multi-byte encoded strings (UTF-8) are not
> so bad and already have full support in OCaml :-)

	I personally think this is the first step, since no
new data types are required. Instead, what is needed would seem to be
simple. What I believe is required is

	1. changes to the lexer to support \uXXXX and \UXXXXXXXX escapes
(in strings, and probably in identifiers)

	2. changes to the lexer to recognize the 'letters'
which can be used in identifiers. The letters which should be
allowed are specified in an ISO document. 

	3. Provide a codec to convert Latin-1 to UTF-8.
[One can argue about whether it is applied by default or not :-]
You might provide other codecs too, such as UCS-16 -> UTF-8

	I guess most of the rest can be done in Ocaml or C
without impacting the compiler/run-time, and when it is right,
the compiler/run-time can be tuned to make more efficient
representations possible. [For example, to generate inline
code to compare 16/31 bit unsigned integers, rather than
call a C routine]

-- 
John (Max) Skaller, mailto:skaller@maxtal.com.au
10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850
checkout Vyper http://Vyper.sourceforge.net
download Interscript http://Interscript.sourceforge.net



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unicode (was RE: JIT-compilation for OCaml?)
  2001-01-11 18:49 ` Xavier Leroy
  2001-01-12  9:24   ` John Max Skaller
@ 2001-01-12 12:05   ` Pierpaolo BERNARDI
       [not found]   ` <3A5F7685.FF2593BB@snob.spb.ru>
  2 siblings, 0 replies; 16+ messages in thread
From: Pierpaolo BERNARDI @ 2001-01-12 12:05 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: Dave Berry, John Max Skaller, Markus Mottl, OCAML


On Thu, 11 Jan 2001, Xavier Leroy wrote:

> It is very relevant indeed.  We've been contemplating adding some
> simple support for wide characters and wide strings, e.g. as two new
> library modules, but the stumbling point is whether to use 16-bit or
> 32-bit wide characters.  While 32 bits is probably the wave of the
> future, 16 bits is what we need to interface easily with Java and with
> many Microsoft products (e.g. COM dispatch components, Visual Basic,
> various Win32 APIs).



"Recruits" as Bob wants to call them, will come from the pubic, not here.
 -- Robert J. Petry, C.L.  -  March 24, 2000



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unicode (was RE: JIT-compilation for OCaml?)
  2001-01-11 12:58 Unicode (was RE: JIT-compilation for OCaml?) Dave Berry
                   ` (2 preceding siblings ...)
  2001-01-12  8:33 ` John Max Skaller
@ 2001-01-12 21:25 ` Nickolay Semyonov
  3 siblings, 0 replies; 16+ messages in thread
From: Nickolay Semyonov @ 2001-01-12 21:25 UTC (permalink / raw)
  To: Dave Berry; +Cc: caml-list

Dave Berry wrote:
> 
>I thought Unicode was a recognised subset of ISO-10646,corresponding to the
> range 0-2^16.  Also, don't Windows NT/2000 use Unicode?
> 
>My knowledge of C/C++ is probably out of date, but I thought they just used
> the wide character type, without requiring a particular internal
> representation.  In what way do ISO C/C++ support ISO-10646?
> 
> (I realise this isn't directly on-topic, but it may be relevant for future
> extensions to OCaml?)
> 

Actually I think that default Ocaml string --- must be unicode (or
ISO-10XXX, that doesn't matter. It is really hard to write multilingual
applications in Ocaml now.

Nickolay



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unicode (was RE: JIT-compilation for OCaml?)
       [not found]   ` <3A5F7685.FF2593BB@snob.spb.ru>
@ 2001-01-12 21:33     ` Nickolay Semyonov
  2001-01-17 19:47       ` John Max Skaller
  0 siblings, 1 reply; 16+ messages in thread
From: Nickolay Semyonov @ 2001-01-12 21:33 UTC (permalink / raw)
  To: caml-list

Nickolay Semyonov wrote:
 
Xavier Leroy wrote:
 
 > Shall we "do it right" (for some notion of "right") or favor
 > interoperability?  Hard question.  My current answer is to
 > procrastinate...  Actually, multi-byte encoded strings (UTF-8) are
not
 > so bad and already have full support in OCaml :-)
 >
 
 Er, where? Am I missed something?
 
 Nickolay



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unicode (was RE: JIT-compilation for OCaml?)
       [not found]   ` <3A5F77B7.52D8F933@snob.spb.ru>
@ 2001-01-12 21:33     ` Nickolay Semyonov
  0 siblings, 0 replies; 16+ messages in thread
From: Nickolay Semyonov @ 2001-01-12 21:33 UTC (permalink / raw)
  To: caml-list

John Max Skaller wrote:

>         I think it is. In particular, Ocaml supports 8 bit characters,
> and even allows the high 128 bytes to be used in identifiers
> (to allow French names :-)
>
 
# String.lowercase "SOME CYRILLIC SYMBOLS" ;;
- : string "SOME CYRILLIC SYMBOLS"
#
 
Supporting 8-bits is not enough to handle different languages.
 
Nickolay



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unicode (was RE: JIT-compilation for OCaml?)
  2001-01-12  0:19 ` Pierpaolo BERNARDI
@ 2001-01-17 19:37   ` John Max Skaller
  2001-01-18 17:49     ` Pierpaolo BERNARDI
  0 siblings, 1 reply; 16+ messages in thread
From: John Max Skaller @ 2001-01-17 19:37 UTC (permalink / raw)
  To: Pierpaolo BERNARDI; +Cc: OCAML

Pierpaolo BERNARDI wrote:
> 
> On Thu, 11 Jan 2001, Dave Berry wrote:
> 
> > I thought Unicode was a recognised subset of ISO-10646, corresponding to the
> > range 0-2^16.
> 
> No. ISO-10646 and Unicode contains exactly the same code points.
> Unicode has room for about 2^20 code points. The ISO committee has
> agreed to limit ISO-10646 to the same range.

	Unless it has changed recently, the first 64K code points of ISO-10646
are known as the Basic Multilingual Plane (BMP), which corresponds
to ISO-10646. The other 'planes' are not currently used AFAIK,
but they exist. Indeed, some code points from the BMP are reserved
so Unicode can use multi-word encodings of the lower 4 planes.


-- 
John (Max) Skaller, mailto:skaller@maxtal.com.au
10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850
checkout Vyper http://Vyper.sourceforge.net
download Interscript http://Interscript.sourceforge.net



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unicode (was RE: JIT-compilation for OCaml?)
  2001-01-12 21:33     ` Nickolay Semyonov
@ 2001-01-17 19:47       ` John Max Skaller
  0 siblings, 0 replies; 16+ messages in thread
From: John Max Skaller @ 2001-01-17 19:47 UTC (permalink / raw)
  To: Nickolay Semyonov; +Cc: caml-list

Nickolay Semyonov wrote:
> 
> Nickolay Semyonov wrote:
> 
> Xavier Leroy wrote:
> 
>  > Shall we "do it right" (for some notion of "right") or favor
>  > interoperability?  Hard question.  My current answer is to
>  > procrastinate...  Actually, multi-byte encoded strings (UTF-8) are
> not
>  > so bad and already have full support in OCaml :-)
>  >
> 
>  Er, where? Am I missed something?


	Ocaml is '8 bit clean', which means that strings of bytes
may be interpreted by user code as UTF-8 encoded ISO-10646 code points.

-- 
John (Max) Skaller, mailto:skaller@maxtal.com.au
10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850
checkout Vyper http://Vyper.sourceforge.net
download Interscript http://Interscript.sourceforge.net



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unicode (was RE: JIT-compilation for OCaml?)
  2001-01-17 19:37   ` John Max Skaller
@ 2001-01-18 17:49     ` Pierpaolo BERNARDI
  2001-01-22 20:27       ` John Max Skaller
  0 siblings, 1 reply; 16+ messages in thread
From: Pierpaolo BERNARDI @ 2001-01-18 17:49 UTC (permalink / raw)
  To: John Max Skaller; +Cc: OCAML

On Thu, 18 Jan 2001, John Max Skaller wrote:

> Pierpaolo BERNARDI wrote:
> > 
> > On Thu, 11 Jan 2001, Dave Berry wrote:
> > 
> > > I thought Unicode was a recognised subset of ISO-10646, corresponding to the
> > > range 0-2^16.
> > 
> > No. ISO-10646 and Unicode contains exactly the same code points.
> > Unicode has room for about 2^20 code points. The ISO committee has
> > agreed to limit ISO-10646 to the same range.
> 
> 	Unless it has changed recently, the first 64K code points of ISO-10646
> are known as the Basic Multilingual Plane (BMP), which corresponds
> to ISO-10646. The other 'planes' are not currently used AFAIK,
> but they exist. 

Let me repeat: ISO has formally agreed to not use code points outside of
the Unicode possibility.  This leaves room for about 2^20 characters.
Today has been published a draft of Unicode 3.1 (the definitive version 
is due out in a couple of months, which already uses code points outside
of the BMP.  See the Unicode FAQs at www.unicode.org for more
informations.

> Indeed, some code points from the BMP are reserved
> so Unicode can use multi-word encodings of the lower 4 planes.

Unicode can be encoded in several ways, for example, UTF-8, UTF-16,
UTF-32, UCS2, etc..  This has nothing to do with the number of characters
that can be encoded.

Cheers,
  Pierpaolo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unicode (was RE: JIT-compilation for OCaml?)
  2001-01-18 17:49     ` Pierpaolo BERNARDI
@ 2001-01-22 20:27       ` John Max Skaller
  2001-01-22 21:44         ` Pierpaolo BERNARDI
  0 siblings, 1 reply; 16+ messages in thread
From: John Max Skaller @ 2001-01-22 20:27 UTC (permalink / raw)
  To: Pierpaolo BERNARDI; +Cc: OCAML

Pierpaolo BERNARDI wrote:

> Let me repeat: ISO has formally agreed to not use code points outside of
> the Unicode possibility.  

	OK, accepted.

> This leaves room for about 2^20 characters.

> > Indeed, some code points from the BMP are reserved
> > so Unicode can use multi-word encodings of the lower 4 planes.
> 
> Unicode can be encoded in several ways, for example, UTF-8, UTF-16,
> UTF-32, UCS2, etc..  This has nothing to do with the number of characters
> that can be encoded.

	This is not quite right. Unicode is 16 bit, it supports
only 2^16 code points: again, unless this has
changed recently. However, some of the code points are reserved
for UCS-16 encoding of a larger space of 2^20 code points (another
four bits -- I was wrong, this is the lower 16 (not 4) planes).

	So it is not quite true that it has 'nothing to do with
the number of characters that can be encoded', since some of
the code points of the BMP are reserved precisely for the purpose
of two word encodings of a larger space. (I think these are
the High and Low Surrogates: U+d8xx, U+dcxx respectively).

	Note that this is only loosely connected with
the encoding of _characters_, since some code points are
not characters (such as 'newline'), and some sequences
of code points represent a single (accented) character :-)

-- 
John (Max) Skaller, mailto:skaller@maxtal.com.au
10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850
checkout Vyper http://Vyper.sourceforge.net
download Interscript http://Interscript.sourceforge.net



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unicode (was RE: JIT-compilation for OCaml?)
  2001-01-22 20:27       ` John Max Skaller
@ 2001-01-22 21:44         ` Pierpaolo BERNARDI
  2001-01-24 13:41           ` John Max Skaller
  0 siblings, 1 reply; 16+ messages in thread
From: Pierpaolo BERNARDI @ 2001-01-22 21:44 UTC (permalink / raw)
  To: John Max Skaller; +Cc: OCAML

On Tue, 23 Jan 2001, John Max Skaller wrote:

> > Unicode can be encoded in several ways, for example, UTF-8, UTF-16,
> > UTF-32, UCS2, etc..  This has nothing to do with the number of characters
> > that can be encoded.
> 
> 	This is not quite right. Unicode is 16 bit, it supports
> only 2^16 code points: again, unless this has
> changed recently. 

Sorry, no. The idea that "Unicode is 16 bit" is a relic of the
prehistory of Unicode, when it was thought that 64K characters would
suffice.

Unicode associates abstract characters with a numeric index (scalar
value). These scalar values, to be stored in a computer must be
serialized. Some serialization methods are, for example: UTF-8 (based
on 8-bit chunks), UTF-16 (based on 16-bit chunks), UTF-32 (based on
32-bit chunks). With UTF-32 there's a simple correspondence between
Unicode scalar values and chunks: the numeric value of the 32-bit
chunk is the scalar value; with the other two formats, Unicode
characters use a variable number of chunks: 1 to 4 for UTF-8, 1 or 2
for UTF-16. There are other serialization methods, like, for example
the archaic UCS-2 (which uses surrogates, and which I think is used by
Microsoft), UTF-7, UTF-EBCDIC, UTF-7,5, etc.

Unicode scalar values range from 0x0000 to 0x100000. The forthcoming
Unicode 3.1 uses 94,140 of these values.

And the same is true for ISO. There are some discrepancies which are due
only to the different times of publication of the respective standards.

> However, some of the code points are reserved
> for UCS-16 encoding of a larger space of 2^20 code points (another

You mean UCS-2.  UCS-2 is just one encoding among many others, it is not
identical to Unicode. Yes, there are code points whose only purpose is to
be used with this particular encoding.

> 	Note that this is only loosely connected with
> the encoding of _characters_, since some code points are
> not characters (such as 'newline'), and some sequences
> of code points represent a single (accented) character :-)

Yes, although the word 'character' has many meanings:

  Character. (1) The smallest component of written language that has
  semantic value; refers to the abstract meaning and/or shape, rather than
  a specific shape (see also glyph), though in code tables some form of
  visual representation is essential for the reader's understanding. (2)
  Synonym for abstract character. (See Definition D3 in Section 3.3,
  Characters and Coded Representations .) (3) The basic unit of encoding
  for the Unicode character encoding. (4) The English name for the
  ideographic written elements of Chinese origin. (See ideograph (2).)
     (from the Unicode Glossary)

You mean meaning 1. Usually, in computer related discussions meaning 3 is
intended.

If you are interested in sorting out the details, you can read UTR-17:

 http://www.unicode.org/unicode/reports/tr17/

Hope this helps.

  Pierpaolo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unicode (was RE: JIT-compilation for OCaml?)
  2001-01-22 21:44         ` Pierpaolo BERNARDI
@ 2001-01-24 13:41           ` John Max Skaller
  0 siblings, 0 replies; 16+ messages in thread
From: John Max Skaller @ 2001-01-24 13:41 UTC (permalink / raw)
  To: Pierpaolo BERNARDI; +Cc: OCAML

Pierpaolo BERNARDI wrote:

> Sorry, no. The idea that "Unicode is 16 bit" is a relic of the
> prehistory of Unicode, when it was thought that 64K characters would
> suffice.

	When people talk about Unicode in industry they mean
16 bit code points.

-- 
John (Max) Skaller, mailto:skaller@maxtal.com.au
10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850
checkout Vyper http://Vyper.sourceforge.net
download Interscript http://Interscript.sourceforge.net



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Unicode (was RE: JIT-compilation for OCaml?)
       [not found] <Pine.GSO.4.00.10101222155260.697-100000@carlotta.cli.di.unipi .it>
@ 2001-01-22 21:57 ` Pierpaolo BERNARDI
  0 siblings, 0 replies; 16+ messages in thread
From: Pierpaolo BERNARDI @ 2001-01-22 21:57 UTC (permalink / raw)
  To: John Max Skaller; +Cc: OCAML


On Mon, 22 Jan 2001, I wrote:

> Unicode scalar values range from 0x0000 to 0x100000. 

Sorry, the range is: from 0x0000 to 0x10FFFF.

P.




^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2001-01-24 17:13 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-01-11 12:58 Unicode (was RE: JIT-compilation for OCaml?) Dave Berry
2001-01-11 18:49 ` Xavier Leroy
2001-01-12  9:24   ` John Max Skaller
2001-01-12 12:05   ` Pierpaolo BERNARDI
     [not found]   ` <3A5F7685.FF2593BB@snob.spb.ru>
2001-01-12 21:33     ` Nickolay Semyonov
2001-01-17 19:47       ` John Max Skaller
2001-01-12  0:19 ` Pierpaolo BERNARDI
2001-01-17 19:37   ` John Max Skaller
2001-01-18 17:49     ` Pierpaolo BERNARDI
2001-01-22 20:27       ` John Max Skaller
2001-01-22 21:44         ` Pierpaolo BERNARDI
2001-01-24 13:41           ` John Max Skaller
2001-01-12  8:33 ` John Max Skaller
     [not found]   ` <3A5F77B7.52D8F933@snob.spb.ru>
2001-01-12 21:33     ` Nickolay Semyonov
2001-01-12 21:25 ` Nickolay Semyonov
     [not found] <Pine.GSO.4.00.10101222155260.697-100000@carlotta.cli.di.unipi .it>
2001-01-22 21:57 ` Pierpaolo BERNARDI

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).