caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: John Max Skaller <skaller@ozemail.com.au>
To: Dave Berry <dave@kal.com>
Cc: Markus Mottl <mottl@miss.wu-wien.ac.at>, OCAML <caml-list@inria.fr>
Subject: Re: Unicode (was RE: JIT-compilation for OCaml?)
Date: Fri, 12 Jan 2001 19:33:54 +1100	[thread overview]
Message-ID: <3A5EC172.CE9FBA65@ozemail.com.au> (raw)
In-Reply-To: <3145774E67D8D111BE6E00C0DF418B663AD720@nt.kal.com>

Dave Berry wrote:
> 
> I thought Unicode was a recognised subset of ISO-10646, corresponding to the
> range 0-2^16.  Also, don't Windows NT/2000 use Unicode?

	Yes and Yes. More precisely, Unicode is often 'ahead' of ISO,
adding new characters which make it into new versions of ISO-10646
later.

> My knowledge of C/C++ is probably out of date, but I thought they just used
> the wide character type, without requiring a particular internal
> representation.  In what way do ISO C/C++ support ISO-10646?

	There are, for example, both 16 and 31 bit escapes.
What the compiler does with them is implementation defined I think,
that is, it can silently truncate to 16 or even 8 bits, but
the programmer can still encode any ISO-10646 character.

	The type 'whchar_t' has implementation defined size in C++
(like all the other integral types). This doesn't exclude using
32 bit characters.

> (I realise this isn't directly on-topic, but it may be relevant for future
> extensions to OCaml?)

	I think it is. In particular, Ocaml supports 8 bit characters,
and even allows the high 128 bytes to be used in identifiers
(to allow French names :-)

	When and if this support is upgraded, Ocaml should go to
full ISO-10646 support: for identifiers this is easily done by
using UTF-8 (and providing an codec to convert Latin-1 for
backward compatibility). Supporting 2^31 code points in regular
expressions is more difficult. Collation is a nightmare :-)

-- 
John (Max) Skaller, mailto:skaller@maxtal.com.au
10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850
checkout Vyper http://Vyper.sourceforge.net
download Interscript http://Interscript.sourceforge.net



  parent reply	other threads:[~2001-01-12  9:19 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-01-11 12:58 Dave Berry
2001-01-11 18:49 ` Xavier Leroy
2001-01-12  9:24   ` John Max Skaller
2001-01-12 12:05   ` Pierpaolo BERNARDI
     [not found]   ` <3A5F7685.FF2593BB@snob.spb.ru>
2001-01-12 21:33     ` Nickolay Semyonov
2001-01-17 19:47       ` John Max Skaller
2001-01-12  0:19 ` Pierpaolo BERNARDI
2001-01-17 19:37   ` John Max Skaller
2001-01-18 17:49     ` Pierpaolo BERNARDI
2001-01-22 20:27       ` John Max Skaller
2001-01-22 21:44         ` Pierpaolo BERNARDI
2001-01-24 13:41           ` John Max Skaller
2001-01-12  8:33 ` John Max Skaller [this message]
     [not found]   ` <3A5F77B7.52D8F933@snob.spb.ru>
2001-01-12 21:33     ` Nickolay Semyonov
2001-01-12 21:25 ` Nickolay Semyonov
     [not found] <Pine.GSO.4.00.10101222155260.697-100000@carlotta.cli.di.unipi .it>
2001-01-22 21:57 ` Pierpaolo BERNARDI

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3A5EC172.CE9FBA65@ozemail.com.au \
    --to=skaller@ozemail.com.au \
    --cc=caml-list@inria.fr \
    --cc=dave@kal.com \
    --cc=mottl@miss.wu-wien.ac.at \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).