From: "Joel C. Salomon" <joelcsalomon@gmail.com>
To: "Fans of the OS Plan 9 from Bell Labs" <9fans@9fans.net>
Subject: Re: [9fans] About The Codes Beyond Unicode-BMP
Date: Thu, 13 Mar 2008 14:44:46 -0400 [thread overview]
Message-ID: <7871fcf50803131144g30d3ac4bp18bb6f662a6997bd@mail.gmail.com> (raw)
In-Reply-To: <b615db319cecd764c8e0946fdc806c79@coraid.com>
On Thu, Mar 13, 2008 at 10:55 AM, erik quanstrom <quanstro@coraid.com> wrote:
> plan 9 supports utf16. that is codpoints u+0000 — u+fffff.
To be pedantic, UTF-16 has the ability to represent characters in the
'astral planes' via surrogate pairs (pairs of character in the range
U+D800–U+DFFF); Plan 9's charset is approximately UCS-2.
Java has the same trouble; its astral plane characters are first
encoded as UTF-16 surrogate pairs, then those 16-bit values are
encoded as UTF-8.
> to support larger characters, the starting point would be changing Rune
> from ushort to ulong and changing constants like UTFmax and fixing
> chartorune and runetochar. (and finding all the places that assume that
> UTFmax really is 3.)
>
> it's all very doable, but it would be a very invasive change.
Not really, since only the 2²⁰+2¹⁶ values from 0–0x10FFFF are needed
and UTFmax only needs to go up to 4. An advantage would be that
out-of-band symbols like EOF and yacc terminals could be represented
in the same data type as the characters
On the other hand, there are more useful bits of Unicode that are
unimplemented in Plan 9. Mañana (as in /sys/doc/utf.{html,ps,pdf}
never did come.
--Joel
prev parent reply other threads:[~2008-03-13 18:44 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-03-13 14:28 Hongzheng Wang
2008-03-13 14:55 ` erik quanstrom
2008-03-13 15:02 ` Hongzheng Wang
2008-03-13 18:23 ` Russ Cox
2008-03-13 18:38 ` erik quanstrom
2008-03-13 18:44 ` Joel C. Salomon [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7871fcf50803131144g30d3ac4bp18bb6f662a6997bd@mail.gmail.com \
--to=joelcsalomon@gmail.com \
--cc=9fans@9fans.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).