9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: "Joel C. Salomon" <joelcsalomon@gmail.com>
To: "Fans of the OS Plan 9 from Bell Labs" <9fans@9fans.net>
Subject: Re: [9fans] About The Codes Beyond Unicode-BMP
Date: Thu, 13 Mar 2008 14:44:46 -0400	[thread overview]
Message-ID: <7871fcf50803131144g30d3ac4bp18bb6f662a6997bd@mail.gmail.com> (raw)
In-Reply-To: <b615db319cecd764c8e0946fdc806c79@coraid.com>

On Thu, Mar 13, 2008 at 10:55 AM, erik quanstrom <quanstro@coraid.com> wrote:
> plan 9 supports utf16.  that is codpoints u+0000 — u+fffff.

To be pedantic, UTF-16 has the ability to represent characters in the
'astral planes' via surrogate pairs (pairs of character in the range
U+D800–U+DFFF); Plan 9's charset is approximately UCS-2.

Java has the same trouble; its astral plane characters are first
encoded as UTF-16 surrogate pairs, then those 16-bit values are
encoded as UTF-8.

> to support larger characters, the starting point would be changing Rune
> from ushort to ulong and changing constants like UTFmax and fixing
> chartorune and runetochar.  (and finding all the places that assume that
> UTFmax really is 3.)
>
> it's all very doable, but it would be a very invasive change.

Not really, since only the 2²⁰+2¹⁶ values from 0–0x10FFFF are needed
and UTFmax only needs to go up to 4.  An advantage would be that
out-of-band symbols like EOF and yacc terminals could be represented
in the same data type as the characters

On the other hand, there are more useful bits of Unicode that are
unimplemented in Plan 9.  Mañana (as in /sys/doc/utf.{html,ps,pdf}
never did come.

--Joel

      parent reply	other threads:[~2008-03-13 18:44 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-13 14:28 Hongzheng Wang
2008-03-13 14:55 ` erik quanstrom
2008-03-13 15:02   ` Hongzheng Wang
2008-03-13 18:23   ` Russ Cox
2008-03-13 18:38     ` erik quanstrom
2008-03-13 18:44   ` Joel C. Salomon [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7871fcf50803131144g30d3ac4bp18bb6f662a6997bd@mail.gmail.com \
    --to=joelcsalomon@gmail.com \
    --cc=9fans@9fans.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).