9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: "Joel Salomon" <salomo3@cooper.edu>
To: "Fans of the OS Plan 9 from Bell Labs" <9fans@cse.psu.edu>
Subject: Re: [9fans] UTF-8 criticism?
Date: Mon, 19 Jul 2004 17:35:54 -0400	[thread overview]
Message-ID: <1556.63.165.50.175.1090272954.squirrel@wish.cooper.edu> (raw)
In-Reply-To: <007b01c46dd6$89a0c420$8efa7d50@SOMA>

>> Would moving to 32 bit signed (and only 0 -- 2^21 allowed, plus -1 for
>> EOF) as in the more recent revisions of Unicode take care of the
>> surrogates problem?
>
> this has nothing to do with EOF.

Sorry if I was unclear - let me try again. Would moving to 32 bit signed
(and only 0 -- 2^21 allowed), thus including all surrogates in the
directly accessible character set solve the problem?

Yes, this does open a new can of worms, but how much more difficult would
it be to move from 16 bit Runes to 21/32 bit wide Runes then it was to
move from 7 bit ASCII to Unicode in the first place?

As an aside, the way I've understood the Unicode standard (4.0), 21 bit
characters can be encoded in 1, 2, 3, or 4 bytes in UTF-8 and if text is
internally represented by int32, some out-of-band information (like EOF,
or bad UTF (but preserving the original bytes)) can be carried along.

--Joel


  reply	other threads:[~2004-07-19 21:35 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-07-18 17:31 Jack Johnson
2004-07-18 18:27 ` Rob Pike
2004-07-18 18:39 ` boyd, rounin
2004-07-18 19:05   ` Rob Pike
2004-07-18 19:06     ` boyd, rounin
2004-07-19  9:00       ` Douglas A. Gwyn
2004-07-19 15:34         ` Skip Tavakkolian
2004-07-18 19:34     ` boyd, rounin
2004-07-19  7:40       ` Charles Forsyth
2004-07-19  8:39         ` Geoff Collyer
2004-07-19 21:01     ` Joel Salomon
2004-07-19 21:22       ` boyd, rounin
2004-07-19 21:35         ` Joel Salomon [this message]
2004-07-19 21:56           ` Joel Salomon
2004-07-19 21:42       ` andrey mirtchovski
2004-07-19 21:43         ` Tengwar " Joel Salomon
2004-07-20  8:32       ` Douglas A. Gwyn
2004-07-19 21:35 ` rog

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1556.63.165.50.175.1090272954.squirrel@wish.cooper.edu \
    --to=salomo3@cooper.edu \
    --cc=9fans@cse.psu.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).