9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: "Joel Salomon" <salomo3@cooper.edu>
To: "Fans of the OS Plan 9 from Bell Labs" <9fans@cse.psu.edu>
Subject: Re: [9fans] UTF-8 criticism?
Date: Mon, 19 Jul 2004 17:56:12 -0400	[thread overview]
Message-ID: <1583.63.165.50.175.1090274172.squirrel@wish.cooper.edu> (raw)
In-Reply-To: <1556.63.165.50.175.1090272954.squirrel@wish.cooper.edu>

Joel Salomon said:
> As an aside, the way I've understood the Unicode standard (4.0), 21 bit
> characters can be encoded in 1, 2, 3, or 4 bytes in UTF-8 and if text is
> internally represented by int32, some out-of-band information (like EOF,
> or bad UTF (but preserving the original bytes)) can be carried along.
>

And here's where the out-of-band encoding might come in useful:

rog@vitanuova.com said:
> you do have to be a bit careful with utf-8, as many possible byte
> sequences map down to the same rune (error), so if you
> do your comparisons too early, you run the risk of inconsistency.
>
> for instance, you can exploit this (at least, i *think* this is the
> cause) to create a file that can never be removed on ken's fileserver:
<snip>

but if "error" becomes 0x80000000 & XX, where XX is the original (bad, or
out-of-place) byte, we never lose the ability to retrieve/delete the file.
This would be an extension to Unicode, possibly a dangerous one, but maybe
worth considering.

--Joel


  reply	other threads:[~2004-07-19 21:56 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-07-18 17:31 Jack Johnson
2004-07-18 18:27 ` Rob Pike
2004-07-18 18:39 ` boyd, rounin
2004-07-18 19:05   ` Rob Pike
2004-07-18 19:06     ` boyd, rounin
2004-07-19  9:00       ` Douglas A. Gwyn
2004-07-19 15:34         ` Skip Tavakkolian
2004-07-18 19:34     ` boyd, rounin
2004-07-19  7:40       ` Charles Forsyth
2004-07-19  8:39         ` Geoff Collyer
2004-07-19 21:01     ` Joel Salomon
2004-07-19 21:22       ` boyd, rounin
2004-07-19 21:35         ` Joel Salomon
2004-07-19 21:56           ` Joel Salomon [this message]
2004-07-19 21:42       ` andrey mirtchovski
2004-07-19 21:43         ` Tengwar " Joel Salomon
2004-07-20  8:32       ` Douglas A. Gwyn
2004-07-19 21:35 ` rog

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1583.63.165.50.175.1090274172.squirrel@wish.cooper.edu \
    --to=salomo3@cooper.edu \
    --cc=9fans@cse.psu.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).