9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: "Douglas A. Gwyn" <DAGwyn@null.net>
To: 9fans@cse.psu.edu
Subject: Re: [9fans] wchar_t in ANSI C (was "Announce: port")
Date: Tue, 30 Apr 2002 09:40:27 +0000	[thread overview]
Message-ID: <3CCE1366.EAB27A38@null.net> (raw)
In-Reply-To: <991a7d99caeee7b2557f759e7b5a8a77@caldo.demon.co.uk>

forsyth@caldo.demon.co.uk wrote:
> does it insist that it be `self-synchronising' ... ?

The C language standard doesn't insist on much at all for multibyte
encodings, because they are not under control of the programming
language.  It happens that almost any encoding scheme *will*
self-synchronize within a few more coded characters after a coding
error; in fact there is a cute "mind-reading" magic trick that
exploits the underlying phenomenon:  Spread out a deck of 52 cards
in a row face-up, ask the victim to pick any card among the first
ten, then count forward *mentally* that many cards (J=10, etc.) and
iterate with the card reached until the last card runs him past the
end of the deck.  When he says he's done, you instantly tell him
the last card he reached.  The trick is that you perform the same
procedure using your own choice of starting card; odds are good
that the sequences merge before the end.

The one real constraint the C standard imposed on multibyte encodings
was that there be no embedded 0-valued bytes.  The idea was that
(before 1994) it was expected that m.b. sequences would be copied
etc. using the char-oriented legacy functions and we all know that
the 0 byte has special meaning there.  Unfortunately, with the spread
of UTF-16 as an external encoding, this constraint has led to a real
problem, which is being worked on by interested parties.

Different people can draw different conclusions from such situations.
For example, I take it as one more example of the evil of stealing
perfectly legitimate code values for in-band control purposes.


       reply	other threads:[~2002-04-30  9:40 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <991a7d99caeee7b2557f759e7b5a8a77@caldo.demon.co.uk>
2002-04-30  9:40 ` Douglas A. Gwyn [this message]
2002-04-30 18:16 David Gordon Hogan
     [not found] <9b8b28678237726753936b99587567ed@plan9.bell-labs.com>
2002-04-30  9:40 ` Douglas A. Gwyn
  -- strict thread matches above, loose matches on Subject: below --
2002-04-29 12:53 rob pike, esq.
2002-04-29 12:42 rob pike, esq.
2002-04-29 16:18 ` Douglas A. Gwyn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3CCE1366.EAB27A38@null.net \
    --to=dagwyn@null.net \
    --cc=9fans@cse.psu.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).