From mboxrd@z Thu Jan  1 00:00:00 1970
Message-ID: <b954b2060803130728v79b2933bq121ecdd430440766@mail.gmail.com>
Date: Thu, 13 Mar 2008 22:28:54 +0800
From: "Hongzheng Wang" <wanghz@gmail.com>
To: "Fans of the OS Plan 9 from Bell Labs" <9fans@9fans.net>
MIME-Version: 1.0
Content-Type: multipart/mixed;
	boundary="----=_Part_14674_20653817.1205418534236"
Subject: [9fans]  About The Codes Beyond Unicode-BMP
Topicbox-Message-UUID: 774295d4-ead3-11e9-9d60-3106f5b1d025

------=_Part_14674_20653817.1205418534236
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

Hi,

I did an experiment to test if the programs in Plan9 could support the
codes beyond Unicode-BMP.
The result is not so good.

Let's repeat it:

Take U+01000 code for example.  Create a file and fill it with only
one character U+010000 encoded
in UTF-8.

Note that It is could be done with Vim on Linux since Vim has a good
support.  Double check could
be done by Nvi.  The internal representation of U+010000's with UTF-8
is F0908080 [1].

Open the file by ed or sam or acme.  Of course, it could not be
displayed correctly since no fonts in
system could coverage such a code yet.  Then, just re-write the file
again.  Then open it again by
non-Plan9 program, say, Nvi on Linux.  The internal representation became
EFBFBDEFBFBDEFBFBDEFBFBD.  That is, both ed and sam (also acme) failed
to recognize
U+010000 encoded by UTF-8, and destroyed it when writing.

So, does Plan9 acctually supports only the codes in Unicode-BMP?

BTW: the attachment is the gzipped test file containing only U+010000
encoded by UTF-8.

[1] http://en.wikipedia.org/wiki/UTF-8

--
HZ

------=_Part_14674_20653817.1205418534236
Content-Type: application/x-gzip; name=test.gz
Content-Transfer-Encoding: base64
X-Attachment-Id: f_fdrf1w8s0
Content-Disposition: attachment; filename=test.gz

H4sICNc52UcAA3Rlc3QA+zChoYELADEsNQkFAAAA
------=_Part_14674_20653817.1205418534236--