From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: Date: Thu, 13 Mar 2008 22:28:54 +0800 From: "Hongzheng Wang" To: "Fans of the OS Plan 9 from Bell Labs" <9fans@9fans.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_14674_20653817.1205418534236" Subject: [9fans] About The Codes Beyond Unicode-BMP Topicbox-Message-UUID: 774295d4-ead3-11e9-9d60-3106f5b1d025 ------=_Part_14674_20653817.1205418534236 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Hi, I did an experiment to test if the programs in Plan9 could support the codes beyond Unicode-BMP. The result is not so good. Let's repeat it: Take U+01000 code for example. Create a file and fill it with only one character U+010000 encoded in UTF-8. Note that It is could be done with Vim on Linux since Vim has a good support. Double check could be done by Nvi. The internal representation of U+010000's with UTF-8 is F0908080 [1]. Open the file by ed or sam or acme. Of course, it could not be displayed correctly since no fonts in system could coverage such a code yet. Then, just re-write the file again. Then open it again by non-Plan9 program, say, Nvi on Linux. The internal representation became EFBFBDEFBFBDEFBFBDEFBFBD. That is, both ed and sam (also acme) failed to recognize U+010000 encoded by UTF-8, and destroyed it when writing. So, does Plan9 acctually supports only the codes in Unicode-BMP? BTW: the attachment is the gzipped test file containing only U+010000 encoded by UTF-8. [1] http://en.wikipedia.org/wiki/UTF-8 -- HZ ------=_Part_14674_20653817.1205418534236 Content-Type: application/x-gzip; name=test.gz Content-Transfer-Encoding: base64 X-Attachment-Id: f_fdrf1w8s0 Content-Disposition: attachment; filename=test.gz H4sICNc52UcAA3Rlc3QA+zChoYELADEsNQkFAAAA ------=_Part_14674_20653817.1205418534236--