9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: la-ninpre <aaoth@aaoth.xyz>
To: 9fans@9fans.net
Subject: [9fans] utf-8 handling oddities
Date: Fri, 13 Oct 2023 20:29:17 +0000	[thread overview]
Message-ID: <1597A7B3-09D5-443F-B372-8B28F5F2B059@aaoth.xyz> (raw)

greetings, 9fans.

recently i have been studying utf-8 encoding and decided to look at how it is handled in plan 9. i thought that since plan 9 was the first application of this encoding, it makes sense to look at its implementation. the fact that mentioned implementation was done by designers of the encoding themselves only adds to this decision.

so i grabbed the last release tarball from p9f.org and studied it. but when i was testing some other implementations to compare how each handles encoding/decoding errors, i noticed that the same code linked with plan9port's lib9 behaves differently (or may i say, incorrectly) when dealing with surrogate halves than that original plan 9 implementation. i started digging through archive versions of the same code only to find out that the implementation changed only after the release of fourth edition. specifically, i looked at /sys/src/libc/port/rune.c file. the version that i studied was taken from so called 'latest release' on p9f page. the timestamp on that file says that it was last modified in 2013, while the rest of the code is timestamped at 2002. inferno os source code too has this change ported to it around the same time.

if i understand it correctly, unicode extended past the BMP in 1996 with the release of unicode 2.0. plan 9 had two editions released after that, but, of course assuming that archives on p9f are indeed correct, the implementation didn't reflect the change in the code until 2013 (and that's why that old code propagated to both plan9port and 9front). so, maybe someone knows why is that the case? i'd appreciate any input on this or some pointers to information resources that you may know of.

best regards,
la ninpre.

------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T8384b8174eb88096-M127761f645d18b8419fc4f9b
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

             reply	other threads:[~2023-10-13 20:29 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-13 20:29 la-ninpre [this message]
2023-10-14  4:56 ` LdBeth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1597A7B3-09D5-443F-B372-8B28F5F2B059@aaoth.xyz \
    --to=aaoth@aaoth.xyz \
    --cc=9fans@9fans.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).