9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: LdBeth <andpuke@foxmail.com>
To: 9fans <9fans@9fans.net>
Subject: Re: [9fans] utf-8 handling oddities
Date: Fri, 13 Oct 2023 23:56:29 -0500	[thread overview]
Message-ID: <tencent_52F608630CE8D45815FDE9BB4D750B3B820A@qq.com> (raw)
In-Reply-To: <1597A7B3-09D5-443F-B372-8B28F5F2B059@aaoth.xyz>

>>>>> In <1597A7B3-09D5-443F-B372-8B28F5F2B059@aaoth.xyz> 
>>>>>   la-ninpre <aaoth@aaoth.xyz> wrote:

la-ninpre> if i understand it correctly, unicode extended past the BMP
la-ninpre> in 1996 with the release of unicode 2.0. plan 9 had two
la-ninpre> editions released after that, but, of course assuming that
la-ninpre> archives on p9f are indeed correct, the implementation
la-ninpre> didn't reflect the change in the code until 2013 (and
la-ninpre> that's why that old code propagated to both plan9port and
la-ninpre> 9front). so, maybe someone knows why is that the case? i'd
la-ninpre> appreciate any input on this or some pointers to
la-ninpre> information resources that you may know of.

Fun fact, "the underlying Xerces parser used by most systems never
implemented XML 1.0 fifth edition" (which was released in 2008).

It is not uncommon for implementors to decide not cover new features
that is lesser of their interests.

Also, UTF-8 is **not required** to handle surrogate by Unicode standard
and Rob Pike has said in a relevant golang thread:

> It's correct to reject them

https://golang-dev.narkive.com/4Zves5rC/surrogate-halves-and-utf-8

which also explains the rationale of the Plan9 code.

la-ninpre> best regards,
la-ninpre> la ninpre.


---
ldbeth


------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T8384b8174eb88096-M50e3a04b5272c6334c10d2af
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

      reply	other threads:[~2023-10-14  9:20 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-13 20:29 la-ninpre
2023-10-14  4:56 ` LdBeth [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=tencent_52F608630CE8D45815FDE9BB4D750B3B820A@qq.com \
    --to=andpuke@foxmail.com \
    --cc=9fans@9fans.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).