From: "quiekaizam via 9fans" <9fans@9fans.net>
To: 9fans <9fans@9fans.net>, Shawn Rutledge <lists@ecloud.org>
Subject: Re: [9fans] Why does utfutf() exist?
Date: Fri, 19 Dec 2025 00:50:59 +0900 [thread overview]
Message-ID: <BEAEAC17-D4E1-49E3-911E-8FCF8D1B52BE@wilsonb.com> (raw)
In-Reply-To: <BCC16A4B-CD61-45EB-B8D0-277D9064BDC2@ecloud.org>
[-- Attachment #1: Type: text/plain, Size: 1516 bytes --]
> I would assume converting to a rune would turn out the same either way:
This sounds wrong to me. IIUC Runes are just Unicode code points. Glyphs may have multiple representations in Unicode, of which your ü is a good example. Mapping these representations together is a question of Unicode normalization, however, and involves lots of fiddly questions whose answers are specific to the particular use case. As such, conversation to Runes cannot reasonably perform normalization AFAIU.
2025年12月18日 18:53:35 JST、Shawn Rutledge <lists@ecloud.org> より:
>> On Dec 17, 2025, at 22:17, Jacob Moody <moody@posixcafe.org> wrote:
>>
>> I've been poking at some of the utf* functions lately and utfutf is a bit puzzling.
>> At face value, strstr() should be sufficient for handling utf8 encoded strings just as strcmp() is.
>
> Maybe normalization could be the reason: there can be multiple representations, for example, ü might be one code point (Unicode: U+00FC, UTF-8: C3 BC), or might be u with a combining umlaut. I would assume converting to a rune would turn out the same either way: then you can compare them even if the haystack is represented one way in utf8 and the needle is the other way. (Disclaimer: I’m not a unicode expert, even less so on 9)
>
------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T8831073f8b8bb351-Mb71f0b6c34b98f89c7952434
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription
[-- Attachment #2: Type: text/html, Size: 2850 bytes --]
next prev parent reply other threads:[~2025-12-18 16:18 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-17 21:17 [9fans] Why does utfutf() exist? Jacob Moody
2025-12-18 9:53 ` Shawn Rutledge
2025-12-18 15:50 ` quiekaizam via 9fans [this message]
2025-12-18 17:13 ` Jacob Moody
2025-12-18 20:16 ` Rob Pike
2025-12-18 20:48 ` Jacob Moody
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=BEAEAC17-D4E1-49E3-911E-8FCF8D1B52BE@wilsonb.com \
--to=9fans@9fans.net \
--cc=lists@ecloud.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).