From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, URIBL_ZEN_BLOCKED_OPENDNS autolearn=ham autolearn_force=no version=3.4.4 Received: from txout-a3-smtp.messagingengine.com (txout-a3-smtp.messagingengine.com [103.168.172.226]) by inbox.vuxu.org (Postfix) with ESMTP id 50DDD2AEFE for ; Thu, 18 Dec 2025 19:44:49 +0100 (CET) Received: from localhost.localdomain (phl-topicbox-02.internal [10.202.2.220]) by mailtxout.phl.internal (Postfix) with ESMTP id D59291C0174 for ; Thu, 18 Dec 2025 13:44:48 -0500 (EST) ARC-Authentication-Results: i=2; topicbox.com; arc=pass; dkim=pass (1024-bit rsa key sha256) header.d=posixcafe.org header.i=@posixcafe.org header.b=MklwowsB header.a=rsa-sha256 header.s=20200506 x-bits=1024; dmarc=pass policy.published-domain-policy=none policy.applied-disposition=none policy.evaluated-disposition=none (p=none,d=none,d.eval=none) policy.policy-from=p header.from=posixcafe.org; spf=pass smtp.mailfrom=moody@posixcafe.org smtp.helo=mail.posixcafe.org; x-internal-arc=fail (as.1.topicbox.com=pass, ams.1.topicbox.com=fail (message has been altered)) (Message modified while forwarding at Topicbox) ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d= topicbox.com; h=message-id:date:mime-version:subject:to :references:from:in-reply-to:content-type :content-transfer-encoding:list-help:list-id:list-post :list-subscribe:reply-to:list-unsubscribe; s=sysmsg-1; t= 1766083488; bh=mC90aNo35W1ROpU9WBSMljlSGhDODw2FtffEV91cy70=; b=S hwfY2lz9RJLIt7kHEhYoVELsEZJAIm8xlegy4K2tGmQ/F2BY8OLaqU4xfhge0mYR PaKXvXgot/K0OP6puv8GXmU6eXpEwpFGcHBXmGQp01/nbXaZqfGc3G6GBj3XaEJo 1NpzBmeizPLtCcRwMI1qibAxwyBWyhgeCizUkLXllA= ARC-Seal: i=2; a=rsa-sha256; cv=pass; d=topicbox.com; s=sysmsg-1; t= 1766083488; b=DrktYb68eoNA2GgNYIZl+d5eH1vUoDgwzX/1KBcC87p6AqD6eC aTTsZZ850U3XlIG/0kznJSLjyjG4uIcJ2NxYLy9zh7bjVvOB2Gm+w+d+Iv2dAnc3 mclyDFQifD0RGSBJ7cwLW2RW0s06nrsP+7USKY4Lir7Rop39dIyMv/waQ= Authentication-Results: topicbox.com; arc=pass; dkim=pass (1024-bit rsa key sha256) header.d=posixcafe.org header.i=@posixcafe.org header.b=MklwowsB header.a=rsa-sha256 header.s=20200506 x-bits=1024; dmarc=pass policy.published-domain-policy=none policy.applied-disposition=none policy.evaluated-disposition=none (p=none,d=none,d.eval=none) policy.policy-from=p header.from=posixcafe.org; spf=pass smtp.mailfrom=moody@posixcafe.org smtp.helo=mail.posixcafe.org; x-internal-arc=fail (as.1.topicbox.com=pass, ams.1.topicbox.com=fail (message has been altered)) (Message modified while forwarding at Topicbox) X-Received-Authentication-Results: authmilter.topicbox.com; arc=none (no signatures found); bimi=skipped (DMARC Policy is not at enforcement); dkim=pass (1024-bit rsa key sha256) header.d=posixcafe.org header.i=@posixcafe.org header.b=MklwowsB header.a=rsa-sha256 header.s=20200506 x-bits=1024; dmarc=pass policy.published-domain-policy=none policy.applied-disposition=none policy.evaluated-disposition=none (p=none,d=none,d.eval=none) policy.policy-from=p header.from=posixcafe.org; iprev=pass smtp.remote-ip=45.76.19.58 (mail.posixcafe.org); spf=pass smtp.mailfrom=moody@posixcafe.org smtp.helo=mail.posixcafe.org; x-aligned-from=pass (Address match); x-me-sender=none; x-ptr=pass smtp.helo=mail.posixcafe.org policy.ptr=mail.posixcafe.org; x-return-mx=pass header.domain=posixcafe.org policy.is_org=yes (MX Records found: mail.posixcafe.org); x-return-mx=pass smtp.domain=posixcafe.org policy.is_org=yes (MX Records found: mail.posixcafe.org); x-tls=pass smtp.version=TLSv1.3 smtp.cipher=TLS_AES_256_GCM_SHA384 smtp.bits=256/256; x-vs=clean score=0 state=0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=9fans.net; h=message-id :date:mime-version:subject:to:references:from:in-reply-to :content-type:content-transfer-encoding:list-help:list-id :list-post:list-subscribe:reply-to:list-unsubscribe; s=dkim-1; t=1766083488; x=1766169888; bh=IE+hWIrnVh7qARuwHccs1WZaoNlTtQ6i M3VxEM+3iwI=; b=E4e9nMDmUjv/YxbpGydc/XW3refnF6pLlajispQHJ8UbCOA8 4FMDvFVootjv/aNYY9uPdgv2Y0bJRxMpgS3XZBGCxZTi0H54J8ZvTpDZOuIzbYMg 2K4IlC1ueEO4Qh/2HJuyLaYZfRZ9uOy6E+AiQfxPf0kNrrJ3WykYML+qTo4= Received: from authmilter.topicbox.com (unknown [172.17.0.1]) by mx.topicbox.com (Postfix) with ESMTP id EB40C35CC9ED for <9fans@9fans.net>; Thu, 18 Dec 2025 12:13:49 -0500 (EST) Received: from mx.topicbox.com (172.17.0.1 [172.17.0.1]) by authmilter.topicbox.com (Authentication Milter) with ESMTP id C3F11C310C3; Thu, 18 Dec 2025 12:13:49 -0500 ARC-Seal: i=1; a=rsa-sha256; cv=none; d=topicbox.com; s=arcseal; t= 1766078029; b=oherQhyvnyyyMOE33tHpnd+irapNpOWn9opJ2mNnyFgoAw+O7R t+9gH1XDwuL2A5EPJmPDhpcQFxSycBnzFox0VZDOdUmw1HE8gnNEmiCXOsYVyST4 S8646fC8ZMExRuydY3cruDuW3qANY565F+lKDjo1c6GWSzJkcPAs3vMololpClSG k6xpN2RnWPjpryuQcaG4zRPn5PKDHl3YWBh1N1RZ+Gq5RPrgxj7OW2U8T0OoJYC9 6bvjo29r9Xbpwzuoo5fzr2vHck1caFqymLyhAkClTeFPJGMt/OFQe5cqYAcnZtJH paRqgFS/5Omx+00BnpVVK1m+OwAKUk2l2ubw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d= topicbox.com; h=message-id:date:mime-version:subject:to :references:from:in-reply-to:content-type :content-transfer-encoding; s=arcseal; t=1766078029; bh=JaUUha19 ZNMmXvG/NaAxxFMprCPrgTc+8otHiKDzh8E=; b=OyaohHIircH7zoU+AN5azFyF n5CiJoHdlWb1sCbIQQqtIRCkRFWp4f9EOvBVieEf4itlrVwZVh7o1M8Oqu9rEvmX vavSh33huR76OSKj5HEe4bcWFxP7wkol3KSJMp50grXwlUs0aZoCMbNZMwj2iAt0 Nibix2TqBqoSGYxR2xxd2Oirn/lGg8opgevnYy360X1LyIuRDlCmOAaLfxbME5hE 4Si6TZh7qYNrVcDD5V6lP5uJjdD+I4IV+XKDNAL8kVW56CxOF6CEeLfSRyhQeqqe 3KLOcVHTPvD0YwgsJySf2U9Rzk8jFpklw0nlDGYjSZxOxVx4PKiBYvswf1im/g== ARC-Authentication-Results: i=1; authmilter.topicbox.com; arc=none (no signatures found); bimi=skipped (DMARC Policy is not at enforcement); dkim=pass (1024-bit rsa key sha256) header.d=posixcafe.org header.i=@posixcafe.org header.b=MklwowsB header.a=rsa-sha256 header.s=20200506 x-bits=1024; dmarc=pass policy.published-domain-policy=none policy.applied-disposition=none policy.evaluated-disposition=none (p=none,d=none,d.eval=none) policy.policy-from=p header.from=posixcafe.org; iprev=pass smtp.remote-ip=45.76.19.58 (mail.posixcafe.org); spf=pass smtp.mailfrom=moody@posixcafe.org smtp.helo=mail.posixcafe.org; x-aligned-from=pass (Address match); x-me-sender=none; x-ptr=pass smtp.helo=mail.posixcafe.org policy.ptr=mail.posixcafe.org; x-return-mx=pass header.domain=posixcafe.org policy.is_org=yes (MX Records found: mail.posixcafe.org); x-return-mx=pass smtp.domain=posixcafe.org policy.is_org=yes (MX Records found: mail.posixcafe.org); x-tls=pass smtp.version=TLSv1.3 smtp.cipher=TLS_AES_256_GCM_SHA384 smtp.bits=256/256; x-vs=clean score=0 state=0 X-ME-VSCause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgdegheellecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpggftfghnshhusghstghrihgsvgdp uffrtefokffrpgfnqfghnecuuegrihhlohhuthemuceftddtnecunecujfgurhepkfffgg gfuffvfhfhjggtgfesthekredttddvjeenucfhrhhomheplfgrtghosgcuofhoohguhicu oehmohhougihsehpohhsihigtggrfhgvrdhorhhgqeenucggtffrrghtthgvrhhnpeevue fgjeegieejgfevjeetffffgeffheekkeelgfevteejfeehiefgfeevtdfgleenucfkphep geehrdejiedrudelrdehkedpudejfedrvdeirddvuddvrdduudehnecuvehluhhsthgvrh fuihiivgeptdenucfrrghrrghmpehinhgvthepgeehrdejiedrudelrdehkedphhgvlhho pehmrghilhdrphhoshhigigtrghfvgdrohhrghdpmhgrihhlfhhrohhmpeeomhhoohguhi esphhoshhigigtrghfvgdrohhrgheqpdhnsggprhgtphhtthhopedupdhrtghpthhtohep oeelfhgrnhhsseelfhgrnhhsrdhnvghtqe X-ME-VSScore: 0 X-ME-VSCategory: clean Received-SPF: pass (posixcafe.org: 45.76.19.58 is authorized to use 'moody@posixcafe.org' in 'mfrom' identity (mechanism 'mx' matched)) receiver=authmilter.topicbox.com; identity=mailfrom; envelope-from="moody@posixcafe.org"; helo=mail.posixcafe.org; client-ip=45.76.19.58 Received: from mail.posixcafe.org (mail.posixcafe.org [45.76.19.58]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx.topicbox.com (Postfix) with ESMTPS for <9fans@9fans.net>; Thu, 18 Dec 2025 12:13:49 -0500 (EST) Received: from [192.168.168.201] (173-26-212-115.client.mchsi.com [173.26.212.115]) by mail.posixcafe.org (OpenSMTPD) with ESMTPSA id 97f3e5fd (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO) for <9fans@9fans.net>; Thu, 18 Dec 2025 11:12:25 -0600 (CST) Message-ID: Date: Thu, 18 Dec 2025 11:13:47 -0600 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [9fans] Why does utfutf() exist? To: 9fans@9fans.net References: <2ae07915-6e27-49f6-9424-d3eacc73e9e7@posixcafe.org> Content-Language: en-US From: Jacob Moody Autocrypt: addr=moody@posixcafe.org; keydata= xsDNBF9kFAYBDACq5sxitiy8Cyomv1vBVGd4QhI1wjq6HA9JXtxGMaz2J6gao3Bxg4EWD7dQ ow9/oEAtDmYImx2H54zWhqhk0R37Uo14CpoXIza+azhlHlJ6s81hgBoQBdv0hctNEW+vC6kY fuoKKaDaAsCzxqogLyh2ac8xJVFiyWFTfV3i1nw2NksKcwswkdksH+IkYY/apcs5WXTHp3k0 hBvqocxM6uUuxZgrKRKUY6m/3W+MAtehq/TEK8pYTYAxGxG2DKioQ1+kW40ml0pNZGPXdhY7 hsguq8Kr+2/GdfO8yRyo03Ne/oFz8Hhzk5NfZq1AGgnkFVthT1tb10GMDRMhfDGTei9qFLzc 3LbNEuD8hvQB0tbmnqmoJmnd2XkKqq/H8eF0zQNo+omleg8zWjSKp3L/pkLMSehPixeKP6bE 3AeBVU54N3e74gr8ID9AiInp9aUfq9ETsBtGwZl6eL3hsLIQenMyq5O+dAF9LOB3XLHRQDpc r6387c6V8jcktbY3mlp+sX0AEQEAAc0hSmFjb2IgTW9vZHkgPG1vb2R5QHBvc2l4Y2FmZS5v cmc+wsEOBBMBCAA4FiEEEFJxEKoR2F822a75BfNtD6eaX0MFAl9kFAYCGwMFCwkIBwIGFQoJ CAsCBBYCAwECHgECF4AACgkQBfNtD6eaX0NmUgv9Fd8q0LSj4pcW2TwBl9g1SQ30Gv3ZFkpt +SlPqmpjttp7YdUnWkUKA/PIVGae95Zw+AQ/LSxKcodBT2hkPl9HANiko/kvp5X8dkqsSePW gvIs6+ByShd0ySNzN+OBUfKUIyThTEOGgfdnlbdQ4zlWYoWZdy9qAwQTOQ2O2GH6AVofuXVi 2IlU9mBGV3H7tMbC7BRkOUQC0Uvxp9uHlpnRauen5Lji6fRgMzdbHmqvuK2I2hyHPWp3YEF9 D6b2sw4Vb/dqlp+Olf/pYLuN768eH6TL+Ja1E5FWtGbC8Am9J9A3MI8ZHIb3zl8+BHRzVUiE KE1Ji76a/0NEhdrxJaOMAa0Fyt560Wg0sFvpM7FGrNq/BjOU41pt3p1EejOw5YrlQJFYaagj 4vB5HdZMJ6Q6SZejgtDg3DvErW6DaK1sHWOQ6/RB8hoq5FkGsBWmXnBvJ9swtXAu6fYy9aRq PhhMFPRfilfA5VUFK8OQ4Mf6e1n+0rxvf9+aW4UPOOBFMWwfzsDNBF9kFAYBDADktcxPTnT2 jD69hNrFr8v0GEb2N2AwGSNBWzyVNkQR10vsCzQzCbmV89pgOr1SWAEAwCVIC9UboGOurJ9f tZ2MJ4xg7QPHu63cOFWwWuo5W+X8Q+VJdzZ0d9inyY81FD5gF2t932F6PMmCUPqMfA8HN4NF DuqjxUYwBv5Q9Qf1N0ctkL/gWNzKIHNe9EOsdqoTfwToovDe+gmAqIIbZpyzYHXgRqSwV1rN 7Ek0muK9OOePaH+YXYqpPKjLNGWDtKdiRUaAM8M0EQUezzwdNrSX4QpOQm4FjQQU+dgXchsg 60tFKRzT5fUoRLIRlHGOlN5swvsH4gZzlf+fsh6/UjR/zLzjqOtMMjAIh6j4+3kEY87g/8GP AntTg+WFdQECdgr56hJOYn6jWFQzvpfvS9IXyL3RqKMAv7V63hqDEInGHGtOyoz7bWgfCsjx 4pjfE+htsqifExnAvxXxezpqQShhS0caFyRa6gx9YId1WxXLMdwksnxtH6OG6uSZp/7+2qEA EQEAAcLA9gQYAQgAIBYhBBBScRCqEdhfNtmu+QXzbQ+nml9DBQJfZBQGAhsMAAoJEAXzbQ+n ml9DqjIL/3aOqLTDnp1ZqCX9l8CSszbF18A5hXGT5aRjANF5em7atz8/jaqA/u6OI2gSzAiY 3G5AOybpNIiGTLbXceo9aArtxJ3SBKFI/cQAWXEqnD0jMnOMtieu60fnyQzRSiCOhjsj5ndZ 5R0Q7U0Q+0b04jJ9gLGlKGk+OT/+wdFM3Q9iG4h4uyiymF+IiiY0bKHvlRh09dMWezrtnIMg KvHFYEew/UJJJ5rOX4qGLRSWGdgiOauZZD2PULmIyQbRyVyc0nXY6pfoDjTY7CXGklAhiN80 Nd8uf65dNZbfmWmLU0QqZHqYSfUiY0vC+L7fbudraljJpWl4SjgIT54AeAsvOnNhdeno+vZR mZcSUZZGHVwadJeSFAxJ6lZSFljypc8PaZy2+7MhYDnAihho9/hGRHon+Ri+os/nLf8IAtJV ipUGqUp7AZDWArB+WZvjkt0Vto1fO9jgmmKSMLGLR4+GEQPlTgVT/OMheWxEhpARtwksMtiR kA7wyn9tHMM41bDH8w== In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Topicbox-Policy-Reasoning: moderate: sender is a member; group holds all messages Topicbox-Message-UUID: ec7b2b1e-dc34-11f0-8ba8-016c0fc0285f Archived-At: =?UTF-8?B?PGh0dHBzOi8vOWZhbnMudG9waWNib3guY29tL2dyb3Vwcy85?= =?UTF-8?B?ZmFucy9UODgzMTA3M2Y4YjhiYjM1MS1NMGFjZDJhNDIzNTY3MjkxNjVmYTdk?= =?UTF-8?B?MDBiPg==?= List-Help: List-Id: "9fans" <9fans.9fans.net> List-Post: List-Software: Topicbox v0 List-Subscribe: Precedence: list Reply-To: 9fans <9fans@9fans.net> List-Unsubscribe: , Topicbox-Delivery-ID: 2:9fans:437d30aa-c441-11e9-8a57-d036212d11b0:522be890-2105-11eb-b15e-8d699134e1fa:M0acd2a42356729165fa7d00b:1:vaWu_57gKFN_Qcf-3agUlkNZRy3Cha29g2xpasROUlg On 12/18/25 03:53, Shawn Rutledge wrote: >> On Dec 17, 2025, at 22:17, Jacob Moody wrote: >> >> I've been poking at some of the utf* functions lately and utfutf is a bi= t puzzling. >> At face value, strstr() should be sufficient for handling utf8 encoded s= trings just as strcmp() is. >=20 > Maybe normalization could be the reason: there can be multiple representa= tions, for example, =C3=BC might be one code point (Unicode: U+00FC, UTF-8:= C3 BC), or might be u with a combining umlaut. I would assume converting = to a rune would turn out the same either way: then you can compare them eve= n if the haystack is represented one way in utf8 and the needle is the othe= r way. (Disclaimer: I=E2=80=99m not a unicode expert, even less so on 9) No, normalization is completely orthogonal to this. First of all, when these were written Plan 9 did not handle detached codepo= ints or decomposed sequences at all, so I'd find it quite surprising if the intention was to handle them here (or in ch= artorune). Also, from a design standpoint your UTF decoding is not the correct place i= mplement normalization for a large number of reasons, to name a few: 1. Normalization requires the context of multiple codepoints, would be quit= e complex for chartorune to do this as by the standards definition a normalization context can technically be unbound= ed. 2. It would be quite surprising if you're goal is to read in a file and wri= te it back out that you silently convert codepoints. 3. Normalization is not exactly cheap to perform, chartorune is in the hotp= ath of a lot of code. 4. One form is not inherently more correct than the other, the Unicode stan= dard says you should treat both composed and decomposed forms as even. If you want more context on specifically normalization, I wrote a paper abo= ut my normalization implementation for 9front that I presented at the last = IWP9. ------------------------------------------ 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/T8831073f8b8bb351-M0acd2= a42356729165fa7d00b Delivery options: https://9fans.topicbox.com/groups/9fans/subscription