From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE, RCVD_IN_ZEN_BLOCKED_OPENDNS,URIBL_DBL_BLOCKED_OPENDNS, URIBL_ZEN_BLOCKED_OPENDNS autolearn=ham autolearn_force=no version=3.4.4 Received: from txout-a1-smtp.messagingengine.com (txout-a1-smtp.messagingengine.com [103.168.172.224]) by inbox.vuxu.org (Postfix) with ESMTP id 3F15B2218C for ; Wed, 17 Dec 2025 23:25:25 +0100 (CET) Received: from localhost.localdomain (phl-topicbox-02.internal [10.202.2.220]) by mailtxout.phl.internal (Postfix) with ESMTP id BA8751C0165 for ; Wed, 17 Dec 2025 17:25:24 -0500 (EST) ARC-Authentication-Results: i=2; topicbox.com; arc=pass; dkim=pass (1024-bit rsa key sha256) header.d=posixcafe.org header.i=@posixcafe.org header.b=R+Dzulqw header.a=rsa-sha256 header.s=20200506 x-bits=1024; dmarc=pass policy.published-domain-policy=none policy.applied-disposition=none policy.evaluated-disposition=none (p=none,d=none,d.eval=none) policy.policy-from=p header.from=posixcafe.org; spf=pass smtp.mailfrom=moody@posixcafe.org smtp.helo=mail.posixcafe.org; x-internal-arc=fail (as.1.topicbox.com=pass, ams.1.topicbox.com=fail (message has been altered)) (Message modified while forwarding at Topicbox) ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d= topicbox.com; h=message-id:date:mime-version:to:from :content-type:content-transfer-encoding:list-help:list-id :list-post:list-subscribe:reply-to:subject:list-unsubscribe; s= sysmsg-1; t=1766010324; bh=Za+IRfA+1vmMd8hBD+yhpfBh+C/k4xChDft2t RsOVBc=; b=TfsYzeH7bIAjGvM+jD0Ecwu/pRGWq+lT6Mut9Va+oTiyTVxoNaKNy 613WkS4FFg0sCkhG1x6WwCQwY7+khvuGuvHltIFoVSa5qg9N6KJYudqAE8ecOGSg 2LVmsq+SUwne+Lp1yjB+M196bkDTf/dTnx/ptSBa5WdJz98MtuGRaE= ARC-Seal: i=2; a=rsa-sha256; cv=pass; d=topicbox.com; s=sysmsg-1; t= 1766010324; b=S+YB2iggbn/dlU+bUUnUJQ4Ms7Hq2rguw462jDpFDS+YfOrnW2 UilMvCE7/yvAWMOM/QgQEO1m7cezgfIxzYPwhiEPwOV0JZRKPYqG4sLRtrViF63a RkKKTUFZ2g4FeWlg0qvRUt/8EApVBS+lyZi1DFEhSfRw84rvoVxckvJZg= Authentication-Results: topicbox.com; arc=pass; dkim=pass (1024-bit rsa key sha256) header.d=posixcafe.org header.i=@posixcafe.org header.b=R+Dzulqw header.a=rsa-sha256 header.s=20200506 x-bits=1024; dmarc=pass policy.published-domain-policy=none policy.applied-disposition=none policy.evaluated-disposition=none (p=none,d=none,d.eval=none) policy.policy-from=p header.from=posixcafe.org; spf=pass smtp.mailfrom=moody@posixcafe.org smtp.helo=mail.posixcafe.org; x-internal-arc=fail (as.1.topicbox.com=pass, ams.1.topicbox.com=fail (message has been altered)) (Message modified while forwarding at Topicbox) X-Received-Authentication-Results: authmilter.topicbox.com; arc=none (no signatures found); bimi=skipped (DMARC Policy is not at enforcement); dkim=pass (1024-bit rsa key sha256) header.d=posixcafe.org header.i=@posixcafe.org header.b=R+Dzulqw header.a=rsa-sha256 header.s=20200506 x-bits=1024; dmarc=pass policy.published-domain-policy=none policy.applied-disposition=none policy.evaluated-disposition=none (p=none,d=none,d.eval=none) policy.policy-from=p header.from=posixcafe.org; iprev=pass smtp.remote-ip=45.76.19.58 (mail.posixcafe.org); spf=pass smtp.mailfrom=moody@posixcafe.org smtp.helo=mail.posixcafe.org; x-aligned-from=pass (Address match); x-me-sender=none; x-ptr=pass smtp.helo=mail.posixcafe.org policy.ptr=mail.posixcafe.org; x-return-mx=pass header.domain=posixcafe.org policy.is_org=yes (MX Records found: mail.posixcafe.org); x-return-mx=pass smtp.domain=posixcafe.org policy.is_org=yes (MX Records found: mail.posixcafe.org); x-tls=pass smtp.version=TLSv1.3 smtp.cipher=TLS_AES_256_GCM_SHA384 smtp.bits=256/256; x-vs=clean score=0 state=0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=9fans.net; h=message-id :date:mime-version:to:from:content-type :content-transfer-encoding:list-help:list-id:list-post :list-subscribe:reply-to:subject:list-unsubscribe; s=dkim-1; t= 1766010324; x=1766096724; bh=9NJtlLsBSR1yP12UoOsLQlbcybGRKKSCGpq dEf4hsZg=; b=iEJ/cm6tf608wqDFWasTR3TqG+FdrIkGGvuXD1VUUqVcjNhpHEr ArrpSZ7DZJoKR9mgvWoDr3TVzoFIh0jA5N0nVpSglHxgTWWw/0k5xN/++MItrFvV 4PBWNpDoOSUy+wz+hpbSnN9N0EyfGlYRVsMTnIfw2dyH9nOGOtEiUrUw= Received: from authmilter.topicbox.com (unknown [172.17.0.1]) by mx.topicbox.com (Postfix) with ESMTP id C8AB14D86404 for <9fans@9fans.net>; Wed, 17 Dec 2025 16:17:05 -0500 (EST) Received: from mx.topicbox.com (172.17.0.1 [172.17.0.1]) by authmilter.topicbox.com (Authentication Milter) with ESMTP id 3574D5A2A6E; Wed, 17 Dec 2025 16:17:05 -0500 ARC-Seal: i=1; a=rsa-sha256; cv=none; d=topicbox.com; s=arcseal; t= 1766006225; b=IpvHPreSEUzfLKRd9swkc5Vk5uuKoEZx4/jt8TkQZTB4MCUAYH fJzsHTjJYPaM+xu9s7mc5t/Q+GRt6dvWkpY9FwH9UK0FHQ4gEWXpKpQzfkaDXNq1 zrR7Sk7g4gIkNt8Nskc66/x7FSRTbluR5lynvyfKYtRI29yS5jKePTCAAfya3/OC CXmlsfZx3gatspXlR6hZWLdWTcHCigbnq9Fa870G8/ZjZtlkIMewA1rSq6wmM6dI r0yY1un6Cn7YkFsjucUzt3z8EiEaXMRRLnBBhgGuGR+o6mKNSLNW5KeLaUav+2vr uOwfruAoPwtLOjvBzWZLg+B2Ah04qMn1/Vrg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d= topicbox.com; h=message-id:date:mime-version:to:from:subject :content-type:content-transfer-encoding; s=arcseal; t= 1766006225; bh=hObSiATDkxS/zAHuVuFht99vtivGIVX+AcACSlQ9+b8=; b=y yYyEM/8gY8tCFf4RtLp/9J/9MVQgRBKso96yz604G3VCUPKofh9Q2fYFe2a7VB/F RyOrZ9N8HDIjkfijLD+K+ivQtKG7puAkkUqK+nREQPEsjXOLEwrpXNZdAnLZ1Whb 1/ceQKAj9OjdsRgtyG/OcX1E/bEgq2wAWxbhjRHOxwfT0GyUxIRqwNMzMQFB5qC/ JdBeGlO8PQ8tlNUPxRcMrdyys4wMt96lXXZZfWiM6NA6B0ARqC/FoOp8DRPXtRnv WYH3b4ElMj6SNmRIgrEQK2WV4yaFndnuh3lTn8bJhrZFsje3EdSwjZPN5BdQVeDg GHpA68Cj2NcXqhoKIKjCw== ARC-Authentication-Results: i=1; authmilter.topicbox.com; arc=none (no signatures found); bimi=skipped (DMARC Policy is not at enforcement); dkim=pass (1024-bit rsa key sha256) header.d=posixcafe.org header.i=@posixcafe.org header.b=R+Dzulqw header.a=rsa-sha256 header.s=20200506 x-bits=1024; dmarc=pass policy.published-domain-policy=none policy.applied-disposition=none policy.evaluated-disposition=none (p=none,d=none,d.eval=none) policy.policy-from=p header.from=posixcafe.org; iprev=pass smtp.remote-ip=45.76.19.58 (mail.posixcafe.org); spf=pass smtp.mailfrom=moody@posixcafe.org smtp.helo=mail.posixcafe.org; x-aligned-from=pass (Address match); x-me-sender=none; x-ptr=pass smtp.helo=mail.posixcafe.org policy.ptr=mail.posixcafe.org; x-return-mx=pass header.domain=posixcafe.org policy.is_org=yes (MX Records found: mail.posixcafe.org); x-return-mx=pass smtp.domain=posixcafe.org policy.is_org=yes (MX Records found: mail.posixcafe.org); x-tls=pass smtp.version=TLSv1.3 smtp.cipher=TLS_AES_256_GCM_SHA384 smtp.bits=256/256; x-vs=clean score=0 state=0 X-ME-VSCause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgdegfeeivdcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpggftfghnshhusghstghrihgsvgdp uffrtefokffrpgfnqfghnecuuegrihhlohhuthemuceftddtnecunecujfgurhepkfffgg gfvffhufgtgfesthejredttddvjeenucfhrhhomheplfgrtghosgcuofhoohguhicuoehm ohhougihsehpohhsihigtggrfhgvrdhorhhgqeenucggtffrrghtthgvrhhnpeduudegke elgfejkefgheevhefhffettddvgfelvdekueejledtkeejgeeuuddvtdenucfkphepgeeh rdejiedrudelrdehkedpudejfedrvdeirddvuddvrdduudehnecuvehluhhsthgvrhfuih iivgeptdenucfrrghrrghmpehinhgvthepgeehrdejiedrudelrdehkedphhgvlhhopehm rghilhdrphhoshhigigtrghfvgdrohhrghdpmhgrihhlfhhrohhmpeeomhhoohguhiesph hoshhigigtrghfvgdrohhrgheqpdhnsggprhgtphhtthhopedupdhrtghpthhtohepoeel fhgrnhhsseelfhgrnhhsrdhnvghtqe X-ME-VSScore: 0 X-ME-VSCategory: clean Received-SPF: pass (posixcafe.org: 45.76.19.58 is authorized to use 'moody@posixcafe.org' in 'mfrom' identity (mechanism 'mx' matched)) receiver=authmilter.topicbox.com; identity=mailfrom; envelope-from="moody@posixcafe.org"; helo=mail.posixcafe.org; client-ip=45.76.19.58 Received: from mail.posixcafe.org (mail.posixcafe.org [45.76.19.58]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx.topicbox.com (Postfix) with ESMTPS for <9fans@9fans.net>; Wed, 17 Dec 2025 16:17:05 -0500 (EST) Received: from [192.168.168.201] (173-26-212-115.client.mchsi.com [173.26.212.115]) by mail.posixcafe.org (OpenSMTPD) with ESMTPSA id e0e63a94 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO) for <9fans@9fans.net>; Wed, 17 Dec 2025 15:15:42 -0600 (CST) Message-ID: <2ae07915-6e27-49f6-9424-d3eacc73e9e7@posixcafe.org> Date: Wed, 17 Dec 2025 15:17:04 -0600 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: 9fans@9fans.net From: Jacob Moody Autocrypt: addr=moody@posixcafe.org; keydata= xsDNBF9kFAYBDACq5sxitiy8Cyomv1vBVGd4QhI1wjq6HA9JXtxGMaz2J6gao3Bxg4EWD7dQ ow9/oEAtDmYImx2H54zWhqhk0R37Uo14CpoXIza+azhlHlJ6s81hgBoQBdv0hctNEW+vC6kY fuoKKaDaAsCzxqogLyh2ac8xJVFiyWFTfV3i1nw2NksKcwswkdksH+IkYY/apcs5WXTHp3k0 hBvqocxM6uUuxZgrKRKUY6m/3W+MAtehq/TEK8pYTYAxGxG2DKioQ1+kW40ml0pNZGPXdhY7 hsguq8Kr+2/GdfO8yRyo03Ne/oFz8Hhzk5NfZq1AGgnkFVthT1tb10GMDRMhfDGTei9qFLzc 3LbNEuD8hvQB0tbmnqmoJmnd2XkKqq/H8eF0zQNo+omleg8zWjSKp3L/pkLMSehPixeKP6bE 3AeBVU54N3e74gr8ID9AiInp9aUfq9ETsBtGwZl6eL3hsLIQenMyq5O+dAF9LOB3XLHRQDpc r6387c6V8jcktbY3mlp+sX0AEQEAAc0hSmFjb2IgTW9vZHkgPG1vb2R5QHBvc2l4Y2FmZS5v cmc+wsEOBBMBCAA4FiEEEFJxEKoR2F822a75BfNtD6eaX0MFAl9kFAYCGwMFCwkIBwIGFQoJ CAsCBBYCAwECHgECF4AACgkQBfNtD6eaX0NmUgv9Fd8q0LSj4pcW2TwBl9g1SQ30Gv3ZFkpt +SlPqmpjttp7YdUnWkUKA/PIVGae95Zw+AQ/LSxKcodBT2hkPl9HANiko/kvp5X8dkqsSePW gvIs6+ByShd0ySNzN+OBUfKUIyThTEOGgfdnlbdQ4zlWYoWZdy9qAwQTOQ2O2GH6AVofuXVi 2IlU9mBGV3H7tMbC7BRkOUQC0Uvxp9uHlpnRauen5Lji6fRgMzdbHmqvuK2I2hyHPWp3YEF9 D6b2sw4Vb/dqlp+Olf/pYLuN768eH6TL+Ja1E5FWtGbC8Am9J9A3MI8ZHIb3zl8+BHRzVUiE KE1Ji76a/0NEhdrxJaOMAa0Fyt560Wg0sFvpM7FGrNq/BjOU41pt3p1EejOw5YrlQJFYaagj 4vB5HdZMJ6Q6SZejgtDg3DvErW6DaK1sHWOQ6/RB8hoq5FkGsBWmXnBvJ9swtXAu6fYy9aRq PhhMFPRfilfA5VUFK8OQ4Mf6e1n+0rxvf9+aW4UPOOBFMWwfzsDNBF9kFAYBDADktcxPTnT2 jD69hNrFr8v0GEb2N2AwGSNBWzyVNkQR10vsCzQzCbmV89pgOr1SWAEAwCVIC9UboGOurJ9f tZ2MJ4xg7QPHu63cOFWwWuo5W+X8Q+VJdzZ0d9inyY81FD5gF2t932F6PMmCUPqMfA8HN4NF DuqjxUYwBv5Q9Qf1N0ctkL/gWNzKIHNe9EOsdqoTfwToovDe+gmAqIIbZpyzYHXgRqSwV1rN 7Ek0muK9OOePaH+YXYqpPKjLNGWDtKdiRUaAM8M0EQUezzwdNrSX4QpOQm4FjQQU+dgXchsg 60tFKRzT5fUoRLIRlHGOlN5swvsH4gZzlf+fsh6/UjR/zLzjqOtMMjAIh6j4+3kEY87g/8GP AntTg+WFdQECdgr56hJOYn6jWFQzvpfvS9IXyL3RqKMAv7V63hqDEInGHGtOyoz7bWgfCsjx 4pjfE+htsqifExnAvxXxezpqQShhS0caFyRa6gx9YId1WxXLMdwksnxtH6OG6uSZp/7+2qEA EQEAAcLA9gQYAQgAIBYhBBBScRCqEdhfNtmu+QXzbQ+nml9DBQJfZBQGAhsMAAoJEAXzbQ+n ml9DqjIL/3aOqLTDnp1ZqCX9l8CSszbF18A5hXGT5aRjANF5em7atz8/jaqA/u6OI2gSzAiY 3G5AOybpNIiGTLbXceo9aArtxJ3SBKFI/cQAWXEqnD0jMnOMtieu60fnyQzRSiCOhjsj5ndZ 5R0Q7U0Q+0b04jJ9gLGlKGk+OT/+wdFM3Q9iG4h4uyiymF+IiiY0bKHvlRh09dMWezrtnIMg KvHFYEew/UJJJ5rOX4qGLRSWGdgiOauZZD2PULmIyQbRyVyc0nXY6pfoDjTY7CXGklAhiN80 Nd8uf65dNZbfmWmLU0QqZHqYSfUiY0vC+L7fbudraljJpWl4SjgIT54AeAsvOnNhdeno+vZR mZcSUZZGHVwadJeSFAxJ6lZSFljypc8PaZy2+7MhYDnAihho9/hGRHon+Ri+os/nLf8IAtJV ipUGqUp7AZDWArB+WZvjkt0Vto1fO9jgmmKSMLGLR4+GEQPlTgVT/OMheWxEhpARtwksMtiR kA7wyn9tHMM41bDH8w== Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Topicbox-Policy-Reasoning: moderate: sender is a member; group holds all messages Topicbox-Message-UUID: bd9c9ba4-db8d-11f0-97a8-4e6c6bc11ef0 Archived-At: =?UTF-8?B?PGh0dHBzOi8vOWZhbnMudG9waWNib3guY29tL2dyb3Vwcy85?= =?UTF-8?B?ZmFucy9UODgzMTA3M2Y4YjhiYjM1MS1NMTljMmRlZGU4MGUyYjg0MzlmYTRj?= =?UTF-8?B?NjhlPg==?= List-Help: List-Id: "9fans" <9fans.9fans.net> List-Post: List-Software: Topicbox v0 List-Subscribe: Precedence: list Reply-To: 9fans <9fans@9fans.net> Subject: [9fans] Why does utfutf() exist? List-Unsubscribe: , Topicbox-Delivery-ID: 2:9fans:437d30aa-c441-11e9-8a57-d036212d11b0:522be890-2105-11eb-b15e-8d699134e1fa:M19c2dede80e2b8439fa4c68e:1:upsIw-QVfNcpmdgKOZ6SEOa-PIP0MkbzhHzfQp0nSpY I've been poking at some of the utf* functions lately and utfutf is a bit p= uzzling. At face value, strstr() should be sufficient for handling utf8 encoded stri= ngs just as strcmp() is. These functions have largely been the same since 9front imported them, so m= odifying them starts to drift into "touching the artwork" dangers, so I wanted to think aloud(more like r= amble) here and see if folks agree. So first, the implementation of utfutf itself to see if it's doing somethin= g different than strstr: char* utfutf(char *s1, char *s2) { char *p; long f, n1, n2; Rune r; n1 =3D chartorune(&r, s2); f =3D r; if(f <=3D Runesync) /* represents self */ return strstr(s1, s2); n2 =3D strlen(s2); for(p=3Ds1; p=3Dutfrune(p, f); p+=3Dn1) if(strncmp(p, s2, n2) =3D=3D 0) return p; return 0; } We do see that in the case of a leading ascii byte we do indeed just use st= rstr(). However do note that the check should be < not <=3D as Runeself is 0x80. If we do start with a multi-byte utf8 sequence we do a normal strstr like a= pproach but use utfrune(). So let's take a look at utfrune(): char* utfrune(char *s, long c) { long c1; Rune r; int n; if(c < Runesync) /* not part of utf sequence */ return strchr(s, c); for(;;) { c1 =3D *(uchar*)s; if(c1 < Runeself) { /* one byte rune */ if(c1 =3D=3D 0) return 0; if(c1 =3D=3D c) return s; s++; continue; } n =3D chartorune(&r, s); if(r =3D=3D c) return s; s +=3D n; } } So we can ignore the < Runesync case, since we won't hit that. What lays left is a simple iteration and check against the passed value. So let's look at strstr and see if there's a reason to avoid it: char* strstr(char *s1, char *s2) { char *p, *pa, *pb; int c0, c; c0 =3D *s2; if(c0 =3D=3D 0) return s1; s2++; for(p=3Dstrchr(s1, c0); p; p=3Dstrchr(p+1, c0)) { pa =3D p; for(pb=3Ds2;; pb++) { c =3D *pb; if(c =3D=3D 0) return p; if(c !=3D *++pa) break; } } return 0; } By my reading nothing here breaks when dealing with utf8, you are not as ef= ficient because on each iteration you call strchr with p+1, which means you need to= skip through the remaining parts of the current sequence but compared to calling chartor= une() on each non-ascii character I think it'll still wind out on top. (Would like to ver= ify though). Reading the remaining bytes is safe because the beginning of a valid utf8 s= equence can never be confused with the middle of one by definition. Ok, so my thought here is perhaps this is for handling invalid utf-8 string= . So let's walk through that. In utfutf, we only check the first rune of s2, so assuming that is invalid = and we get Runerror, we then call utfrune(), which does it's own chartorune and can return Runer= ror for a different invalid sequence of bytes. That seems too strange to be intentional but I a= m unsure. Additionally this only happens for the first sequence, further sequences ar= e compared as-is with the strncmp() call, so again this doesn't seem intentional. With that being said, I have a purposed cleanup of these two functions: char* utfrune(char *s, long c) { Rune r; char buf[UTFmax + 1] =3D {0}; if(c < Runesync) /* not part of utf sequence */ return strchr(s, c); r =3D c; runetochar(buf, &r); return strstr(s, buf); } /* might as well keep it for old code */ char* utfutf(char *s1, char *s2) { return strstr(s1, s2); } A quick grep of the 9front source tree shows no use of utfutf(), qwx pointed me to some other usecases in sources and around github, however a cursory look showed that non of them were relying on behavior that strstr would not satisfy. So I am asking here for any historical context, if there is some. Thanks, moody ------------------------------------------ 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/T8831073f8b8bb351-M19c2d= ede80e2b8439fa4c68e Delivery options: https://9fans.topicbox.com/groups/9fans/subscription