From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/113521 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Joey McCollum via ntg-context Newsgroups: gmane.comp.tex.context Subject: Checking for a Unicode prefix of a Unicode string Date: Fri, 26 Nov 2021 01:42:00 -0500 Message-ID: Reply-To: mailing list for ConTeXt users Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============4697220624337836787==" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="17322"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Joey McCollum To: mailing list for ConTeXt users Original-X-From: ntg-context-bounces@ntg.nl Fri Nov 26 07:42:54 2021 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane-mx.org Original-Received: from zapf.boekplan.nl ([5.39.185.232] helo=zapf.ntg.nl) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mqUwf-0004KS-2X for gctc-ntg-context-518@m.gmane-mx.org; Fri, 26 Nov 2021 07:42:53 +0100 Original-Received: from localhost (localhost [127.0.0.1]) by zapf.ntg.nl (Postfix) with ESMTP id 7D26228913A; Fri, 26 Nov 2021 07:42:26 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at zapf.boekplan.nl Original-Received: from zapf.ntg.nl ([127.0.0.1]) by localhost (zapf.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ZbJXtzJiT27W; Fri, 26 Nov 2021 07:42:24 +0100 (CET) Original-Received: from zapf.ntg.nl (localhost [127.0.0.1]) by zapf.ntg.nl (Postfix) with ESMTP id 6E121289407; Fri, 26 Nov 2021 07:42:24 +0100 (CET) Original-Received: from localhost (localhost [127.0.0.1]) by zapf.ntg.nl (Postfix) with ESMTP id 80A3628913A for ; Fri, 26 Nov 2021 07:42:23 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at zapf.boekplan.nl Original-Received: from zapf.ntg.nl ([127.0.0.1]) by localhost (zapf.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tF_zFrYAbzJW for ; Fri, 26 Nov 2021 07:42:22 +0100 (CET) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.166.54; helo=mail-io1-f54.google.com; envelope-from=jmccollum20140511@gmail.com; receiver= Original-Received: from mail-io1-f54.google.com (mail-io1-f54.google.com [209.85.166.54]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)) (No client certificate requested) by zapf.ntg.nl (Postfix) with ESMTPS id 5E04B282AD3 for ; Fri, 26 Nov 2021 07:42:22 +0100 (CET) Original-Received: by mail-io1-f54.google.com with SMTP id y16so10154361ioc.8 for ; Thu, 25 Nov 2021 22:42:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:from:date:message-id:subject:to; bh=Rz5blHeoup6Q/vp+gYayJHnS5SO206yRBxA5lZBGfRs=; b=dpRIjJ/8SlgIMYOBNpQRj6BOhtCJU3VdIjpp31L1sbAjUPUUsD8bqWjKHZa+XCilPS qF0RDrhM8u3IT9m6EeqOt1iA2cahkSy6T2kjuu2yMKELgisJGsYBaRy0PK/ElvdQO8AJ onZgguWMr4gCZI6/tnkHHTH/PvUIgvsMwP94SnE3NnbvSYgsJwRiMjdD+kUmFfZOYeUI rxvAzBIhzuLkvYxL3UNx6yt4OlOM2hRyw98YI3p3xTk8/LTUpRu7a4l5v/M4WLdWTIe3 myufd2wht0hQHkEfbI0i4f0S2ocLpiyce3xXq2PyAk/TjE5mqV9Zdgfsd4KR/Alu8G2m wvaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=Rz5blHeoup6Q/vp+gYayJHnS5SO206yRBxA5lZBGfRs=; b=i2SHz6ZGNUY+/2ic5LJwru6gtN3wP7KU7sTTetMbVDSVxonpjk0Rav2X7CR+YB3rkB w/NwAhZLLe7C/PBJRtCZxEA79/T8kO6WROKWAWUZkdZv+kACIfyx0mQHwOZZY6goH8BY /KDYCp557nsDOTNcEe1EmTXtmD6iAokLjL9uLXrebcMHVd2AfCMtk65czw228LufEAm6 pWYPT6hoFYi/qVtIPROfR+JMmuwTjznE/tEWaP6LjBAs58QbCDUzpXDglnf5qbE0oKik eady/5bjxwuOcEV493zPvO2eZ2TYKfOA8y3AyPJPF0xPU6rCa7ucly9u43igyC46bUfg wOsg== X-Gm-Message-State: AOAM530GujQYJR6+A+/CMotYDOEyyS/OyRPSgdl9DDqEQh5iS+A7OX3N r3FW/9E0P/16J+viQFrAmquXsu177O0zVuzwPC6cE2BkU0g= X-Google-Smtp-Source: ABdhPJyar+Rw7ztq+5gvmqg2tgiD0dJ0ZqanSxbZ0211gEQteg4oeEo0b7Agn+/n9PUUcuePfFlfAww4JQS/sRunT6w= X-Received: by 2002:a05:6602:2d04:: with SMTP id c4mr31207096iow.146.1637908940231; Thu, 25 Nov 2021 22:42:20 -0800 (PST) X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.26 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ntg-context-bounces@ntg.nl Original-Sender: "ntg-context" Xref: news.gmane.io gmane.comp.tex.context:113521 Archived-At: --===============4697220624337836787== Content-Type: multipart/alternative; boundary="0000000000008e96e505d1ab6480" --0000000000008e96e505d1ab6480 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I wasn't aware of a general-purpose "doifstartswith" macro in ConTeXt (the \doifnextcharelse macro only works one character at a time, and the \doifinstring macros may capture substrings that are not prefixes), and I'd like to develop one for something I'm working on. I've been trying to do this in Lua, as that seemed like the most natural approach. Normally, something like this would work fine as a foundation: ``` function isprefix(prefix, str) if string.sub(str, 1, string.len(prefix)) =3D=3D prefix then return true end return false end ``` Unfortunately, if I want to check for prefixes that include two-byte characters like =C2=A7 and =C2=B6, then the positions and string lengths th= at I specify no longer correspond to the actual byte offsets and lengths that Lua uses. I'm aware of the utf8 plugin that was intended to address this issue, but the following code also isn't working: ``` function sbl.isprefix(prefix, str) -- lua is devious and measures string length in bytes, not chars, -- so we can't just use string.sub and string.len as we normally would. local i =3D utf8.offset(str, 1) local j =3D utf8.offset(str, utf8.len(prefix) + 1) - 1 if string.sub(str, i, j) =3D=3D prefix then return true end return false end ``` The only other detail that may be relevant is that I'm passing a macro as the "str" input. But this should be expanded when the Lua code is manipulating it, right? I'm sure there's something obvious that I'm doing wrong, but I've been trying to get this to work for at least a couple hours now, and I don't know what else to try. If there is a simple fix to my Lua code or an existing ConTeXt macro that will get the job done, I'd appreciate it! Joey --0000000000008e96e505d1ab6480 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I wasn't aware of a general-purpose "doifstartswi= th" macro in ConTeXt (the \doifnextcharelse macro only works one chara= cter at a time, and the \doifinstring=C2=A0macros may capture substrings th= at are not prefixes), and I'd like to develop one for something I'm= working on. I've been trying to do this in Lua, as that seemed like th= e most natural approach. Normally, something like this would work fine as a= foundation:

```
=C2=A0 function isprefix(prefix, str= )
=C2=A0 =C2=A0 if string.sub(str, 1, string.len(prefix)) =3D=3D prefix = then
=C2=A0 =C2=A0 =C2=A0 return true
=C2=A0 =C2=A0 end
=C2=A0 =C2= =A0 return false
=C2=A0 end
```

Unfortunatel= y, if I want to check for prefixes that include two-byte characters like=C2= =A0=C2=A7 and =C2=B6, then the positions and string lengths that I specify = no longer correspond to the actual byte offsets and lengths that Lua uses. = I'm aware of the utf8 plugin that was intended to address this issue, b= ut the following code also isn't working:

```<= br>=C2=A0 function sbl.isprefix(prefix, str)
=C2=A0 =C2=A0 -- lua is dev= ious and measures string length in bytes, not chars,
=C2=A0 =C2=A0 -- so= we can't just use string.sub and string.len as we normally would.
= =C2=A0 =C2=A0 local i =3D utf8.offset(str, 1)
=C2=A0 =C2=A0 local j =3D = utf8.offset(str, utf8.len(prefix) + 1) - 1
=C2=A0 =C2=A0 if string.sub(s= tr, i, j) =3D=3D prefix then
=C2=A0 =C2=A0 =C2=A0 return true
=C2=A0 = =C2=A0 end
=C2=A0 =C2=A0 return false
=C2=A0 end
```
The only other detail that may be relevant is that I'm pass= ing a macro as the "str" input. But this should be expanded when = the Lua code is manipulating it, right?

I'm su= re there's something obvious that I'm doing wrong, but I've bee= n trying to get this to work for at least a couple hours now, and I don'= ;t know what else to try. If there is a simple fix to my Lua code or an exi= sting ConTeXt macro that will get the job done, I'd appreciate it!

Joey


--0000000000008e96e505d1ab6480-- --===============4697220624337836787== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX18KSWYgeW91ciBxdWVzdGlvbiBpcyBvZiBpbnRlcmVz dCB0byBvdGhlcnMgYXMgd2VsbCwgcGxlYXNlIGFkZCBhbiBlbnRyeSB0byB0aGUgV2lraSEKCm1h aWxsaXN0IDogbnRnLWNvbnRleHRAbnRnLm5sIC8gaHR0cDovL3d3dy5udGcubmwvbWFpbG1hbi9s aXN0aW5mby9udGctY29udGV4dAp3ZWJwYWdlICA6IGh0dHA6Ly93d3cucHJhZ21hLWFkZS5ubCAv IGh0dHA6Ly9jb250ZXh0LmFhbmhldC5uZXQKYXJjaGl2ZSAgOiBodHRwczovL2JpdGJ1Y2tldC5v cmcvcGhnL2NvbnRleHQtbWlycm9yL2NvbW1pdHMvCndpa2kgICAgIDogaHR0cDovL2NvbnRleHRn YXJkZW4ubmV0Cl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fCg== --===============4697220624337836787==--