From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/33157 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: BPJ Newsgroups: gmane.text.pandoc Subject: More on changing the case of the first character in Lua Date: Thu, 5 Oct 2023 16:01:42 +0200 Message-ID: Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="000000000000c93bbe0606f890cd" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="8112"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBCWMVYEK54FRBUED7OUAMGQEOLZYSYQ-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Thu Oct 05 16:01:57 2023 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-oa1-f63.google.com ([209.85.160.63]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1qoOvM-0001uD-Bi for gtp-pandoc-discuss@m.gmane-mx.org; Thu, 05 Oct 2023 16:01:56 +0200 Original-Received: by mail-oa1-f63.google.com with SMTP id 586e51a60fabf-1e1cba2da07sf1189457fac.1 for ; Thu, 05 Oct 2023 07:01:56 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1696514515; cv=pass; d=google.com; s=arc-20160816; b=EslgGFzAhqo+QmvdFyprUTrI7QqrPG240RF6hwhoJtPdjaICRGB46r3srctvbNryRu JLrBIGrrICQQljmxy7ITyMUOAF8OSqRd/3rqCC2U0LJ+hNoxrOzo5qaji5eKlwF81gR3 Qh/GtxXRNhlGxksRl4nG1nWl8NRHvMpa6usXqF8vhevTftB6eYam7PHvt3tDRNf7Ap3z mH9jIfWHrP6yzTZ1ykXPGayE8NuG+mS/+EclvNyzcDr9Fy1hPPU7Bj+RrE8vCroz0x7A McMvX/SJcjJl6/jzmxOueVu5EdDOaa5DuisY7jB5OAK4tcxPDH2B8K4lv7l51rqdTTxU zMsA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:to:subject:message-id:date :from:mime-version:sender:dkim-signature:dkim-signature; bh=05dyEhBZgQtHYLadEOdu61Xz4D1MKAkkt3tR34ysE/0=; fh=4cPfTtzleA5nPUC1EQtk197aIUeaT1ew1v/oILbLT2I=; b=pv6ikEBEnQbgHNSmBz5E594wpaTa8McEB9La87BIa7a8SkXhf858Yfb2OymhWImI5T 7ujPxUazcWh7EYN/oEjiXNcgN1uGVpxgkL4x7mp0pjYqEjevBBXB9MuAf4M9Cklwce7u O+bs6JY+ieRPbkLFAxBg5NPZ1rTTQAzDwsM7qss59WkMjk83Nl/yTkSPFznKtWwa4cip tOuvASHW0MyQR370Kiq2lRS5qTcqWS5DDMXe+/dxOINuply0ftW5xfzFDMmye4TuGvWb oFLo6yBD1YlvCSUUcDK0SliX30a8FGbBcYsmYBxP+fXgZ5Tc7ReUEIE9SQ8rUm9FZt/P yukQ== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=JdSUuQtX; spf=pass (google.com: domain of melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2607:f8b0:4864:20::1136 as permitted sender) smtp.mailfrom=melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20230601; t=1696514515; x=1697119315; darn=m.gmane-mx.org; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender:to:subject :message-id:date:from:mime-version:sender:from:to:cc:subject:date :message-id:reply-to; bh=05dyEhBZgQtHYLadEOdu61Xz4D1MKAkkt3tR34ysE/0=; b=tliPz1JH32zDWEBv9kEO4+vt8jEEvyoDOOBo8XOWDMZIubHW2yLfPEdSZ5McS4fF1I WpwwQPKBCh+SvmU4iCvtL7MbBO8HeFvR4VklH8KNGc0jz6R36gNIfLWZa2DkeXZ1nE34 wnY98fphuPBZ5L4Piv3ZONWGBoE0g1quIji5XygUqo5IH2Z3JK2xSSA+3ube/X9cPdHZ 315YO9fJcD2wbif++t0sk6Y15Sn5dCn/fJzSGtaoyricFzfS6yJhLiu+JhP5EydWDFyT /iOalY+T2hTa5sfRW3fuBn8AOH19HIPvkaUOxTSEuxWYvE1OPGRqz6zqjFrU73dhW5/G DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1696514515; x=1697119315; darn=m.gmane-mx.org; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender:to:subject :message-id:date:from:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=05dyEhBZgQtHYLadEOdu61Xz4D1MKAkkt3tR34ysE/0=; b=ZySbELsbXVPYDVw13xfFTTQRLlwhvGLatr+8FziYesZDvlg8IPzbo6Lr/9Lv9I8vRV peS4ZOZL0FQXRo2mjbwsl/jG45eaTgTwmCyxLka8kT1BPLsL2Dwe09z9JhWjnO8T/eHs ceR3jPtw2H/HuVt4J1GY9iHgIsRnYIriJVxQN/B+eicj80KIsRl++YTIe6vszRDr63fl pAyvKaep2ZLf2KohV5xHjCFlkCLWhdMGnfv2r5WkAR/JcZdMVXo1IEWlsepnxDFG00Vv etNZuiLPtmikeKJGhSlm9pi/pxMX//9ZaKA5lHOA6VatfR1zSrAtoB8YqVyISqIRhNgQ gKRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696514515; x=1697119315; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender:to:subject :message-id:date:from:mime-version:x-beenthere:x-gm-message-state :sender:from:to:cc:subject:date:message-id:reply-to; bh=05dyEhBZgQtHYLadEOdu61Xz4D1MKAkkt3tR34ysE/0=; b=rFN7MaQRzyLOonA7mgOwe4TEo18FckyhuSUPm4cE7S4F/PIvP/ZSo5/QiNRDUvg3MO lKq3UPamQv7Y8yeprWMOvKIm/Fa1zHOPj/Hu8aTtIEog2Syyu2kvAdn4nYg4vuC6LwHr XjArODmsBhf/JRpvTfi7Fp6zi6O/IjmIndi2DOMo1ezin84wdnpYuSprVKj3eNjCXHKR qmEydmcc3evwxSE6wn80KZBnRdOfuIIFWTRrGb2qsIChw4pMlXqPbmW5YP2gQSeBghGf XLgCq0cvyCkXFPIdX0W3GV7lD8TxADNfDtHe5/L Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AOJu0YwM2yc6IvlsgzGkdE6d77H9LwYABsBtnDQ7GG8rYZwI5pf/xvVc HPd2CgN+NrRLry8cNB5lQRI= X-Google-Smtp-Source: AGHT+IEOGi/vgWeQ+mp4Yltxz3E/y+iTRzrAtUjqrvO2Hc1mokXkk/mWzkza5uSz5yLSr7ED3e1Fog== X-Received: by 2002:a05:6870:6586:b0:1dd:3533:f35b with SMTP id fp6-20020a056870658600b001dd3533f35bmr800484oab.28.1696514515223; Thu, 05 Oct 2023 07:01:55 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:6871:88e:b0:1dd:6b48:3e2b with SMTP id r14-20020a056871088e00b001dd6b483e2bls940922oaq.2.-pod-prod-00-us; Thu, 05 Oct 2023 07:01:52 -0700 (PDT) X-Received: by 2002:a54:4413:0:b0:3a7:36d1:9c13 with SMTP id k19-20020a544413000000b003a736d19c13mr1352183oiw.12.1696514511789; Thu, 05 Oct 2023 07:01:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696514511; cv=none; d=google.com; s=arc-20160816; b=pfq/oUSJTiSXu0ZX7NIleVE4wYhx5Jf4zmxXjtinEQ3Pmj1DmkfuADDFpd1/DIFrmM TWRhUf15dbKB+njUyBwpLoxBf98d4ruFVj062hk0N1WHJE1lfTChJhssHgkKLgouaIH/ l6VVfOlNSxuF85p96VYg56SW2tcN06/m/GXC+4r89CSTcRAYR8kfBdZGDveHlOWNfnry PzV47thoCe3OglPsEuIhj05VsmCB0l3OlH/3znh4OruUuS26L9t1YbbtKpbusMJxLHIS BoDxhJpsOM21N+9FZlZnvBUHsZYX88avQXJqbqERXCRof9q78N6RRrUNcKaV5vaUqfJa rcjg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=to:subject:message-id:date:from:mime-version:dkim-signature; bh=eywCX2N7q5hNkXLggjGz6UJDeIMEAjuh1fyafVaXJPo=; fh=4cPfTtzleA5nPUC1EQtk197aIUeaT1ew1v/oILbLT2I=; b=jwGib6kKvldpixhITODF8etW7/Ig/aMAjOKd0H1zQBwc56maGz92QQKvJbVUPhp+uS 8SNRbSGHK/N6ivyXy4O6cYtIPQ3pfRKRy5GlCzWkXmO9Am4JDep0aNG4Z9WNt/OIdb2d 2a/8yJ8mURkEvrv/1LqQdvgJq5qRVbKobcrCcyNw6Jbu5kBceMWoAXL3+K4NN2qm64/T bbb0VHzBJ7DlZhkjJ1hFZH1eZYlCTkWjxgA+Xcyt9XyOwM9MiBPoHrjKc1/o9c/OoVHB ex3uxNg1TVIyjC8vbe7AyQbL7jouo+WlXZgAURsYieQTh/5MKVZYzDbrdctRqFv1u9Hx hf4Q== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=JdSUuQtX; spf=pass (google.com: domain of melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2607:f8b0:4864:20::1136 as permitted sender) smtp.mailfrom=melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Original-Received: from mail-yw1-x1136.google.com (mail-yw1-x1136.google.com. [2607:f8b0:4864:20::1136]) by gmr-mx.google.com with ESMTPS id eq3-20020a056808448300b003a843f1814csi98154oib.4.2023.10.05.07.01.51 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 05 Oct 2023 07:01:51 -0700 (PDT) Received-SPF: pass (google.com: domain of melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2607:f8b0:4864:20::1136 as permitted sender) client-ip=2607:f8b0:4864:20::1136; Original-Received: by mail-yw1-x1136.google.com with SMTP id 00721157ae682-59e88a28b98so9420167b3.1 for ; Thu, 05 Oct 2023 07:01:51 -0700 (PDT) X-Received: by 2002:a0d:db4e:0:b0:59f:66a1:bfca with SMTP id d75-20020a0ddb4e000000b0059f66a1bfcamr738393ywe.26.1696514511025; Thu, 05 Oct 2023 07:01:51 -0700 (PDT) X-Original-Sender: melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=JdSUuQtX; spf=pass (google.com: domain of melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2607:f8b0:4864:20::1136 as permitted sender) smtp.mailfrom=melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:33157 Archived-At: --000000000000c93bbe0606f890cd Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable How to add functions for setting the case of the first char in a string in Lua. This takes advantage of two features of `string.gsub`: - If the third argument, the replacement, is a function it will be passed each match, or the capture(s) of the match if any, as argument(s) and the replacement is whatever it returns (although it must be a string or number!= ) - If passed a (non-negative) integer as fourth argument at most that many substitutions will be made, so by passing `1` as argument \#4 we can substitute just the first match. ``` lua local charpat =3D utf8.charpattern for _, case in ipairs({ 'upper', 'lowet' }) do local case_fun =3D pandoc.text[case] local name =3D case .. '_first' pandoc.text[name] =3D function(s) local stype =3D type(s) if 'string' ~=3D stype then error("Argument must be string, not " .. stype) end -- Set case of the first match against charpat in s. return s:gsub(charpat, case_fun, 1) end end ``` If we could use lua-utf8 we could match the first letter instead since in its gsub `%l` =3D=3D Unicode Genera= l Category Letter! We could even say ``` lua lutf8.gsub(s, '%f[_%w]%l', lutf8.upper, 1) ``` to uppercase the first letter in the first word! Theoretically you could use a Unicode-aware regex library. The lrexlib-Oniguruma binding < http://rrthomas.github.io/lrexlib/manual.html> < http://rrthomas.github.io/lrexlib/manual.html#oniguruma-only-functions-and-= methods > would probably be a good choice, because it has a very full-featured (not lua-like!) regex syntax and some very useful other features although lrexlib doesn=E2=80=99t support all of them, but on the other hand adds som= e of its own to all its bindings, notably Lua-like `match`, `find`, `gmatch`, `gsub` functions, a `split` function and a `tfind` function which returns a table with captures, including named captures if any/supported by the library.[^1] However luautf8 and the lrexlib libraries can=E2=80=99t be use= d with the statically linked Pandoc binaries and I don=E2=80=99t expect that any o= f them can be included with Pandoc as they are rather big. Let=E2=80=99s see how `lpeg.utfR` fares with hugeish alternations like =E2=80=9Call LGC (uppercase/lowercase) letters=E2=80=9D which break memory with the `lpeg.P(= char) + lpeg.P(char) + ...` approach. Unfortunately the fact that most extension blocks list letters in upper=E2=80=94lower pairs probably speaks against it= . [^1]: FWIW I wouldn=E2=80=99t mind a ~~`tgsub`~~ function which would take = a function as argument \#3 and pass it a table in the style of `tfind`, although you can fake it by looping over a string with `tfind` in Lua. --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/CADAJKhCo2iqrG0XemNaN2ax28Q04%2By2Td4OLe0c0cqbVFJ3zoQ%40mail= .gmail.com. --000000000000c93bbe0606f890cd Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
How to add functions for setting the ca= se of the first char in a string in Lua. This takes advantage of two featur= es of `string.gsub`:

- I= f the third argument, the replacement, is a function it will be passed each= match, or the capture(s) of the match if any, as argument(s) and the repla= cement is whatever it returns (although it must be a string or number!)
- If passed a (non-negative) integer as fourth argument= at most that many substitutions will be made, so by passing `1` as argumen= t \#4 we can substitute just the first match.

``` lua
local charpat =3D utf8= .charpattern
for _, case in ipairs({ 'upper'= , 'lowet' }) do
=C2=A0 local case_fun =3D pa= ndoc.text[case]
=C2=A0 local name =3D case .. '_= first'
=C2=A0 pandoc.text[name] =3D function(s)<= /div>
=C2=A0 =C2=A0 local stype =3D type(s)
=C2=A0 =C2=A0 if 'string' ~=3D stype then
=C2=A0 =C2=A0 =C2=A0 error("Argument must be string, not &qu= ot; .. stype)
=C2=A0 =C2=A0 end
=C2=A0 =C2=A0 -- Set case of the first match against charpat in s.
=
=C2=A0 =C2=A0 return s:gsub(charpat, case_fun, 1)
=C2=A0 end
end
```

If we could use lu= a-utf8 <https://github.c= om/starwing/luautf8> we could match the first letter instead since i= n its gsub `%l` =3D=3D Unicode General Category Letter! We could even say

``` lua
lutf8.gsub(s, '%f[_%w]%l', lutf8.upper, 1)
```

to uppercase th= e first letter in the first word!

Theoretically you could use a Unicode-aware regex library. The lr= exlib-Oniguruma binding


would probably be a good choice, because it has a very full-featured (= not lua-like!) regex syntax and some very useful other features although lr= exlib doesn=E2=80=99t support all of them, but on the other hand adds some = of its own to all its bindings, notably Lua-like `match`, `find`, `gmatch`,= `gsub` functions, a `split` function and a `tfind` function which returns = a table with captures, including named captures if any/supported by the lib= rary.[^1] However luautf8 and the lrexlib libraries can=E2=80=99t be used w= ith the statically linked Pandoc binaries and I don=E2=80=99t expect that a= ny of them can be included with Pandoc as they are rather big. Let=E2=80=99= s see how `lpeg.utfR` fares with hugeish alternations like =E2=80=9Call LGC= (uppercase/lowercase) letters=E2=80=9D which break memory with the `lpeg.P= (char) + lpeg.P(char) + ...` approach. Unfortunately the fact that most ext= ension blocks list letters in upper=E2=80=94lower pairs probably speaks aga= inst it.

[^1]: FWIW I wo= uldn=E2=80=99t mind a ~~`tgsub`~~ function which would take a function as a= rgument \#3 and pass it a table in the style of `tfind`, although you can f= ake it by looping over a string with `tfind` in Lua.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://group= s.google.com/d/msgid/pandoc-discuss/CADAJKhCo2iqrG0XemNaN2ax28Q04%2By2Td4OL= e0c0cqbVFJ3zoQ%40mail.gmail.com.
--000000000000c93bbe0606f890cd--