From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/31282 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Albert Krewinkel Newsgroups: gmane.text.pandoc Subject: Re: regex captures on a Header element Date: Thu, 25 Aug 2022 19:21:23 +0200 Message-ID: <3EFC2EF5-A5CA-46A6-AEB6-80ECFD05A3B2@zeitkraut.de> References: <03fcdfd9-2811-4622-897e-98d2303e54e1n@googlegroups.com> <79D67508-3478-4C1D-9637-7084AE959EDD@gmail.com> <85a8a23d-7e4e-4f1a-b54e-b2c0fb4e6182n@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=----YK3BQUPP7YJHZ53LGI9BJSW1TKDD6K Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="2069"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBCZJF7XJTILRBGO7T2MAMGQEDBFIQVY-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Thu Aug 25 19:21:33 2022 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-lj1-f184.google.com ([209.85.208.184]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1oRGXt-0000LK-RB for gtp-pandoc-discuss@m.gmane-mx.org; Thu, 25 Aug 2022 19:21:33 +0200 Original-Received: by mail-lj1-f184.google.com with SMTP id o9-20020a2e7309000000b00261d4ae66d7sf2335705ljc.21 for ; Thu, 25 Aug 2022 10:21:33 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1661448093; cv=pass; d=google.com; s=arc-20160816; b=uBr+m1PCXJaTpls39Xb4GKJK3NINY6Q5qCSeDuXNPIR2UyYJQUYcL1AEodhF46O8BF jGb9T07kP73U3Y9BbtI8HVZ1yn/cSAql9LtAicEggp/1wy2SSQZ5YowpwcS40Lr+ttRb 17/LmG+d/sbTtvo0/yOITSPOJxci3uJI8DfBQLpAhHRVfsK4ed6e/355Tcj/qA95K9xF YEvR80uc8l6IxM2zPxREMHn0IPLcdn6jh1IsVvk3iFxy9ZAQkQH+21bLZ/iGReCok0N1 t74V01kP0Hiu8yXM0ak8WYTGAV9FoNF5rVzgqnExFT3lDn3vsyHbqTkgrxTJYXuNTx2f u7rQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:content-transfer-encoding :mime-version:message-id:references:in-reply-to:subject:to:from:date :sender:dkim-signature; bh=w4NYPTFH+LeVnihBprlC0ml9a9N1B2aN2J3WDaHfOwQ=; b=x+BXOhpO046p0UUDhjpTJiXXAYe9sVbYJaww9VESmI19vvtai+ZOIgN+H6y7E1u29S nIp/pje6cn7EeKtZXyOI4OFsvvw4MFMQ1I4Gncp+UWUg9lwoqgaViZfaN9q9S8/kxynE hdeXdPUE7kKr3ZeZV2mde7hMTw5xY1JfbQ4Bj7fdQfmgHBhQS6y4HTNBcZSZDuHJL83Q My3+0+VckPxLLDSE6W8RPXsscv43/ovajCBDKJSPv+Za/VWNM3PVA71bmoY8DXAro2L1 Z3g6Ea1xNdQ3hxSOAstEeLSjDH+vtGVgPlYtGHI44Zu7Ts8K8iDagYyw4vqY0SHJ0n5T xuOQ== ARC-Authentication-Results: i=2; gmr-mx.google.com; spf=pass (google.com: domain of albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org designates 2001:67c:2050:0:465::101 as permitted sender) smtp.mailfrom=albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender :content-transfer-encoding:mime-version:message-id:references :in-reply-to:subject:to:from:date:sender:from:to:cc; bh=w4NYPTFH+LeVnihBprlC0ml9a9N1B2aN2J3WDaHfOwQ=; b=FjH0FUpf3TG9/N1Oip178IQ5jFVEgLwNtyccvbrwiIPclukXI6WLddxgId+Swf8Jzp MimaRpChstXEg68IsGwtWkqbkuOAdprrLSNAIlCQpxwMQkWyFUEnfeSae0JRV3ua9Mab aCGLY1zNpoJYsNPA+GGROzSv0HynxB6XHMubMT54X/4F+0IcSAB7nos1z3GPq5jRIH3m Aj1yWoBvtdFQMa8/Jeln2O5IOjF6ijebHzvi7TjRjMUWJbIl5nhpQQO74htax5aZGxK3 GaFPyki4a+Iabb4h+0YY1o5AHj1iKwyIk8zb8FjXOAyAjQ22j1ysUs81NTi/+15q8EBr YwlA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender :content-transfer-encoding:mime-version:message-id:references :in-reply-to:subject:to:from:date:x-gm-message-state:sender:from:to :cc; bh=w4NYPTFH+LeVnihBprlC0ml9a9N1B2aN2J3WDaHfOwQ=; b=gCdJepkc3vRsZTmuXDDQOxwemVVcH8iusrvXGTwHKid6HUZfnIlvmC2VaKECKgbDZ1 vHhRiDw1jztlmFXe543euohja8uXQbzMhEG7ctDHEw8ZQboYrNLuErxPCPZUakQ4SY9P kdkCTHXz8gnTeRQZxTyBOjsPYwWK5cxUARHRL7ttTOwsjps0PK/bTFBZSdz+T7q/QFap M/4vIFAnUGdZWIu1amnid5MdSCH8ktJDqd6wcorQXA8RgVIvxZr3fATwYVVnFQw/N3qT hQY0ELY2RYA/Dwc7qOEJ99i9aCMpD2+f0ESZTnL8Jzsf5bx3iXnAY Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: ACgBeo2SFEl2YjIXlC3boon6y2ZiSxtJZ1QmR0/CvO0w9GR7Vh0X+RLn 9/gDzQvdt4kgt/LWV+COySo= X-Google-Smtp-Source: AA6agR716p0x1JoGOEF+kVrdl9h29D5w14KoIsfVaEOR6yvy3QL2JdqrcmJQufx/O2tEPfW+AKtwmQ== X-Received: by 2002:a05:6512:1095:b0:492:c8b0:db3 with SMTP id j21-20020a056512109500b00492c8b00db3mr1594730lfg.2.1661448093160; Thu, 25 Aug 2022 10:21:33 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:6512:34d1:b0:48b:2227:7787 with SMTP id w17-20020a05651234d100b0048b22277787ls1090860lfr.3.-pod-prod-gmail; Thu, 25 Aug 2022 10:21:27 -0700 (PDT) X-Received: by 2002:ac2:4d29:0:b0:492:e965:b6aa with SMTP id h9-20020ac24d29000000b00492e965b6aamr1396909lfk.495.1661448087744; Thu, 25 Aug 2022 10:21:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1661448087; cv=none; d=google.com; s=arc-20160816; b=r3q/NrMpJ1CN55CrSeEUxJHJuYzUB1qglJNaqHh47twkrP2Fg1o2WrQg632kBib48v ru/+SwBrBOCzNuDLJ1VSPSEul0NAcX9JN1imJiDQzmeiE+oQM2iEYXJ/cFvl2EypGLRB oQHzqoMLbx2QdKkRb200L+IWzW/AAMiUXbtEX9TffrCL8PWz1h/d7oiS+0w0QW6M+xbp dhmtOO+F25KkWVJqFqY5E00O8RuJZppN1H1xZroKLoFpTVpU/OsOnsNCnFkf/G0q/LZs oRm5FRyAQLftT/LwwkxlWE8gTaaUyhC9mIvurhN6lJdPsiNKVEnuvUdwS/tVJ+YsMhMY UhFg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:message-id:references :in-reply-to:subject:to:from:date; bh=QaRdJ8BP0qlqSbp6nRnH9caKZYNISnkpO404HOOXhQI=; b=AqSj9oteMrKUADX+uIIDl4mQLrkuk3XKcOto1llpHBgcKFaF/vH0eM6xMQ07BY3rhZ cQuFig7Xv4ZGYArm9dUOqlh/p/dw80Xg3u1qEdc5sH4b87EhCRei2Zz7fHVOm0XoOax0 PcWvTSWGzNlWWzBJiaubKf7cSRYtNgoaGXep+GMvngwacC7v8kMbKrS3LJ4o5W21qCGw vVVB1ELk+icC8WRL1VLZWgjl9xkGCEy2SC1dOxlz97O35s8Xzes8DGPqTuvjaONddxPi vt5DxK9X0h9/RRzbWPMComUmSoTDvNZynSVqR8K8rlESpuNso+qUVOXgQCUmaXqUs4Ub jWxA== ARC-Authentication-Results: i=1; gmr-mx.google.com; spf=pass (google.com: domain of albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org designates 2001:67c:2050:0:465::101 as permitted sender) smtp.mailfrom=albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org Original-Received: from mout-p-101.mailbox.org (mout-p-101.mailbox.org. [2001:67c:2050:0:465::101]) by gmr-mx.google.com with ESMTPS id c4-20020ac25f64000000b00492ce810d43si90940lfc.10.2022.08.25.10.21.26 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Aug 2022 10:21:26 -0700 (PDT) Received-SPF: pass (google.com: domain of albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org designates 2001:67c:2050:0:465::101 as permitted sender) client-ip=2001:67c:2050:0:465::101; Original-Received: from smtp1.mailbox.org (smtp1.mailbox.org [10.196.197.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-101.mailbox.org (Postfix) with ESMTPS id 4MD8rg73H2z9sTB for ; Thu, 25 Aug 2022 19:21:23 +0200 (CEST) In-Reply-To: <85a8a23d-7e4e-4f1a-b54e-b2c0fb4e6182n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> X-Original-Sender: albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org designates 2001:67c:2050:0:465::101 as permitted sender) smtp.mailfrom=albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:31282 Archived-At: ------YK3BQUPP7YJHZ53LGI9BJSW1TKDD6K Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable If you have markup in your headings then you may want to iterate over `el.c= ontent` to find the separator, then you don't have to worry about Unicode. = Something along the lines of ~~~lua local sep_seen =3D false local en =3D pandoc.Inlines{} local zh =3D pandoc.Inlines{} for i, v in ipairs(el.content) do if sep_seen then zh:insert(v) elseif v.text =3D=3D '|' then sep_seen =3D true else en:insert(v) end ~~~ You may also like the function `pandoc.utils.stringify` as an alternative t= o using table.concat. https://pandoc.org/lua-filters#pandoc.utils.stringify Randy Josleyn hat am 25.08.2022 05:29 CEST geschr= ieben: Thank you for the heads-up. I just tested out `utf8.charpattern`, but I cou= ld only get it to match one character and not a contiguous string of them; = I'll default to `.+` for now. After more experimentation, I was able to get it doing what I wanted. I rea= lized I should have been using `table.concat` instead of `unpack`. My final= code is below for reference. My next task is to get pairs of paragraphs an= d put their contents in a custom latex command which typesets them in paral= lel. Thank you for your help! ~~~lua function Header(el) local pattern =3D "(%a+)%s+|%s+(.+)" local content =3D {} for k, v in pairs(el.content) do if v.t =3D=3D 'Str' then content[k] =3D v.text elseif v.t =3D=3D 'Space' then content[k] =3D ' ' end end local headertext =3D table.concat(content) local headers =3D (string.gsub(headertext, pattern, '{%1}{%2}')) return pandoc.RawInline('latex', '\\bisection' .. headers) end ~~~ On Thursday, August 25, 2022 at 12:42:25 AM UTC+8 fiddlosopher wrote: One thing to keep in mind is that Lua's string functions are not unicode-aw= are.=20 So things like %a+ are probably not going to work as expected on Chinese te= xt.=20 Lua 5.3 (which is the default version we include) has some support for UTF-= 8,=20 see https://www.lua.org/manual/5.3/manual.html#6.5=20 > On Aug 24, 2022, at 5:30 AM, Randy Josleyn wrote:= =20 >=20 > Hi group,=20 >=20 > I am writing a multilingual document and I want to convert a markdown hea= der to a latex command like so:=20 >=20 > ## A header | =E4=B8=AD=E6=96=87=E6=A0=87=E9=A2=98=20 > ->=20 > \bisection{A header}{=E4=B8=AD=E6=96=87=E6=A0=87=E9=A2=98}=20 >=20 > Using this example about man pages from the documentation, I have come up= with something like the following filter:=20 >=20 > ~~~lua=20 > local text =3D pandoc.text=20 > local raw =3D function (content)=20 > return pandoc.RawInline('latex', content)=20 > end=20 >=20 > function Header(el)=20 > local pattern =3D "(%a+)%s+|%s+(.*)"=20 > headertext =3D table.unpack(el.content).text=20 > local _, _, enh, zhh =3D string.find(headertext, pattern)=20 > return raw('\\bisection{'..enh..'}{'..zhh..'}')=20 > end=20 > ~~~=20 >=20 > However, Lua tells me I'm trying to concatenate a nil value `zhh`. I gues= s it could be that my regex is wrong, or that I'm using string.find incorre= ctly; I copied the pattern from the Lua manual Section 20.3, "Captures". Ca= n anyone give me any pointers?=20 >=20 > Thank you all!=20 >=20 > Randy=20 >=20 > --=20 > You received this message because you are subscribed to the Google Groups= "pandoc-discuss" group.=20 > To unsubscribe from this group and stop receiving emails from it, send an= email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org=20 > To view this discussion on the web visit https://groups.google.com/d/msgi= d/pandoc-discuss/03fcdfd9-2811-4622-897e-98d2303e54e1n%40googlegroups.com.= =20 --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org . To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/85a8a23d-7e4e-4f1a-b54e-b2c0fb4e6182n%40googlegroups.com . --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/3EFC2EF5-A5CA-46A6-AEB6-80ECFD05A3B2%40zeitkraut.de. ------YK3BQUPP7YJHZ53LGI9BJSW1TKDD6K Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable If you have markup in your headings then you may= want to iterate over `el.content` to find the separator, then you don't ha= ve to worry about Unicode. Something along the lines of

~~~lua
local sep_seen =3D false

local en =3D pandoc.Inlines{}

loca= l zh =3D pandoc.Inlines{}

for i, v in ipairs(el.content) do

= if sep_seen then

zh:insert(v)

elseif v.text =3D=3D '|' then=

sep_seen =3D true

else

en:insert(v)

end
=
~~~

You may also like the function `pandoc.utils.stringify` as a= n alternative to using table.concat.

https://pandoc.org/lua-filters#pandoc.ut= ils.stringify

Randy Josleyn <randy.josleyn-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> hat = am 25.08.2022 05:29 CEST geschrieben:

Thank you for the heads-up. I = just tested out `utf8.charpattern`, but I could only get it to match one ch= aracter and not a contiguous string of them; I'll default to `.+` for now.<= br>
After more experimentation, I was able to get it doing what I wanted= . I realized I should have been using `table.concat` instead of `unpack`. M= y final code is below for reference. My next task is to get pairs of paragr= aphs and put their contents in a custom latex command which typesets them i= n parallel. Thank you for your help!

~~~lua

function Header(e= l)
local pattern =3D "(%a+)%s+|%s+(.+)"
local content =3D {}
fo= r k, v in pairs(el.content) do
if v.t =3D=3D 'Str' then
content[k] = =3D v.text
elseif v.t =3D=3D 'Space' then
content[k] =3D ' '
en= d
end
local headertext =3D table.concat(content)
local headers = =3D (string.gsub(headertext, pattern, '{%1}{%2}'))
return pandoc.RawInl= ine('latex', '\\bisection' .. headers)
end

~~~

On Thursday= , August 25, 2022 at 12:42:25 AM UTC+8 fiddlosopher wrote:

One thing= to keep in mind is that Lua's string functions are not unicode-aware.
= So things like %a+ are probably not going to work as expected on Chinese te= xt.
Lua 5.3 (which is the default version we include) has some support = for UTF-8,
see https://www.lua.org/manual/5.3/manual.html#6.5


> On A= ug 24, 2022, at 5:30 AM, Randy Josleyn <randy....-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: <= br>>
> Hi group,
>
> I am writing a multilingual do= cument and I want to convert a markdown header to a latex command like so: =
>
> ## A header | =E4=B8=AD=E6=96=87=E6=A0=87=E9=A2=98
&g= t; ->
> \bisection{A header}{=E4=B8=AD=E6=96=87=E6=A0=87=E9=A2=98= }
>
> Using this example about man pages from the documentati= on, I have come up with something like the following filter:
>
&= gt; ~~~lua
> local text =3D pandoc.text
> local raw =3D funct= ion (content)
> return pandoc.RawInline('latex', content)
> e= nd
>
> function Header(el)
> local pattern =3D "(%a+)%= s+|%s+(.*)"
> headertext =3D table.unpack(el.content).text
> = local _, _, enh, zhh =3D string.find(headertext, pattern)
> return r= aw('\\bisection{'..enh..'}{'..zhh..'}')
> end
> ~~~
> =
> However, Lua tells me I'm trying to concatenate a nil value `zhh`.= I guess it could be that my regex is wrong, or that I'm using string.find = incorrectly; I copied the pattern from the Lua manual Section 20.3, "Captur= es". Can anyone give me any pointers?
>
> Thank you all!
= >
> Randy
>
> --
> You received this message= because you are subscribed to the Google Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, se= nd an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this dis= cussion on the web visit https:/= /groups.google.com/d/msgid/pandoc-discuss/03fcdfd9-2811-4622-897e-98d2303e5= 4e1n%40googlegroups.com.

--
You received this message beca= use you are subscribed to the Google Groups "pandoc-discuss" group.
To u= nsubscribe from this group and stop receiving emails from it, send an email= to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <mailto:pandoc-discuss+u= nsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>.
To view this discussion on the web visi= t https://groups.google.com/d/ms= gid/pandoc-discuss/85a8a23d-7e4e-4f1a-b54e-b2c0fb4e6182n%40googlegroups.com= <https://groups.google.com/d/msgid/pandoc-discuss/85a8a2= 3d-7e4e-4f1a-b54e-b2c0fb4e6182n%40googlegroups.com?utm_medium=3Demail&u= tm_source=3Dfooter>.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/msgi= d/pandoc-discuss/3EFC2EF5-A5CA-46A6-AEB6-80ECFD05A3B2%40zeitkraut.de. ------YK3BQUPP7YJHZ53LGI9BJSW1TKDD6K--