* regex captures on a Header element
@ 2022-08-24 12:30 Randy Josleyn
[not found] ` <03fcdfd9-2811-4622-897e-98d2303e54e1n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Randy Josleyn @ 2022-08-24 12:30 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 1486 bytes --]
Hi group,
I am writing a multilingual document and I want to convert a markdown
header to a latex command like so:
## A header | 中文标题
->
\bisection{A header}{中文标题}
Using this example
<https://pandoc.org/lua-filters.html#modifying-pandocs-manual.txt-for-man-pages>
about man pages from the documentation, I have come up with something like
the following filter:
~~~lua
local text = pandoc.text
local raw = function (content)
return pandoc.RawInline('latex', content)
end
function Header(el)
local pattern = "(%a+)%s+|%s+(.*)"
headertext = table.unpack(el.content).text
local _, _, enh, zhh = string.find(headertext, pattern)
return raw('\\bisection{'..enh..'}{'..zhh..'}')
end
~~~
However, Lua tells me I'm trying to concatenate a nil value `zhh`. I guess
it could be that my regex is wrong, or that I'm using string.find
incorrectly; I copied the pattern from the Lua manual Section 20.3,
"Captures" <https://www.lua.org/pil/20.3.html>. Can anyone give me any
pointers?
Thank you all!
Randy
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/03fcdfd9-2811-4622-897e-98d2303e54e1n%40googlegroups.com.
[-- Attachment #1.2: Type: text/html, Size: 2037 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: regex captures on a Header element
[not found] ` <03fcdfd9-2811-4622-897e-98d2303e54e1n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2022-08-24 16:42 ` John MacFarlane
[not found] ` <79D67508-3478-4C1D-9637-7084AE959EDD-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: John MacFarlane @ 2022-08-24 16:42 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
One thing to keep in mind is that Lua's string functions are not unicode-aware.
So things like %a+ are probably not going to work as expected on Chinese text.
Lua 5.3 (which is the default version we include) has some support for UTF-8,
see https://www.lua.org/manual/5.3/manual.html#6.5
> On Aug 24, 2022, at 5:30 AM, Randy Josleyn <randy.josleyn-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
> Hi group,
>
> I am writing a multilingual document and I want to convert a markdown header to a latex command like so:
>
> ## A header | 中文标题
> ->
> \bisection{A header}{中文标题}
>
> Using this example about man pages from the documentation, I have come up with something like the following filter:
>
> ~~~lua
> local text = pandoc.text
> local raw = function (content)
> return pandoc.RawInline('latex', content)
> end
>
> function Header(el)
> local pattern = "(%a+)%s+|%s+(.*)"
> headertext = table.unpack(el.content).text
> local _, _, enh, zhh = string.find(headertext, pattern)
> return raw('\\bisection{'..enh..'}{'..zhh..'}')
> end
> ~~~
>
> However, Lua tells me I'm trying to concatenate a nil value `zhh`. I guess it could be that my regex is wrong, or that I'm using string.find incorrectly; I copied the pattern from the Lua manual Section 20.3, "Captures". Can anyone give me any pointers?
>
> Thank you all!
>
> Randy
>
> --
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/03fcdfd9-2811-4622-897e-98d2303e54e1n%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/79D67508-3478-4C1D-9637-7084AE959EDD%40gmail.com.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: regex captures on a Header element
[not found] ` <79D67508-3478-4C1D-9637-7084AE959EDD-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2022-08-25 3:29 ` Randy Josleyn
[not found] ` <85a8a23d-7e4e-4f1a-b54e-b2c0fb4e6182n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Randy Josleyn @ 2022-08-25 3:29 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 3391 bytes --]
Thank you for the heads-up. I just tested out `utf8.charpattern`, but I
could only get it to match one character and not a contiguous string of
them; I'll default to `.+` for now.
After more experimentation, I was able to get it doing what I wanted. I
realized I should have been using `table.concat` instead of `unpack`. My
final code is below for reference. My next task is to get pairs of
paragraphs and put their contents in a custom latex command which typesets
them in parallel. Thank you for your help!
~~~lua
function Header(el)
local pattern = "(%a+)%s+|%s+(.+)"
local content = {}
for k, v in pairs(el.content) do
if v.t == 'Str' then
content[k] = v.text
elseif v.t == 'Space' then
content[k] = ' '
end
end
local headertext = table.concat(content)
local headers = (string.gsub(headertext, pattern, '{%1}{%2}'))
return pandoc.RawInline('latex', '\\bisection' .. headers)
end
~~~
On Thursday, August 25, 2022 at 12:42:25 AM UTC+8 fiddlosopher wrote:
> One thing to keep in mind is that Lua's string functions are not
> unicode-aware.
> So things like %a+ are probably not going to work as expected on Chinese
> text.
> Lua 5.3 (which is the default version we include) has some support for
> UTF-8,
> see https://www.lua.org/manual/5.3/manual.html#6.5
>
>
> > On Aug 24, 2022, at 5:30 AM, Randy Josleyn <randy....-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> >
> > Hi group,
> >
> > I am writing a multilingual document and I want to convert a markdown
> header to a latex command like so:
> >
> > ## A header | 中文标题
> > ->
> > \bisection{A header}{中文标题}
> >
> > Using this example about man pages from the documentation, I have come
> up with something like the following filter:
> >
> > ~~~lua
> > local text = pandoc.text
> > local raw = function (content)
> > return pandoc.RawInline('latex', content)
> > end
> >
> > function Header(el)
> > local pattern = "(%a+)%s+|%s+(.*)"
> > headertext = table.unpack(el.content).text
> > local _, _, enh, zhh = string.find(headertext, pattern)
> > return raw('\\bisection{'..enh..'}{'..zhh..'}')
> > end
> > ~~~
> >
> > However, Lua tells me I'm trying to concatenate a nil value `zhh`. I
> guess it could be that my regex is wrong, or that I'm using string.find
> incorrectly; I copied the pattern from the Lua manual Section 20.3,
> "Captures". Can anyone give me any pointers?
> >
> > Thank you all!
> >
> > Randy
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/03fcdfd9-2811-4622-897e-98d2303e54e1n%40googlegroups.com
> .
>
>
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/85a8a23d-7e4e-4f1a-b54e-b2c0fb4e6182n%40googlegroups.com.
[-- Attachment #1.2: Type: text/html, Size: 5040 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: regex captures on a Header element
[not found] ` <85a8a23d-7e4e-4f1a-b54e-b2c0fb4e6182n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2022-08-25 17:21 ` Albert Krewinkel
0 siblings, 0 replies; 4+ messages in thread
From: Albert Krewinkel @ 2022-08-25 17:21 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
[-- Attachment #1: Type: text/plain, Size: 4591 bytes --]
If you have markup in your headings then you may want to iterate over `el.content` to find the separator, then you don't have to worry about Unicode. Something along the lines of
~~~lua
local sep_seen = false
local en = pandoc.Inlines{}
local zh = pandoc.Inlines{}
for i, v in ipairs(el.content) do
if sep_seen then
zh:insert(v)
elseif v.text == '|' then
sep_seen = true
else
en:insert(v)
end
~~~
You may also like the function `pandoc.utils.stringify` as an alternative to using table.concat.
https://pandoc.org/lua-filters#pandoc.utils.stringify
Randy Josleyn <randy.josleyn-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> hat am 25.08.2022 05:29 CEST geschrieben:
Thank you for the heads-up. I just tested out `utf8.charpattern`, but I could only get it to match one character and not a contiguous string of them; I'll default to `.+` for now.
After more experimentation, I was able to get it doing what I wanted. I realized I should have been using `table.concat` instead of `unpack`. My final code is below for reference. My next task is to get pairs of paragraphs and put their contents in a custom latex command which typesets them in parallel. Thank you for your help!
~~~lua
function Header(el)
local pattern = "(%a+)%s+|%s+(.+)"
local content = {}
for k, v in pairs(el.content) do
if v.t == 'Str' then
content[k] = v.text
elseif v.t == 'Space' then
content[k] = ' '
end
end
local headertext = table.concat(content)
local headers = (string.gsub(headertext, pattern, '{%1}{%2}'))
return pandoc.RawInline('latex', '\\bisection' .. headers)
end
~~~
On Thursday, August 25, 2022 at 12:42:25 AM UTC+8 fiddlosopher wrote:
One thing to keep in mind is that Lua's string functions are not unicode-aware.
So things like %a+ are probably not going to work as expected on Chinese text.
Lua 5.3 (which is the default version we include) has some support for UTF-8,
see https://www.lua.org/manual/5.3/manual.html#6.5
> On Aug 24, 2022, at 5:30 AM, Randy Josleyn <randy....-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
> Hi group,
>
> I am writing a multilingual document and I want to convert a markdown header to a latex command like so:
>
> ## A header | 中文标题
> ->
> \bisection{A header}{中文标题}
>
> Using this example about man pages from the documentation, I have come up with something like the following filter:
>
> ~~~lua
> local text = pandoc.text
> local raw = function (content)
> return pandoc.RawInline('latex', content)
> end
>
> function Header(el)
> local pattern = "(%a+)%s+|%s+(.*)"
> headertext = table.unpack(el.content).text
> local _, _, enh, zhh = string.find(headertext, pattern)
> return raw('\\bisection{'..enh..'}{'..zhh..'}')
> end
> ~~~
>
> However, Lua tells me I'm trying to concatenate a nil value `zhh`. I guess it could be that my regex is wrong, or that I'm using string.find incorrectly; I copied the pattern from the Lua manual Section 20.3, "Captures". Can anyone give me any pointers?
>
> Thank you all!
>
> Randy
>
> --
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/03fcdfd9-2811-4622-897e-98d2303e54e1n%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/85a8a23d-7e4e-4f1a-b54e-b2c0fb4e6182n%40googlegroups.com <https://groups.google.com/d/msgid/pandoc-discuss/85a8a23d-7e4e-4f1a-b54e-b2c0fb4e6182n%40googlegroups.com?utm_medium=email&utm_source=footer>.
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/3EFC2EF5-A5CA-46A6-AEB6-80ECFD05A3B2%40zeitkraut.de.
[-- Attachment #2: Type: text/html, Size: 5783 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-08-25 17:21 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-24 12:30 regex captures on a Header element Randy Josleyn
[not found] ` <03fcdfd9-2811-4622-897e-98d2303e54e1n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-08-24 16:42 ` John MacFarlane
[not found] ` <79D67508-3478-4C1D-9637-7084AE959EDD-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-08-25 3:29 ` Randy Josleyn
[not found] ` <85a8a23d-7e4e-4f1a-b54e-b2c0fb4e6182n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-08-25 17:21 ` Albert Krewinkel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).