* regex captures on a Header element @ 2022-08-24 12:30 Randy Josleyn [not found] ` <03fcdfd9-2811-4622-897e-98d2303e54e1n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 0 siblings, 1 reply; 4+ messages in thread From: Randy Josleyn @ 2022-08-24 12:30 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 1486 bytes --] Hi group, I am writing a multilingual document and I want to convert a markdown header to a latex command like so: ## A header | 中文标题 -> \bisection{A header}{中文标题} Using this example <https://pandoc.org/lua-filters.html#modifying-pandocs-manual.txt-for-man-pages> about man pages from the documentation, I have come up with something like the following filter: ~~~lua local text = pandoc.text local raw = function (content) return pandoc.RawInline('latex', content) end function Header(el) local pattern = "(%a+)%s+|%s+(.*)" headertext = table.unpack(el.content).text local _, _, enh, zhh = string.find(headertext, pattern) return raw('\\bisection{'..enh..'}{'..zhh..'}') end ~~~ However, Lua tells me I'm trying to concatenate a nil value `zhh`. I guess it could be that my regex is wrong, or that I'm using string.find incorrectly; I copied the pattern from the Lua manual Section 20.3, "Captures" <https://www.lua.org/pil/20.3.html>. Can anyone give me any pointers? Thank you all! Randy -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/03fcdfd9-2811-4622-897e-98d2303e54e1n%40googlegroups.com. [-- Attachment #1.2: Type: text/html, Size: 2037 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <03fcdfd9-2811-4622-897e-98d2303e54e1n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* Re: regex captures on a Header element [not found] ` <03fcdfd9-2811-4622-897e-98d2303e54e1n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2022-08-24 16:42 ` John MacFarlane [not found] ` <79D67508-3478-4C1D-9637-7084AE959EDD-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 4+ messages in thread From: John MacFarlane @ 2022-08-24 16:42 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw One thing to keep in mind is that Lua's string functions are not unicode-aware. So things like %a+ are probably not going to work as expected on Chinese text. Lua 5.3 (which is the default version we include) has some support for UTF-8, see https://www.lua.org/manual/5.3/manual.html#6.5 > On Aug 24, 2022, at 5:30 AM, Randy Josleyn <randy.josleyn-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > Hi group, > > I am writing a multilingual document and I want to convert a markdown header to a latex command like so: > > ## A header | 中文标题 > -> > \bisection{A header}{中文标题} > > Using this example about man pages from the documentation, I have come up with something like the following filter: > > ~~~lua > local text = pandoc.text > local raw = function (content) > return pandoc.RawInline('latex', content) > end > > function Header(el) > local pattern = "(%a+)%s+|%s+(.*)" > headertext = table.unpack(el.content).text > local _, _, enh, zhh = string.find(headertext, pattern) > return raw('\\bisection{'..enh..'}{'..zhh..'}') > end > ~~~ > > However, Lua tells me I'm trying to concatenate a nil value `zhh`. I guess it could be that my regex is wrong, or that I'm using string.find incorrectly; I copied the pattern from the Lua manual Section 20.3, "Captures". Can anyone give me any pointers? > > Thank you all! > > Randy > > -- > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/03fcdfd9-2811-4622-897e-98d2303e54e1n%40googlegroups.com. -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/79D67508-3478-4C1D-9637-7084AE959EDD%40gmail.com. ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <79D67508-3478-4C1D-9637-7084AE959EDD-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: regex captures on a Header element [not found] ` <79D67508-3478-4C1D-9637-7084AE959EDD-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2022-08-25 3:29 ` Randy Josleyn [not found] ` <85a8a23d-7e4e-4f1a-b54e-b2c0fb4e6182n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 0 siblings, 1 reply; 4+ messages in thread From: Randy Josleyn @ 2022-08-25 3:29 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 3391 bytes --] Thank you for the heads-up. I just tested out `utf8.charpattern`, but I could only get it to match one character and not a contiguous string of them; I'll default to `.+` for now. After more experimentation, I was able to get it doing what I wanted. I realized I should have been using `table.concat` instead of `unpack`. My final code is below for reference. My next task is to get pairs of paragraphs and put their contents in a custom latex command which typesets them in parallel. Thank you for your help! ~~~lua function Header(el) local pattern = "(%a+)%s+|%s+(.+)" local content = {} for k, v in pairs(el.content) do if v.t == 'Str' then content[k] = v.text elseif v.t == 'Space' then content[k] = ' ' end end local headertext = table.concat(content) local headers = (string.gsub(headertext, pattern, '{%1}{%2}')) return pandoc.RawInline('latex', '\\bisection' .. headers) end ~~~ On Thursday, August 25, 2022 at 12:42:25 AM UTC+8 fiddlosopher wrote: > One thing to keep in mind is that Lua's string functions are not > unicode-aware. > So things like %a+ are probably not going to work as expected on Chinese > text. > Lua 5.3 (which is the default version we include) has some support for > UTF-8, > see https://www.lua.org/manual/5.3/manual.html#6.5 > > > > On Aug 24, 2022, at 5:30 AM, Randy Josleyn <randy....-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > > Hi group, > > > > I am writing a multilingual document and I want to convert a markdown > header to a latex command like so: > > > > ## A header | 中文标题 > > -> > > \bisection{A header}{中文标题} > > > > Using this example about man pages from the documentation, I have come > up with something like the following filter: > > > > ~~~lua > > local text = pandoc.text > > local raw = function (content) > > return pandoc.RawInline('latex', content) > > end > > > > function Header(el) > > local pattern = "(%a+)%s+|%s+(.*)" > > headertext = table.unpack(el.content).text > > local _, _, enh, zhh = string.find(headertext, pattern) > > return raw('\\bisection{'..enh..'}{'..zhh..'}') > > end > > ~~~ > > > > However, Lua tells me I'm trying to concatenate a nil value `zhh`. I > guess it could be that my regex is wrong, or that I'm using string.find > incorrectly; I copied the pattern from the Lua manual Section 20.3, > "Captures". Can anyone give me any pointers? > > > > Thank you all! > > > > Randy > > > > -- > > You received this message because you are subscribed to the Google > Groups "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/03fcdfd9-2811-4622-897e-98d2303e54e1n%40googlegroups.com > . > > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/85a8a23d-7e4e-4f1a-b54e-b2c0fb4e6182n%40googlegroups.com. [-- Attachment #1.2: Type: text/html, Size: 5040 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <85a8a23d-7e4e-4f1a-b54e-b2c0fb4e6182n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* Re: regex captures on a Header element [not found] ` <85a8a23d-7e4e-4f1a-b54e-b2c0fb4e6182n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2022-08-25 17:21 ` Albert Krewinkel 0 siblings, 0 replies; 4+ messages in thread From: Albert Krewinkel @ 2022-08-25 17:21 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw [-- Attachment #1: Type: text/plain, Size: 4591 bytes --] If you have markup in your headings then you may want to iterate over `el.content` to find the separator, then you don't have to worry about Unicode. Something along the lines of ~~~lua local sep_seen = false local en = pandoc.Inlines{} local zh = pandoc.Inlines{} for i, v in ipairs(el.content) do if sep_seen then zh:insert(v) elseif v.text == '|' then sep_seen = true else en:insert(v) end ~~~ You may also like the function `pandoc.utils.stringify` as an alternative to using table.concat. https://pandoc.org/lua-filters#pandoc.utils.stringify Randy Josleyn <randy.josleyn-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> hat am 25.08.2022 05:29 CEST geschrieben: Thank you for the heads-up. I just tested out `utf8.charpattern`, but I could only get it to match one character and not a contiguous string of them; I'll default to `.+` for now. After more experimentation, I was able to get it doing what I wanted. I realized I should have been using `table.concat` instead of `unpack`. My final code is below for reference. My next task is to get pairs of paragraphs and put their contents in a custom latex command which typesets them in parallel. Thank you for your help! ~~~lua function Header(el) local pattern = "(%a+)%s+|%s+(.+)" local content = {} for k, v in pairs(el.content) do if v.t == 'Str' then content[k] = v.text elseif v.t == 'Space' then content[k] = ' ' end end local headertext = table.concat(content) local headers = (string.gsub(headertext, pattern, '{%1}{%2}')) return pandoc.RawInline('latex', '\\bisection' .. headers) end ~~~ On Thursday, August 25, 2022 at 12:42:25 AM UTC+8 fiddlosopher wrote: One thing to keep in mind is that Lua's string functions are not unicode-aware. So things like %a+ are probably not going to work as expected on Chinese text. Lua 5.3 (which is the default version we include) has some support for UTF-8, see https://www.lua.org/manual/5.3/manual.html#6.5 > On Aug 24, 2022, at 5:30 AM, Randy Josleyn <randy....-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > Hi group, > > I am writing a multilingual document and I want to convert a markdown header to a latex command like so: > > ## A header | 中文标题 > -> > \bisection{A header}{中文标题} > > Using this example about man pages from the documentation, I have come up with something like the following filter: > > ~~~lua > local text = pandoc.text > local raw = function (content) > return pandoc.RawInline('latex', content) > end > > function Header(el) > local pattern = "(%a+)%s+|%s+(.*)" > headertext = table.unpack(el.content).text > local _, _, enh, zhh = string.find(headertext, pattern) > return raw('\\bisection{'..enh..'}{'..zhh..'}') > end > ~~~ > > However, Lua tells me I'm trying to concatenate a nil value `zhh`. I guess it could be that my regex is wrong, or that I'm using string.find incorrectly; I copied the pattern from the Lua manual Section 20.3, "Captures". Can anyone give me any pointers? > > Thank you all! > > Randy > > -- > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/03fcdfd9-2811-4622-897e-98d2303e54e1n%40googlegroups.com. -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>. To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/85a8a23d-7e4e-4f1a-b54e-b2c0fb4e6182n%40googlegroups.com <https://groups.google.com/d/msgid/pandoc-discuss/85a8a23d-7e4e-4f1a-b54e-b2c0fb4e6182n%40googlegroups.com?utm_medium=email&utm_source=footer>. -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/3EFC2EF5-A5CA-46A6-AEB6-80ECFD05A3B2%40zeitkraut.de. [-- Attachment #2: Type: text/html, Size: 5783 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-08-25 17:21 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-08-24 12:30 regex captures on a Header element Randy Josleyn [not found] ` <03fcdfd9-2811-4622-897e-98d2303e54e1n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2022-08-24 16:42 ` John MacFarlane [not found] ` <79D67508-3478-4C1D-9637-7084AE959EDD-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2022-08-25 3:29 ` Randy Josleyn [not found] ` <85a8a23d-7e4e-4f1a-b54e-b2c0fb4e6182n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2022-08-25 17:21 ` Albert Krewinkel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).