* Lua filter process LaTeX @ 2020-10-26 10:28 Thomas Hodgson [not found] ` <021778da-effc-47ab-b69b-1d33e16a041fn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Thomas Hodgson @ 2020-10-26 10:28 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 1129 bytes --] I have been learning how to write filters with Lua. I started with the tikz.lua example, and changed it to search for any figure. I want to put captions in the alt text of the images. So, I added this: ``` local caption = string.match(el.text, "\\caption{(.-)}") if caption then return pandoc.Plain({pandoc.Image({pandoc.Str(caption)}, fname)}) ``` That works, but it turns the raw LaTeX into a string, and my regex only goes as far as the last bracket; this is a regex issue.. So, I get things like this as alt text: `foo' \emph{bar Is there a way to have the filter turn that LaTeX into Pandoc's native format? I tried to think about pandoc.read, but didn't get very far. Thanks. Tom -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/021778da-effc-47ab-b69b-1d33e16a041fn%40googlegroups.com. [-- Attachment #1.2: Type: text/html, Size: 1676 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <021778da-effc-47ab-b69b-1d33e16a041fn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* Re: Lua filter process LaTeX [not found] ` <021778da-effc-47ab-b69b-1d33e16a041fn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2020-10-26 11:44 ` Thomas Hodgson [not found] ` <59cbbd11-ca9d-4334-a348-92277bb459c2n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Thomas Hodgson @ 2020-10-26 11:44 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 1725 bytes --] I think that the regex was partly the problem. This does what I want, as long as there is a line break after the end of the `\caption`. The errors I was getting from pandoc.read were because the input wasn't good LaTeX. ``` local caption = string.match(el.text, "\\caption{(.+)}\n") if caption then local alt_text = pandoc.utils.stringify(pandoc.read(caption, 'latex').blocks) return pandoc.Plain({pandoc.Image({pandoc.Str(alt_text)}, fname)}) ``` On Monday, 26 October 2020 at 11:28:24 UTC+1 Thomas Hodgson wrote: > I have been learning how to write filters with Lua. I started with the > tikz.lua example, and changed it to search for any figure. I want to put > captions in the alt text of the images. So, I added this: > > ``` > local caption = string.match(el.text, "\\caption{(.-)}") > if caption then > return pandoc.Plain({pandoc.Image({pandoc.Str(caption)}, > fname)}) > ``` > > That works, but it turns the raw LaTeX into a string, and my regex only > goes as far as the last bracket; this is a regex issue.. So, I get things > like this as alt text: > > `foo' \emph{bar > > Is there a way to have the filter turn that LaTeX into Pandoc's native > format? I tried to think about pandoc.read, but didn't get very far. > > Thanks. > > Tom > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/59cbbd11-ca9d-4334-a348-92277bb459c2n%40googlegroups.com. [-- Attachment #1.2: Type: text/html, Size: 2606 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <59cbbd11-ca9d-4334-a348-92277bb459c2n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* Re: Lua filter process LaTeX [not found] ` <59cbbd11-ca9d-4334-a348-92277bb459c2n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2020-10-26 12:41 ` William Lupton [not found] ` <CAEe_xxi3KVqos4O4Q4Ng1vx=WB9wT8dV+-z1DMq7FsxGVRkF3w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: William Lupton @ 2020-10-26 12:41 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1: Type: text/plain, Size: 2747 bytes --] Sorry to comment on the lua rather than on the pandoc, but note that you can replace this: string.match(el.text, "\\caption{(.+)}\n") with this: el.text:match("\\caption{(.+)}\n") (note the colon) i.e., you can treat match() as a string method. I definitely prefer to do this! On Mon, 26 Oct 2020 at 11:44, Thomas Hodgson <thomas.hodgson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > I think that the regex was partly the problem. This does what I want, as > long as there is a line break after the end of the `\caption`. The errors I > was getting from pandoc.read were because the input wasn't good LaTeX. > > ``` > local caption = string.match(el.text, "\\caption{(.+)}\n") > if caption then > local alt_text = pandoc.utils.stringify(pandoc.read(caption, > 'latex').blocks) > return pandoc.Plain({pandoc.Image({pandoc.Str(alt_text)}, > fname)}) > ``` > > On Monday, 26 October 2020 at 11:28:24 UTC+1 Thomas Hodgson wrote: > >> I have been learning how to write filters with Lua. I started with the >> tikz.lua example, and changed it to search for any figure. I want to put >> captions in the alt text of the images. So, I added this: >> >> ``` >> local caption = string.match(el.text, "\\caption{(.-)}") >> if caption then >> return pandoc.Plain({pandoc.Image({pandoc.Str(caption)}, >> fname)}) >> ``` >> >> That works, but it turns the raw LaTeX into a string, and my regex only >> goes as far as the last bracket; this is a regex issue.. So, I get things >> like this as alt text: >> >> `foo' \emph{bar >> >> Is there a way to have the filter turn that LaTeX into Pandoc's native >> format? I tried to think about pandoc.read, but didn't get very far. >> >> Thanks. >> >> Tom >> > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/59cbbd11-ca9d-4334-a348-92277bb459c2n%40googlegroups.com > <https://groups.google.com/d/msgid/pandoc-discuss/59cbbd11-ca9d-4334-a348-92277bb459c2n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAEe_xxi3KVqos4O4Q4Ng1vx%3DWB9wT8dV%2B-z1DMq7FsxGVRkF3w%40mail.gmail.com. [-- Attachment #2: Type: text/html, Size: 4086 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <CAEe_xxi3KVqos4O4Q4Ng1vx=WB9wT8dV+-z1DMq7FsxGVRkF3w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Lua filter process LaTeX [not found] ` <CAEe_xxi3KVqos4O4Q4Ng1vx=WB9wT8dV+-z1DMq7FsxGVRkF3w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2020-10-26 15:52 ` Thomas Hodgson [not found] ` <09ac0613-ca74-459f-8cd1-e09c276308dbn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Thomas Hodgson @ 2020-10-26 15:52 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 2967 bytes --] Thanks. The only lua I know is what I have read in Pandoc filters, and it probably shows. On Monday, 26 October 2020 at 13:41:59 UTC+1 William Lupton wrote: > Sorry to comment on the lua rather than on the pandoc, but note that you > can replace this: > > string.match(el.text, "\\caption{(.+)}\n") > > with this: > > el.text:match("\\caption{(.+)}\n") > > (note the colon) i.e., you can treat match() as a string method. I > definitely prefer to do this! > > On Mon, 26 Oct 2020 at 11:44, Thomas Hodgson <thomas....-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > >> I think that the regex was partly the problem. This does what I want, as >> long as there is a line break after the end of the `\caption`. The errors I >> was getting from pandoc.read were because the input wasn't good LaTeX. >> >> ``` >> local caption = string.match(el.text, "\\caption{(.+)}\n") >> if caption then >> local alt_text = pandoc.utils.stringify(pandoc.read(caption, >> 'latex').blocks) >> return pandoc.Plain({pandoc.Image({pandoc.Str(alt_text)}, >> fname)}) >> ``` >> >> On Monday, 26 October 2020 at 11:28:24 UTC+1 Thomas Hodgson wrote: >> >>> I have been learning how to write filters with Lua. I started with the >>> tikz.lua example, and changed it to search for any figure. I want to put >>> captions in the alt text of the images. So, I added this: >>> >>> ``` >>> local caption = string.match(el.text, "\\caption{(.-)}") >>> if caption then >>> return pandoc.Plain({pandoc.Image({pandoc.Str(caption)}, >>> fname)}) >>> ``` >>> >>> That works, but it turns the raw LaTeX into a string, and my regex only >>> goes as far as the last bracket; this is a regex issue.. So, I get things >>> like this as alt text: >>> >>> `foo' \emph{bar >>> >>> Is there a way to have the filter turn that LaTeX into Pandoc's native >>> format? I tried to think about pandoc.read, but didn't get very far. >>> >>> Thanks. >>> >>> Tom >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/pandoc-discuss/59cbbd11-ca9d-4334-a348-92277bb459c2n%40googlegroups.com >> <https://groups.google.com/d/msgid/pandoc-discuss/59cbbd11-ca9d-4334-a348-92277bb459c2n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/09ac0613-ca74-459f-8cd1-e09c276308dbn%40googlegroups.com. [-- Attachment #1.2: Type: text/html, Size: 5023 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <09ac0613-ca74-459f-8cd1-e09c276308dbn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* Re: Lua filter process LaTeX [not found] ` <09ac0613-ca74-459f-8cd1-e09c276308dbn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2020-10-30 10:32 ` Thomas Hodgson [not found] ` <ca58cb60-eda3-4ae0-86b6-b6c27aba7c3bn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Thomas Hodgson @ 2020-10-30 10:32 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 3705 bytes --] This is what I finally came up with: ``` local caption = el.text:match('\\caption{(.+)}\n') if caption then local caption_text = pandoc.utils.stringify(pandoc.read(caption, 'latex').blocks) return pandoc.Para({pandoc.Image({pandoc.Str(caption_text)}, fname, 'fig:')}) else return pandoc.Para({pandoc.Image({}, fname, 'fig:')}) end ``` This does what I want, which is produces HTML where the <img> is wrapped in a <figure>, and if my LaTeX had a \caption I get a <figcaption> with that text. Have I done it in a reasonable way? On Monday, 26 October 2020 at 16:52:56 UTC+1 Thomas Hodgson wrote: > Thanks. The only lua I know is what I have read in Pandoc filters, and it > probably shows. > > On Monday, 26 October 2020 at 13:41:59 UTC+1 William Lupton wrote: > >> Sorry to comment on the lua rather than on the pandoc, but note that you >> can replace this: >> >> string.match(el.text, "\\caption{(.+)}\n") >> >> with this: >> >> el.text:match("\\caption{(.+)}\n") >> >> (note the colon) i.e., you can treat match() as a string method. I >> definitely prefer to do this! >> >> On Mon, 26 Oct 2020 at 11:44, Thomas Hodgson <thomas....-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> >> wrote: >> >>> I think that the regex was partly the problem. This does what I want, as >>> long as there is a line break after the end of the `\caption`. The errors I >>> was getting from pandoc.read were because the input wasn't good LaTeX. >>> >>> ``` >>> local caption = string.match(el.text, "\\caption{(.+)}\n") >>> if caption then >>> local alt_text = pandoc.utils.stringify(pandoc.read(caption, >>> 'latex').blocks) >>> return pandoc.Plain({pandoc.Image({pandoc.Str(alt_text)}, >>> fname)}) >>> ``` >>> >>> On Monday, 26 October 2020 at 11:28:24 UTC+1 Thomas Hodgson wrote: >>> >>>> I have been learning how to write filters with Lua. I started with the >>>> tikz.lua example, and changed it to search for any figure. I want to put >>>> captions in the alt text of the images. So, I added this: >>>> >>>> ``` >>>> local caption = string.match(el.text, "\\caption{(.-)}") >>>> if caption then >>>> return pandoc.Plain({pandoc.Image({pandoc.Str(caption)}, >>>> fname)}) >>>> ``` >>>> >>>> That works, but it turns the raw LaTeX into a string, and my regex only >>>> goes as far as the last bracket; this is a regex issue.. So, I get things >>>> like this as alt text: >>>> >>>> `foo' \emph{bar >>>> >>>> Is there a way to have the filter turn that LaTeX into Pandoc's native >>>> format? I tried to think about pandoc.read, but didn't get very far. >>>> >>>> Thanks. >>>> >>>> Tom >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "pandoc-discuss" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/pandoc-discuss/59cbbd11-ca9d-4334-a348-92277bb459c2n%40googlegroups.com >>> <https://groups.google.com/d/msgid/pandoc-discuss/59cbbd11-ca9d-4334-a348-92277bb459c2n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/ca58cb60-eda3-4ae0-86b6-b6c27aba7c3bn%40googlegroups.com. [-- Attachment #1.2: Type: text/html, Size: 6255 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <ca58cb60-eda3-4ae0-86b6-b6c27aba7c3bn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* Re: Lua filter process LaTeX [not found] ` <ca58cb60-eda3-4ae0-86b6-b6c27aba7c3bn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2020-10-30 16:11 ` Albert Krewinkel [not found] ` <87pn4zlmna.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Albert Krewinkel @ 2020-10-30 16:11 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw Thomas Hodgson writes: > This is what I finally came up with: > > ``` > local caption = el.text:match('\\caption{(.+)}\n') > ⋮ > ``` > > This does what I want, which is produces HTML where the <img> is wrapped in > a <figure>, and if my LaTeX had a \caption I get a <figcaption> with that > text. Have I done it in a reasonable way? Looks good! If you'd like to make the caption match a little more robust, you could use '\\caption(%b{})' as the pattern. This will work even if there are additional command on the same line, as in \caption{the {\large image} caption}\label{fig:myimg} The `%b{}` ensures that braces in the result are balanced. The result will include the enclosing braces, but that won't matter in your case. One could get rid of them by appending `:sub(2, -2)` to the line. -- Albert Krewinkel GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124 -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/87pn4zlmna.fsf%40zeitkraut.de. ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <87pn4zlmna.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>]
* Re: Lua filter process LaTeX [not found] ` <87pn4zlmna.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> @ 2020-11-20 22:21 ` Thomas Hodgson 0 siblings, 0 replies; 7+ messages in thread From: Thomas Hodgson @ 2020-11-20 22:21 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw [-- Attachment #1: Type: text/plain, Size: 2140 bytes --] Thanks. I wondered about matching braces, but it seemed to be very complicated to do. That's very neat. On Fri, 30 Oct 2020 at 17:11, Albert Krewinkel <albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> wrote: > > Thomas Hodgson writes: > > > This is what I finally came up with: > > > > ``` > > local caption = el.text:match('\\caption{(.+)}\n') > > ⋮ > > ``` > > > > This does what I want, which is produces HTML where the <img> is wrapped > in > > a <figure>, and if my LaTeX had a \caption I get a <figcaption> with that > > text. Have I done it in a reasonable way? > > Looks good! If you'd like to make the caption match a little more > robust, you could use '\\caption(%b{})' as the pattern. This will work > even if there are additional command on the same line, as in > > \caption{the {\large image} caption}\label{fig:myimg} > > The `%b{}` ensures that braces in the result are balanced. The result > will include the enclosing braces, but that won't matter in your case. > One could get rid of them by appending `:sub(2, -2)` to the line. > > -- > Albert Krewinkel > GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124 > > -- > You received this message because you are subscribed to a topic in the > Google Groups "pandoc-discuss" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/pandoc-discuss/frS044mEAjE/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/87pn4zlmna.fsf%40zeitkraut.de > . > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFtWZzmweETDmQ%3DXoe%3DA3hYUXpdgkFyu3qxRh5oRahP1thM3nQ%40mail.gmail.com. [-- Attachment #2: Type: text/html, Size: 3222 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-11-20 22:21 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-10-26 10:28 Lua filter process LaTeX Thomas Hodgson [not found] ` <021778da-effc-47ab-b69b-1d33e16a041fn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2020-10-26 11:44 ` Thomas Hodgson [not found] ` <59cbbd11-ca9d-4334-a348-92277bb459c2n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2020-10-26 12:41 ` William Lupton [not found] ` <CAEe_xxi3KVqos4O4Q4Ng1vx=WB9wT8dV+-z1DMq7FsxGVRkF3w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2020-10-26 15:52 ` Thomas Hodgson [not found] ` <09ac0613-ca74-459f-8cd1-e09c276308dbn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2020-10-30 10:32 ` Thomas Hodgson [not found] ` <ca58cb60-eda3-4ae0-86b6-b6c27aba7c3bn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2020-10-30 16:11 ` Albert Krewinkel [not found] ` <87pn4zlmna.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> 2020-11-20 22:21 ` Thomas Hodgson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).