public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Lua filter process LaTeX
@ 2020-10-26 10:28 Thomas Hodgson
       [not found] ` <021778da-effc-47ab-b69b-1d33e16a041fn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Thomas Hodgson @ 2020-10-26 10:28 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1129 bytes --]

I have been learning how to write filters with Lua. I started with the 
tikz.lua example, and changed it to search for any figure. I want to put 
captions in the alt text of the images. So, I added this:

```
local caption = string.match(el.text, "\\caption{(.-)}")
        if caption then
            return pandoc.Plain({pandoc.Image({pandoc.Str(caption)}, 
fname)})
```

That works, but it turns the raw LaTeX into a string, and my regex only 
goes as far as the last bracket; this is a regex issue.. So, I get things 
like this as alt text:

`foo' \emph{bar

Is there a way to have the filter turn that LaTeX into Pandoc's native 
format? I tried to think about pandoc.read, but didn't get very far.

Thanks.

Tom

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/021778da-effc-47ab-b69b-1d33e16a041fn%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 1676 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Lua filter process LaTeX
       [not found] ` <021778da-effc-47ab-b69b-1d33e16a041fn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-10-26 11:44   ` Thomas Hodgson
       [not found]     ` <59cbbd11-ca9d-4334-a348-92277bb459c2n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Thomas Hodgson @ 2020-10-26 11:44 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1725 bytes --]

I think that the regex was partly the problem. This does what I want, as 
long as there is a line break after the end of the `\caption`. The errors I 
was getting from pandoc.read were because the input wasn't good LaTeX.

```
local caption = string.match(el.text, "\\caption{(.+)}\n")
        if caption then
            local alt_text = pandoc.utils.stringify(pandoc.read(caption, 
'latex').blocks)
            return pandoc.Plain({pandoc.Image({pandoc.Str(alt_text)}, 
fname)})
```

On Monday, 26 October 2020 at 11:28:24 UTC+1 Thomas Hodgson wrote:

> I have been learning how to write filters with Lua. I started with the 
> tikz.lua example, and changed it to search for any figure. I want to put 
> captions in the alt text of the images. So, I added this:
>
> ```
> local caption = string.match(el.text, "\\caption{(.-)}")
>         if caption then
>             return pandoc.Plain({pandoc.Image({pandoc.Str(caption)}, 
> fname)})
> ```
>
> That works, but it turns the raw LaTeX into a string, and my regex only 
> goes as far as the last bracket; this is a regex issue.. So, I get things 
> like this as alt text:
>
> `foo' \emph{bar
>
> Is there a way to have the filter turn that LaTeX into Pandoc's native 
> format? I tried to think about pandoc.read, but didn't get very far.
>
> Thanks.
>
> Tom
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/59cbbd11-ca9d-4334-a348-92277bb459c2n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 2606 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Lua filter process LaTeX
       [not found]     ` <59cbbd11-ca9d-4334-a348-92277bb459c2n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-10-26 12:41       ` William Lupton
       [not found]         ` <CAEe_xxi3KVqos4O4Q4Ng1vx=WB9wT8dV+-z1DMq7FsxGVRkF3w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: William Lupton @ 2020-10-26 12:41 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1: Type: text/plain, Size: 2747 bytes --]

Sorry to comment on the lua rather than on the pandoc, but note that you
can replace this:

string.match(el.text, "\\caption{(.+)}\n")

with this:

el.text:match("\\caption{(.+)}\n")

(note the colon) i.e., you can treat match() as a string method. I
definitely prefer to do this!

On Mon, 26 Oct 2020 at 11:44, Thomas Hodgson <thomas.hodgson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
wrote:

> I think that the regex was partly the problem. This does what I want, as
> long as there is a line break after the end of the `\caption`. The errors I
> was getting from pandoc.read were because the input wasn't good LaTeX.
>
> ```
> local caption = string.match(el.text, "\\caption{(.+)}\n")
>         if caption then
>             local alt_text = pandoc.utils.stringify(pandoc.read(caption,
> 'latex').blocks)
>             return pandoc.Plain({pandoc.Image({pandoc.Str(alt_text)},
> fname)})
> ```
>
> On Monday, 26 October 2020 at 11:28:24 UTC+1 Thomas Hodgson wrote:
>
>> I have been learning how to write filters with Lua. I started with the
>> tikz.lua example, and changed it to search for any figure. I want to put
>> captions in the alt text of the images. So, I added this:
>>
>> ```
>> local caption = string.match(el.text, "\\caption{(.-)}")
>>         if caption then
>>             return pandoc.Plain({pandoc.Image({pandoc.Str(caption)},
>> fname)})
>> ```
>>
>> That works, but it turns the raw LaTeX into a string, and my regex only
>> goes as far as the last bracket; this is a regex issue.. So, I get things
>> like this as alt text:
>>
>> `foo' \emph{bar
>>
>> Is there a way to have the filter turn that LaTeX into Pandoc's native
>> format? I tried to think about pandoc.read, but didn't get very far.
>>
>> Thanks.
>>
>> Tom
>>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/59cbbd11-ca9d-4334-a348-92277bb459c2n%40googlegroups.com
> <https://groups.google.com/d/msgid/pandoc-discuss/59cbbd11-ca9d-4334-a348-92277bb459c2n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAEe_xxi3KVqos4O4Q4Ng1vx%3DWB9wT8dV%2B-z1DMq7FsxGVRkF3w%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 4086 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Lua filter process LaTeX
       [not found]         ` <CAEe_xxi3KVqos4O4Q4Ng1vx=WB9wT8dV+-z1DMq7FsxGVRkF3w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2020-10-26 15:52           ` Thomas Hodgson
       [not found]             ` <09ac0613-ca74-459f-8cd1-e09c276308dbn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Thomas Hodgson @ 2020-10-26 15:52 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2967 bytes --]

Thanks. The only lua I know is what I have read in Pandoc filters, and it 
probably shows.

On Monday, 26 October 2020 at 13:41:59 UTC+1 William Lupton wrote:

> Sorry to comment on the lua rather than on the pandoc, but note that you 
> can replace this:
>
> string.match(el.text, "\\caption{(.+)}\n")
>
> with this:
>
> el.text:match("\\caption{(.+)}\n")
>
> (note the colon) i.e., you can treat match() as a string method. I 
> definitely prefer to do this!
>
> On Mon, 26 Oct 2020 at 11:44, Thomas Hodgson <thomas....-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
>> I think that the regex was partly the problem. This does what I want, as 
>> long as there is a line break after the end of the `\caption`. The errors I 
>> was getting from pandoc.read were because the input wasn't good LaTeX.
>>
>> ```
>> local caption = string.match(el.text, "\\caption{(.+)}\n")
>>         if caption then
>>             local alt_text = pandoc.utils.stringify(pandoc.read(caption, 
>> 'latex').blocks)
>>             return pandoc.Plain({pandoc.Image({pandoc.Str(alt_text)}, 
>> fname)})
>> ```
>>
>> On Monday, 26 October 2020 at 11:28:24 UTC+1 Thomas Hodgson wrote:
>>
>>> I have been learning how to write filters with Lua. I started with the 
>>> tikz.lua example, and changed it to search for any figure. I want to put 
>>> captions in the alt text of the images. So, I added this:
>>>
>>> ```
>>> local caption = string.match(el.text, "\\caption{(.-)}")
>>>         if caption then
>>>             return pandoc.Plain({pandoc.Image({pandoc.Str(caption)}, 
>>> fname)})
>>> ```
>>>
>>> That works, but it turns the raw LaTeX into a string, and my regex only 
>>> goes as far as the last bracket; this is a regex issue.. So, I get things 
>>> like this as alt text:
>>>
>>> `foo' \emph{bar
>>>
>>> Is there a way to have the filter turn that LaTeX into Pandoc's native 
>>> format? I tried to think about pandoc.read, but didn't get very far.
>>>
>>> Thanks.
>>>
>>> Tom
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/pandoc-discuss/59cbbd11-ca9d-4334-a348-92277bb459c2n%40googlegroups.com 
>> <https://groups.google.com/d/msgid/pandoc-discuss/59cbbd11-ca9d-4334-a348-92277bb459c2n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/09ac0613-ca74-459f-8cd1-e09c276308dbn%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 5023 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Lua filter process LaTeX
       [not found]             ` <09ac0613-ca74-459f-8cd1-e09c276308dbn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-10-30 10:32               ` Thomas Hodgson
       [not found]                 ` <ca58cb60-eda3-4ae0-86b6-b6c27aba7c3bn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Thomas Hodgson @ 2020-10-30 10:32 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 3705 bytes --]

This is what I finally came up with:

```
local caption = el.text:match('\\caption{(.+)}\n')
        if caption then
            local caption_text = 
pandoc.utils.stringify(pandoc.read(caption, 'latex').blocks)
            return pandoc.Para({pandoc.Image({pandoc.Str(caption_text)}, 
fname, 'fig:')})
        else
            return pandoc.Para({pandoc.Image({}, fname, 'fig:')})
        end
```

This does what I want, which is produces HTML where the <img> is wrapped in 
a <figure>, and if my LaTeX had a \caption I get a <figcaption> with that 
text. Have I done it in a reasonable way?

On Monday, 26 October 2020 at 16:52:56 UTC+1 Thomas Hodgson wrote:

> Thanks. The only lua I know is what I have read in Pandoc filters, and it 
> probably shows.
>
> On Monday, 26 October 2020 at 13:41:59 UTC+1 William Lupton wrote:
>
>> Sorry to comment on the lua rather than on the pandoc, but note that you 
>> can replace this:
>>
>> string.match(el.text, "\\caption{(.+)}\n")
>>
>> with this:
>>
>> el.text:match("\\caption{(.+)}\n")
>>
>> (note the colon) i.e., you can treat match() as a string method. I 
>> definitely prefer to do this!
>>
>> On Mon, 26 Oct 2020 at 11:44, Thomas Hodgson <thomas....-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 
>> wrote:
>>
>>> I think that the regex was partly the problem. This does what I want, as 
>>> long as there is a line break after the end of the `\caption`. The errors I 
>>> was getting from pandoc.read were because the input wasn't good LaTeX.
>>>
>>> ```
>>> local caption = string.match(el.text, "\\caption{(.+)}\n")
>>>         if caption then
>>>             local alt_text = pandoc.utils.stringify(pandoc.read(caption, 
>>> 'latex').blocks)
>>>             return pandoc.Plain({pandoc.Image({pandoc.Str(alt_text)}, 
>>> fname)})
>>> ```
>>>
>>> On Monday, 26 October 2020 at 11:28:24 UTC+1 Thomas Hodgson wrote:
>>>
>>>> I have been learning how to write filters with Lua. I started with the 
>>>> tikz.lua example, and changed it to search for any figure. I want to put 
>>>> captions in the alt text of the images. So, I added this:
>>>>
>>>> ```
>>>> local caption = string.match(el.text, "\\caption{(.-)}")
>>>>         if caption then
>>>>             return pandoc.Plain({pandoc.Image({pandoc.Str(caption)}, 
>>>> fname)})
>>>> ```
>>>>
>>>> That works, but it turns the raw LaTeX into a string, and my regex only 
>>>> goes as far as the last bracket; this is a regex issue.. So, I get things 
>>>> like this as alt text:
>>>>
>>>> `foo' \emph{bar
>>>>
>>>> Is there a way to have the filter turn that LaTeX into Pandoc's native 
>>>> format? I tried to think about pandoc.read, but didn't get very far.
>>>>
>>>> Thanks.
>>>>
>>>> Tom
>>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "pandoc-discuss" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/pandoc-discuss/59cbbd11-ca9d-4334-a348-92277bb459c2n%40googlegroups.com 
>>> <https://groups.google.com/d/msgid/pandoc-discuss/59cbbd11-ca9d-4334-a348-92277bb459c2n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/ca58cb60-eda3-4ae0-86b6-b6c27aba7c3bn%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 6255 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Lua filter process LaTeX
       [not found]                 ` <ca58cb60-eda3-4ae0-86b6-b6c27aba7c3bn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-10-30 16:11                   ` Albert Krewinkel
       [not found]                     ` <87pn4zlmna.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Albert Krewinkel @ 2020-10-30 16:11 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


Thomas Hodgson writes:

> This is what I finally came up with:
>
> ```
> local caption = el.text:match('\\caption{(.+)}\n')
> ⋮
> ```
>
> This does what I want, which is produces HTML where the <img> is wrapped in
> a <figure>, and if my LaTeX had a \caption I get a <figcaption> with that
> text. Have I done it in a reasonable way?

Looks good! If you'd like to make the caption match a little more
robust, you could use '\\caption(%b{})' as the pattern.  This will work
even if there are additional command on the same line, as in

    \caption{the {\large image} caption}\label{fig:myimg}

The `%b{}` ensures that braces in the result are balanced. The result
will include the enclosing braces, but that won't matter in your case.
One could get rid of them by appending `:sub(2, -2)` to the line.

--
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/87pn4zlmna.fsf%40zeitkraut.de.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Lua filter process LaTeX
       [not found]                     ` <87pn4zlmna.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
@ 2020-11-20 22:21                       ` Thomas Hodgson
  0 siblings, 0 replies; 7+ messages in thread
From: Thomas Hodgson @ 2020-11-20 22:21 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 2140 bytes --]

Thanks. I wondered about matching braces, but it seemed to be very
complicated to do. That's very neat.

On Fri, 30 Oct 2020 at 17:11, Albert Krewinkel <albert+pandoc-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
wrote:

>
> Thomas Hodgson writes:
>
> > This is what I finally came up with:
> >
> > ```
> > local caption = el.text:match('\\caption{(.+)}\n')
> > ⋮
> > ```
> >
> > This does what I want, which is produces HTML where the <img> is wrapped
> in
> > a <figure>, and if my LaTeX had a \caption I get a <figcaption> with that
> > text. Have I done it in a reasonable way?
>
> Looks good! If you'd like to make the caption match a little more
> robust, you could use '\\caption(%b{})' as the pattern.  This will work
> even if there are additional command on the same line, as in
>
>     \caption{the {\large image} caption}\label{fig:myimg}
>
> The `%b{}` ensures that braces in the result are balanced. The result
> will include the enclosing braces, but that won't matter in your case.
> One could get rid of them by appending `:sub(2, -2)` to the line.
>
> --
> Albert Krewinkel
> GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "pandoc-discuss" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/pandoc-discuss/frS044mEAjE/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/87pn4zlmna.fsf%40zeitkraut.de
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFtWZzmweETDmQ%3DXoe%3DA3hYUXpdgkFyu3qxRh5oRahP1thM3nQ%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 3222 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-11-20 22:21 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-26 10:28 Lua filter process LaTeX Thomas Hodgson
     [not found] ` <021778da-effc-47ab-b69b-1d33e16a041fn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-10-26 11:44   ` Thomas Hodgson
     [not found]     ` <59cbbd11-ca9d-4334-a348-92277bb459c2n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-10-26 12:41       ` William Lupton
     [not found]         ` <CAEe_xxi3KVqos4O4Q4Ng1vx=WB9wT8dV+-z1DMq7FsxGVRkF3w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-10-26 15:52           ` Thomas Hodgson
     [not found]             ` <09ac0613-ca74-459f-8cd1-e09c276308dbn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-10-30 10:32               ` Thomas Hodgson
     [not found]                 ` <ca58cb60-eda3-4ae0-86b6-b6c27aba7c3bn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-10-30 16:11                   ` Albert Krewinkel
     [not found]                     ` <87pn4zlmna.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2020-11-20 22:21                       ` Thomas Hodgson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).