public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: "'lukeflo' via pandoc-discuss" <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
To: pandoc-discuss <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Subject: Re: Lua filter to change macro for figure caption md -> latex
Date: Mon, 11 Dec 2023 14:04:40 -0800 (PST)	[thread overview]
Message-ID: <78503784-88bd-4d34-b5f5-a8634d667ba0n@googlegroups.com> (raw)
In-Reply-To: <CACTSqG5L4o7npfcUV1iGPMi5fbUrqgPc+5ttPQmZhNTFW7Vsng-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 18099 bytes --]

Dear William, dear Pablo,

thanks for your replies. It does indeed help me to understand the way such 
writers/filters work. But unfortunatley, I can't produce a working filter 
on my own.

@William: I tried using 'fig.caption' instead of 'fig.content.caption[1]'. 
But then I get a different error message: *Error running Lua: attempt to 
concatenate a table value (field 'caption')*.
Thus, I'm a little bit confused, since the html output at the end of your 
logging output looks like most fields are processed correctly.

@Pablo: Your filters are really interesting. The float code gave me some 
usefule insights. I also tried to adapt your Latex code:

``` lua
function Figure(elem)
  raw_elem = '\\begin{figure*}\n' ..
              '\\centering\n' .. 
              '\\includegraphics[width=0.9\\textwidth]{' .. elem.content .. 
'}\n' .. 
              '\\caption{' .. elem.caption .. '}\n' .. 
              '\\end{figure*}'
  return pandoc.RawBlock('latex', raw_elem)
end
```
I run it as filter instead as writer. But get the same error as above.

Generally, my biggest problem is that I don't fully understand how to 
access the values of the different AST fields. How, for example, can I use 
the values of the different fields of a Pandoc Figure Block 
<https://pandoc.org/lua-filters.html#type-figure>? Caption is there defined 
as key-value pair <https://pandoc.org/lua-filters.html#type-caption> for 
long and short captions, but as mentioned 'fig.caption' oder 'elem.caption' 
throws the table value error. Might that be due to the key-value form? And 
is there a difference accessing these things for a filter instead of a 
writer?

If someone has some informations on that, it would be great; even better 
with a (short) framework or example that shows how to access a value and 
put it in a framework with latex-code surrounded. The docs are very 
detailed, but I don't find an explanation of these basics which makes sense 
to me...

Thanks again and all the best!

Pablo Serrati schrieb am Montag, 11. Dezember 2023 um 15:53:38 UTC+1:

> Hello Lukeflo
>
> I have been working for a few months on a project to enable the 
> publication of scientific journals (https://github.com/estedeahora/guri). 
> The project has a workflow that takes `docx` and generates the final files 
> (`pdf`, `xml` and `html`), using an `markdown` (md) as an intermediate 
> file. One of the issues I had to address was the floating elements and in 
> particular the use of subcaptions for "sources" and "notes". While the 
> project is still in the development stage, I think some of the filters I 
> use may help you (see `files/filters/`). 
>
> So far we are using the proposal in the journal [Quid 16](
> https://publicaciones.sociales.uba.ar/index.php/quid16/issue/current) ) 
> if you are interested in seeing the results. My plan is to generate a 
> stable release that can be used by other journals in the coming months. I 
> am open to new ideas and contributions to improve the project.
>
> Best,
>
> Pablo 
>
> El lun, 11 dic 2023 a la(s) 08:30, 'William Lupton' via pandoc-discuss (
> pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org) escribió:
>
>> Luke,
>>
>> Filter functions take a single argument (the element in question) so you 
>> need the second form: Figure(fig).
>>
>> I think the crash is because the caption is in fig.caption (not in 
>> fig.content[1].caption). See 
>> https://pandoc.org/lua-filters.html#type-figure.
>>
>> Perhaps I can put in a plug for https://github.com/pandoc-ext/logging. I 
>> think you might find this helpful for gaining insight into the element 
>> structure. See example below.
>>
>> Hope this helps.
>>
>> Cheers,
>> William
>>
>> --------
>>
>> With this filter in rep.lua:
>>
>> local logging = require 'logging'
>>
>> function Pandoc(pandoc)
>>
>>     if logging.loglevel > 0 then
>>
>>         logging.temp('meta', pandoc.meta)
>>
>>     end
>>
>>     logging.temp('blocks', pandoc.blocks)
>>
>> end
>>
>> ...your input gives this:
>>
>> *% *pandoc luke.md -L rep.lua
>>
>> (#) blocks Blocks[3] {
>>
>>   [1] Header {
>>
>>     attr: Attr {
>>
>>       attributes: AttributeList {}
>>
>>       classes: List {}
>>
>>       identifier: "headline"
>>
>>     }
>>
>>     content: Inlines[1] {
>>
>>       [1] Str "Headline"
>>
>>     }
>>
>>     level: 1
>>
>>   }
>>
>>   [2] Para {
>>
>>     content: Inlines[3] {
>>
>>       [1] Str "Some"
>>
>>       [2] Space
>>
>>       [3] Str "text"
>>
>>     }
>>
>>   }
>>
>>   [3] Figure {
>>
>>     attr: Attr {
>>
>>       attributes: AttributeList {}
>>
>>       classes: List {}
>>
>>       identifier: ""
>>
>>     }
>>
>>     caption: {
>>
>>       long: Blocks[1] {
>>
>>         [1] Plain {
>>
>>           content: Inlines[7] {
>>
>>             [1] Str "caption"
>>
>>             [2] Space
>>
>>             [3] Str "to"
>>
>>             [4] Space
>>
>>             [5] Str "an"
>>
>>             [6] Space
>>
>>             [7] Str "image"
>>
>>           }
>>
>>         }
>>
>>       }
>>
>>     }
>>
>>     content: Blocks[1] {
>>
>>       [1] Plain {
>>
>>         content: Inlines[1] {
>>
>>           [1] Image {
>>
>>             attr: Attr {
>>
>>               attributes: AttributeList {}
>>
>>               classes: List {}
>>
>>               identifier: ""
>>
>>             }
>>
>>             caption: Inlines[7] {
>>
>>               [1] Str "caption"
>>
>>               [2] Space
>>
>>               [3] Str "to"
>>
>>               [4] Space
>>
>>               [5] Str "an"
>>
>>               [6] Space
>>
>>               [7] Str "image"
>>
>>             }
>>
>>             src: "counter_plot_new_periods.png"
>>
>>             title: ""
>>
>>           }
>>
>>         }
>>
>>       }
>>
>>     }
>>
>>   }
>>
>> }
>>
>> <h1 id="headline">Headline</h1>
>>
>> <p>Some text</p>
>>
>> <figure>
>>
>> <img src="counter_plot_new_periods.png" alt="caption to an image" />
>>
>> <figcaption aria-hidden="true">caption to an image</figcaption>
>>
>> </figure>
>>
>> On Mon, 11 Dec 2023 at 10:36, 'lukeflo' via pandoc-discuss <
>> pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> wrote:
>>
>>> PS: I know I'm using two very different approaches calling the Figure 
>>> function with arguments. Thats due to the fact that I'm not sure which way 
>>> is the right one...
>>>
>>> lukeflo schrieb am Montag, 11. Dezember 2023 um 11:33:21 UTC+1:
>>>
>>>> So far, I took the markdwon writer example from the pandoc docs 
>>>> <https://pandoc.org/custom-writers.html#example-modified-markdown-writer> 
>>>> to try out the general function of writers. It works and I think that I 
>>>> understand the general usage.
>>>>
>>>> But especially figures (in latex writer and presuambley in general) are 
>>>> relativley complex. Here are two things I tried out so far but always got 
>>>> an error:
>>>>
>>>> ``` lua
>>>> function Writer (doc, opts)
>>>>    local filter = {
>>>>       function Figure (caption, image, attr)
>>>>  local figcap = '\sidecaption{' .. caption .. '}'
>>>>  return '\\begin{figure}\n' .. image .. '\n' .. figcap .. '\n' 
>>>> '\\end{figure}\n'
>>>>       end
>>>>    }
>>>>    return pandoc.write(doc:walk(filter), 'latex', opts)
>>>> end
>>>> ```
>>>> If I run this writer with my custom template from the CLI using *pandoc 
>>>> --template=../custom.template -t test-writer.lua ast-test.txt -o 
>>>> ast-test.tex* I get 
>>>> *Error running Lua:test-writer.lua:27: '(' expected near 'Figure'*.
>>>>
>>>> Furthermore, I tried running the following code just to understand how 
>>>> those writer work. Here I just wanted to replace {figure} with the starred 
>>>> version {figure*} (not sidecaption):
>>>>
>>>> ``` lua
>>>> function Writer (doc, opts)
>>>>   local filter = {
>>>>      Figure = function (fig)
>>>> local tester = '\\begin{figure*}\n' .. 
>>>> fig.content[1].caption[1].attributes[1] .. '\\end{figure*}\n'
>>>> return pandoc.RawBlock('latex', tester)
>>>>     end
>>>>   }
>>>>   return pandoc.write(doc:walk(filter), 'latex', opts)
>>>> end
>>>> ```
>>>> But also got an error:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *Error running Lua:test-writer.lua:28: attempt to index a nil value 
>>>> (field 'caption')stack traceback: [C]: in ? [C]: in method 'walk' 
>>>> test-writer.lua:32: in function 'Writer'stack traceback: 
>>>> test-writer.lua:32: in function 'Writer'*
>>>>
>>>> I'm aware that I might be missing something very basic and maybe even 
>>>> very simple. But I'm kind of getting lost a little bit inside all 
>>>> functions, modules etc. as well as the general framework of such writers.
>>>>
>>>> Thus, any help explaining my errors and maybe suggesting some better 
>>>> code is very appreciated!
>>>>
>>>> The test file in both cases is very simple:
>>>>
>>>> ``` markdown
>>>> ---
>>>> title: A title
>>>> ---
>>>>
>>>> # Headline
>>>>
>>>> Some text
>>>>
>>>> ![caption to an image](counter_plot_new_periods.png)
>>>> ```
>>>>
>>>> Thanks in advance!
>>>> lukeflo schrieb am Freitag, 8. Dezember 2023 um 08:35:48 UTC+1:
>>>>
>>>>> Hello Julien,
>>>>>
>>>>> thanks for the reply. Unfortunatley, as mentioned in the stackoverflow 
>>>>> post, your suggested LaTeX code won't work.
>>>>>
>>>>> The \caption macro is very complex in the backend and cannot be 
>>>>> copied on the fly via \let, \NewCommandCopy or something similar. 
>>>>> Even after doing so with e.g. \NewCommandCopy{\oldcaption}{\caption} 
>>>>> and then setting \RenewDocumentCommand{\caption}{o 
>>>>> m}{\sidecaption[#1]{#2}} nothing changes and the definition of 
>>>>> \caption, checked with \meaning or something similar, stays the same 
>>>>> as before (even \DeclareDocumentCommand doesn't work).
>>>>>
>>>>> In the end, it might be possible to somehow change the \caption macro 
>>>>> itself. But the effort might not be worth the result (and its more of a 
>>>>> question for TeX.SE).
>>>>>
>>>>> Using a custom writer for building Latex figures and replace the 
>>>>> \caption string inside would be a great solution. I read through the writer 
>>>>> manual, but didn't really understand how the AST works and which values 
>>>>> have to be used in such a writer. Furthermore, I'm using a a custom Latex 
>>>>> template for exporting (based on the default.template.latex) which has to 
>>>>> be integrated with such a writer.
>>>>>
>>>>> Therefore, I really woud appreciate a Lua framework to understand 
>>>>> which functions have to be edited etc. to accomplish the substitution.
>>>>>
>>>>> Best
>>>>>
>>>>> Julien Dutant schrieb am Dienstag, 5. Dezember 2023 um 17:09:19 UTC+1:
>>>>>
>>>>>> Lua filters only change Pandoc's AST representation of your document, 
>>>>>> i.e. before it is then converted to LaTeX. A Raw block filter will not act 
>>>>>> on Pandoc's LaTeX output, but only on Raw LaTeX blocks that are in the 
>>>>>> markdown itself.
>>>>>>
>>>>>> A Pandoc solution would be to write a custom Lua *writer* 
>>>>>> <https://pandoc.org/custom-writers.html>. The writer would use 
>>>>>> pandoc.write to generate Pandoc's own LaTeX output (body only) and modify 
>>>>>> it with regular expressions or Lua patterns. To replace just a command name 
>>>>>> this is fairly easy, though longer than the third solution below.
>>>>>>
>>>>>> A LaTeX solution is to redefine \caption as \sidecaption:
>>>>>> \renewcommand{\caption}{\sidecaption} 
>>>>>>
>>>>>> You can keep this enclosed in groups ({...}) to ensure that the 
>>>>>> redefinition only applies locally.
>>>>>>
>>>>>> A hybrid Pandoc/LaTeX solution is a Lua filter that insert LaTeX code 
>>>>>> to redefine \caption around figures:
>>>>>>
>>>>>> ``` lua
>>>>>> if FORMAT:match 'latex' then
>>>>>>     function Figure (elem) return { 
>>>>>>         pandoc.RawBlock('latex',
>>>>>> '{\\renewcommand{\\caption}{\\subcaption}'), 
>>>>>>          elem, 
>>>>>>          pandoc.RawBlock('latex','}') 
>>>>>>        } 
>>>>>>    end 
>>>>>> end
>>>>>>
>>>>>> ```
>>>>>>
>>>>>> This replaces any 'Figure' block element by a list (succession) of 
>>>>>> three raw LaTeX blocks. The output should look like:
>>>>>> {\renewcommand{\caption}{\subcaption} 
>>>>>> ... Pandoc's LaTeX for the figure ... 
>>>>>> } 
>>>>>>
>>>>>> Reposted from 
>>>>>> https://stackoverflow.com/questions/77504584/pandoc-md-latex-write-lua-filter-to-change-latex-macro-used-for-caption/77607636#77607636
>>>>>>
>>>>>> On Monday, November 20, 2023 at 7:06:57 AM UTC+11 lukeflo wrote:
>>>>>>
>>>>>>> Hi everybody,
>>>>>>>
>>>>>>> I have written a custom latex `.cls' file to establish a typesetting
>>>>>>> workflow for the scientific journals of my research institute. The 
>>>>>>> texts
>>>>>>> should be written in Markdown and then be processed with `pandoc' to
>>>>>>> LaTeX.
>>>>>>>
>>>>>>> I already have an elaborated pandoc template to produce the LaTeX
>>>>>>> preambel etc. So far its working great.
>>>>>>>
>>>>>>> But for the figures I need the caption from the Markdown file to be 
>>>>>>> set
>>>>>>> with `\sidecaption' instead of `\caption' in LaTeX, as well as with 
>>>>>>> an
>>>>>>> optional argument (short-caption) for the image attribution in the 
>>>>>>> list
>>>>>>> of figures.
>>>>>>>
>>>>>>> To get the latter working I use the following template from a GitHub
>>>>>>> discussion in the [pandoc repo]:
>>>>>>>
>>>>>>> ┌────
>>>>>>> │ PANDOC_VERSION:must_be_at_least '3.1'
>>>>>>> │
>>>>>>> │ if FORMAT:match 'latex' then
>>>>>>> │   function Figure(f)
>>>>>>> │     local short = 
>>>>>>> f.content[1].content[1].attributes['short-caption']
>>>>>>> │     if short and not f.caption.short then
>>>>>>> │       f.caption.short = pandoc.Inlines(short)
>>>>>>> │     end
>>>>>>> │     return f
>>>>>>> │   end
>>>>>>> │ end
>>>>>>> └────
>>>>>>>
>>>>>>> That works without any flaws.
>>>>>>>
>>>>>>> But now I need to figure out how to change the LaTeX macro used for 
>>>>>>> the
>>>>>>> caption. The older [approach of pre pandoc version 3.0 posted] by 
>>>>>>> tarleb
>>>>>>> is really intuitive and I could have easily adapted it to my needs. 
>>>>>>> But
>>>>>>> since pandoc 3.0 there is the new [/complex figures/] approach and, 
>>>>>>> so
>>>>>>> far, I couldn't figure out how to change the LaTeX macro used for the
>>>>>>> captions with this new behaviour.
>>>>>>>
>>>>>>> I tried something like that (Adapted from [here]:
>>>>>>>
>>>>>>> ┌────
>>>>>>> │ if FORMAT:match 'latex' then
>>>>>>> │   function RawBlock (raw)
>>>>>>> │     local caption = raw.text:match('\\caption')
>>>>>>> │     if caption then
>>>>>>> │        raw:gsub('\\caption', '\\sidecaption')
>>>>>>> │     end
>>>>>>> │     return raw
>>>>>>> │   end
>>>>>>> │ end
>>>>>>> └────
>>>>>>>
>>>>>>> But nothing happened.
>>>>>>>
>>>>>>> The main challenge for me are my more-or-less non-existing lua 
>>>>>>> skills. I
>>>>>>> just never had to use it for my daily tasks. I thought about using 
>>>>>>> `awk'
>>>>>>> or `sed' to edit the `.tex' file itself using a regex-substitution, 
>>>>>>> but
>>>>>>> that should remain an absolute stopgap, since it makes the whole
>>>>>>> workflow less portable.
>>>>>>>
>>>>>>> Thus, I'm hoping for a hint/a solution in form of a pandoc-lua script
>>>>>>> which 1. helps me to achieve the goal, and 2. improve my 
>>>>>>> understanding
>>>>>>> of lua and the /complex figures/ approach for similar future tasks.
>>>>>>>
>>>>>>> I appreciate any tipp!
>>>>>>>
>>>>>>> Best,
>>>>>>> Lukeflo
>>>>>>>
>>>>>>> This question is also posted on StackOverFlow: 
>>>>>>> https://stackoverflow.com/q/77504584/19647155
>>>>>>>
>>>>>>> [pandoc repo]
>>>>>>> <https://github.com/jgm/pandoc/issues/7915#issuecomment-1427113349>
>>>>>>>
>>>>>>> [approach of pre pandoc version 3.0 posted]
>>>>>>> <https://github.com/jgm/pandoc/issues/7915#issuecomment-1039370851>
>>>>>>>
>>>>>>> [/complex figures/] <https://github.com/jgm/pandoc/releases?page=2>
>>>>>>>
>>>>>>> [here] <https://stackoverflow.com/a/71296595/19647155>
>>>>>>>
>>>>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "pandoc-discuss" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/pandoc-discuss/32dfe8eb-98ac-40ee-92d7-162528add367n%40googlegroups.com 
>>> <https://groups.google.com/d/msgid/pandoc-discuss/32dfe8eb-98ac-40ee-92d7-162528add367n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/pandoc-discuss/CAEe_xxikobOS_G9x71nxtz0dr99VVhgBV8in%3DxKzXgh7JMaRcw%40mail.gmail.com 
>> <https://groups.google.com/d/msgid/pandoc-discuss/CAEe_xxikobOS_G9x71nxtz0dr99VVhgBV8in%3DxKzXgh7JMaRcw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/78503784-88bd-4d34-b5f5-a8634d667ba0n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 54981 bytes --]

  parent reply	other threads:[~2023-12-11 22:04 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-19 20:06 'lukeflo' via pandoc-discuss
     [not found] ` <51ca8210-3d60-4d5d-9af2-04c85995deb6n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-12-05 16:09   ` Julien Dutant
     [not found]     ` <f3fa2d12-6277-47c6-a3fc-b5fea1485600n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-12-08  7:35       ` 'lukeflo' via pandoc-discuss
     [not found]         ` <b565fdd5-8216-4596-a2ed-c75019aad172n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-12-11 10:33           ` 'lukeflo' via pandoc-discuss
     [not found]             ` <b419ae83-de83-4035-97cd-fb41cb6be647n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-12-11 10:36               ` 'lukeflo' via pandoc-discuss
     [not found]                 ` <32dfe8eb-98ac-40ee-92d7-162528add367n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-12-11 11:30                   ` 'William Lupton' via pandoc-discuss
     [not found]                     ` <CAEe_xxikobOS_G9x71nxtz0dr99VVhgBV8in=xKzXgh7JMaRcw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2023-12-11 14:53                       ` Pablo Serrati
     [not found]                         ` <CACTSqG5L4o7npfcUV1iGPMi5fbUrqgPc+5ttPQmZhNTFW7Vsng-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2023-12-11 22:04                           ` 'lukeflo' via pandoc-discuss [this message]
     [not found]                             ` <78503784-88bd-4d34-b5f5-a8634d667ba0n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-12-12 10:16                               ` 'William Lupton' via pandoc-discuss

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=78503784-88bd-4d34-b5f5-a8634d667ba0n@googlegroups.com \
    --to=pandoc-discuss-/jypxa39uh5tlh3mbocffw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).