public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: Pablo Serrati <pabloserrati-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Subject: Re: Lua filter to change macro for figure caption md -> latex
Date: Mon, 11 Dec 2023 11:53:21 -0300	[thread overview]
Message-ID: <CACTSqG5L4o7npfcUV1iGPMi5fbUrqgPc+5ttPQmZhNTFW7Vsng@mail.gmail.com> (raw)
In-Reply-To: <CAEe_xxikobOS_G9x71nxtz0dr99VVhgBV8in=xKzXgh7JMaRcw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 15402 bytes --]

Hello Lukeflo

I have been working for a few months on a project to enable the publication
of scientific journals (https://github.com/estedeahora/guri). The project
has a workflow that takes `docx` and generates the final files (`pdf`,
`xml` and `html`), using an `markdown` (md) as an intermediate file. One of
the issues I had to address was the floating elements and in particular the
use of subcaptions for "sources" and "notes". While the project is still in
the development stage, I think some of the filters I use may help you (see
`files/filters/`).

So far we are using the proposal in the journal [Quid 16](
https://publicaciones.sociales.uba.ar/index.php/quid16/issue/current) ) if
you are interested in seeing the results. My plan is to generate a stable
release that can be used by other journals in the coming months. I am open
to new ideas and contributions to improve the project.

Best,

Pablo

El lun, 11 dic 2023 a la(s) 08:30, 'William Lupton' via pandoc-discuss (
pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org) escribió:

> Luke,
>
> Filter functions take a single argument (the element in question) so you
> need the second form: Figure(fig).
>
> I think the crash is because the caption is in fig.caption (not in
> fig.content[1].caption). See
> https://pandoc.org/lua-filters.html#type-figure.
>
> Perhaps I can put in a plug for https://github.com/pandoc-ext/logging. I
> think you might find this helpful for gaining insight into the element
> structure. See example below.
>
> Hope this helps.
>
> Cheers,
> William
>
> --------
>
> With this filter in rep.lua:
>
> local logging = require 'logging'
>
> function Pandoc(pandoc)
>
>     if logging.loglevel > 0 then
>
>         logging.temp('meta', pandoc.meta)
>
>     end
>
>     logging.temp('blocks', pandoc.blocks)
>
> end
>
> ...your input gives this:
>
> *% *pandoc luke.md -L rep.lua
>
> (#) blocks Blocks[3] {
>
>   [1] Header {
>
>     attr: Attr {
>
>       attributes: AttributeList {}
>
>       classes: List {}
>
>       identifier: "headline"
>
>     }
>
>     content: Inlines[1] {
>
>       [1] Str "Headline"
>
>     }
>
>     level: 1
>
>   }
>
>   [2] Para {
>
>     content: Inlines[3] {
>
>       [1] Str "Some"
>
>       [2] Space
>
>       [3] Str "text"
>
>     }
>
>   }
>
>   [3] Figure {
>
>     attr: Attr {
>
>       attributes: AttributeList {}
>
>       classes: List {}
>
>       identifier: ""
>
>     }
>
>     caption: {
>
>       long: Blocks[1] {
>
>         [1] Plain {
>
>           content: Inlines[7] {
>
>             [1] Str "caption"
>
>             [2] Space
>
>             [3] Str "to"
>
>             [4] Space
>
>             [5] Str "an"
>
>             [6] Space
>
>             [7] Str "image"
>
>           }
>
>         }
>
>       }
>
>     }
>
>     content: Blocks[1] {
>
>       [1] Plain {
>
>         content: Inlines[1] {
>
>           [1] Image {
>
>             attr: Attr {
>
>               attributes: AttributeList {}
>
>               classes: List {}
>
>               identifier: ""
>
>             }
>
>             caption: Inlines[7] {
>
>               [1] Str "caption"
>
>               [2] Space
>
>               [3] Str "to"
>
>               [4] Space
>
>               [5] Str "an"
>
>               [6] Space
>
>               [7] Str "image"
>
>             }
>
>             src: "counter_plot_new_periods.png"
>
>             title: ""
>
>           }
>
>         }
>
>       }
>
>     }
>
>   }
>
> }
>
> <h1 id="headline">Headline</h1>
>
> <p>Some text</p>
>
> <figure>
>
> <img src="counter_plot_new_periods.png" alt="caption to an image" />
>
> <figcaption aria-hidden="true">caption to an image</figcaption>
>
> </figure>
>
> On Mon, 11 Dec 2023 at 10:36, 'lukeflo' via pandoc-discuss <
> pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> wrote:
>
>> PS: I know I'm using two very different approaches calling the Figure
>> function with arguments. Thats due to the fact that I'm not sure which way
>> is the right one...
>>
>> lukeflo schrieb am Montag, 11. Dezember 2023 um 11:33:21 UTC+1:
>>
>>> So far, I took the markdwon writer example from the pandoc docs
>>> <https://pandoc.org/custom-writers.html#example-modified-markdown-writer>
>>> to try out the general function of writers. It works and I think that I
>>> understand the general usage.
>>>
>>> But especially figures (in latex writer and presuambley in general) are
>>> relativley complex. Here are two things I tried out so far but always got
>>> an error:
>>>
>>> ``` lua
>>> function Writer (doc, opts)
>>>    local filter = {
>>>       function Figure (caption, image, attr)
>>>  local figcap = '\sidecaption{' .. caption .. '}'
>>>  return '\\begin{figure}\n' .. image .. '\n' .. figcap .. '\n'
>>> '\\end{figure}\n'
>>>       end
>>>    }
>>>    return pandoc.write(doc:walk(filter), 'latex', opts)
>>> end
>>> ```
>>> If I run this writer with my custom template from the CLI using *pandoc
>>> --template=../custom.template -t test-writer.lua ast-test.txt -o
>>> ast-test.tex* I get
>>> *Error running Lua:test-writer.lua:27: '(' expected near 'Figure'*.
>>>
>>> Furthermore, I tried running the following code just to understand how
>>> those writer work. Here I just wanted to replace {figure} with the starred
>>> version {figure*} (not sidecaption):
>>>
>>> ``` lua
>>> function Writer (doc, opts)
>>>   local filter = {
>>>      Figure = function (fig)
>>> local tester = '\\begin{figure*}\n' ..
>>> fig.content[1].caption[1].attributes[1] .. '\\end{figure*}\n'
>>> return pandoc.RawBlock('latex', tester)
>>>     end
>>>   }
>>>   return pandoc.write(doc:walk(filter), 'latex', opts)
>>> end
>>> ```
>>> But also got an error:
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *Error running Lua:test-writer.lua:28: attempt to index a nil value
>>> (field 'caption')stack traceback: [C]: in ? [C]: in method 'walk'
>>> test-writer.lua:32: in function 'Writer'stack traceback:
>>> test-writer.lua:32: in function 'Writer'*
>>>
>>> I'm aware that I might be missing something very basic and maybe even
>>> very simple. But I'm kind of getting lost a little bit inside all
>>> functions, modules etc. as well as the general framework of such writers.
>>>
>>> Thus, any help explaining my errors and maybe suggesting some better
>>> code is very appreciated!
>>>
>>> The test file in both cases is very simple:
>>>
>>> ``` markdown
>>> ---
>>> title: A title
>>> ---
>>>
>>> # Headline
>>>
>>> Some text
>>>
>>> ![caption to an image](counter_plot_new_periods.png)
>>> ```
>>>
>>> Thanks in advance!
>>> lukeflo schrieb am Freitag, 8. Dezember 2023 um 08:35:48 UTC+1:
>>>
>>>> Hello Julien,
>>>>
>>>> thanks for the reply. Unfortunatley, as mentioned in the stackoverflow
>>>> post, your suggested LaTeX code won't work.
>>>>
>>>> The \caption macro is very complex in the backend and cannot be copied
>>>> on the fly via \let, \NewCommandCopy or something similar. Even after
>>>> doing so with e.g. \NewCommandCopy{\oldcaption}{\caption} and then
>>>> setting \RenewDocumentCommand{\caption}{o m}{\sidecaption[#1]{#2}}
>>>> nothing changes and the definition of \caption, checked with \meaning
>>>> or something similar, stays the same as before (even
>>>> \DeclareDocumentCommand doesn't work).
>>>>
>>>> In the end, it might be possible to somehow change the \caption macro
>>>> itself. But the effort might not be worth the result (and its more of a
>>>> question for TeX.SE).
>>>>
>>>> Using a custom writer for building Latex figures and replace the
>>>> \caption string inside would be a great solution. I read through the writer
>>>> manual, but didn't really understand how the AST works and which values
>>>> have to be used in such a writer. Furthermore, I'm using a a custom Latex
>>>> template for exporting (based on the default.template.latex) which has to
>>>> be integrated with such a writer.
>>>>
>>>> Therefore, I really woud appreciate a Lua framework to understand which
>>>> functions have to be edited etc. to accomplish the substitution.
>>>>
>>>> Best
>>>>
>>>> Julien Dutant schrieb am Dienstag, 5. Dezember 2023 um 17:09:19 UTC+1:
>>>>
>>>>> Lua filters only change Pandoc's AST representation of your document,
>>>>> i.e. before it is then converted to LaTeX. A Raw block filter will not act
>>>>> on Pandoc's LaTeX output, but only on Raw LaTeX blocks that are in the
>>>>> markdown itself.
>>>>>
>>>>> A Pandoc solution would be to write a custom Lua *writer*
>>>>> <https://pandoc.org/custom-writers.html>. The writer would use
>>>>> pandoc.write to generate Pandoc's own LaTeX output (body only) and modify
>>>>> it with regular expressions or Lua patterns. To replace just a command name
>>>>> this is fairly easy, though longer than the third solution below.
>>>>>
>>>>> A LaTeX solution is to redefine \caption as \sidecaption:
>>>>> \renewcommand{\caption}{\sidecaption}
>>>>>
>>>>> You can keep this enclosed in groups ({...}) to ensure that the
>>>>> redefinition only applies locally.
>>>>>
>>>>> A hybrid Pandoc/LaTeX solution is a Lua filter that insert LaTeX code
>>>>> to redefine \caption around figures:
>>>>>
>>>>> ``` lua
>>>>> if FORMAT:match 'latex' then
>>>>>     function Figure (elem) return {
>>>>>         pandoc.RawBlock('latex',
>>>>> '{\\renewcommand{\\caption}{\\subcaption}'),
>>>>>          elem,
>>>>>          pandoc.RawBlock('latex','}')
>>>>>        }
>>>>>    end
>>>>> end
>>>>>
>>>>> ```
>>>>>
>>>>> This replaces any 'Figure' block element by a list (succession) of
>>>>> three raw LaTeX blocks. The output should look like:
>>>>> {\renewcommand{\caption}{\subcaption}
>>>>> ... Pandoc's LaTeX for the figure ...
>>>>> }
>>>>>
>>>>> Reposted from
>>>>> https://stackoverflow.com/questions/77504584/pandoc-md-latex-write-lua-filter-to-change-latex-macro-used-for-caption/77607636#77607636
>>>>>
>>>>> On Monday, November 20, 2023 at 7:06:57 AM UTC+11 lukeflo wrote:
>>>>>
>>>>>> Hi everybody,
>>>>>>
>>>>>> I have written a custom latex `.cls' file to establish a typesetting
>>>>>> workflow for the scientific journals of my research institute. The
>>>>>> texts
>>>>>> should be written in Markdown and then be processed with `pandoc' to
>>>>>> LaTeX.
>>>>>>
>>>>>> I already have an elaborated pandoc template to produce the LaTeX
>>>>>> preambel etc. So far its working great.
>>>>>>
>>>>>> But for the figures I need the caption from the Markdown file to be
>>>>>> set
>>>>>> with `\sidecaption' instead of `\caption' in LaTeX, as well as with an
>>>>>> optional argument (short-caption) for the image attribution in the
>>>>>> list
>>>>>> of figures.
>>>>>>
>>>>>> To get the latter working I use the following template from a GitHub
>>>>>> discussion in the [pandoc repo]:
>>>>>>
>>>>>> ┌────
>>>>>> │ PANDOC_VERSION:must_be_at_least '3.1'
>>>>>> │
>>>>>> │ if FORMAT:match 'latex' then
>>>>>> │   function Figure(f)
>>>>>> │     local short =
>>>>>> f.content[1].content[1].attributes['short-caption']
>>>>>> │     if short and not f.caption.short then
>>>>>> │       f.caption.short = pandoc.Inlines(short)
>>>>>> │     end
>>>>>> │     return f
>>>>>> │   end
>>>>>> │ end
>>>>>> └────
>>>>>>
>>>>>> That works without any flaws.
>>>>>>
>>>>>> But now I need to figure out how to change the LaTeX macro used for
>>>>>> the
>>>>>> caption. The older [approach of pre pandoc version 3.0 posted] by
>>>>>> tarleb
>>>>>> is really intuitive and I could have easily adapted it to my needs.
>>>>>> But
>>>>>> since pandoc 3.0 there is the new [/complex figures/] approach and, so
>>>>>> far, I couldn't figure out how to change the LaTeX macro used for the
>>>>>> captions with this new behaviour.
>>>>>>
>>>>>> I tried something like that (Adapted from [here]:
>>>>>>
>>>>>> ┌────
>>>>>> │ if FORMAT:match 'latex' then
>>>>>> │   function RawBlock (raw)
>>>>>> │     local caption = raw.text:match('\\caption')
>>>>>> │     if caption then
>>>>>> │        raw:gsub('\\caption', '\\sidecaption')
>>>>>> │     end
>>>>>> │     return raw
>>>>>> │   end
>>>>>> │ end
>>>>>> └────
>>>>>>
>>>>>> But nothing happened.
>>>>>>
>>>>>> The main challenge for me are my more-or-less non-existing lua
>>>>>> skills. I
>>>>>> just never had to use it for my daily tasks. I thought about using
>>>>>> `awk'
>>>>>> or `sed' to edit the `.tex' file itself using a regex-substitution,
>>>>>> but
>>>>>> that should remain an absolute stopgap, since it makes the whole
>>>>>> workflow less portable.
>>>>>>
>>>>>> Thus, I'm hoping for a hint/a solution in form of a pandoc-lua script
>>>>>> which 1. helps me to achieve the goal, and 2. improve my understanding
>>>>>> of lua and the /complex figures/ approach for similar future tasks.
>>>>>>
>>>>>> I appreciate any tipp!
>>>>>>
>>>>>> Best,
>>>>>> Lukeflo
>>>>>>
>>>>>> This question is also posted on StackOverFlow:
>>>>>> https://stackoverflow.com/q/77504584/19647155
>>>>>>
>>>>>> [pandoc repo]
>>>>>> <https://github.com/jgm/pandoc/issues/7915#issuecomment-1427113349>
>>>>>>
>>>>>> [approach of pre pandoc version 3.0 posted]
>>>>>> <https://github.com/jgm/pandoc/issues/7915#issuecomment-1039370851>
>>>>>>
>>>>>> [/complex figures/] <https://github.com/jgm/pandoc/releases?page=2>
>>>>>>
>>>>>> [here] <https://stackoverflow.com/a/71296595/19647155>
>>>>>>
>>>>> --
>> You received this message because you are subscribed to the Google Groups
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/pandoc-discuss/32dfe8eb-98ac-40ee-92d7-162528add367n%40googlegroups.com
>> <https://groups.google.com/d/msgid/pandoc-discuss/32dfe8eb-98ac-40ee-92d7-162528add367n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/CAEe_xxikobOS_G9x71nxtz0dr99VVhgBV8in%3DxKzXgh7JMaRcw%40mail.gmail.com
> <https://groups.google.com/d/msgid/pandoc-discuss/CAEe_xxikobOS_G9x71nxtz0dr99VVhgBV8in%3DxKzXgh7JMaRcw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CACTSqG5L4o7npfcUV1iGPMi5fbUrqgPc%2B5ttPQmZhNTFW7Vsng%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 49373 bytes --]

  parent reply	other threads:[~2023-12-11 14:53 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-19 20:06 'lukeflo' via pandoc-discuss
     [not found] ` <51ca8210-3d60-4d5d-9af2-04c85995deb6n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-12-05 16:09   ` Julien Dutant
     [not found]     ` <f3fa2d12-6277-47c6-a3fc-b5fea1485600n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-12-08  7:35       ` 'lukeflo' via pandoc-discuss
     [not found]         ` <b565fdd5-8216-4596-a2ed-c75019aad172n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-12-11 10:33           ` 'lukeflo' via pandoc-discuss
     [not found]             ` <b419ae83-de83-4035-97cd-fb41cb6be647n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-12-11 10:36               ` 'lukeflo' via pandoc-discuss
     [not found]                 ` <32dfe8eb-98ac-40ee-92d7-162528add367n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-12-11 11:30                   ` 'William Lupton' via pandoc-discuss
     [not found]                     ` <CAEe_xxikobOS_G9x71nxtz0dr99VVhgBV8in=xKzXgh7JMaRcw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2023-12-11 14:53                       ` Pablo Serrati [this message]
     [not found]                         ` <CACTSqG5L4o7npfcUV1iGPMi5fbUrqgPc+5ttPQmZhNTFW7Vsng-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2023-12-11 22:04                           ` 'lukeflo' via pandoc-discuss
     [not found]                             ` <78503784-88bd-4d34-b5f5-a8634d667ba0n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-12-12 10:16                               ` 'William Lupton' via pandoc-discuss

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACTSqG5L4o7npfcUV1iGPMi5fbUrqgPc+5ttPQmZhNTFW7Vsng@mail.gmail.com \
    --to=pabloserrati-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).