>What do you get if you run pandoc with -f latex+raw_tex -t native without any filter? My guess is that it is one of these:
>
> 1.  The whole tabular ends up inside a huge RawBlock.
>
> 2.  The \makecell command ends up inside a RawInline or RawBlock and doesn't get rendered in the output.
>
> 3. #2 + the regex doesn't see the \IPA command because it is inside the \makecell command.

It looks like it's #3, everything within the makecell command is just completely missing from the document.

Redefining \IPA and \makecell in latex fixes the problem as far as pandoc is concerned. But I am using these commands to generate the latex output that I want, so I don't want the redefinitions permanently within the latex source file. Is there a way to configure pandoc to only insert these when pandoc is processing the .tex file, so I have correct tables and no strikethrough when xelatex is processing the file?

On Monday, December 6, 2021 at 2:34:27 AM UTC-8 BPJ wrote:
What do you get if you run pandoc with -f latex+raw_tex -t native without any filter? My guess is that it is one of these:

1.  The whole tabular ends up inside a huge RawBlock.

2.  The \makecell command ends up inside a RawInline or RawBlock and doesn't get rendered in the output.

3. #2 + the regex doesn't see the \IPA command because it is inside the \makecell command.

Also you may need a non-greedy regex: "\\IPA{(.*?)}" — and you may need the regex module for that to work.

Please try putting these definitions at the top of your document body[^0]:

``````latex
\usepackage[normalem]{ulem}

\renewcommand{\makecell}[1]{#1}

\renewcommand{\IPA}[1]{\sout{#1}}
``````

Then save the Lua code below to a file sout2ipa.lua in the current directory and run pandoc with -f latex -t html -L sout2ipa.lua

``````lua
function Strikeout (elem)
  return pandoc.Span(elem.content, { class = 'IPA' })
end
``````

Now you should get all your IPA nicely inside spans with class "IPA".

There is a gotcha: this trick requires that you don't have any actual strikeout text in your document.

@jgm there really should be an extension which makes the LaTeX reader recognise a pseudocommand `\PandocSpan{attrA=value, attrB={long value}}{content}` so that one could do redefinitions like those below and get native spans in the Pandoc AST.

``````latex
\renewcommand{\IPA}[1]{\PandocSpan{class=IPA}{#1}}

\renewcommand{\TakesTwo}[2]{\PandocSpan{class=foo}{\PandocSpan{data-foo=1}{#1}\PandocSpan{data-foo=2}{#2}}}

\renewcommand{\TakesKeyVals}[2][]{\PandocSpan{#1, class=bar}{#2}
``````

where the reader wi convert any keyval-style content in the first argument to span attributes, with later ones overriding.

(And possibly an analogous PandocDiv command, working somewhat like the `\NewEnviron` command of the LaTeX environ package[^1] with a pseudo-command `\BODY` (or `\DIV` so as to not clash with environ!) which gets replaced with the content of the div.)

Even if such a structure isn't usable on its own it would be much easier to modify it with filters.

/bpj

[^0]: I'm not sure that \renewcommand works but since I am on my phone ATM I can't check. If not comment out original \newcommand and/or \usepackage commands and define substitute commands with \newcommand as appropriate.



Den mån 6 dec. 2021 05:26Greg S <elorian...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev:
Okay I've written a filter:

```
#!/usr/bin/python
import logging
import re
from pandocfilters import toJSONFilter, Emph, Para, RawInline

ipa_regex = re.compile("\\\IPA{(.*)}")

def handle(key, value, format, meta):
    logging.warning(f"KEY {key} VALUE {value} format {format} META {meta}")
    if key == "RawInline":
          if m := ipa_regex.match(value[1]):
               return RawInline('html', f"{m.group(1)}")

if __name__ == "__main__":
    toJSONFilter(handle)
```

and with the `-f latex+raw_tex` option passed to pandoc it looks like this is correctly capturing the text in the IPA macro.

However, I noticed that the filter completely skips over text in the \IPA macro if that macro occurs within a latex table defined with \begin{tabular}. I'm using the
makecell latex package and wrapping the cells with the \makecell command (i.e. `\makecell { \IPA{ some text } }`, but I tried removing the \makecell and the IPA macro still gets skipped in this context.


On Sunday, December 5, 2021 at 12:12:44 PM UTC-8 John MacFarlane wrote:

I should have mentioned before that you'll need to enable
the `raw_tex` extension as shown above, to allow inclusion
of RawBlock or RawInline.

% pandoc -t native -f latex+raw_tex
\IPA{hi} there
^D
[ Para
[ RawInline (Format "latex") "\\IPA{hi}"
, Space
, Str "there"
]
]


Greg S <elorian...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> How can I write a filter that matches RawInline elements if the filter
> applies after the unknown latex macros have been applied in the parsing
> stage? I'm not seeing the text within the \IPA macro at all in the logging
> from the test filter I wrote - is there something I need to do to make that
> filter apply earlier?
>
> On Sunday, December 5, 2021 at 10:56:51 AM UTC-8 John MacFarlane wrote:
>
>>
>> You can't insert the macro with a filter, because the filter
>> is applied after parsing, and the macro would be resolved in
>> the parsing phase.
>>
>> However, you could have a filter that matches RawInline
>> elements that are "\IPA" commands, extracts their textual
>> content, and returns a Str element.
>>
>> Greg S <elorian...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>>
>> > Is there a way I can tell pandoc to insert a new Latex macro before
>> > processing that doesn't exist in the document? Using
>> > \renewcommand{\IPA}[1]{#1} makes the text appear in the output of the
>> latex
>> > -> html conversion, but it breaks the formatting I care about in the pdf
>> > version so I don't want to have that line permanently in the latex source
>> > file.
>> >
>> > I think I'd ultimately like to use a filter to intercept the raw latex
>> from
>> > \IPA{...} and do something specific with it in HTML (probably put it
>> within
>> > a <span class="IPA"> tag). I also have some other latex macros from
>> > specific packages that pandoc doesn't seem to understand, that I'd like
>> to
>> > handle in a custom way. I tried creating a simple logging Python filter
>> > just to understand how they work.
>> >
>> > ```
>> > #!/usr/bin/python
>> > import logging
>> > from pandocfilters import toJSONFilter, Emph, Para
>> >
>> > def handle(key, value, format, meta):
>> > logging.warn(f"KEY {key} VALUE {value} format {format} META {meta}")
>> >
>> > if __name__ == "__main__":
>> > toJSONFilter(handle)
>> > ```
>> > And then running `pandoc --pdf-engine=xelatex --verbose test.tex -o
>> > test.html --filter filter.py`.
>> >
>> > But it seems like latex macros that pandoc doesn't understand are getting
>> > skipped before the filter is applied, so the `handle` function never gets
>> > called with the text contents of my \IPA macro.
>> >
>> > On Saturday, December 4, 2021 at 9:37:16 AM UTC-8 John MacFarlane wrote:
>> >
>> >>
>> >> Pandoc doesn't understand everything, especially outside of
>> >> core LaTeX. In particular, it doesn't understand
>> >>
>> >> \DeclareTextFontCommand
>> >>
>> >> from fontspec, so the \IPA macro isn't understood.
>> >>
>> >> You can work around this by adding your own macro
>> >> definition before you convert with pandoc:
>> >>
>> >> \renewcommand{\IPA}[1]{#1}
>> >>
>> >> and then the contents of \IPA will just be passed
>> >> through.
>> >>
>> >> I suppose you could alternatively redefine
>> >>
>> >> \renewcommand{\DeclareTextFontCommand}[2]{\newcommand{#1}[1]{##1}}
>> >>
>> >> before your fontspec stuff (untested and may not work).
>> >>
>> >> Another option is to use a filter and intercept the raw
>> >> LaTeX inline produced from \IPA{some text}, changing it
>> >> into textual content, but I think the first approach above
>> >> is the simplest.
>> >>
>> >>
>> >>
>> >> Greg S <elorian...@gmail.com> writes:
>> >>
>> >> > I have a minimal test latex file `test.tex`:
>> >> >
>> >> >
>> >> > \documentclass{article}
>> >> >
>> >> > \usepackage{fontspec}
>> >> >
>> >> > \newfontfamily\IPAFont{Doulos SIL}
>> >> > \DeclareTextFontCommand{\IPA}{\IPAFont}
>> >> >
>> >> > \begin{document}
>> >> >
>> >> > \section{Test}
>> >> > Hello \IPA{some IPA}
>> >> >
>> >> > \end{document}
>> >> >
>> >> >
>> >> > This builds fine with xelatex and produces a pdf I expect. When i try
>> to
>> >> > convert this to an html document with `pandoc --pdf-engine=xelatex
>> >> > --verbose test.tex -o test.html`, I see the warnings:
>> >> >
>> >> > [INFO] Could not load include file fontspec.sty at test.tex line 3
>> >> column 22
>> >> > [INFO] Skipped '\newfontfamily' at test.tex line 5 column 15
>> >> > [INFO] Skipped '\IPAFont{Doulos SIL}' at test.tex line 5 column 35
>> >> > [INFO] Skipped '\DeclareTextFontCommand{\IPA}{\IPAFont}' at test.tex
>> >> line 6
>> >> > column 40
>> >> > [INFO] Skipped '\IPA{some IPA}' at test.tex line 11 column 21
>> >> >
>> >> > And the text within the custom \IPA command is skipped. How can I make
>> >> > pandoc not skip these?
>> >> >
>> >> >
>> >> > --
>> >> > You received this message because you are subscribed to the Google
>> >> Groups "pandoc-discuss" group.
>> >> > To unsubscribe from this group and stop receiving emails from it, send
>> >> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
>> >> > To view this discussion on the web visit
>> >>
>> https://groups.google.com/d/msgid/pandoc-discuss/0462fc42-ae24-4c52-b267-1126ed5834edn%40googlegroups.com
>> >> .
>> >>
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> Groups "pandoc-discuss" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> an email to pandoc-discus...@googlegroups.com.
>> > To view this discussion on the web visit
>> https://groups.google.com/d/msgid/pandoc-discuss/bac7947b-259e-4774-b993-33f69fffc05fn%40googlegroups.com
>> .
>>
>
> --
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discus...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/84e207d9-eaed-4b24-8b6b-62ea07bb2b5bn%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discus...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/4f3956c3-e028-473c-b622-dae2f0b72dedn%40googlegroups.com.