From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/29689 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: BPJ Newsgroups: gmane.text.pandoc Subject: Re: Skipping commands in LaTeX document Date: Mon, 6 Dec 2021 11:34:14 +0100 Message-ID: References: <0462fc42-ae24-4c52-b267-1126ed5834edn@googlegroups.com> <84e207d9-eaed-4b24-8b6b-62ea07bb2b5bn@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="000000000000d9370505d277ccf8" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="23317"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBCWMVYEK54FRBMGOW6GQMGQEMZEDYDA-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mon Dec 06 11:34:28 2021 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-lj1-f191.google.com ([209.85.208.191]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1muBKF-0005rV-Qb for gtp-pandoc-discuss@m.gmane-mx.org; Mon, 06 Dec 2021 11:34:27 +0100 Original-Received: by mail-lj1-f191.google.com with SMTP id q19-20020a05651c055300b0021a259ae8bbsf3256749ljp.4 for ; Mon, 06 Dec 2021 02:34:27 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1638786866; cv=pass; d=google.com; s=arc-20160816; b=E2JIdk/WxYQsItO41aiRhq68EgaNBSJ9t1TaFnjVATLLpl8yPRtBYpwKK8AKMtMLZ1 I2HXI86+pWXiD7QDFMREa7R3L5Qe7YTqbquTOgJzlfD01CBFDTxIQHgfPOb7IBxfdhwE G6SQjOqjeXR+qJNOuYiOfyD4mVBIw/+0yAYbOhrjXgBAhfOoSA1bCfe9nf3r6WUPpFm8 vbB4j6bupM4Kp0ywwlcnA+6jiOY5LbpExmBdHNbgqtkuQlT5EdBQSAO2Wn2Q0AAEmS3R mA6jl3E0HVD9d1dNC9VhkivV2H9VNlHClUSj3GDesDcb7CRtWFR1YIpDpTHlmjCq8QhG bDWQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:to:subject:message-id:date:from :reply-to:in-reply-to:references:mime-version:sender:dkim-signature; bh=Web/XshbiyvhtKIAdft8WbI3oDO500EgxOZc2REqb1U=; b=KDvKsQRZ0qXq+ebc2vQOqII7x+dYgPUpvkjvztFSoqtQwn6qpTdapbf5RUZG+HrpP0 /404quLz9g5uw7fSFSWkblYwPQ5zoDEkCOaPc/WH66qPFTXdloawaOHXSnr0FivBUzy2 Noow0AspoFKAF/4E09kAltN0fB59wGjVoaOE4DgKQulGhqk6ai1HP+2eMUjflVBFrvaB 02l8z+Y+CV0S9wdkFIigxropFk7VadaQxR9qS3/4cINYS8C8Iaji0U7/SOi8eQouGkTD lPq/792usj0Mssm+DicYPeStRhu9LoelDcSHfvUqhRKm/uY7wUgjxELW+joc8KRMzTS0 sEeA== ARC-Authentication-Results: i=2; gmr-mx.google.com; spf=pass (google.com: domain of melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 209.85.167.49 as permitted sender) smtp.mailfrom=melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=sender:mime-version:references:in-reply-to:reply-to:from:date :message-id:subject:to:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-subscribe:list-unsubscribe; bh=Web/XshbiyvhtKIAdft8WbI3oDO500EgxOZc2REqb1U=; b=SY4VJh4So+mYadiZPtPeZP++oPooVePOg/xfwY8OvyA378cqcfbSOylNTSDDqGXGVZ Z9l0GlgSuDqd0E0mT+wg4mZob9vIWeDa7UluJwTQvCFn3F3CPz8ln3FWwN9Iz1LuPxyd Cv3htvonB7wyTa3C/HthXPDmt9pwjgQJPYzfUsIhuf7ngJ2SJkXJRQRVws4fzd9HhVqj DBQfA8F0aEE6fp7aygizGVpAwxpjEoXa1Lc4DcuJXNOcgkXhE+nQK/AsN476OddyoVmj 3CQ4Yf6OjZkUsnySiuiQBwJldGWVuP0eL8B6in1v5AKcYC1S5oJ/kBKxuSBjwvXi3JKT WReQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=sender:x-gm-message-state:mime-version:references:in-reply-to :reply-to:from:date:message-id:subject:to:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=Web/XshbiyvhtKIAdft8WbI3oDO500EgxOZc2REqb1U=; b=2ikDNUBJMttKxbFAT4immv6hpK0vQau+Fg+ALADq95c/oHZLCwdK3v4OSVszUR/Yha /JXg+0yG27GfdbaWsqZeJdT/K1cvrnQwRuo2lghtlGep9FJ2vDWdVePve+04EYHmMnt/ z1JEoUgrsSxW4PmZsFncL9oJNjHSAZCt7AwE/6uKHYH/JjYar9rCzgBkMwfVIMp2SSAI hFVhSHEbkRtQQW5ul6uuWvQkraF9w9q6tCsaF0+ejjruThiQ26I/o8JjKOdw7+5BHL+i DKV9VHj2t5ucoyE3dH5nGwk9UQ9nS+JWwMouuRd1OrgVVQgov4HUPptBLNScjk7Lsa/B fOjQ== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AOAM5339g4ags/ANmOsKk5iveic1n3lgqnUI6kEHn5mX/r/7MOsQYU6y ld0EVKsVlqMTGbY/kXbOlio= X-Google-Smtp-Source: ABdhPJwJwYnTmN4/0jH06TOMYp2k/vBRzxdLDn6uukbbwJFfrNFicqhtBqXogcP/suUuDyAKovQtVQ== X-Received: by 2002:a2e:a378:: with SMTP id i24mr36536560ljn.290.1638786866733; Mon, 06 Dec 2021 02:34:26 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:651c:1687:: with SMTP id bd7ls2485927ljb.10.gmail; Mon, 06 Dec 2021 02:34:24 -0800 (PST) X-Received: by 2002:a05:651c:555:: with SMTP id q21mr35813366ljp.193.1638786864010; Mon, 06 Dec 2021 02:34:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1638786864; cv=none; d=google.com; s=arc-20160816; b=tbZlvgWmkb9LgCVDwxczMvEcpdbeQOOMAlg/A4zk6NmQ9iXinyn8TDAc6rg6IsdLea mGGX86ZX0Ys7KeKt8iA7Oh7HTwrnbSCjFUmhmTeW8qutu9lqLb2LIctj37/iMTTzgmnX M1T3GSQjd+Fzwnxji1BTaNE0caLBIBvfJwcjHIpYlwEtz/+88ZAG3JuJhG0AA0lQuGm6 O6xuzBmRwxWUyFaK3V65DL31Hud5/rJb7odVtBDe+PysE87HBznku7sSN5f19jU67nh5 yN1QzbBItuW2MncnqHaxh+BqAft2DEpsa6aCcffadd5yHJgnjkDQsbE2WIriKOvpA6/g 6HWw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=to:subject:message-id:date:from:reply-to:in-reply-to:references :mime-version; bh=AZ/0Ont26xNd2Cvr3jEg+jJBKiJ4hgmG6hmot6K+PlM=; b=mlhUbzoTeLyl9BJUpOE/jSCFVHVb6/P9+6Jv5/dIjDdprDCSrQYo6ClNLFjFs7G9fg o1qT0+BHjDtPPS3TJXRIuo0wqKnJ1F/oyFK4Ms/vEB17qSj7XPq1BOP3Z6m3r9JyzDmY TnCKTnQUUx4D6D+hOIU1syehkiPSvO/iav8qSnGL5lcLVB5H42BX3cBgG1r/UVNf69XO lezSpb7yxQQB44Hni7fcez66/6MHjkPg1cCtBS+gBOqyXwDDzsp3vZuRbn1JONArsAvt JbnTrIgwcQ9XHk3cIdyyXxWe0plSej7+kZXfdiCAEvtzMJnbjj9J8atE9nB6rboHYkor M8rA== ARC-Authentication-Results: i=1; gmr-mx.google.com; spf=pass (google.com: domain of melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 209.85.167.49 as permitted sender) smtp.mailfrom=melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Original-Received: from mail-lf1-f49.google.com (mail-lf1-f49.google.com. [209.85.167.49]) by gmr-mx.google.com with ESMTPS id u19si691219ljl.5.2021.12.06.02.34.23 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 06 Dec 2021 02:34:23 -0800 (PST) Received-SPF: pass (google.com: domain of melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 209.85.167.49 as permitted sender) client-ip=209.85.167.49; Original-Received: by mail-lf1-f49.google.com with SMTP id t26so24113129lfk.9 for ; Mon, 06 Dec 2021 02:34:23 -0800 (PST) X-Received: by 2002:a05:6512:1324:: with SMTP id x36mr33485159lfu.495.1638786863280; Mon, 06 Dec 2021 02:34:23 -0800 (PST) In-Reply-To: X-Original-Sender: melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 209.85.167.49 as permitted sender) smtp.mailfrom=melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:29689 Archived-At: --000000000000d9370505d277ccf8 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable What do you get if you run pandoc with -f latex+raw_tex -t native without any filter? My guess is that it is one of these: 1. The whole tabular ends up inside a huge RawBlock. 2. The \makecell command ends up inside a RawInline or RawBlock and doesn't get rendered in the output. 3. #2 + the regex doesn't see the \IPA command because it is inside the \makecell command. Also you may need a non-greedy regex: "\\IPA{(.*?)}" =E2=80=94 and you may = need the regex module for that to work. Please try putting these definitions at the top of your document body[^0]: ``````latex \usepackage[normalem]{ulem} \renewcommand{\makecell}[1]{#1} \renewcommand{\IPA}[1]{\sout{#1}} `````` Then save the Lua code below to a file sout2ipa.lua in the current directory and run pandoc with -f latex -t html -L sout2ipa.lua ``````lua function Strikeout (elem) return pandoc.Span(elem.content, { class =3D 'IPA' }) end `````` Now you should get all your IPA nicely inside spans with class "IPA". There is a gotcha: this trick requires that you don't have any actual strikeout text in your document. @jgm there really should be an extension which makes the LaTeX reader recognise a pseudocommand `\PandocSpan{attrA=3Dvalue, attrB=3D{long value}}{content}` so that one could do redefinitions like those below and get native spans in the Pandoc AST. ``````latex \renewcommand{\IPA}[1]{\PandocSpan{class=3DIPA}{#1}} \renewcommand{\TakesTwo}[2]{\PandocSpan{class=3Dfoo}{\PandocSpan{data-foo= =3D1}{#1}\PandocSpan{data-foo=3D2}{#2}}} \renewcommand{\TakesKeyVals}[2][]{\PandocSpan{#1, class=3Dbar}{#2} `````` where the reader wi convert any keyval-style content in the first argument to span attributes, with later ones overriding. (And possibly an analogous PandocDiv command, working somewhat like the `\NewEnviron` command of the LaTeX environ package[^1] with a pseudo-command `\BODY` (or `\DIV` so as to not clash with environ!) which gets replaced with the content of the div.) Even if such a structure isn't usable on its own it would be much easier to modify it with filters. /bpj [^0]: I'm not sure that \renewcommand works but since I am on my phone ATM I can't check. If not comment out original \newcommand and/or \usepackage commands and define substitute commands with \newcommand as appropriate. [^1]: https://ctan.org/pkg/environ Den m=C3=A5n 6 dec. 2021 05:26Greg S skrev: > Okay I've written a filter: > > ``` > #!/usr/bin/python > import logging > import re > from pandocfilters import toJSONFilter, Emph, Para, RawInline > > ipa_regex =3D re.compile("\\\IPA{(.*)}") > > def handle(key, value, format, meta): > logging.warning(f"KEY {key} VALUE {value} format {format} META > {meta}") > if key =3D=3D "RawInline": > if m :=3D ipa_regex.match(value[1]): > return RawInline('html', f"{m.group(1)}") > > if __name__ =3D=3D "__main__": > toJSONFilter(handle) > ``` > > and with the `-f latex+raw_tex` option passed to pandoc it looks like thi= s > is correctly capturing the text in the IPA macro. > > However, I noticed that the filter completely skips over text in the \IPA > macro if that macro occurs within a latex table defined with > \begin{tabular}. I'm using the > makecell latex package and wrapping the cells with the \makecell command > (i.e. `\makecell { \IPA{ some text } }`, but I tried removing the \makece= ll > and the IPA macro still gets skipped in this context. > > > On Sunday, December 5, 2021 at 12:12:44 PM UTC-8 John MacFarlane wrote: > >> >> I should have mentioned before that you'll need to enable >> the `raw_tex` extension as shown above, to allow inclusion >> of RawBlock or RawInline. >> >> % pandoc -t native -f latex+raw_tex >> \IPA{hi} there >> ^D >> [ Para >> [ RawInline (Format "latex") "\\IPA{hi}" >> , Space >> , Str "there" >> ] >> ] >> >> >> Greg S writes: >> >> > How can I write a filter that matches RawInline elements if the filter >> > applies after the unknown latex macros have been applied in the parsin= g >> > stage? I'm not seeing the text within the \IPA macro at all in the >> logging >> > from the test filter I wrote - is there something I need to do to make >> that >> > filter apply earlier? >> > >> > On Sunday, December 5, 2021 at 10:56:51 AM UTC-8 John MacFarlane wrote= : >> > >> >> >> >> You can't insert the macro with a filter, because the filter >> >> is applied after parsing, and the macro would be resolved in >> >> the parsing phase. >> >> >> >> However, you could have a filter that matches RawInline >> >> elements that are "\IPA" commands, extracts their textual >> >> content, and returns a Str element. >> >> >> >> Greg S writes: >> >> >> >> > Is there a way I can tell pandoc to insert a new Latex macro before >> >> > processing that doesn't exist in the document? Using >> >> > \renewcommand{\IPA}[1]{#1} makes the text appear in the output of >> the >> >> latex >> >> > -> html conversion, but it breaks the formatting I care about in th= e >> pdf >> >> > version so I don't want to have that line permanently in the latex >> source >> >> > file. >> >> > >> >> > I think I'd ultimately like to use a filter to intercept the raw >> latex >> >> from >> >> > \IPA{...} and do something specific with it in HTML (probably put i= t >> >> within >> >> > a tag). I also have some other latex macros fr= om >> >> > specific packages that pandoc doesn't seem to understand, that I'd >> like >> >> to >> >> > handle in a custom way. I tried creating a simple logging Python >> filter >> >> > just to understand how they work. >> >> > >> >> > ``` >> >> > #!/usr/bin/python >> >> > import logging >> >> > from pandocfilters import toJSONFilter, Emph, Para >> >> > >> >> > def handle(key, value, format, meta): >> >> > logging.warn(f"KEY {key} VALUE {value} format {format} META {meta}"= ) >> >> > >> >> > if __name__ =3D=3D "__main__": >> >> > toJSONFilter(handle) >> >> > ``` >> >> > And then running `pandoc --pdf-engine=3Dxelatex --verbose test.tex = -o >> >> > test.html --filter filter.py`. >> >> > >> >> > But it seems like latex macros that pandoc doesn't understand are >> getting >> >> > skipped before the filter is applied, so the `handle` function neve= r >> gets >> >> > called with the text contents of my \IPA macro. >> >> > >> >> > On Saturday, December 4, 2021 at 9:37:16 AM UTC-8 John MacFarlane >> wrote: >> >> > >> >> >> >> >> >> Pandoc doesn't understand everything, especially outside of >> >> >> core LaTeX. In particular, it doesn't understand >> >> >> >> >> >> \DeclareTextFontCommand >> >> >> >> >> >> from fontspec, so the \IPA macro isn't understood. >> >> >> >> >> >> You can work around this by adding your own macro >> >> >> definition before you convert with pandoc: >> >> >> >> >> >> \renewcommand{\IPA}[1]{#1} >> >> >> >> >> >> and then the contents of \IPA will just be passed >> >> >> through. >> >> >> >> >> >> I suppose you could alternatively redefine >> >> >> >> >> >> \renewcommand{\DeclareTextFontCommand}[2]{\newcommand{#1}[1]{##1}} >> >> >> >> >> >> before your fontspec stuff (untested and may not work). >> >> >> >> >> >> Another option is to use a filter and intercept the raw >> >> >> LaTeX inline produced from \IPA{some text}, changing it >> >> >> into textual content, but I think the first approach above >> >> >> is the simplest. >> >> >> >> >> >> >> >> >> >> >> >> Greg S writes: >> >> >> >> >> >> > I have a minimal test latex file `test.tex`: >> >> >> > >> >> >> > >> >> >> > \documentclass{article} >> >> >> > >> >> >> > \usepackage{fontspec} >> >> >> > >> >> >> > \newfontfamily\IPAFont{Doulos SIL} >> >> >> > \DeclareTextFontCommand{\IPA}{\IPAFont} >> >> >> > >> >> >> > \begin{document} >> >> >> > >> >> >> > \section{Test} >> >> >> > Hello \IPA{some IPA} >> >> >> > >> >> >> > \end{document} >> >> >> > >> >> >> > >> >> >> > This builds fine with xelatex and produces a pdf I expect. When = i >> try >> >> to >> >> >> > convert this to an html document with `pandoc >> --pdf-engine=3Dxelatex >> >> >> > --verbose test.tex -o test.html`, I see the warnings: >> >> >> > >> >> >> > [INFO] Could not load include file fontspec.sty at test.tex line >> 3 >> >> >> column 22 >> >> >> > [INFO] Skipped '\newfontfamily' at test.tex line 5 column 15 >> >> >> > [INFO] Skipped '\IPAFont{Doulos SIL}' at test.tex line 5 column >> 35 >> >> >> > [INFO] Skipped '\DeclareTextFontCommand{\IPA}{\IPAFont}' at >> test.tex >> >> >> line 6 >> >> >> > column 40 >> >> >> > [INFO] Skipped '\IPA{some IPA}' at test.tex line 11 column 21 >> >> >> > >> >> >> > And the text within the custom \IPA command is skipped. How can = I >> make >> >> >> > pandoc not skip these? >> >> >> > >> >> >> > >> >> >> > -- >> >> >> > You received this message because you are subscribed to the >> Google >> >> >> Groups "pandoc-discuss" group. >> >> >> > To unsubscribe from this group and stop receiving emails from it= , >> send >> >> >> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> >> >> > To view this discussion on the web visit >> >> >> >> >> >> https://groups.google.com/d/msgid/pandoc-discuss/0462fc42-ae24-4c52-b267= -1126ed5834edn%40googlegroups.com >> >> >> . >> >> >> >> >> > >> >> > -- >> >> > You received this message because you are subscribed to the Google >> >> Groups "pandoc-discuss" group. >> >> > To unsubscribe from this group and stop receiving emails from it, >> send >> >> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> >> > To view this discussion on the web visit >> >> >> https://groups.google.com/d/msgid/pandoc-discuss/bac7947b-259e-4774-b993= -33f69fffc05fn%40googlegroups.com >> >> . >> >> >> > >> > -- >> > You received this message because you are subscribed to the Google >> Groups "pandoc-discuss" group. >> > To unsubscribe from this group and stop receiving emails from it, send >> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/pandoc-discuss/84e207d9-eaed-4b24-8b6b= -62ea07bb2b5bn%40googlegroups.com. >> >> > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/c648fb98-d892-4f1e-b3aa-= 0da071d8de4bn%40googlegroups.com > > . > --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/CADAJKhCC9xm6HX0aF5SzJr9vG3xZR1eiQxxCpA6QNRi1BRE-7g%40mail.g= mail.com. --000000000000d9370505d277ccf8 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Wha= t do you get if you run pandoc with -f latex+raw_tex -t native without any = filter? My guess is that it is one of these:

1.=C2=A0 The whole tabular ends up inside a huge RawBl= ock.

2.=C2=A0 The \makec= ell command ends up inside a RawInline or RawBlock and doesn't get rend= ered in the output.

3. #= 2 + the regex doesn't see the \IPA command because it is inside the \ma= kecell command.

Also you= may need a non-greedy regex: "\\IPA{(.*?)}" =E2=80=94 and you ma= y need the regex module for that to work.

=
Please try putting these definitions at the top of your d= ocument body[^0]:

``````= latex
\usepackage[normalem]{ulem}

\renewcommand{\makecell}[1]{#1}

\renewcommand{\IPA}[1]{\sout{#1}}<= /div>
``````

Then save the Lua code below to a file sout2ipa.lua in the current di= rectory and run pandoc with -f latex -t html -L sout2ipa.lua

``````lua
func= tion Strikeout (elem)
=C2=A0 return pandoc.Span(elem= .content, { class =3D 'IPA' })
end
``````

Now= you should get all your IPA nicely inside spans with class "IPA"= .

There is a gotcha: thi= s trick requires that you don't have any actual strikeout text in your = document.

@jgm there rea= lly should be an extension which makes the LaTeX reader recognise a pseudoc= ommand `\PandocSpan{attrA=3Dvalue, attrB=3D{long value}}{content}` so that = one could do redefinitions like those below and get native spans in the Pan= doc AST.

``````latex
\renewcommand{\IPA}[1]{\PandocSpan{class=3DIPA}{#1}}

\renewcommand{\TakesTwo}[2= ]{\PandocSpan{class=3Dfoo}{\PandocSpan{data-foo=3D1}{#1}\PandocSpan{data-fo= o=3D2}{#2}}}

\renewcomma= nd{\TakesKeyVals}[2][]{\PandocSpan{#1, class=3Dbar}{#2}
``````

where the rea= der wi convert any keyval-style content in the first argument to span attri= butes, with later ones overriding.

(And possibly an analogous PandocDiv command, working somewhat l= ike the `\NewEnviron` command of the LaTeX environ package[^1] with a pseud= o-command `\BODY` (or `\DIV` so as to not clash with environ!) which gets r= eplaced with the content of the div.)

Even if such a structure isn't usable on its own it would= be much easier to modify it with filters.

/bpj

= [^0]: I'm not sure that \renewcommand works but since I am on my phone = ATM I can't check. If not comment out original \newcommand and/or \usep= ackage commands and define substitute commands with \newcommand as appropri= ate.



Den m=C3=A5n 6 dec. 2021 05:26Greg S <elorian.mestec-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> = skrev:
Okay I've written a filt= er:

```
#!/usr/bin/python
import logging
import re
from pandocfilters import toJSONFilter, Emph, Para, RawInline
ipa_regex =3D re.compile("\\\IPA{(.*)}")

def handle(key, value, format, meta):
=C2=A0=C2=A0=C2=A0 log= ging.warning(f"KEY {key} VALUE {value} format {format} META {meta}&quo= t;)
=C2=A0=C2=A0=C2=A0 if key =3D=3D "RawInline":
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if m := =3D ipa_regex.match(value[1]):
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return RawInline('h= tml', f"{m.group(1)}")

=
if __name__ =3D=3D "__main__":
=C2=A0=C2=A0=C2=A0 = toJSONFilter(handle)
```

and with the `-f latex+raw_tex` option passed to pandoc it= looks like this is correctly capturing the text in the IPA macro.

<= div>However, I noticed that the filter completely skips over text in the \IPA macro=20 if that macro occurs within a latex table defined with \begin{tabular}.=20 I'm using the
makecell latex package and wrapping the=20 cells with the \makecell command (i.e. `\makecell { \IPA{ some text }=20 }`, but I tried removing the \makecell and the IPA macro still gets=20 skipped in this context.


On Sunday, December 5, 2021 at 12:12:44 PM= UTC-8 John MacFarlane wrote:

I should have mentioned before that you'll need to enable
the `raw_tex` extension as shown above, to allow inclusion
of RawBlock or RawInline.

% pandoc -t native -f latex+raw_tex =20
\IPA{hi} there
^D
[ Para
[ RawInline (Format "latex") "\\IPA{hi}"
, Space
, Str "there"
]
]


Greg S <elorian...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> = writes:

> How can I write a filter that matches RawInline elements if the fi= lter=20
> applies after the unknown latex macros have been applied in the pa= rsing=20
> stage? I'm not seeing the text within the \IPA macro at all in= the logging=20
> from the test filter I wrote - is there something I need to do to = make that=20
> filter apply earlier?
>
> On Sunday, December 5, 2021 at 10:56:51 AM UTC-8 John MacFarlane w= rote:
>
>>
>> You can't insert the macro with a filter, because the filt= er
>> is applied after parsing, and the macro would be resolved in
>> the parsing phase.
>>
>> However, you could have a filter that matches RawInline
>> elements that are "\IPA" commands, extracts their te= xtual
>> content, and returns a Str element.
>>
>> Greg S <elorian...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org= > writes:
>>
>> > Is there a way I can tell pandoc to insert a new Latex ma= cro before
>> > processing that doesn't exist in the document? Using
>> > \renewcommand{\IPA}[1]{#1} makes the text appear in the o= utput of the=20
>> latex
>> > -> html conversion, but it breaks the formatting I car= e about in the pdf
>> > version so I don't want to have that line permanently= in the latex source
>> > file.
>> >
>> > I think I'd ultimately like to use a filter to interc= ept the raw latex=20
>> from
>> > \IPA{...} and do something specific with it in HTML (prob= ably put it=20
>> within
>> > a <span class=3D"IPA"> tag). I also have = some other latex macros from
>> > specific packages that pandoc doesn't seem to underst= and, that I'd like=20
>> to
>> > handle in a custom way. I tried creating a simple logging= Python filter
>> > just to understand how they work.
>> >
>> > ```
>> > #!/usr/bin/python
>> > import logging
>> > from pandocfilters import toJSONFilter, Emph, Para
>> >
>> > def handle(key, value, format, meta):
>> > logging.warn(f"KEY {key} VALUE {value} format {forma= t} META {meta}")
>> >
>> > if __name__ =3D=3D "__main__":
>> > toJSONFilter(handle)
>> > ```
>> > And then running `pandoc --pdf-engine=3Dxelatex --verbose= test.tex -o
>> > test.html --filter filter.py`.
>> >
>> > But it seems like latex macros that pandoc doesn't un= derstand are getting
>> > skipped before the filter is applied, so the `handle` fun= ction never gets
>> > called with the text contents of my \IPA macro.
>> >
>> > On Saturday, December 4, 2021 at 9:37:16 AM UTC-8 John Ma= cFarlane wrote:
>> >
>> >>
>> >> Pandoc doesn't understand everything, especially = outside of
>> >> core LaTeX. In particular, it doesn't understand
>> >>
>> >> \DeclareTextFontCommand
>> >>
>> >> from fontspec, so the \IPA macro isn't understood= .
>> >>
>> >> You can work around this by adding your own macro
>> >> definition before you convert with pandoc:
>> >>
>> >> \renewcommand{\IPA}[1]{#1}
>> >>
>> >> and then the contents of \IPA will just be passed
>> >> through.
>> >>
>> >> I suppose you could alternatively redefine
>> >>
>> >> \renewcommand{\DeclareTextFontCommand}[2]{\newcommand= {#1}[1]{##1}}
>> >>
>> >> before your fontspec stuff (untested and may not work= ).
>> >>
>> >> Another option is to use a filter and intercept the r= aw
>> >> LaTeX inline produced from \IPA{some text}, changing = it
>> >> into textual content, but I think the first approach = above
>> >> is the simplest.
>> >>
>> >>
>> >>
>> >> Greg S <elorian...@= gmail.com> writes:
>> >>
>> >> > I have a minimal test latex file `test.tex`:
>> >> >
>> >> >
>> >> > \documentclass{article}
>> >> >
>> >> > \usepackage{fontspec}
>> >> >
>> >> > \newfontfamily\IPAFont{Doulos SIL}
>> >> > \DeclareTextFontCommand{\IPA}{\IPAFont}
>> >> >
>> >> > \begin{document}
>> >> >
>> >> > \section{Test}
>> >> > Hello \IPA{some IPA}
>> >> >
>> >> > \end{document}
>> >> >
>> >> >
>> >> > This builds fine with xelatex and produces a pdf= I expect. When i try=20
>> to
>> >> > convert this to an html document with `pandoc --= pdf-engine=3Dxelatex
>> >> > --verbose test.tex -o test.html`, I see the warn= ings:
>> >> >
>> >> > [INFO] Could not load include file fontspec.sty = at test.tex line 3
>> >> column 22
>> >> > [INFO] Skipped '\newfontfamily' at test.= tex line 5 column 15
>> >> > [INFO] Skipped '\IPAFont{Doulos SIL}' at= test.tex line 5 column 35
>> >> > [INFO] Skipped '\DeclareTextFontCommand{\IPA= }{\IPAFont}' at test.tex
>> >> line 6
>> >> > column 40
>> >> > [INFO] Skipped '\IPA{some IPA}' at test.= tex line 11 column 21
>> >> >
>> >> > And the text within the custom \IPA command is s= kipped. How can I make
>> >> > pandoc not skip these?
>> >> >
>> >> >
>> >> > --
>> >> > You received this message because you are subscr= ibed to the Google
>> >> Groups "pandoc-discuss" group.
>> >> > To unsubscribe from this group and stop receivin= g emails from it, send
>> >> an email to pandoc-dis= cus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
>> >> > To view this discussion on the web visit
>> >>=20
>> https://groups.google.com/d/msgid/pandoc-discuss= /0462fc42-ae24-4c52-b267-1126ed5834edn%40googlegroups.com
>> >> .
>> >>
>> >
>> > --
>> > You received this message because you are subscribed to t= he Google=20
>> Groups "pandoc-discuss" group.
>> > To unsubscribe from this group and stop receiving emails = from it, send=20
>> an email to pandoc-discus...@go= oglegroups.com.
>> > To view this discussion on the web visit=20
>> https://groups.google.com/d/msgid/pandoc-discuss= /bac7947b-259e-4774-b993-33f69fffc05fn%40googlegroups.com
>> .
>>
>
> --=20
> You received this message because you are subscribed to the Google= Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, = send an email to pandoc-discus...@googlegrou= ps.com.
> To view this discussion on the web visit https://gro= ups.google.com/d/msgid/pandoc-discuss/84e207d9-eaed-4b24-8b6b-62ea07bb2b5bn= %40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh4Ykp1iOSErHA@public.gmane.org= m.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/c648fb98-d= 892-4f1e-b3aa-0da071d8de4bn%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.= google.com/d/msgid/pandoc-discuss/CADAJKhCC9xm6HX0aF5SzJr9vG3xZR1eiQxxCpA6Q= NRi1BRE-7g%40mail.gmail.com.
--000000000000d9370505d277ccf8--