From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/29043
Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail
From: pompez <martinsifrar11-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Newsgroups: gmane.text.pandoc
Subject: Is there a way to change the way Pandoc parses HTML inside of
 markdown documents?
Date: Mon, 16 Aug 2021 14:43:41 -0700 (PDT)
Message-ID: <aae29ca7-60ca-4349-af03-939f0ac503efn@googlegroups.com>
Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Mime-Version: 1.0
Content-Type: multipart/mixed; 
	boundary="----=_Part_474_2077931839.1629150221825"
Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214";
	logging-data="36859"; mail-complaints-to="usenet@ciao.gmane.io"
To: pandoc-discuss <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Original-X-From: pandoc-discuss+bncBCQPXRHPWIMRBDVY5OEAMGQE7JXWGDY-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mon Aug 16 23:43:48 2021
Return-path: <pandoc-discuss+bncBCQPXRHPWIMRBDVY5OEAMGQE7JXWGDY-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org
Original-Received: from mail-oo1-f59.google.com ([209.85.161.59])
	by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128)
	(Exim 4.92)
	(envelope-from <pandoc-discuss+bncBCQPXRHPWIMRBDVY5OEAMGQE7JXWGDY-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>)
	id 1mFkOa-0009Pf-0C
	for gtp-pandoc-discuss@m.gmane-mx.org; Mon, 16 Aug 2021 23:43:48 +0200
Original-Received: by mail-oo1-f59.google.com with SMTP id i5-20020a4ad385000000b0028bd047a835sf970255oos.12
        for <gtp-pandoc-discuss@m.gmane-mx.org>; Mon, 16 Aug 2021 14:43:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=googlegroups.com; s=20161025;
        h=sender:date:from:to:message-id:subject:mime-version
         :x-original-sender:reply-to:precedence:mailing-list:list-id
         :list-post:list-help:list-archive:list-subscribe:list-unsubscribe;
        bh=2s+MGFeYr/eFRTcEn1OLKjevl30m4h2puIwnsWIBjm0=;
        b=S9WrqHK3Wok6UnWm8eDpnaG1ifjE7R2PChwTymLNvYkFmUREqGMX0EJgNdrziU37Fw
         UTI92wJmDYdw8cWQ2XgtegWEoOAVuCGBpvMtzMbdqh3vhhSb1NEK3TaoC4Mdh9Jo21Hv
         AOZNIJC/fkX8uYsFGOdUtn1Vv+iPDNYVozCQqyG85Y8Wcp7vkLmx/+d7i37rNoFge3YJ
         5tRhEBB5GuXelVuVYyiL7pHn0hlEsDwjd057NILZPdv/F4+TsnLIth1RHM/ZxLgGsOvs
         8oLf/evI8LLaIwFW9cshmtmmaNhNJsROx4+n3Icd/eZMMTT4FDyQUJsJvqRP14JyaLNU
         OAaA==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=date:from:to:message-id:subject:mime-version:x-original-sender
         :reply-to:precedence:mailing-list:list-id:list-post:list-help
         :list-archive:list-subscribe:list-unsubscribe;
        bh=2s+MGFeYr/eFRTcEn1OLKjevl30m4h2puIwnsWIBjm0=;
        b=lzA9q0U/LcRUzd6c1OiAX2e/L+aElc1iY+DkC8UREsSnNc5WMbK2f4RCQr2QH0o5Iu
         HHLVY9D5dCQ7OLFcpjSPUVPkZMEb9VjYhgPPSONpkt7YTpg57V823E/wEckATvB4VxpE
         AmbnBUJEBk7vsAQ8WtzPbyZgxOeFjly/75BaVmzDiMWLVDedvXKXOuW0o4Xdhy07RZcl
         mUoGgXW1ga3jxhWnI+kL7G/QLOzyvtwKIWOCvQIMwkiuq1xw7y2FzHExQaoMcLfq3FNa
         Ao9+Zbc9dXHrjNLpIEelOpo1O1rejhu+JlCpv5VYXUfSuhvGA/AZbNKZfmbib5yF0fTx
         mFgQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=sender:x-gm-message-state:date:from:to:message-id:subject
         :mime-version:x-original-sender:reply-to:precedence:mailing-list
         :list-id:x-spam-checked-in-group:list-post:list-help:list-archive
         :list-subscribe:list-unsubscribe;
        bh=2s+MGFeYr/eFRTcEn1OLKjevl30m4h2puIwnsWIBjm0=;
        b=m9Ey8Y331soLYq1m3jclPUeUGPvwefnCRzmmPWfuU1pv3xhb+v0p46/Fh29r6bjUnm
         2LDlYqFLGuTv3HAYsosbF1I0fSpyyfYTkNWEONAOs/SYWobm4/b1eNvUnx5tDDG27j+W
         46uTIlJbhULOWoFCERoeV64HxrS9hD9olF8JuylQ7yXOZ90msc6Ctg3yjj38s/ZLbYSa
         P5Zp30sH34X2Ijk7bzUJ+pUWJ3/oKRqFueCKKiXv69zpkGRq/tarmnulTjF7cKt5ORGL
         0jNnRlR7dWZHXWBiaW/SfXtfJ48TKFElf1OnqpeiLV76DkxfIEw2Yqel8noeCpFn8z1c
         S1ZA==
Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
X-Gm-Message-State: AOAM530AHGCrVYme4Y7GGVpuUJ2X7MkdGqGbu0bMKHkyUE1YkNeHkEw5
	/KWj5mmc4RS5kLVmCPfc1qo=
X-Google-Smtp-Source: ABdhPJxKp7AbiTNKeK0PSvQ/K9WZ2sKNY+LGDoMxCCH7VTp/SekhTZ1fzwhtHLLqlRgsFSL46L4woA==
X-Received: by 2002:a05:6830:10d2:: with SMTP id z18mr165043oto.341.1629150224021;
        Mon, 16 Aug 2021 14:43:44 -0700 (PDT)
X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Original-Received: by 2002:aca:210a:: with SMTP id 10ls161882oiz.1.gmail; Mon, 16 Aug
 2021 14:43:42 -0700 (PDT)
X-Received: by 2002:aca:bb09:: with SMTP id l9mr9723oif.120.1629150222367;
        Mon, 16 Aug 2021 14:43:42 -0700 (PDT)
X-Original-Sender: martinsifrar11-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
Precedence: list
Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
List-ID: <pandoc-discuss.googlegroups.com>
X-Google-Group-Id: 1007024079513
List-Post: <https://groups.google.com/group/pandoc-discuss/post>, <mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
List-Help: <https://groups.google.com/support/>, <mailto:pandoc-discuss+help-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
List-Archive: <https://groups.google.com/group/pandoc-discuss
List-Subscribe: <https://groups.google.com/group/pandoc-discuss/subscribe>, <mailto:pandoc-discuss+subscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
List-Unsubscribe: <mailto:googlegroups-manage+1007024079513+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>,
 <https://groups.google.com/group/pandoc-discuss/subscribe>
Xref: news.gmane.io gmane.text.pandoc:29043
Archived-At: <http://permalink.gmane.org/gmane.text.pandoc/29043>

------=_Part_474_2077931839.1629150221825
Content-Type: multipart/alternative; 
	boundary="----=_Part_475_2099739335.1629150221825"

------=_Part_475_2099739335.1629150221825
Content-Type: text/plain; charset="UTF-8"


I'm starting out with Lua filters and apologize for this possibly already 
answered question. You can also read this question on StackOverflow 
<https://stackoverflow.com/questions/68809527/is-there-a-way-to-change-the-way-pandoc-parses-html-inside-of-markdown-documents>
.

I'm using Pandoc to convert markdown to HTML. My markdown files also 
contain some raw HTML. In the examples, I'll be using `<mark>` and `<u>`.

Let's say I want to change every `<mark>` to a `<u>` tag. We parse the 
input as HTML and look at the AST.

```
$ echo '<u>foo</u> & <mark>bar</mark>' | pandoc --from=html --to native
[Plain [Underline [Str "foo"],Space,Str "&",Space,Span ("", ["mark"],[]) 
[Str "bar"]]]
```

On this structure, we can use a simple filter which replaces `Span` 
elements representing the `<mark>` tag and replaces with `Underline` 
elements.

```
function Span(elem)
    if elem.classes[1]:gmatch('mark') then
        return pandoc.Underline(elem.content)
    end
end
```

```
[Plain [Underline [Str "foo"],Space,Str "&",Space,Underline [Str "bar"]]]
```

This is good. But if we parse the same input as markdown, we get a much 
less convenient structure.

```
$ echo '<u>foo</u> & <mark>bar</mark>' | pandoc --from=markdown+raw_html 
--to native
[Para [RawInline (Format "html") "<u>",Str "foo",RawInline (Format "html") 
"</u>",Space,Str "&",Space,RawInline (Format "html") "<mark>",Str 
"bar",RawInline (Format "html") "</mark>"]]
```

And if we had some additional criteria by which to replace `<mark>` with 
`<u>` (the content for example), we would have to identify the opening and 
closing `RawInline` elements.

I'm wondering if there is any good solutions to this problem? Is there a 
way to parse HTML in markdown just as HTML would be parsed otherwise? Or is 
there way to solve this in a Lua filter without writing some parsing code?

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/aae29ca7-60ca-4349-af03-939f0ac503efn%40googlegroups.com.

------=_Part_475_2099739335.1629150221825
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<br>I'm starting out with Lua filters and apologize for this possibly alrea=
dy answered question. You can also read <a href=3D"https://stackoverflow.co=
m/questions/68809527/is-there-a-way-to-change-the-way-pandoc-parses-html-in=
side-of-markdown-documents">this question on StackOverflow</a>.<br><br>I'm =
using Pandoc to convert markdown to HTML. My markdown files also contain so=
me raw HTML. In the examples, I'll be using `&lt;mark&gt;` and `&lt;u&gt;`.=
<br><br>Let's say I want to change every `&lt;mark&gt;` to a `&lt;u&gt;` ta=
g. We parse the input as HTML and look at the AST.<br><div><br></div><div>`=
``<br></div><div>$ echo '&lt;u&gt;foo&lt;/u&gt; &amp; &lt;mark&gt;bar&lt;/m=
ark&gt;' | pandoc --from=3Dhtml --to native</div><div>[Plain [Underline [St=
r "foo"],Space,Str "&amp;",Space,Span ("", ["mark"],[]) [Str "bar"]]]</div>=
<div>```<br></div><br>On this structure, we can use a simple filter which r=
eplaces `Span` elements representing the `&lt;mark&gt;` tag and replaces wi=
th `Underline` elements.<br><br>```<br>function Span(elem)<br>&nbsp;&nbsp;&=
nbsp; if elem.classes[1]:gmatch('mark') then<br>&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp; return pandoc.Underline(elem.content)<br>&nbsp;&nbsp;&nbsp;=
 end<br>end<br>```<br><br>```<br>[Plain [Underline [Str "foo"],Space,Str "&=
amp;",Space,Underline [Str "bar"]]]<br>```<br><br>This is good. But if we p=
arse the same input as markdown, we get a much less convenient structure.<b=
r><br>```<br>$ echo '&lt;u&gt;foo&lt;/u&gt; &amp; &lt;mark&gt;bar&lt;/mark&=
gt;' | pandoc --from=3Dmarkdown+raw_html --to native<br>[Para [RawInline (F=
ormat "html") "&lt;u&gt;",Str "foo",RawInline (Format "html") "&lt;/u&gt;",=
Space,Str "&amp;",Space,RawInline (Format "html") "&lt;mark&gt;",Str "bar",=
RawInline (Format "html") "&lt;/mark&gt;"]]<br>```<br><br>And if we had som=
e additional criteria by which to replace `&lt;mark&gt;` with `&lt;u&gt;` (=
the content for example), we would have to identify the opening and closing=
 `RawInline` elements.<br><br>I'm wondering if there is any good solutions =
to this problem? Is there a way to parse HTML in markdown just as HTML woul=
d be parsed otherwise? Or is there way to solve this in a Lua filter withou=
t writing some parsing code?<br><br>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;pandoc-discuss&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org">pand=
oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/d/msgid/pandoc-discuss/aae29ca7-60ca-4349-af03-939f0ac503efn%40googlegro=
ups.com?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.com/d=
/msgid/pandoc-discuss/aae29ca7-60ca-4349-af03-939f0ac503efn%40googlegroups.=
com</a>.<br />

------=_Part_475_2099739335.1629150221825--

------=_Part_474_2077931839.1629150221825--