From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/29043 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: pompez Newsgroups: gmane.text.pandoc Subject: Is there a way to change the way Pandoc parses HTML inside of markdown documents? Date: Mon, 16 Aug 2021 14:43:41 -0700 (PDT) Message-ID: Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_474_2077931839.1629150221825" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="36859"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBCQPXRHPWIMRBDVY5OEAMGQE7JXWGDY-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mon Aug 16 23:43:48 2021 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-oo1-f59.google.com ([209.85.161.59]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1mFkOa-0009Pf-0C for gtp-pandoc-discuss@m.gmane-mx.org; Mon, 16 Aug 2021 23:43:48 +0200 Original-Received: by mail-oo1-f59.google.com with SMTP id i5-20020a4ad385000000b0028bd047a835sf970255oos.12 for ; Mon, 16 Aug 2021 14:43:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:date:from:to:message-id:subject:mime-version :x-original-sender:reply-to:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-subscribe:list-unsubscribe; bh=2s+MGFeYr/eFRTcEn1OLKjevl30m4h2puIwnsWIBjm0=; b=S9WrqHK3Wok6UnWm8eDpnaG1ifjE7R2PChwTymLNvYkFmUREqGMX0EJgNdrziU37Fw UTI92wJmDYdw8cWQ2XgtegWEoOAVuCGBpvMtzMbdqh3vhhSb1NEK3TaoC4Mdh9Jo21Hv AOZNIJC/fkX8uYsFGOdUtn1Vv+iPDNYVozCQqyG85Y8Wcp7vkLmx/+d7i37rNoFge3YJ 5tRhEBB5GuXelVuVYyiL7pHn0hlEsDwjd057NILZPdv/F4+TsnLIth1RHM/ZxLgGsOvs 8oLf/evI8LLaIwFW9cshmtmmaNhNJsROx4+n3Icd/eZMMTT4FDyQUJsJvqRP14JyaLNU OAaA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:message-id:subject:mime-version:x-original-sender :reply-to:precedence:mailing-list:list-id:list-post:list-help :list-archive:list-subscribe:list-unsubscribe; bh=2s+MGFeYr/eFRTcEn1OLKjevl30m4h2puIwnsWIBjm0=; b=lzA9q0U/LcRUzd6c1OiAX2e/L+aElc1iY+DkC8UREsSnNc5WMbK2f4RCQr2QH0o5Iu HHLVY9D5dCQ7OLFcpjSPUVPkZMEb9VjYhgPPSONpkt7YTpg57V823E/wEckATvB4VxpE AmbnBUJEBk7vsAQ8WtzPbyZgxOeFjly/75BaVmzDiMWLVDedvXKXOuW0o4Xdhy07RZcl mUoGgXW1ga3jxhWnI+kL7G/QLOzyvtwKIWOCvQIMwkiuq1xw7y2FzHExQaoMcLfq3FNa Ao9+Zbc9dXHrjNLpIEelOpo1O1rejhu+JlCpv5VYXUfSuhvGA/AZbNKZfmbib5yF0fTx mFgQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:date:from:to:message-id:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=2s+MGFeYr/eFRTcEn1OLKjevl30m4h2puIwnsWIBjm0=; b=m9Ey8Y331soLYq1m3jclPUeUGPvwefnCRzmmPWfuU1pv3xhb+v0p46/Fh29r6bjUnm 2LDlYqFLGuTv3HAYsosbF1I0fSpyyfYTkNWEONAOs/SYWobm4/b1eNvUnx5tDDG27j+W 46uTIlJbhULOWoFCERoeV64HxrS9hD9olF8JuylQ7yXOZ90msc6Ctg3yjj38s/ZLbYSa P5Zp30sH34X2Ijk7bzUJ+pUWJ3/oKRqFueCKKiXv69zpkGRq/tarmnulTjF7cKt5ORGL 0jNnRlR7dWZHXWBiaW/SfXtfJ48TKFElf1OnqpeiLV76DkxfIEw2Yqel8noeCpFn8z1c S1ZA== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AOAM530AHGCrVYme4Y7GGVpuUJ2X7MkdGqGbu0bMKHkyUE1YkNeHkEw5 /KWj5mmc4RS5kLVmCPfc1qo= X-Google-Smtp-Source: ABdhPJxKp7AbiTNKeK0PSvQ/K9WZ2sKNY+LGDoMxCCH7VTp/SekhTZ1fzwhtHLLqlRgsFSL46L4woA== X-Received: by 2002:a05:6830:10d2:: with SMTP id z18mr165043oto.341.1629150224021; Mon, 16 Aug 2021 14:43:44 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:aca:210a:: with SMTP id 10ls161882oiz.1.gmail; Mon, 16 Aug 2021 14:43:42 -0700 (PDT) X-Received: by 2002:aca:bb09:: with SMTP id l9mr9723oif.120.1629150222367; Mon, 16 Aug 2021 14:43:42 -0700 (PDT) X-Original-Sender: martinsifrar11-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:29043 Archived-At: ------=_Part_474_2077931839.1629150221825 Content-Type: multipart/alternative; boundary="----=_Part_475_2099739335.1629150221825" ------=_Part_475_2099739335.1629150221825 Content-Type: text/plain; charset="UTF-8" I'm starting out with Lua filters and apologize for this possibly already answered question. You can also read this question on StackOverflow . I'm using Pandoc to convert markdown to HTML. My markdown files also contain some raw HTML. In the examples, I'll be using `` and ``. Let's say I want to change every `` to a `` tag. We parse the input as HTML and look at the AST. ``` $ echo 'foo & bar' | pandoc --from=html --to native [Plain [Underline [Str "foo"],Space,Str "&",Space,Span ("", ["mark"],[]) [Str "bar"]]] ``` On this structure, we can use a simple filter which replaces `Span` elements representing the `` tag and replaces with `Underline` elements. ``` function Span(elem) if elem.classes[1]:gmatch('mark') then return pandoc.Underline(elem.content) end end ``` ``` [Plain [Underline [Str "foo"],Space,Str "&",Space,Underline [Str "bar"]]] ``` This is good. But if we parse the same input as markdown, we get a much less convenient structure. ``` $ echo 'foo & bar' | pandoc --from=markdown+raw_html --to native [Para [RawInline (Format "html") "",Str "foo",RawInline (Format "html") "",Space,Str "&",Space,RawInline (Format "html") "",Str "bar",RawInline (Format "html") ""]] ``` And if we had some additional criteria by which to replace `` with `` (the content for example), we would have to identify the opening and closing `RawInline` elements. I'm wondering if there is any good solutions to this problem? Is there a way to parse HTML in markdown just as HTML would be parsed otherwise? Or is there way to solve this in a Lua filter without writing some parsing code? -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/aae29ca7-60ca-4349-af03-939f0ac503efn%40googlegroups.com. ------=_Part_475_2099739335.1629150221825 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I'm starting out with Lua filters and apologize for this possibly alrea= dy answered question. You can also read this question on StackOverflow.

I'm = using Pandoc to convert markdown to HTML. My markdown files also contain so= me raw HTML. In the examples, I'll be using `<mark>` and `<u>`.=

Let's say I want to change every `<mark>` to a `<u>` ta= g. We parse the input as HTML and look at the AST.

`= ``
$ echo '<u>foo</u> & <mark>bar</m= ark>' | pandoc --from=3Dhtml --to native
[Plain [Underline [St= r "foo"],Space,Str "&",Space,Span ("", ["mark"],[]) [Str "bar"]]]
=
```

On this structure, we can use a simple filter which r= eplaces `Span` elements representing the `<mark>` tag and replaces wi= th `Underline` elements.

```
function Span(elem)
  &= nbsp; if elem.classes[1]:gmatch('mark') then
    &nb= sp;   return pandoc.Underline(elem.content)
   = end
end
```

```
[Plain [Underline [Str "foo"],Space,Str "&= amp;",Space,Underline [Str "bar"]]]
```

This is good. But if we p= arse the same input as markdown, we get a much less convenient structure.
```
$ echo '<u>foo</u> & <mark>bar</mark&= gt;' | pandoc --from=3Dmarkdown+raw_html --to native
[Para [RawInline (F= ormat "html") "<u>",Str "foo",RawInline (Format "html") "</u>",= Space,Str "&",Space,RawInline (Format "html") "<mark>",Str "bar",= RawInline (Format "html") "</mark>"]]
```

And if we had som= e additional criteria by which to replace `<mark>` with `<u>` (= the content for example), we would have to identify the opening and closing= `RawInline` elements.

I'm wondering if there is any good solutions = to this problem? Is there a way to parse HTML in markdown just as HTML woul= d be parsed otherwise? Or is there way to solve this in a Lua filter withou= t writing some parsing code?

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d= /msgid/pandoc-discuss/aae29ca7-60ca-4349-af03-939f0ac503efn%40googlegroups.= com.
------=_Part_475_2099739335.1629150221825-- ------=_Part_474_2077931839.1629150221825--