In a sense you are in luck because the two HTML headings are parsed as a
single HTML raw block, which is what pandoc normally does with embedded
block level HTML content in (any kind of?) Markdown, so you could have a
Lua filter parse them from HTML into native elements and replace them with
those native elements like this
``````lua
function RawBlock (raw)
if 'html' == raw.format then
local html = raw.text
local doc = pandoc.read(html, 'html')
if doc then return doc.blocks end
end
return nil
end
``````
https://pandoc.org/lua-filters.html
https://pandoc.org/lua-filters.html#pandoc.read
While this does not guarantee that you will not get back any raw HTML,
since some HTML might be unrepresentable as native elements you will most
probably get back native elements which may or may not contain some raw
elements. In this case the success rate will be 100%.
HTH,
/bpj
Den tors 7 okt. 2021 15:32Dominik Wujastyk skrev:
> Using
> pandoc -v
> pandoc 2.14.2
> Compiled with pandoc-types 1.22, texmath 0.12.3.1, skylighting 0.11,
> citeproc 0.5, ipynb 0.1.0.1
>
> Gfm input example:
>
> # NAK 1-1079
>
>
> Chapter-wise concordance of folios
>
>
> Prepared by Dominik Wujastyk (DW) and Andrey Klebanov (AK)
>
>
> Note that this MS (a single physical object kept at the __NAK__ under the
> accession number __1-1079__)
> was microfilmed twice, as **A 45-5 (on 16.10.1970)** and **A 1267-11 (on
> 16.11.1987)**. Digital copies
> of both microfilms are available to us.
>
> ```
>
> Command:
>
> pandoc -f gfm -t commonmark -o outfile.md infile.gfm
>
> Commonmark output:
>
> # NAK 1-1079
>
>
> Chapter-wise concordance of folios
>
>
> Prepared by Dominik Wujastyk (DW) and Andrey Klebanov (AK)
>
>
> Note that this MS (a single physical object kept at the **NAK** under
> the accession number **1-1079**) was microfilmed twice, as **A 45-5 (on
> 16.10.1970)** and **A 1267-11 (on 16.11.1987)**. Digital copies of both
> microfilms are available to us.
>
>
> I was expecting that this command would turn the HTML codes in the gfm
> file into commonmark Markdown. But it didn't. Am I doing something
> silly? Have I failed to understand what commonmark is? The HTML-coded
> text does render in Github and editors like Typora. So it seems wrong to
> treat them as raw blocks.
>
> Furthermore, a markdown-encoded table in the gfm document is converted to
> an HTML-encoded one. Why? This seems counterintuitive to me.
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/eca62f3a-d4e3-4459-830c-ca4a3de2d125n%40googlegroups.com
>
> .
>
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhCb0_HNVMuZ0S0vOpw-RBmcb3TvV9QHYjHLvEPyRwnqqQ%40mail.gmail.com.