From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/29353 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: BPJ Newsgroups: gmane.text.pandoc Subject: Re: Why does GFM to markdown not convert HTML? Date: Fri, 8 Oct 2021 13:34:52 +0200 Message-ID: References: Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="0000000000002cd62305cdd5c5fa" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="37071"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBCWMVYEK54FRB3GZQCFQMGQECYGECYA-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Fri Oct 08 13:35:10 2021 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-wr1-f58.google.com ([209.85.221.58]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1mYo9e-0009Rz-JS for gtp-pandoc-discuss@m.gmane-mx.org; Fri, 08 Oct 2021 13:35:10 +0200 Original-Received: by mail-wr1-f58.google.com with SMTP id s18-20020adfbc12000000b00160b2d4d5ebsf7129281wrg.7 for ; Fri, 08 Oct 2021 04:35:10 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1633692910; cv=pass; d=google.com; s=arc-20160816; b=JsVMSHruyjNajn6eJHfJAgubwc8xV03bKW3VvJKAYDbIqJaHXvgaLDVKupbQk/4K6b 3bT6qtasXax6HMuCNquSsI3ATps4aqke/IbitIHw7Sg9aUIOcSDYvhudz9OmdQtwO/H4 Yp8YAOXIlbqUfr8MbyjRNQM3IFYbCa4hw5kCaAMlqmR1/+mR8OdXF0vmF/azDNon4eya 1l2UUM1TtNcTSXJJ+2WmCfGK1uxfh0UiN2bU5G4EGroz6Y0RPNWyBw0A7ow8Bdx3jsY+ pnmRWP63mxhWwiKau+XRvoWG/c3cqpite1W3ucRo5vEr25Zswn0ZgFWvD96KIZ/VWNlB lEyg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:to:subject:message-id:date :from:in-reply-to:references:mime-version:sender:dkim-signature :dkim-signature; bh=+rJ3QwaiC2Tfeb933X+R6wX6wDrs19DIPb/cOd18sdw=; b=j8eQ5WpZiMzsyYsDn2zNSEloJ3rHxS5fZN2dWnnnvbyghBhwZlhyZd4MTnmumNmpvx mCB5JmY0hAI7wVSWq/TJEE61FcdekX0hliKmAN+d26UlcocrnxOvbxCDyQqPex823NET SPNJKubQhq0klQEOLX2JsvyKMA8FJXVhQcPYDErOOyNUnUiqB9nbrewKzjPjaLo7EbVD 72sC84+0EnuVXDrqYVT7wTn84U+kwpHf4fe22XJSbsibfUyZ8IOdtIDe0aWIyjRblO8C XLmgTBYN3BFSKW7SJgJBbHTVlKjUbOY8yk1JEL+PCE9o000mEjKYlXMGP2m34/JhDCUR u4wQ== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b="pzfaS/5t"; spf=pass (google.com: domain of melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2a00:1450:4864:20::12a as permitted sender) smtp.mailfrom=melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=sender:mime-version:references:in-reply-to:from:date:message-id :subject:to:x-original-sender:x-original-authentication-results :reply-to:precedence:mailing-list:list-id:list-post:list-help :list-archive:list-subscribe:list-unsubscribe; bh=+rJ3QwaiC2Tfeb933X+R6wX6wDrs19DIPb/cOd18sdw=; b=fkcfptUVbbcWG/dIp6jlmC4WuSluHg3AO6ashMGfvkJYs86xI1cm4kUUxL04F/0tr5 sVXNqYQrQ1rolMSn4GODA77QlEn1dbI0NNnzf+jnSoIzH2NCi822kw1BIKR2UA95WN5f q2CxMeMYyQFsy6aoY8iFS3FD3Ge6aCR0ws8tOvCCH+p960bJvXcpGRsiwo1oR8sgXVf3 G8mZe12uehpjoZiAa946+YUCaBx9Gu4lE1/ibHs6U+sQYydqosdfDY6u4OTOBoUrBbnX lHYS+kjyWZOm9ZDzqE253qm0RGaGTRwWSuyA/aODyfRUl0Ajv+1dm3vAkpfuCri5CxKX YoqQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=+rJ3QwaiC2Tfeb933X+R6wX6wDrs19DIPb/cOd18sdw=; b=L2a2BgqmbnLdXgHOjojWYmnm2D9Kl/u87tv4DD/DyAAocs+ndSgKRkheRt4fVNiP3n skn94+dStfk2a4qlZ4+dHNQBM7W3Cm1f+16T7gycG1GnN8MspJkw6ALtvXE+JmX67Krq 2yLRK4CWbECM+n0z/O3SKMahh9P1GqKnZCYpEBkAdGbqeIt7UoikPGGMdepGZ1jCEypw jRrRxBskKjU99/yX66fM8OwFuhvwdrSE8Qx4clTTte2k5LGJsfd6A27qqrpBgB/TaPpu AGNlGcn5BivNqlDocvRFkx2jJNGZIRQ4Ai1xxf2q7wAWWOjee+USd0D3ts07i9A8xW/D y8gg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=sender:x-gm-message-state:mime-version:references:in-reply-to:from :date:message-id:subject:to:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=+rJ3QwaiC2Tfeb933X+R6wX6wDrs19DIPb/cOd18sdw=; b=kgxBQo96RsFEq8AEHWLGhPDQZ63WUNxG57f0qbPMOU+I6CcJlGToaeZDLPXCHWx8Fv BMY2mqeIt8yOR7nMWIVbtvH0ww7teEz3ZwYeWSQfWPA9mPmaIr7utegFd6lfGiyw0Wbb Pnoc/ytKj0y8dlz754igZzH3YvP8zGHOwCSB/hUBKSW1I2RAMJmGnhocmIlnKfZkRtOJ Q3q5RS4YEfpcMHTI1MMQLtRF248llWXgRuCKNx1bVmSfe2E08VldtddIEibgJZn6by7g oD5tPeXy3C4lM7rfgwLxdAFvRUzcmtuD96ldJnOimpVMlG5TGywRa6x5YPKwl+Hw8SJQ nAuA== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AOAM533ELZJ4wdyWbrWNoBSsO2kHYYlyUiOq4ZnbyX9OWtmiEymOmsfH mBaru+JavIp9A6WodjW1xyo= X-Google-Smtp-Source: ABdhPJzAepZmefeRTaPFUrZ2bBC2W11L+1JyzKRGvdYnfxEsgkuwi+QxxsK6CUVufLfpwum1XvZ9LQ== X-Received: by 2002:a7b:c4c2:: with SMTP id g2mr2875257wmk.134.1633692910292; Fri, 08 Oct 2021 04:35:10 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:adf:a285:: with SMTP id s5ls319790wra.1.gmail; Fri, 08 Oct 2021 04:35:07 -0700 (PDT) X-Received: by 2002:adf:a183:: with SMTP id u3mr3423075wru.330.1633692907572; Fri, 08 Oct 2021 04:35:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1633692907; cv=none; d=google.com; s=arc-20160816; b=PkpNxhccdne8pqTHZTU7ioMsYOWgWOuYX27Fzq4N+eznTwQdrhfjjnO+aXzjGymrKv FVYotJ/IHiVaW6MTCnDyK7RqXeMOP4CkTuQg4cUwCa4nGGJRAPI53djPF9wcReCkVGoc VM8wjwJaFgT4XSth+r5/J0iif/8PXwZ9Xv45bU4RtJiV7yoKHEfgH+W3v1yKS6rraj9u ysPQvyigDz8aSJwjYhv71FXBVnnzc+ZdNEzMXlFDFygts8IWNyYRwBO1GghwOOe/ExIv wNv3y0lI0c09H1UpzqvzXW9wy0JLoSqNAVy7VcjjqICnpJ4YIy0eALjuiYrz9u7NVK6F 1UBw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=72Mw/7IJGb2fh4xwKQS7wsIO9uZOEI9QlaGxHJQ0ezk=; b=VLCxulvar+OvOMQjCEsfueXjMQaOr5wg0g9yGHn1ExAB01z6LfzDmmA4OMLvEVy1Eo onteH8vE/Pi2fxXOYP9QXYaAZEo/gCFYXkdpP14CAM8Rb0ReDKT5EU1PWv/aj8b8QzMY qpAwPDuxyJTRcKa9WeYLNSp6DZO25qMyQtqVjHLwM5IJuU0/0aQqmDc3DoL6fNoDYv2/ SXHWhcXx6oGkdhGiGwNW0o/uTNDJ9LXLEbuvgfzpHRs0BbokRMFR02cobTY8Wq42n+lE nwTpNsFPiGfzbhLSbo8lPrGuvCiCZ4pHx+27iLgoHcD3CNatlquOF8015NJPkajYeusp 3/oA== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b="pzfaS/5t"; spf=pass (google.com: domain of melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2a00:1450:4864:20::12a as permitted sender) smtp.mailfrom=melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Original-Received: from mail-lf1-x12a.google.com (mail-lf1-x12a.google.com. [2a00:1450:4864:20::12a]) by gmr-mx.google.com with ESMTPS id j9si158383wrs.3.2021.10.08.04.35.07 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 08 Oct 2021 04:35:07 -0700 (PDT) Received-SPF: pass (google.com: domain of melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2a00:1450:4864:20::12a as permitted sender) client-ip=2a00:1450:4864:20::12a; Original-Received: by mail-lf1-x12a.google.com with SMTP id n8so36078276lfk.6 for ; Fri, 08 Oct 2021 04:35:07 -0700 (PDT) X-Received: by 2002:a2e:760d:: with SMTP id r13mr2888165ljc.355.1633692903330; Fri, 08 Oct 2021 04:35:03 -0700 (PDT) In-Reply-To: X-Original-Sender: melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b="pzfaS/5t"; spf=pass (google.com: domain of melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2a00:1450:4864:20::12a as permitted sender) smtp.mailfrom=melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:29353 Archived-At: --0000000000002cd62305cdd5c5fa Content-Type: text/plain; charset="UTF-8" In a sense you are in luck because the two HTML headings are parsed as a single HTML raw block, which is what pandoc normally does with embedded block level HTML content in (any kind of?) Markdown, so you could have a Lua filter parse them from HTML into native elements and replace them with those native elements like this ``````lua function RawBlock (raw) if 'html' == raw.format then local html = raw.text local doc = pandoc.read(html, 'html') if doc then return doc.blocks end end return nil end `````` https://pandoc.org/lua-filters.html https://pandoc.org/lua-filters.html#pandoc.read While this does not guarantee that you will not get back any raw HTML, since some HTML might be unrepresentable as native elements you will most probably get back native elements which may or may not contain some raw elements. In this case the success rate will be 100%. HTH, /bpj Den tors 7 okt. 2021 15:32Dominik Wujastyk skrev: > Using > pandoc -v > pandoc 2.14.2 > Compiled with pandoc-types 1.22, texmath 0.12.3.1, skylighting 0.11, > citeproc 0.5, ipynb 0.1.0.1 > > Gfm input example: > > # NAK 1-1079 > >

> Chapter-wise concordance of folios >

>

> Prepared by Dominik Wujastyk (DW) and Andrey Klebanov (AK) >

> > Note that this MS (a single physical object kept at the __NAK__ under the > accession number __1-1079__) > was microfilmed twice, as **A 45-5 (on 16.10.1970)** and **A 1267-11 (on > 16.11.1987)**. Digital copies > of both microfilms are available to us. > > ``` > > Command: > > pandoc -f gfm -t commonmark -o outfile.md infile.gfm > > Commonmark output: > > # NAK 1-1079 > >

> Chapter-wise concordance of folios >

>

> Prepared by Dominik Wujastyk (DW) and Andrey Klebanov (AK) >

> > Note that this MS (a single physical object kept at the **NAK** under > the accession number **1-1079**) was microfilmed twice, as **A 45-5 (on > 16.10.1970)** and **A 1267-11 (on 16.11.1987)**. Digital copies of both > microfilms are available to us. > > > I was expecting that this command would turn the HTML codes in the gfm > file into commonmark Markdown. But it didn't. Am I doing something > silly? Have I failed to understand what commonmark is? The HTML-coded > text does render in Github and editors like Typora. So it seems wrong to > treat them as raw blocks. > > Furthermore, a markdown-encoded table in the gfm document is converted to > an HTML-encoded one. Why? This seems counterintuitive to me. > > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/eca62f3a-d4e3-4459-830c-ca4a3de2d125n%40googlegroups.com > > . > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhCb0_HNVMuZ0S0vOpw-RBmcb3TvV9QHYjHLvEPyRwnqqQ%40mail.gmail.com. --0000000000002cd62305cdd5c5fa Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
In a sense you are in luck because the = two HTML headings are parsed as a single HTML raw block, which is what pand= oc normally does with embedded block level HTML content in (any kind of?) M= arkdown, so you could have a Lua filter parse them from HTML into native el= ements and replace them with those native elements like this

``````lua
func= tion RawBlock (raw)
=C2=A0 if 'html' =3D=3D = raw.format then
=C2=A0 =C2=A0 local html =3D raw.tex= t
=C2=A0 =C2=A0 local doc =3D pandoc.read(html, '= ;html')
=C2=A0 =C2=A0 if doc then return doc.blo= cks end
=C2=A0 end
=C2=A0 ret= urn nil
end
``````



While this does not guarantee th= at you will not get back any raw HTML, since some HTML might be unrepresent= able as native elements you will most probably get back native elements whi= ch may or may not contain some raw elements. In this case the success rate = will be 100%.

HTH,
=

/bpj

Den tors 7 okt. 202= 1 15:32Dominik Wujastyk <wujastyk@= gmail.com> skrev:
Using = =C2=A0
pandoc -v
pandoc 2.14.2
Compiled with pandoc-types 1.22, te= xmath 0.12.3.1, skylighting 0.11,
citeproc 0.5, ipynb 0.1.0.1

Gfm= input example:

# NAK 1-1079=

<h2>
=C2=A0 Chapter-wise concordance of folios
</h2&= gt;
<h3>
=C2=A0 Prepared by Dominik Wujastyk (DW) and Andrey = Klebanov (AK)
</h3>

Note that this MS (a single physical o= bject kept at the __NAK__ under the accession number __1-1079__)
was mi= crofilmed twice, as **A 45-5 (on 16.10.1970)** and **A 1267-11 (on 16.11.19= 87)**. Digital copies
of both microfilms are available to us.

```

Command:=C2=A0

<= /div>
pandoc -f gfm -t commonma= rk -o outfile.md infile.gfm

Commonmark output:

# NAK 1-1079

<h2>
=C2=A0= Chapter-wise concordance of folios
</h2>
<h3>
=C2= =A0 Prepared by Dominik Wujastyk (DW) and Andrey Klebanov (AK)
</h3&= gt;

Note that this MS (a single physical object kept at the **NAK** = under
the accession number **1-1079**) was microfilmed twice, as **A 45-= 5 (on
16.10.1970)** and **A 1267-11 (on 16.11.1987)**. Digital copies of= both
microfilms are available to us.


I was expecting = that this command would turn the HTML codes in the gfm file into commonmark= Markdown.=C2=A0 But it didn't.=C2=A0 Am I doing something silly?=C2=A0= Have I failed to understand what commonmark is?=C2=A0 The HTML-coded text = does render in Github and editors like Typora.=C2=A0 So it seems wrong to t= reat them as raw blocks.

Furthermore, a markdown-encoded table in th= e gfm document is converted to an HTML-encoded one.=C2=A0 Why?=C2=A0 This s= eems counterintuitive to me.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh4Ykp1iOSErHA@public.gmane.org= m.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/eca62f3a-d= 4e3-4459-830c-ca4a3de2d125n%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.= google.com/d/msgid/pandoc-discuss/CADAJKhCb0_HNVMuZ0S0vOpw-RBmcb3TvV9QHYjHL= vEPyRwnqqQ%40mail.gmail.com.
--0000000000002cd62305cdd5c5fa--