From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/29046 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: William Lupton Newsgroups: gmane.text.pandoc Subject: Re: Is there a way to change the way Pandoc parses HTML inside of markdown documents? Date: Tue, 17 Aug 2021 11:37:21 +0100 Message-ID: References: Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="000000000000b292ea05c9bee76b" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="37879"; mail-complaints-to="usenet@ciao.gmane.io" Cc: pompez To: pandoc-discuss Original-X-From: pandoc-discuss+bncBCS4HJ6WSAHBB3NC52EAMGQE3BAGBSQ-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Tue Aug 17 12:37:36 2021 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-lf1-f63.google.com ([209.85.167.63]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1mFwTQ-0009hj-HK for gtp-pandoc-discuss@m.gmane-mx.org; Tue, 17 Aug 2021 12:37:36 +0200 Original-Received: by mail-lf1-f63.google.com with SMTP id bu41-20020a05651216a9b02903c171c5bf72sf5108658lfb.8 for ; Tue, 17 Aug 2021 03:37:36 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1629196656; cv=pass; d=google.com; s=arc-20160816; b=yrWtyTJpBODMbBHvV/YTqTLjc35S1mGT+Mh4mdhZGYyKEk7bJYMlqlj9a8rFxXqy9J YT9mQtYOjhiINv382QXGQYfDavl4mdPjnneGATSPSQxvanm4kPTNlqyhbCMWpZXpJChU xcrPbtwg8DIH7/dd1hlTzyIJS9V7nQVtWHng7wFfIjO35JlI2euLfTdMoqOw/6b0VHf/ lpSuQf7R+OLor5iZp+RjRrXYulozrKIkWzfjWwNbZVtE2Qlr/nZBFSX5job3m0S5B7lb KqwTO6r7EgwtOeIH6K5duewgF74pWkaPcllchF3C31Xbbrvi+F4L0aGcjXYdl0BKveJF 5UKw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:cc:to:subject:message-id :date:from:in-reply-to:references:mime-version:sender:dkim-signature; bh=x+SHHBVdrdxpQdG8iEOitOotiOas8s3urDAEpqPjnSM=; b=0CVmHmBmnCQHEUCzb1OVSI0HzyAPAUObC4MEekatFHHg4M3+yUIDB6ok9NLQUugGEp O3B+/iLHBi3cXVg8wKCw2oax7zVaeN+gp/UnA95DQXlfmsZOCQs0Ew2XRDH0fodtHLVQ F40LdM7fHjOwjZCy9vqhkCiMUYNapPxT/QTFcjAnJpzYZYozPlaktyXXYqTmTli+jjCY XbyNZKFLTpzl12WMkUv1n5PyqipcPEwEkyAnFsKqaCkK8PEea2rYC6RoBUaEcoGkgCqL Agrkx8WAaWB2sA5liW9oDeJ60aMv0t/TiVqr8+r4peDRoqkp+w+tWsVKuENp6cIJPXca QThw== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@broadband-forum-org.20150623.gappssmtp.com header.s=20150623 header.b=QnaOjQVv; spf=pass (google.com: domain of wlupton-QSt+ys/nuMyEUIsrzH9SikB+6BGkLq7r@public.gmane.org designates 2a00:1450:4864:20::62d as permitted sender) smtp.mailfrom=wlupton-QSt+ys/nuMyEUIsrzH9SikB+6BGkLq7r@public.gmane.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:mime-version:references:in-reply-to:from:date:message-id :subject:to:cc:x-original-sender:x-original-authentication-results :reply-to:precedence:mailing-list:list-id:list-post:list-help :list-archive:list-subscribe:list-unsubscribe; bh=x+SHHBVdrdxpQdG8iEOitOotiOas8s3urDAEpqPjnSM=; b=C8iTHnqWaDpy6aFuENKeazTtxeEw3mcCBtJlK5klEHe1URciU92NSEn43Le7vTnXHC hWPYmZ+pRLImZiH7vG1blhKYC4ps2ytYUQCHnx0XLCQ/GROO3WDnfE/v6j1KT7qWRNHt 1DfXc6xgkpRnKJasfiMigOiurdz1GB/1bf2HjirZyzAD9aLG0ZLvAD1vlYkvSPqxw6vt WzMD4DG+0/EAQbUuhLKQNwbOLSNckI+EwYMtBkztlDHz6DjbdKJT46Yxr7cF7K0Wkyda k8BOctDrQLduv0QZGMM+41ogo5kvb6RdWSFvCOM/oR7JQJWaJLSpTNAS9bagBfvOKjNI vWaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:mime-version:references:in-reply-to:from :date:message-id:subject:to:cc:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=x+SHHBVdrdxpQdG8iEOitOotiOas8s3urDAEpqPjnSM=; b=kxd9Dh4+9qH9O6IA4j37274M605EXTJGEyjwUVvbJ5vztyLPFsWq9sNb0ePZdFz9zN uob4rMduroiiXCkFPfgU0L9FJChumHgjYpXmGTjfrxedd412sCUeMM2hgWtC/2x7kCFG Zsv+67uouasNBQlnzhgfQ7kmI5pfkQy81CW0fweQqHvhR4N/PW/KAeMTf+b8tzY6Fow5 sUVbSG2fVR8O/ZpmEbFjonoKDbwuHpjCVMa916B8z4a5msMQSR4sqtYic0w08g9YIkzr UC4QGD5SbFIGBEVd3AgU3RLJss8nR1+LF94X93IZpKn/W62MYcpCJ6vHgQHwvaDcqop2 7LCw== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AOAM533UuAHAcBt8v147VGvcWZLpeuc+mZ3xvsfVE4KAftHhlioVXVzh Cp9LxoYKrRZH3nLBC8TdseA= X-Google-Smtp-Source: ABdhPJyetLYDjyisavZXpGnXkQtKAt9a0cpnlmi6Ig+AIKmH7PdFRuJJiJnErEvNAmx8CVYCT7OZjQ== X-Received: by 2002:a05:6512:e94:: with SMTP id bi20mr1973576lfb.350.1629196655904; Tue, 17 Aug 2021 03:37:35 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:6512:c13:: with SMTP id z19ls713345lfu.0.gmail; Tue, 17 Aug 2021 03:37:32 -0700 (PDT) X-Received: by 2002:a05:6512:36d9:: with SMTP id e25mr1923141lfs.553.1629196652379; Tue, 17 Aug 2021 03:37:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1629196652; cv=none; d=google.com; s=arc-20160816; b=W31s/Jo+34ZMjqB8c69LKH0GNokAKxT3/+uRktT5JXSUT+jCdUXd7+ufNhP0SKSUuz LRIe78OrYr5nQZXq/JkBACmYT4ckSMfd3L93w70EzKz1I8Wa/1PfxYtA/kpvZz8XgL6M UVHevIDfx7GF6Egdfv1pVl4Z5RS4EHpYD7FJC1G/QA/oQJw+S8h6ktSXDHWxK54GBfM2 xWmS9Jpgioqm1nnhikF673nnxq/l8A4wOdJSXPbP4Yv1yEwToEDCIBNvnTz6fbjiIgu5 kmAvo386mUbqZLK4mBkklAr5oa0e4c4RORA6M2fRVbgPnPeGAjgTQQgPxwOtF7wkLfmA MDCA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=1qfQK4FadtbF+hyI1X1svHgaIXRjmmlLZIUODWmh3P0=; b=EbqQKq7EvZ4r+AI3ye3oqH/sK5aqXxpNpBH1X9dP34x5XmcFeY6q0lg7TJvO1sQFJC GeRPsm+b3d1ySjU46iUGA3jSn+25xxudBVN5j5eBI8R6cqwBV8E0mNRRccgIM6XzQe2i jIkMQJ4sXz8nj3w5YkvzzL4zGO5qiFrkr1VWwtBLAaB9Gynolv70Sr0DJw4acEBmG7PG PR6NkpQH9bYXm87DhE+YDDLI/2V/CGnr/VsvYv2/2hflzgZfIWV6tldPtg6vmtv6p6SJ bvI3vv8W3XQm1PTl4mEGRyyeOrCD4hxOnJwr25qFg8XROS+bol5HaN2hghY+T+UulNZI NOpg== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@broadband-forum-org.20150623.gappssmtp.com header.s=20150623 header.b=QnaOjQVv; spf=pass (google.com: domain of wlupton-QSt+ys/nuMyEUIsrzH9SikB+6BGkLq7r@public.gmane.org designates 2a00:1450:4864:20::62d as permitted sender) smtp.mailfrom=wlupton-QSt+ys/nuMyEUIsrzH9SikB+6BGkLq7r@public.gmane.org Original-Received: from mail-ej1-x62d.google.com (mail-ej1-x62d.google.com. [2a00:1450:4864:20::62d]) by gmr-mx.google.com with ESMTPS id v2si76816ljh.8.2021.08.17.03.37.32 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 17 Aug 2021 03:37:32 -0700 (PDT) Received-SPF: pass (google.com: domain of wlupton-QSt+ys/nuMyEUIsrzH9SikB+6BGkLq7r@public.gmane.org designates 2a00:1450:4864:20::62d as permitted sender) client-ip=2a00:1450:4864:20::62d; Original-Received: by mail-ej1-x62d.google.com with SMTP id gt38so16504030ejc.13 for ; Tue, 17 Aug 2021 03:37:32 -0700 (PDT) X-Received: by 2002:a17:906:ed1:: with SMTP id u17mr3203214eji.304.1629196651754; Tue, 17 Aug 2021 03:37:31 -0700 (PDT) In-Reply-To: X-Original-Sender: wlupton-QSt+ys/nuMyEUIsrzH9SikB+6BGkLq7r@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@broadband-forum-org.20150623.gappssmtp.com header.s=20150623 header.b=QnaOjQVv; spf=pass (google.com: domain of wlupton-QSt+ys/nuMyEUIsrzH9SikB+6BGkLq7r@public.gmane.org designates 2a00:1450:4864:20::62d as permitted sender) smtp.mailfrom=wlupton-QSt+ys/nuMyEUIsrzH9SikB+6BGkLq7r@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:29046 Archived-At: --000000000000b292ea05c9bee76b Content-Type: text/plain; charset="UTF-8" Could pandoc.read(markup, "html") help? On Mon, 16 Aug 2021 at 23:09, John MacFarlane wrote: > > I'm afraid you'll have to write some parsing code... > > pompez writes: > > > I'm starting out with Lua filters and apologize for this possibly > already > > answered question. You can also read this question on StackOverflow > > < > https://stackoverflow.com/questions/68809527/is-there-a-way-to-change-the-way-pandoc-parses-html-inside-of-markdown-documents > > > > . > > > > I'm using Pandoc to convert markdown to HTML. My markdown files also > > contain some raw HTML. In the examples, I'll be using `` and ``. > > > > Let's say I want to change every `` to a `` tag. We parse the > > input as HTML and look at the AST. > > > > ``` > > $ echo 'foo & bar' | pandoc --from=html --to native > > [Plain [Underline [Str "foo"],Space,Str "&",Space,Span ("", ["mark"],[]) > > [Str "bar"]]] > > ``` > > > > On this structure, we can use a simple filter which replaces `Span` > > elements representing the `` tag and replaces with `Underline` > > elements. > > > > ``` > > function Span(elem) > > if elem.classes[1]:gmatch('mark') then > > return pandoc.Underline(elem.content) > > end > > end > > ``` > > > > ``` > > [Plain [Underline [Str "foo"],Space,Str "&",Space,Underline [Str "bar"]]] > > ``` > > > > This is good. But if we parse the same input as markdown, we get a much > > less convenient structure. > > > > ``` > > $ echo 'foo & bar' | pandoc --from=markdown+raw_html > > --to native > > [Para [RawInline (Format "html") "",Str "foo",RawInline (Format > "html") > > "",Space,Str "&",Space,RawInline (Format "html") "",Str > > "bar",RawInline (Format "html") ""]] > > ``` > > > > And if we had some additional criteria by which to replace `` with > > `` (the content for example), we would have to identify the opening > and > > closing `RawInline` elements. > > > > I'm wondering if there is any good solutions to this problem? Is there a > > way to parse HTML in markdown just as HTML would be parsed otherwise? Or > is > > there way to solve this in a Lua filter without writing some parsing > code? > > > > -- > > You received this message because you are subscribed to the Google > Groups "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/aae29ca7-60ca-4349-af03-939f0ac503efn%40googlegroups.com > . > > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/yh480k1r6tt53d.fsf%40johnmacfarlane.net > . > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAEe_xxj-kp22oToH4o5J54s16W4WzMkiaEicOy%2BTuqDZf5LP3g%40mail.gmail.com. --000000000000b292ea05c9bee76b Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Could=C2=A0pandoc.read(markup, "html")=C2= =A0help?

On Mon, 16 Aug 2021 at 23:09, John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> wrote:

I'm afraid you'll have to write some parsing code...

pompez <ma= rtinsifrar11-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> I'm starting out with Lua filters and apologize for this possibly = already
> answered question. You can also read this question on StackOverflow > <https://stackoverflow.com/questions/68809= 527/is-there-a-way-to-change-the-way-pandoc-parses-html-inside-of-markdown-= documents>
> .
>
> I'm using Pandoc to convert markdown to HTML. My markdown files al= so
> contain some raw HTML. In the examples, I'll be using `<mark>= ;` and `<u>`.
>
> Let's say I want to change every `<mark>` to a `<u>` t= ag. We parse the
> input as HTML and look at the AST.
>
> ```
> $ echo '<u>foo</u> & <mark>bar</mark>&= #39; | pandoc --from=3Dhtml --to native
> [Plain [Underline [Str "foo"],Space,Str "&",Sp= ace,Span ("", ["mark"],[])
> [Str "bar"]]]
> ```
>
> On this structure, we can use a simple filter which replaces `Span` > elements representing the `<mark>` tag and replaces with `Underl= ine`
> elements.
>
> ```
> function Span(elem)
>=C2=A0 =C2=A0 =C2=A0if elem.classes[1]:gmatch('mark') then
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return pandoc.Underline(elem.content)=
>=C2=A0 =C2=A0 =C2=A0end
> end
> ```
>
> ```
> [Plain [Underline [Str "foo"],Space,Str "&",Sp= ace,Underline [Str "bar"]]]
> ```
>
> This is good. But if we parse the same input as markdown, we get a muc= h
> less convenient structure.
>
> ```
> $ echo '<u>foo</u> & <mark>bar</mark>&= #39; | pandoc --from=3Dmarkdown+raw_html
> --to native
> [Para [RawInline (Format "html") "<u>",Str &= quot;foo",RawInline (Format "html")
> "</u>",Space,Str "&",Space,RawInline (Fo= rmat "html") "<mark>",Str
> "bar",RawInline (Format "html") "</mark>= ;"]]
> ```
>
> And if we had some additional criteria by which to replace `<mark&g= t;` with
> `<u>` (the content for example), we would have to identify the o= pening and
> closing `RawInline` elements.
>
> I'm wondering if there is any good solutions to this problem? Is t= here a
> way to parse HTML in markdown just as HTML would be parsed otherwise? = Or is
> there way to solve this in a Lua filter without writing some parsing c= ode?
>
> --
> You received this message because you are subscribed to the Google Gro= ups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send= an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
> To view this discussion on the web visit https://groups.google.co= m/d/msgid/pandoc-discuss/aae29ca7-60ca-4349-af03-939f0ac503efn%40googlegrou= ps.com.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discu= ss/yh480k1r6tt53d.fsf%40johnmacfarlane.net.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://group= s.google.com/d/msgid/pandoc-discuss/CAEe_xxj-kp22oToH4o5J54s16W4WzMkiaEicOy= %2BTuqDZf5LP3g%40mail.gmail.com.
--000000000000b292ea05c9bee76b--