From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/29047 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Bastien DUMONT Newsgroups: gmane.text.pandoc Subject: Re: Is there a way to change the way Pandoc parses HTML inside of markdown documents? Date: Tue, 17 Aug 2021 11:24:32 +0000 Message-ID: References: Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="30174"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBDCINCES2QJRBLFZ52EAMGQE2EETPTQ-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Tue Aug 17 13:25:36 2021 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-wm1-f59.google.com ([209.85.128.59]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1mFxDq-0007by-UP for gtp-pandoc-discuss@m.gmane-mx.org; Tue, 17 Aug 2021 13:25:34 +0200 Original-Received: by mail-wm1-f59.google.com with SMTP id o3-20020a05600c510300b002e6dd64e896sf772776wms.1 for ; Tue, 17 Aug 2021 04:25:34 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1629199534; cv=pass; d=google.com; s=arc-20160816; b=stj7KT0uZGXZLBTlWFRyasnREAXHb6Pxvj6pTlP5dshm9J99HswW8UbYjxMmSEGv9u YFH/e2tw0f1w7Pd6rRftM2ejCwULlLIsp1qP1+8KKqlIxb5qaE0O49W6TtEDlIAnSvpA V/RtNFjFAxRCRLaCR+SYK7sDLQFh98w+bvM26lAaHHFFBZCGaJfm75eVQB4yFGTE1m1/ xtAunKpudzgEAVFLQxpV4JUjJ3yqhPu/58JGP0FP8UiYuBhxwvM4Mc6r3+2rYAkq7cfP ZFGgpx4lBMd/pyZqWbbO1/dii8evrkFESZRtOBXb4AyROOabDMMfyiIeJ6GQ/GLGWkRa vPXg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:to:from:date:sender:dkim-signature; bh=XHvWylCAMLQVS24tej2AkeDysjhmehOVpsDo7nzVHRE=; b=fwCoRA/07Qhw4PqTQDZ+Miv3rfoKVhJw3XAYZzrAH5NheY+zU8YpSStwGGq6VbhYe0 +DDWv/kCdfxHLthPNFl3r16pecoYrece8pcbPox3yxCv2N8cYp+QlOmUUYSM6YCQwlb5 KToMNn71ixgqtTNruRZhSPtjALC53/emB/40qt0J6f81qpPOSWHJUZsZ5du8t02pEg7X 0CqO4uJtlcL1SZ7dQRp4gdEeTxoAJXuCNcnldmSdcR5hh7Fat29qK2KGKFoDCLC6Orre nRTrzJul42iUG7o5LeWxG5reYq0Phwy5uk0Ai58FtrVahfG+NtT8KmT4YZDNbN5qeRiR z5pQ== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@posteo.net header.s=2017 header.b=Gp8Nj8fN; spf=pass (google.com: domain of bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org designates 185.67.36.65 as permitted sender) smtp.mailfrom=bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=posteo.net DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:date:from:to:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=XHvWylCAMLQVS24tej2AkeDysjhmehOVpsDo7nzVHRE=; b=bMP53hOxfMLTPfsxVFIe049hApaPVc1jFb6hzYrfu2uebcoYVv+iZOhLm8qvSK1Qjc ub9CnOh7lSmRIGsK02JCOopYFq+46lAcFXlQoMR+7f6oi647Ml66uS+jomvohfqSRdai Ep3cRtfUVwbSGs0Rd6+o6QFyqLqahBrDPRPHUCYAbqiZwhnn3d8K6s7YFc9g74dq940h 9tvWYwkx2ek0D7LDw5Vc1duaRgkbLOVa2mG953kbJ7Xe/HJ2wGNI4QspftTzcNdr+OvE jU0Mxt/0RZoW4fL/3AYTR8LEbNRgAyF3SRKz1yofknMVSEmXzcRn6VWZmJ9T91aXWcHP SCGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:date:from:to:subject:message-id :references:mime-version:content-disposition :content-transfer-encoding:in-reply-to:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=XHvWylCAMLQVS24tej2AkeDysjhmehOVpsDo7nzVHRE=; b=oxHvx57kv5TLrX+jm//TQTMIKGmvXyC3L2Zf3aw3Z3K1GX+X8FYrNrkHYEz09a56ZK JB6+1MrygD4VIl0x+B6dzPATnQtE2sgNcGUZi2EHvwTw45V2sWaMRyz4maWg1yY/W8Yz h1bGsqDmbsf5UmB/3XcsFcDe3kyo7uaUQ9LdRSu0RQlS1UP4brOsmj6MoxM2FwO5DY2V jNHELbRL4jx2VZPasoH1wW6xM3Q+738gPj7f8DWOphdFuBzK/EcDht0kcQMheBsdVk0W ewrrTK/xqSZ3lnsFmmpz7jiSyZbNXE2ASE1ia6jE8f0I Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AOAM530SsY36GuSKJiFZnbN0ykknSZZHJUmPj3IOo4kfm+MWdig2zPAG +UTLzHEyoZDvY3DuDPe2yM8= X-Google-Smtp-Source: ABdhPJyzaZjgw6yUvB2LPGhlQB1YA0IWM7xq/DDPvwwpy/JeGDelYwZrYPCdHUq2JZEIez07UlAtPA== X-Received: by 2002:a1c:7d06:: with SMTP id y6mr2779316wmc.7.1629199534223; Tue, 17 Aug 2021 04:25:34 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a7b:cd8a:: with SMTP id y10ls834190wmj.1.canary-gmail; Tue, 17 Aug 2021 04:25:31 -0700 (PDT) X-Received: by 2002:a1c:2c2:: with SMTP id 185mr2840222wmc.137.1629199531710; Tue, 17 Aug 2021 04:25:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1629199531; cv=none; d=google.com; s=arc-20160816; b=FQIfqyt9Zx/G4wONZjASSMNJIrpy0rGACNwm19B0FZRAJ2p/1oJS/wYSYp6lYFpLYL AbcqK3qwQbV+X5xkqQRAdQBYlPG1RFl/D2375CrrZXFCuv7Y/RGsSAk20rBYIxqkpoK3 vkROxL8VQLk3li3AaXjgBrGN58jOX12O0reoRqVYViU82ITFFhhNwb3AWpMQkiZ3BEZB c7tikUn11cJ7BcfJ1texzvL1lM0fTlIx19EHdPFzFess8+EoRPecq0HGcmF9Brc2xGrE eYS2BohsZXD/pDuefoKsvXz9pVh6kFxvWI/t9kk7fGHoik9GbdBF/Ziub15Q7z+vFnun 4yGw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:to:from:date :dkim-signature; bh=r/4oLcBljWXM/fjt+dwtN4nX2WauIDL62gcGehstzlY=; b=GnIYW4u+lPZVFE+fMS73rggT+KV3PGyqIzxnZa8BdIA/riWljFZAcaZqreXy3hObb1 KPRwJ8K6Tb2ta9xV5pWf5Sj4qFlh9PSWs5/raFohVHi8Aqhn9MUOHkUvBpxyJ3YeZsTo b6WU3/RrEuyIkbhz/9b8YVHWRvKEMs4HFtZwY/85xUACzyNHr6p6xE9hG4V7DEXFWjd9 jRRTx9xxXxr3cpqSRQ/ui6VLtWzWyE4bQyksjc/xIxj0lXdlM7p09cMUx1eEwX05GsyQ opVeW2gZiRkmcccr7IDLyTp8LmPlXr+wv3spFdEhY8hMBevyQrgh/hsnn4JtbHsDZGpK By8g== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@posteo.net header.s=2017 header.b=Gp8Nj8fN; spf=pass (google.com: domain of bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org designates 185.67.36.65 as permitted sender) smtp.mailfrom=bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=posteo.net Original-Received: from mout01.posteo.de (mout01.posteo.de. [185.67.36.65]) by gmr-mx.google.com with ESMTPS id v10si118880wml.2.2021.08.17.04.25.31 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Aug 2021 04:25:31 -0700 (PDT) Received-SPF: pass (google.com: domain of bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org designates 185.67.36.65 as permitted sender) client-ip=185.67.36.65; Original-Received: from submission (posteo.de [89.146.220.130]) by mout01.posteo.de (Postfix) with ESMTPS id 5446E240026 for ; Tue, 17 Aug 2021 13:25:31 +0200 (CEST) Original-Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4GppcC01kSz9rxB for ; Tue, 17 Aug 2021 13:25:30 +0200 (CEST) Content-Disposition: inline In-Reply-To: X-Original-Sender: bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@posteo.net header.s=2017 header.b=Gp8Nj8fN; spf=pass (google.com: domain of bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org designates 185.67.36.65 as permitted sender) smtp.mailfrom=bastien.dumont-VwIFZPTo/vqsTnJN9+BGXg@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=posteo.net Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:29047 Archived-At: > On this structure, we can use a simple filter which replaces `Span` > elements representing the `` tag and replaces with `Underline` > elements. > > ``` > function Span(elem) >=C2=A0 =C2=A0 =C2=A0if elem.classes[1]:gmatch('mark') then >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return pandoc.Underline(elem.content) >=C2=A0 =C2=A0 =C2=A0end > end To apply the same code on a Markdown input file, you can use inline spans l= ike this=C2=A0: `[foo]{.underline} & [bar]{.mark}`. =20 Le Tuesday 17 August 2021 =C3=A0 11:37:21AM, William Lupton a =C3=A9crit : > Could=C2=A0[1]pandoc.read(markup, "html")=C2=A0help? >=20 > On Mon, 16 Aug 2021 at 23:09, John MacFarlane <[2]jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> wrote= : >=20 >=20 > I'm afraid you'll have to write some parsing code... >=20 > pompez <[3]martinsifrar11-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: >=20 > > I'm starting out with Lua filters and apologize for this possibly a= lready > > answered question. You can also read this question on StackOverflow > > <[4]https://stackoverflow.com/questions/68809527/ > is-there-a-way-to-change-the-way-pandoc-parses-html-inside-of-markdow= n-documents > > > > . > > > > I'm using Pandoc to convert markdown to HTML. My markdown files als= o > > contain some raw HTML. In the examples, I'll be using `` and = ``. > > > > Let's say I want to change every `` to a `` tag. We parse = the > > input as HTML and look at the AST. > > > > ``` > > $ echo 'foo & bar' | pandoc --from=3Dhtml --to = native > > [Plain [Underline [Str "foo"],Space,Str "&",Space,Span ("", ["mark"= ],[]) > > [Str "bar"]]] > > ``` > > > > On this structure, we can use a simple filter which replaces `Span` > > elements representing the `` tag and replaces with `Underline= ` > > elements. > > > > ``` > > function Span(elem) > >=C2=A0 =C2=A0 =C2=A0if elem.classes[1]:gmatch('mark') then > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return pandoc.Underline(elem.conte= nt) > >=C2=A0 =C2=A0 =C2=A0end > > end > > ``` > > > > ``` > > [Plain [Underline [Str "foo"],Space,Str "&",Space,Underline [Str "b= ar"]]] > > ``` > > > > This is good. But if we parse the same input as markdown, we get a = much > > less convenient structure. > > > > ``` > > $ echo 'foo & bar' | pandoc --from=3Dmarkdown+r= aw_html > > --to native > > [Para [RawInline (Format "html") "",Str "foo",RawInline (Format > "html") > > "",Space,Str "&",Space,RawInline (Format "html") "",Str > > "bar",RawInline (Format "html") ""]] > > ``` > > > > And if we had some additional criteria by which to replace ``= with > > `` (the content for example), we would have to identify the open= ing > and > > closing `RawInline` elements. > > > > I'm wondering if there is any good solutions to this problem? Is th= ere a > > way to parse HTML in markdown just as HTML would be parsed otherwis= e? Or > is > > there way to solve this in a Lua filter without writing some parsin= g > code? > > > > -- > > You received this message because you are subscribed to the Google = Groups > "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, s= end an > email to [5]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To view this discussion on the web visit [6]https://groups.google.c= om/d/ > msgid/pandoc-discuss/ > aae29ca7-60ca-4349-af03-939f0ac503efn%40googlegroups.com. >=20 > -- > You received this message because you are subscribed to the Google Gr= oups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, sen= d an > email to [7]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit [8]https://groups.google.com= /d/ > msgid/pandoc-discuss/yh480k1r6tt53d.fsf%40johnmacfarlane.net. >=20 > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an= email > to [9]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit [10]https://groups.google.com/d/= msgid/ > pandoc-discuss/ > CAEe_xxj-kp22oToH4o5J54s16W4WzMkiaEicOy%2BTuqDZf5LP3g%40mail.gmail.com. >=20 > References: >=20 > [1] https://pandoc.org/lua-filters.html#pandoc.read > [2] mailto:jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org > [3] mailto:martinsifrar11-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org > [4] https://stackoverflow.com/questions/68809527/is-there-a-way-to-change= -the-way-pandoc-parses-html-inside-of-markdown-documents > [5] mailto:pandoc-discuss%2Bunsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > [6] https://groups.google.com/d/msgid/pandoc-discuss/aae29ca7-60ca-4349-a= f03-939f0ac503efn%40googlegroups.com > [7] mailto:pandoc-discuss%2Bunsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > [8] https://groups.google.com/d/msgid/pandoc-discuss/yh480k1r6tt53d.fsf%4= 0johnmacfarlane.net > [9] mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > [10] https://groups.google.com/d/msgid/pandoc-discuss/CAEe_xxj-kp22oToH4o= 5J54s16W4WzMkiaEicOy%2BTuqDZf5LP3g%40mail.gmail.com?utm_medium=3Demail&utm_= source=3Dfooter --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/YRuccFhI3anHPRPc%40localhost.