From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/33202 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Kevin Keegan Newsgroups: gmane.text.pandoc Subject: Re: Ignore link attributes and always match a hyperlink or image Date: Wed, 18 Oct 2023 23:30:58 -0700 (PDT) Message-ID: <41091039-be55-4692-bed4-e87aef240f14n@googlegroups.com> References: <1fa1b803-eced-48d5-b96d-153068eacd2bn@googlegroups.com> <3BE27726-13AE-4F51-8BB9-E729A21A62B8@gmail.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_1152_1386628198.1697697058614" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="29219"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBD4OTQPL3ALRBI42YOUQMGQEO7PA4KA-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Thu Oct 19 08:31:04 2023 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-oo1-f55.google.com ([209.85.161.55]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1qtMYi-0007TX-6p for gtp-pandoc-discuss@m.gmane-mx.org; Thu, 19 Oct 2023 08:31:04 +0200 Original-Received: by mail-oo1-f55.google.com with SMTP id 006d021491bc7-58403acbbe2sf9856eaf.2 for ; Wed, 18 Oct 2023 23:31:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20230601; t=1697697063; x=1698301863; darn=m.gmane-mx.org; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:x-original-sender :mime-version:subject:references:in-reply-to:message-id:to:from:date :sender:from:to:cc:subject:date:message-id:reply-to; bh=USESSyDh0heMGZvNkzlMm2jBBrg0UV25cGwIpPDkaAs=; b=pXQuw6LS/zRxQCn+AV8V6f3YnFsZfS+pPYg2auLMlHx76XA0u2kv5wsZWdGj6Ug3iM U/YOjd+hpDgaI5JNA508lsSXUnzDO1p2I6CILJt0T2ev9O728/o6OaIIKxsggKqA1R5n tkPbx9JFXk/n0Bzpy3QI8ertieQ1D5YxIzTf+9oyIt7lY37fEhC5DGlLpzK9h0Jc8FmJ JlN2z0L/nGtMOjfQTfHnOe2Diou8or/5A7DgO/z8uS3KGN3VK/r0ifYJDZhpmZeDUsbr zwpwzFWM6CRsuTgrBlobcQ1urqx4z9gtV/94kwWLOpbsb4REEluXAHGoiW2CYac+p/TN oVfg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1697697063; x=1698301863; darn=m.gmane-mx.org; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:x-original-sender :mime-version:subject:references:in-reply-to:message-id:to:from:date :from:to:cc:subject:date:message-id:reply-to; bh=USESSyDh0heMGZvNkzlMm2jBBrg0UV25cGwIpPDkaAs=; b=GdW1nLL2FvKziQ/pBF9P9kJi9NxF1MmecZrEKhPtW0i2aJmmoGHsdqd+iMxMdoVnxV Al3D9vz8T3+doJ02tlyuhaq7b+W3yLm8cKO1SLHzjCwgY8n2POGYdPdhyx9fu75yudEV PSrxeTr6bg+5CCFzUO7eW6qRqxbqdLK3OgNH2ei15qspM4NfwuPBzBH+FL447aYP5WL2 7ublhzRJyOsWjdwgWC5wg1Mlmdcu3lwC5zvxr1uGVxZ+fOoeAeCHCp3U3NmClzSBLt6a Bh8z2w2EEgPZr2V9/GP6QXRWYTTiu2DC6SdOGERA/c3RJWRjvLtXNG9wtfTPYKup/8n+ VZGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697697063; x=1698301863; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-sender:mime-version:subject:references:in-reply-to :message-id:to:from:date:x-beenthere:x-gm-message-state:sender:from :to:cc:subject:date:message-id:reply-to; bh=USESSyDh0heMGZvNkzlMm2jBBrg0UV25cGwIpPDkaAs=; b=QzB5B/OZhThHsO85EWDFPbFAzqgjUvzxgMClD8Vewsqa2T9BVpPwYESfNg2CeWs/Y/ 8UhXhkRX/oFj1A7c95eQgctBOaTGmpI25t064AJMXJDR6nsKo5fhhKu5HqX2mEkSL5I8 WC9zXo2HceHNyWi7Rc0PIx3rR8fO4o+uobi+LlwF5x8mxcQ9yIbFUkwsXTXCmJsSKRWt JPbSLHOo+PIU9+YJegbPjcvNRAtNFwA/Natr+JknkzbHcQ/FARA5DhdFRS4rn5yI8dsM OKnTwLjhJDz7igmOAoD6JjjBme2ZN3/cu1cmrtact47kkwgpnn Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AOJu0YxsRAqNV/jCD50uEt181PhfjnyEjy/H14d70bzTtx4M8z9E9KW5 ES5vl2Bklf0lonDc+EmsE9k= X-Google-Smtp-Source: AGHT+IGXwHH2FMaGzczSrKO+RoqXOh6wU4kDdnc0fmmjrRYKu4S7P95ki8KeWrihNcgAg7a0QP0OAg== X-Received: by 2002:a4a:d8ce:0:b0:57b:de27:28ed with SMTP id c14-20020a4ad8ce000000b0057bde2728edmr1347961oov.6.1697697063102; Wed, 18 Oct 2023 23:31:03 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a4a:4992:0:b0:581:e081:95d7 with SMTP id z140-20020a4a4992000000b00581e08195d7ls2028312ooa.0.-pod-prod-00-us; Wed, 18 Oct 2023 23:30:59 -0700 (PDT) X-Received: by 2002:a05:6830:3499:b0:6b9:620e:d6a7 with SMTP id c25-20020a056830349900b006b9620ed6a7mr415631otu.1.1697697059136; Wed, 18 Oct 2023 23:30:59 -0700 (PDT) In-Reply-To: <3BE27726-13AE-4F51-8BB9-E729A21A62B8-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> X-Original-Sender: poowaq-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:33202 Archived-At: ------=_Part_1152_1386628198.1697697058614 Content-Type: multipart/alternative; boundary="----=_Part_1153_678488324.1697697058614" ------=_Part_1153_678488324.1697697058614 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks, I didn't expect that from reading the `raw_html` documentation. On Thursday, October 19, 2023 at 8:02:08=E2=80=AFAM UTC+2 John MacFarlane w= rote: > You can try disabling raw_html: -t markdown_strict-raw_html > > > On Oct 18, 2023, at 10:35 PM, Kevin Keegan wrote: > >=20 > > I am trying to convert some naif HTML snippets to markdown, everything= =20 > works great expect for this strange behaviour that I am curious to know i= f=20 > I am missing something in pandoc or I need to fix it myself. > >=20 > > Having this HTML snippet: > > ``` > >

Lorem ipsum dolor sit=20 > amet.

> > ``` > >=20 > > Using `link_attributes` extension, it returns: > > ``` > > $ printf '

Lorem ipsum dolor class=3D"a">sit amet.

' | pandoc --from html --to=20 > markdown_strict+link_attributes > > Lorem [ipsum](#) dolor [sit](#){.a} amet. > > ``` > >=20 > > By omitting it, it returns: > > ``` > > $ printf '

Lorem ipsum dolor class=3D"a">sit amet.

' | pandoc --from html --to markdown_strict > > Lorem [ipsum](#) dolor sit amet. > > ``` > >=20 > > I was wondering if there is a way by omitting the `link_attributes`=20 > extension to replace anyway the hyperlink with extra attributes, ignoring= =20 > the latter. The desired result would be: > > ``` > > $ printf '

Lorem ipsum dolor class=3D"a">sit amet.

' | pandoc --from html --to markdown_strict > > Lorem [ipsum](#) dolor [sit](#) amet. > > ``` > >=20 > > Thank you. > >=20 > > --=20 > > You received this message because you are subscribed to the Google=20 > Groups "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, send= =20 > an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To view this discussion on the web visit=20 > https://groups.google.com/d/msgid/pandoc-discuss/1fa1b803-eced-48d5-b96d-= 153068eacd2bn%40googlegroups.com > . > > --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/41091039-be55-4692-bed4-e87aef240f14n%40googlegroups.com. ------=_Part_1153_678488324.1697697058614 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks, I didn't expect that from reading the `raw_html` documentation.
On Thur= sday, October 19, 2023 at 8:02:08=E2=80=AFAM UTC+2 John MacFarlane wrote:
You can try di= sabling raw_html: -t markdown_strict-raw_html

> On Oct 18, 2023, at 10:35 PM, Kevin Keegan <poo...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>=20
> I am trying to convert some naif HTML snippets to markdown, everyt= hing works great expect for this strange behaviour that I am curious to kno= w if I am missing something in pandoc or I need to fix it myself.
>=20
> Having this HTML snippet:
> ```
> <p>Lorem <a href=3D"#">ipsum</a> dolor= <a href=3D"#" class=3D"a">sit</a> amet.<= ;/p>
> ```
>=20
> Using `link_attributes` extension, it returns:
> ```
> $ printf '<p>Lorem <a href=3D"#">ipsum&l= t;/a> dolor <a href=3D"#" class=3D"a">sit</= a> amet.</p>' | pandoc --from html --to markdown_strict+link_a= ttributes
> Lorem [ipsum](#) dolor [sit](#){.a} amet.
> ```
>=20
> By omitting it, it returns:
> ```
> $ printf '<p>Lorem <a href=3D"#">ipsum&l= t;/a> dolor <a href=3D"#" class=3D"a">sit</= a> amet.</p>' | pandoc --from html --to markdown_strict
> Lorem [ipsum](#) dolor <a href=3D"#" class=3D"a&= quot;>sit</a> amet.
> ```
>=20
> I was wondering if there is a way by omitting the `link_attributes= ` extension to replace anyway the hyperlink with extra attributes, ignoring= the latter. The desired result would be:
> ```
> $ printf '<p>Lorem <a href=3D"#">ipsum&l= t;/a> dolor <a href=3D"#" class=3D"a">sit</= a> amet.</p>' | pandoc --from html --to markdown_strict
> Lorem [ipsum](#) dolor [sit](#) amet.
> ```
>=20
> Thank you.
>=20
> --=20
> You received this message because you are subscribed to the Google= Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, = send an email to pandoc-discus..= .@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/1fa1b803-e= ced-48d5-b96d-153068eacd2bn%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d= /msgid/pandoc-discuss/41091039-be55-4692-bed4-e87aef240f14n%40googlegroups.= com.
------=_Part_1153_678488324.1697697058614-- ------=_Part_1152_1386628198.1697697058614--