public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Ignore link attributes and always match a hyperlink or image
@ 2023-10-19  5:35 Kevin Keegan
       [not found] ` <1fa1b803-eced-48d5-b96d-153068eacd2bn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: Kevin Keegan @ 2023-10-19  5:35 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1580 bytes --]

I am trying to convert some naif HTML snippets to markdown, everything 
works great expect for this strange behaviour that I am curious to know if 
I am missing something in pandoc or I need to fix it myself.

Having this HTML snippet:
```
<p>Lorem <a href="#">ipsum</a> dolor <a href="#" class="a">sit</a> amet.</p>
```

Using `link_attributes` extension, it returns:
```
$ printf '<p>Lorem <a href="#">ipsum</a> dolor <a href="#" 
class="a">sit</a> amet.</p>' | pandoc --from html --to 
markdown_strict+link_attributes
Lorem [ipsum](#) dolor [sit](#){.a} amet.
```

By omitting it, it returns:
```
$ printf '<p>Lorem <a href="#">ipsum</a> dolor <a href="#" 
class="a">sit</a> amet.</p>' | pandoc --from html --to markdown_strict
Lorem [ipsum](#) dolor <a href="#" class="a">sit</a> amet.
```

I was wondering if there is a way by omitting the `link_attributes` 
extension to replace anyway the hyperlink with extra attributes, ignoring 
the latter. The desired result would be:
```
$ printf '<p>Lorem <a href="#">ipsum</a> dolor <a href="#" 
class="a">sit</a> amet.</p>' | pandoc --from html --to markdown_strict
Lorem [ipsum](#) dolor [sit](#) amet.
```

Thank you.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/1fa1b803-eced-48d5-b96d-153068eacd2bn%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 2334 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Ignore link attributes and always match a hyperlink or image
       [not found] ` <1fa1b803-eced-48d5-b96d-153068eacd2bn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2023-10-19  6:01   ` John MacFarlane
       [not found]     ` <3BE27726-13AE-4F51-8BB9-E729A21A62B8-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: John MacFarlane @ 2023-10-19  6:01 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

You can try disabling raw_html:  -t markdown_strict-raw_html

> On Oct 18, 2023, at 10:35 PM, Kevin Keegan <poowaq-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> 
> I am trying to convert some naif HTML snippets to markdown, everything works great expect for this strange behaviour that I am curious to know if I am missing something in pandoc or I need to fix it myself.
> 
> Having this HTML snippet:
> ```
> <p>Lorem <a href="#">ipsum</a> dolor <a href="#" class="a">sit</a> amet.</p>
> ```
> 
> Using `link_attributes` extension, it returns:
> ```
> $ printf '<p>Lorem <a href="#">ipsum</a> dolor <a href="#" class="a">sit</a> amet.</p>' | pandoc --from html --to markdown_strict+link_attributes
> Lorem [ipsum](#) dolor [sit](#){.a} amet.
> ```
> 
> By omitting it, it returns:
> ```
> $ printf '<p>Lorem <a href="#">ipsum</a> dolor <a href="#" class="a">sit</a> amet.</p>' | pandoc --from html --to markdown_strict
> Lorem [ipsum](#) dolor <a href="#" class="a">sit</a> amet.
> ```
> 
> I was wondering if there is a way by omitting the `link_attributes` extension to replace anyway the hyperlink with extra attributes, ignoring the latter. The desired result would be:
> ```
> $ printf '<p>Lorem <a href="#">ipsum</a> dolor <a href="#" class="a">sit</a> amet.</p>' | pandoc --from html --to markdown_strict
> Lorem [ipsum](#) dolor [sit](#) amet.
> ```
> 
> Thank you.
> 
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/1fa1b803-eced-48d5-b96d-153068eacd2bn%40googlegroups.com.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Ignore link attributes and always match a hyperlink or image
       [not found]     ` <3BE27726-13AE-4F51-8BB9-E729A21A62B8-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2023-10-19  6:30       ` Kevin Keegan
  0 siblings, 0 replies; 3+ messages in thread
From: Kevin Keegan @ 2023-10-19  6:30 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2518 bytes --]

Thanks, I didn't expect that from reading the `raw_html` documentation.
On Thursday, October 19, 2023 at 8:02:08 AM UTC+2 John MacFarlane wrote:

> You can try disabling raw_html: -t markdown_strict-raw_html
>
> > On Oct 18, 2023, at 10:35 PM, Kevin Keegan <poo...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > 
> > I am trying to convert some naif HTML snippets to markdown, everything 
> works great expect for this strange behaviour that I am curious to know if 
> I am missing something in pandoc or I need to fix it myself.
> > 
> > Having this HTML snippet:
> > ```
> > <p>Lorem <a href="#">ipsum</a> dolor <a href="#" class="a">sit</a> 
> amet.</p>
> > ```
> > 
> > Using `link_attributes` extension, it returns:
> > ```
> > $ printf '<p>Lorem <a href="#">ipsum</a> dolor <a href="#" 
> class="a">sit</a> amet.</p>' | pandoc --from html --to 
> markdown_strict+link_attributes
> > Lorem [ipsum](#) dolor [sit](#){.a} amet.
> > ```
> > 
> > By omitting it, it returns:
> > ```
> > $ printf '<p>Lorem <a href="#">ipsum</a> dolor <a href="#" 
> class="a">sit</a> amet.</p>' | pandoc --from html --to markdown_strict
> > Lorem [ipsum](#) dolor <a href="#" class="a">sit</a> amet.
> > ```
> > 
> > I was wondering if there is a way by omitting the `link_attributes` 
> extension to replace anyway the hyperlink with extra attributes, ignoring 
> the latter. The desired result would be:
> > ```
> > $ printf '<p>Lorem <a href="#">ipsum</a> dolor <a href="#" 
> class="a">sit</a> amet.</p>' | pandoc --from html --to markdown_strict
> > Lorem [ipsum](#) dolor [sit](#) amet.
> > ```
> > 
> > Thank you.
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/pandoc-discuss/1fa1b803-eced-48d5-b96d-153068eacd2bn%40googlegroups.com
> .
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/41091039-be55-4692-bed4-e87aef240f14n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 3925 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-10-19  6:30 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-19  5:35 Ignore link attributes and always match a hyperlink or image Kevin Keegan
     [not found] ` <1fa1b803-eced-48d5-b96d-153068eacd2bn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2023-10-19  6:01   ` John MacFarlane
     [not found]     ` <3BE27726-13AE-4F51-8BB9-E729A21A62B8-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2023-10-19  6:30       ` Kevin Keegan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).