* HTML to Markdown and >/< entities
@ 2017-07-28 10:14 Benjamin Ullrich
[not found] ` <6e17b71f-4ce2-4e88-9bc3-fbdbb4d74365-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Benjamin Ullrich @ 2017-07-28 10:14 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 1963 bytes --]
I'm new to pandoc and currently experimenting to convert from HTML to Markdown. For this I have created the following sample input text:
text <strong>strong</strong> <em>emphasize</em> > & < © text <a href=
"http://domain.test" <http://domain.test>>Link Title</a>
My expectation for the result would be:
text **strong** **emphasize** > & < © text [Link Title](http://domain.test)
Instead, running pandoc like:
pandoc -f html -t markdown_strict --normalize --wrap none /tmp/pandoc_input
where /tmp/pandoc_input contains the above mentioned text produces the following output:
text **strong** **emphasize** > & < © text [Link Title](
http://domain.test)
All looks fine except > and < are not converted as I would expect.
My current system:
Ubuntu 16.04 LTS
pandoc 1.16.0.2
I also tried with pandoc 1.19.1 from Ubuntu Artful, same problem.
I also tried with pandoc 1.12.0.2 on Ubuntu 14.04 LTS. There > and < are converted but preceded with a \ which is also not what I would expect:
text **bold** **italic** \> & \< © text [Link Title](http://domain.test)
Am I just missing something here? Should it work as I expect or is my expectation wrong? Any input on what could be the problem or how to circumvent the issue would be appreciated.
Regards,
Benjamin
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/6e17b71f-4ce2-4e88-9bc3-fbdbb4d74365%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
[-- Attachment #1.2: Type: text/html, Size: 3586 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: HTML to Markdown and >/< entities
[not found] ` <6e17b71f-4ce2-4e88-9bc3-fbdbb4d74365-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2017-07-28 11:53 ` Ivan Lazar Miljenovic
[not found] ` <CA+u6gbzgquNuuGXLdsCtcdo+JhS2BchV9zQbyJNmaRVa7_UsVQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Ivan Lazar Miljenovic @ 2017-07-28 11:53 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
On 28 July 2017 at 20:14, Benjamin Ullrich <bulirich-S0/GAf8tV78@public.gmane.org> wrote:
> I'm new to pandoc and currently experimenting to convert from HTML to
> Markdown. For this I have created the following sample input text:
>
> text <strong>strong</strong> <em>emphasize</em> > & <
> © text <a href="http://domain.test">Link Title</a>
>
> My expectation for the result would be:
>
> text **strong** *emphasize* > & < © text [Link Title](http://domain.test)
>
> Instead, running pandoc like:
>
> pandoc -f html -t markdown_strict --normalize --wrap none /tmp/pandoc_input
>
> where /tmp/pandoc_input contains the above mentioned text produces the
> following output:
>
> text **strong** *emphasize* > & < © text [Link
> Title](http://domain.test)
>
> All looks fine except > and < are not converted as I would expect.
>
> My current system:
>
> Ubuntu 16.04 LTS
> pandoc 1.16.0.2
>
> I also tried with pandoc 1.19.1 from Ubuntu Artful, same problem.
>
> I also tried with pandoc 1.12.0.2 on Ubuntu 14.04 LTS. There > and <
> are converted but preceded with a \ which is also not what I would expect:
>
> text **bold** *italic* \> & \< © text [Link Title](http://domain.test)
>
>
> Am I just missing something here? Should it work as I expect or is my
> expectation wrong? Any input on what could be the problem or how to
> circumvent the issue would be appreciated.
Markdown allows for inline html, so if you want > or < you do need to
escape them; the behaviour of how they are escaped by the Markdown
writer seems to have changed though.
>
> Regards,
> Benjamin
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/6e17b71f-4ce2-4e88-9bc3-fbdbb4d74365%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
--
Ivan Lazar Miljenovic
Ivan.Miljenovic-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
http://IvanMiljenovic.wordpress.com
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CA%2Bu6gbzgquNuuGXLdsCtcdo%2BJhS2BchV9zQbyJNmaRVa7_UsVQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: HTML to Markdown and >/< entities
[not found] ` <CA+u6gbzgquNuuGXLdsCtcdo+JhS2BchV9zQbyJNmaRVa7_UsVQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-07-30 20:42 ` John MacFarlane
[not found] ` <20170730204232.GF11715-9Rnp8PDaXcadBw3G0RLmbRFnWt+6NQIA@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: John MacFarlane @ 2017-07-30 20:42 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
We need to escape `<` and `>` in general, since these
can have special meanings in Markdown (autolinks,
raw HTML).
We could use either `\<` or `<`; we used the latter
because the former doesn't work with the original
Markdown.pl and derivatives (such as showndown,
python markdown, etc.).
I think it would make sense, though, to use `\<`
and `\>` when `all_symbols_escapable` is set.
I'm going to implement that change.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: HTML to Markdown and >/< entities
[not found] ` <20170730204232.GF11715-9Rnp8PDaXcadBw3G0RLmbRFnWt+6NQIA@public.gmane.org>
@ 2017-08-01 22:10 ` Melroch
0 siblings, 0 replies; 4+ messages in thread
From: Melroch @ 2017-08-01 22:10 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
[-- Attachment #1: Type: text/plain, Size: 1795 bytes --]
Den 30 jul 2017 22:43 skrev "John MacFarlane" <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org>:
We need to escape `<` and `>` in general, since these
can have special meanings in Markdown (autolinks,
raw HTML).
We could use either `\<` or `<`; we used the latter
because the former doesn't work with the original
Markdown.pl and derivatives (such as showndown,
python markdown, etc.).
I think it would make sense, though, to use `\<`
and `\>` when `all_symbols_escapable` is set.
I'm going to implement that change.
I'm glad to hear that. I've been postfiltering generated markdown to
achieve that for some time.
Thanks!
/bpj
--
You received this message because you are subscribed to the Google Groups
"pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/ms
gid/pandoc-discuss/20170730204232.GF11715%40Johns-MacBook-Pro.local.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhCen%2ByX60WC3gFp9vF4hZA5HRQZaP%2B6e4osCsCK4%3D6_KA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
[-- Attachment #2: Type: text/html, Size: 3538 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-08-01 22:10 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-28 10:14 HTML to Markdown and >/< entities Benjamin Ullrich
[not found] ` <6e17b71f-4ce2-4e88-9bc3-fbdbb4d74365-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2017-07-28 11:53 ` Ivan Lazar Miljenovic
[not found] ` <CA+u6gbzgquNuuGXLdsCtcdo+JhS2BchV9zQbyJNmaRVa7_UsVQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-07-30 20:42 ` John MacFarlane
[not found] ` <20170730204232.GF11715-9Rnp8PDaXcadBw3G0RLmbRFnWt+6NQIA@public.gmane.org>
2017-08-01 22:10 ` Melroch
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).