* HTML to Markdown and >/< entities @ 2017-07-28 10:14 Benjamin Ullrich [not found] ` <6e17b71f-4ce2-4e88-9bc3-fbdbb4d74365-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 0 siblings, 1 reply; 4+ messages in thread From: Benjamin Ullrich @ 2017-07-28 10:14 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 1963 bytes --] I'm new to pandoc and currently experimenting to convert from HTML to Markdown. For this I have created the following sample input text: text <strong>strong</strong> <em>emphasize</em> > & < © text <a href= "http://domain.test" <http://domain.test>>Link Title</a> My expectation for the result would be: text **strong** **emphasize** > & < © text [Link Title](http://domain.test) Instead, running pandoc like: pandoc -f html -t markdown_strict --normalize --wrap none /tmp/pandoc_input where /tmp/pandoc_input contains the above mentioned text produces the following output: text **strong** **emphasize** > & < © text [Link Title]( http://domain.test) All looks fine except > and < are not converted as I would expect. My current system: Ubuntu 16.04 LTS pandoc 1.16.0.2 I also tried with pandoc 1.19.1 from Ubuntu Artful, same problem. I also tried with pandoc 1.12.0.2 on Ubuntu 14.04 LTS. There > and < are converted but preceded with a \ which is also not what I would expect: text **bold** **italic** \> & \< © text [Link Title](http://domain.test) Am I just missing something here? Should it work as I expect or is my expectation wrong? Any input on what could be the problem or how to circumvent the issue would be appreciated. Regards, Benjamin -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/6e17b71f-4ce2-4e88-9bc3-fbdbb4d74365%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #1.2: Type: text/html, Size: 3586 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <6e17b71f-4ce2-4e88-9bc3-fbdbb4d74365-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* Re: HTML to Markdown and >/< entities [not found] ` <6e17b71f-4ce2-4e88-9bc3-fbdbb4d74365-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2017-07-28 11:53 ` Ivan Lazar Miljenovic [not found] ` <CA+u6gbzgquNuuGXLdsCtcdo+JhS2BchV9zQbyJNmaRVa7_UsVQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 4+ messages in thread From: Ivan Lazar Miljenovic @ 2017-07-28 11:53 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw On 28 July 2017 at 20:14, Benjamin Ullrich <bulirich-S0/GAf8tV78@public.gmane.org> wrote: > I'm new to pandoc and currently experimenting to convert from HTML to > Markdown. For this I have created the following sample input text: > > text <strong>strong</strong> <em>emphasize</em> > & < > © text <a href="http://domain.test">Link Title</a> > > My expectation for the result would be: > > text **strong** *emphasize* > & < © text [Link Title](http://domain.test) > > Instead, running pandoc like: > > pandoc -f html -t markdown_strict --normalize --wrap none /tmp/pandoc_input > > where /tmp/pandoc_input contains the above mentioned text produces the > following output: > > text **strong** *emphasize* > & < © text [Link > Title](http://domain.test) > > All looks fine except > and < are not converted as I would expect. > > My current system: > > Ubuntu 16.04 LTS > pandoc 1.16.0.2 > > I also tried with pandoc 1.19.1 from Ubuntu Artful, same problem. > > I also tried with pandoc 1.12.0.2 on Ubuntu 14.04 LTS. There > and < > are converted but preceded with a \ which is also not what I would expect: > > text **bold** *italic* \> & \< © text [Link Title](http://domain.test) > > > Am I just missing something here? Should it work as I expect or is my > expectation wrong? Any input on what could be the problem or how to > circumvent the issue would be appreciated. Markdown allows for inline html, so if you want > or < you do need to escape them; the behaviour of how they are escaped by the Markdown writer seems to have changed though. > > Regards, > Benjamin > > > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/6e17b71f-4ce2-4e88-9bc3-fbdbb4d74365%40googlegroups.com. > For more options, visit https://groups.google.com/d/optout. -- Ivan Lazar Miljenovic Ivan.Miljenovic-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org http://IvanMiljenovic.wordpress.com -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CA%2Bu6gbzgquNuuGXLdsCtcdo%2BJhS2BchV9zQbyJNmaRVa7_UsVQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout. ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <CA+u6gbzgquNuuGXLdsCtcdo+JhS2BchV9zQbyJNmaRVa7_UsVQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: HTML to Markdown and >/< entities [not found] ` <CA+u6gbzgquNuuGXLdsCtcdo+JhS2BchV9zQbyJNmaRVa7_UsVQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2017-07-30 20:42 ` John MacFarlane [not found] ` <20170730204232.GF11715-9Rnp8PDaXcadBw3G0RLmbRFnWt+6NQIA@public.gmane.org> 0 siblings, 1 reply; 4+ messages in thread From: John MacFarlane @ 2017-07-30 20:42 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw We need to escape `<` and `>` in general, since these can have special meanings in Markdown (autolinks, raw HTML). We could use either `\<` or `<`; we used the latter because the former doesn't work with the original Markdown.pl and derivatives (such as showndown, python markdown, etc.). I think it would make sense, though, to use `\<` and `\>` when `all_symbols_escapable` is set. I'm going to implement that change. ^ permalink raw reply [flat|nested] 4+ messages in thread
[parent not found: <20170730204232.GF11715-9Rnp8PDaXcadBw3G0RLmbRFnWt+6NQIA@public.gmane.org>]
* Re: HTML to Markdown and >/< entities [not found] ` <20170730204232.GF11715-9Rnp8PDaXcadBw3G0RLmbRFnWt+6NQIA@public.gmane.org> @ 2017-08-01 22:10 ` Melroch 0 siblings, 0 replies; 4+ messages in thread From: Melroch @ 2017-08-01 22:10 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw [-- Attachment #1: Type: text/plain, Size: 1795 bytes --] Den 30 jul 2017 22:43 skrev "John MacFarlane" <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org>: We need to escape `<` and `>` in general, since these can have special meanings in Markdown (autolinks, raw HTML). We could use either `\<` or `<`; we used the latter because the former doesn't work with the original Markdown.pl and derivatives (such as showndown, python markdown, etc.). I think it would make sense, though, to use `\<` and `\>` when `all_symbols_escapable` is set. I'm going to implement that change. I'm glad to hear that. I've been postfiltering generated markdown to achieve that for some time. Thanks! /bpj -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/ms gid/pandoc-discuss/20170730204232.GF11715%40Johns-MacBook-Pro.local. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhCen%2ByX60WC3gFp9vF4hZA5HRQZaP%2B6e4osCsCK4%3D6_KA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #2: Type: text/html, Size: 3538 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-08-01 22:10 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-07-28 10:14 HTML to Markdown and >/< entities Benjamin Ullrich [not found] ` <6e17b71f-4ce2-4e88-9bc3-fbdbb4d74365-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2017-07-28 11:53 ` Ivan Lazar Miljenovic [not found] ` <CA+u6gbzgquNuuGXLdsCtcdo+JhS2BchV9zQbyJNmaRVa7_UsVQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-07-30 20:42 ` John MacFarlane [not found] ` <20170730204232.GF11715-9Rnp8PDaXcadBw3G0RLmbRFnWt+6NQIA@public.gmane.org> 2017-08-01 22:10 ` Melroch
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).