public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* HTML to Markdown and >/< entities
@ 2017-07-28 10:14 Benjamin Ullrich
       [not found] ` <6e17b71f-4ce2-4e88-9bc3-fbdbb4d74365-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Benjamin Ullrich @ 2017-07-28 10:14 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1963 bytes --]

I'm new to pandoc and currently experimenting to convert from HTML to Markdown. For this I have created the following sample input text: 


text&nbsp;&nbsp;<strong>strong</strong> <em>emphasize</em> &gt; &amp; &lt; &copy; text <a href=
"http://domain.test" <http://domain.test>>Link Title</a> 

My expectation for the result would be: 

text  **strong** **emphasize** > & < © text [Link Title](http://domain.test) 


Instead, running pandoc like: 

pandoc -f html -t markdown_strict --normalize --wrap none /tmp/pandoc_input 

where /tmp/pandoc_input contains the above mentioned text produces the following output: 


text  **strong** **emphasize** &gt; & &lt; © text [Link Title](
http://domain.test) 

All looks fine except &gt; and &lt; are not converted as I would expect. 

My current system: 

Ubuntu 16.04 LTS 
pandoc 1.16.0.2 

I also tried with pandoc 1.19.1 from Ubuntu Artful, same problem. 

I also tried with pandoc 1.12.0.2 on Ubuntu 14.04 LTS. There &gt; and &lt; are converted but preceded with a \ which is also not what I would expect: 


text  **bold** **italic** \> & \< © text [Link Title](http://domain.test) 


Am I just missing something here? Should it work as I expect or is my expectation wrong? Any input on what could be the problem or how to circumvent the issue would be appreciated. 


Regards, 
Benjamin


-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/6e17b71f-4ce2-4e88-9bc3-fbdbb4d74365%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 3586 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-08-01 22:10 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-28 10:14 HTML to Markdown and &gt;/&lt; entities Benjamin Ullrich
     [not found] ` <6e17b71f-4ce2-4e88-9bc3-fbdbb4d74365-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2017-07-28 11:53   ` Ivan Lazar Miljenovic
     [not found]     ` <CA+u6gbzgquNuuGXLdsCtcdo+JhS2BchV9zQbyJNmaRVa7_UsVQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-07-30 20:42       ` John MacFarlane
     [not found]         ` <20170730204232.GF11715-9Rnp8PDaXcadBw3G0RLmbRFnWt+6NQIA@public.gmane.org>
2017-08-01 22:10           ` Melroch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).