* How to convert from OPML to Markdown without escaping the Markdown @ 2019-04-11 17:39 Patrick Kenny [not found] ` <7dd34cfd-e19f-47b2-a20c-0e62f3901ba4-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 0 siblings, 1 reply; 11+ messages in thread From: Patrick Kenny @ 2019-04-11 17:39 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 1379 bytes --] I'm relatively new to Pandoc and hoping someone can point me in the right direction. I created an outline in some software that exports to OPML. This outline contains HTML tags and Markdown (for example, lists made of *). So, when I export the outline to OPML, it looks like this: <outline text="List title"> <outline text="* Item 1" /> <outline text="* Item 2" /> <outline text="* Item 3" /> I converted from OPML to Markdown like this: pandoc -o -s myfile.md myfile.opml --from=opml --to=commonmark However, this results in the markdown being escaped: List title \* Item 1 \* Item 2 \* Item 3 HTML tags are escaped similarly. How can I turn this escaping off? I want the markdown in the OPML file to be preserved (treated as markdown) upon conversion to markdown. -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/7dd34cfd-e19f-47b2-a20c-0e62f3901ba4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #1.2: Type: text/html, Size: 16180 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <7dd34cfd-e19f-47b2-a20c-0e62f3901ba4-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* Re: How to convert from OPML to Markdown without escaping the Markdown [not found] ` <7dd34cfd-e19f-47b2-a20c-0e62f3901ba4-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2019-04-11 23:58 ` John MacFarlane [not found] ` <yh480kef68xffn.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org> 2019-04-12 8:41 ` BPJ 1 sibling, 1 reply; 11+ messages in thread From: John MacFarlane @ 2019-04-11 23:58 UTC (permalink / raw) To: Patrick Kenny, pandoc-discuss The way the OPML reader and writer work is: - <outline> elements correspond to section headings The text attribute is the heading text - The contents of the _note attribute, if present, are parsed as Markdown and treated as text under the heading. I have never used OPML myself, so I don't have a good sense why it's this way; maybe someone else does. Patrick Kenny <ptmkenny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: > I'm relatively new to Pandoc and hoping someone can point me in the right > direction. > > > I created an outline in some software that exports to OPML. This outline > contains HTML tags and Markdown (for example, lists made of *). > > > So, when I export the outline to OPML, it looks like this: > > > <outline text="List title"> > > <outline text="* Item 1" /> > > <outline text="* Item 2" /> > > <outline text="* Item 3" /> > > > I converted from OPML to Markdown like this: > > > pandoc -o -s myfile.md myfile.opml --from=opml --to=commonmark > > However, this results in the markdown being escaped: > > > List title > > \* Item 1 > > \* Item 2 > > \* Item 3 > > > HTML tags are escaped similarly. > > > How can I turn this escaping off? I want the markdown in the OPML file to > be preserved (treated as markdown) upon conversion to markdown. > > > -- > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/7dd34cfd-e19f-47b2-a20c-0e62f3901ba4%40googlegroups.com. > For more options, visit https://groups.google.com/d/optout. ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <yh480kef68xffn.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>]
* Re: How to convert from OPML to Markdown without escaping the Markdown [not found] ` <yh480kef68xffn.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org> @ 2019-04-12 9:41 ` Patrick Kenny [not found] ` <1cc15235-38f9-49f9-a89b-bd37bf331ccb-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 0 siblings, 1 reply; 11+ messages in thread From: Patrick Kenny @ 2019-04-12 9:41 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 4324 bytes --] Ok, thanks, that's helpful. The problem may stem from the OPML spec being vague and OPML having two rather different use cases: cataloging RSS feeds and storing outlines of books, task lists, and things in outlining software. That said, the OPML spec <http://dev.opml.org/spec2.html> says this about the <outline> element: *Text attribute <http://dev.opml.org/spec2.html#textAttribute>* > Every outline element must have at least a *text* attribute, which is > what is displayed when an outliner > <http://support.opml.org/basicOutlining> opens the OPML file. To omit the > text attribute would render the outline useless in an outliner. This is > what the user would see > <http://images.scripting.com/archiveScriptingCom/2005/10/14/badopml2.gif> -- > clearly an unacceptable user experience. Part of the purpose of producing > OPML is to give users the power to accumulate and organize related > information in an outliner. This is as important a use for OPML as data > interchange. > A missing text attribute in any outline element is an error. > Text attributes may contain encoded HTML markup. So HTML tags should be allowed. If HTML tags are allowed, then Markdown should be allowed as well, since it's basically a shorthand for HTML. Upon further investigation, it's not just the * lists that get escaped. Ordered lists 1. 2. 3. etc. are also escaped as 1\. 2\., and headers are escaped as \#\#\#\#. So it seems to me that OPML is not handled correctly during conversion. I can write a shell script to get around this, but maybe there's a better way? On Friday, April 12, 2019 at 8:58:36 AM UTC+9, John MacFarlane wrote: > > > The way the OPML reader and writer work is: > > - <outline> elements correspond to section headings > The text attribute is the heading text > - The contents of the _note attribute, if present, are > parsed as Markdown and treated as text under the > heading. > > I have never used OPML myself, so I don't have a good > sense why it's this way; maybe someone else does. > > > Patrick Kenny <ptmk...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <javascript:>> writes: > > > I'm relatively new to Pandoc and hoping someone can point me in the > right > > direction. > > > > > > I created an outline in some software that exports to OPML. This > outline > > contains HTML tags and Markdown (for example, lists made of *). > > > > > > So, when I export the outline to OPML, it looks like this: > > > > > > <outline text="List title"> > > > > <outline text="* Item 1" /> > > > > <outline text="* Item 2" /> > > > > <outline text="* Item 3" /> > > > > > > I converted from OPML to Markdown like this: > > > > > > pandoc -o -s myfile.md myfile.opml --from=opml --to=commonmark > > > > However, this results in the markdown being escaped: > > > > > > List title > > > > \* Item 1 > > > > \* Item 2 > > > > \* Item 3 > > > > > > HTML tags are escaped similarly. > > > > > > How can I turn this escaping off? I want the markdown in the OPML file > to > > be preserved (treated as markdown) upon conversion to markdown. > > > > > > -- > > You received this message because you are subscribed to the Google > Groups "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>. > > To post to this group, send email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > <javascript:>. > > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/7dd34cfd-e19f-47b2-a20c-0e62f3901ba4%40googlegroups.com. > > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/1cc15235-38f9-49f9-a89b-bd37bf331ccb%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #1.2: Type: text/html, Size: 8115 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <1cc15235-38f9-49f9-a89b-bd37bf331ccb-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* Re: How to convert from OPML to Markdown without escaping the Markdown [not found] ` <1cc15235-38f9-49f9-a89b-bd37bf331ccb-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2019-04-12 15:06 ` John MacFarlane [not found] ` <m2mukvb6va.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org> 0 siblings, 1 reply; 11+ messages in thread From: John MacFarlane @ 2019-04-12 15:06 UTC (permalink / raw) To: Patrick Kenny, pandoc-discuss Interesting about the spec allowing HTML formatting. We could implement that. I've created an issue: https://github.com/jgm/pandoc/issues/5444 But that's not the only issue here. The other issue is that pandoc maps all outline elements to *section headings*. I get the impression that you're expecting something else, since you're inserting content that doesn't make sense in a heading (like lists). Content that goes under headings should go in the _note attribute. Currently this accepts markdown formatting. We might want to change that to HTML, as noted in the issue. > So HTML tags should be allowed. If HTML tags are allowed, then Markdown > should be allowed as well, since it's basically a shorthand for HTML. No, it's not (look at what pandoc is used for), and no, it shouldn't be allowed. The meaning of a document is different if it's interpreted as HTML or as Markdown. We should accord with the spec and interpret the text attribute as HTML. We should consider doing the same for the _note attribute, but I'd want to hear from current OPML users before making a change. Patrick Kenny <ptmkenny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: > Ok, thanks, that's helpful. > > The problem may stem from the OPML spec being vague and OPML having two > rather different use cases: cataloging RSS feeds and storing outlines of > books, task lists, and things in outlining software. > > That said, the OPML spec <http://dev.opml.org/spec2.html> says this about > the <outline> element: > > *Text attribute <http://dev.opml.org/spec2.html#textAttribute>* >> Every outline element must have at least a *text* attribute, which is >> what is displayed when an outliner >> <http://support.opml.org/basicOutlining> opens the OPML file. To omit the >> text attribute would render the outline useless in an outliner. This is >> what the user would see >> <http://images.scripting.com/archiveScriptingCom/2005/10/14/badopml2.gif> -- >> clearly an unacceptable user experience. Part of the purpose of producing >> OPML is to give users the power to accumulate and organize related >> information in an outliner. This is as important a use for OPML as data >> interchange. >> A missing text attribute in any outline element is an error. >> Text attributes may contain encoded HTML markup. > > So HTML tags should be allowed. If HTML tags are allowed, then Markdown > should be allowed as well, since it's basically a shorthand for HTML. > > Upon further investigation, it's not just the * lists that get escaped. > > Ordered lists 1. 2. 3. etc. are also escaped as 1\. 2\., and headers are > escaped as \#\#\#\#. > > So it seems to me that OPML is not handled correctly during conversion. > > I can write a shell script to get around this, but maybe there's a better > way? > > On Friday, April 12, 2019 at 8:58:36 AM UTC+9, John MacFarlane wrote: >> >> >> The way the OPML reader and writer work is: >> >> - <outline> elements correspond to section headings >> The text attribute is the heading text >> - The contents of the _note attribute, if present, are >> parsed as Markdown and treated as text under the >> heading. >> >> I have never used OPML myself, so I don't have a good >> sense why it's this way; maybe someone else does. >> >> >> Patrick Kenny <ptmk...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <javascript:>> writes: >> >> > I'm relatively new to Pandoc and hoping someone can point me in the >> right >> > direction. >> > >> > >> > I created an outline in some software that exports to OPML. This >> outline >> > contains HTML tags and Markdown (for example, lists made of *). >> > >> > >> > So, when I export the outline to OPML, it looks like this: >> > >> > >> > <outline text="List title"> >> > >> > <outline text="* Item 1" /> >> > >> > <outline text="* Item 2" /> >> > >> > <outline text="* Item 3" /> >> > >> > >> > I converted from OPML to Markdown like this: >> > >> > >> > pandoc -o -s myfile.md myfile.opml --from=opml --to=commonmark >> > >> > However, this results in the markdown being escaped: >> > >> > >> > List title >> > >> > \* Item 1 >> > >> > \* Item 2 >> > >> > \* Item 3 >> > >> > >> > HTML tags are escaped similarly. >> > >> > >> > How can I turn this escaping off? I want the markdown in the OPML file >> to >> > be preserved (treated as markdown) upon conversion to markdown. >> > >> > >> > -- >> > You received this message because you are subscribed to the Google >> Groups "pandoc-discuss" group. >> > To unsubscribe from this group and stop receiving emails from it, send >> an email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>. >> > To post to this group, send email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org >> <javascript:>. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/pandoc-discuss/7dd34cfd-e19f-47b2-a20c-0e62f3901ba4%40googlegroups.com. >> >> > For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/1cc15235-38f9-49f9-a89b-bd37bf331ccb%40googlegroups.com. > For more options, visit https://groups.google.com/d/optout. ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <m2mukvb6va.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>]
* Re: How to convert from OPML to Markdown without escaping the Markdown [not found] ` <m2mukvb6va.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org> @ 2019-04-12 16:26 ` Patrick Kenny [not found] ` <CAE3gDhw7PXHZPKu8SQH__ZV6xB2xv89ZOEx0B3Gd8pZ=-sB7pg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2019-04-13 10:38 ` BP Jonsson 1 sibling, 1 reply; 11+ messages in thread From: Patrick Kenny @ 2019-04-12 16:26 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw [-- Attachment #1: Type: text/plain, Size: 7668 bytes --] Thank you for the clarification about the distinction between HTML and Markdown; I was confused. As for pandoc mapping the outline elements to section headings, that's not ideal in my case (or for others who are using an app like Workflowy or Dynalist to export outlines that contain lots of markdown formatting as OPML, but, the Pandoc documentation on filters has a great example of using behead.hs to adjust the formatting for section headers below a certain threshold, which is all I needed to fix that part for my case. On Sat, Apr 13, 2019 at 12:06 AM John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> wrote: > > Interesting about the spec allowing HTML formatting. > We could implement that. I've created an issue: > > https://github.com/jgm/pandoc/issues/5444 > > But that's not the only issue here. The other issue > is that pandoc maps all outline elements to *section > headings*. I get the impression that you're expecting > something else, since you're inserting content that > doesn't make sense in a heading (like lists). > > Content that goes under headings should go in the > _note attribute. Currently this accepts markdown > formatting. We might want to change that to HTML, > as noted in the issue. > > > So HTML tags should be allowed. If HTML tags are allowed, then Markdown > > should be allowed as well, since it's basically a shorthand for HTML. > > No, it's not (look at what pandoc is used for), and > no, it shouldn't be allowed. The meaning of a document > is different if it's interpreted as HTML or as > Markdown. We should accord with the spec and > interpret the text attribute as HTML. We should > consider doing the same for the _note attribute, but > I'd want to hear from current OPML users before making > a change. > > > Patrick Kenny <ptmkenny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: > > > Ok, thanks, that's helpful. > > > > The problem may stem from the OPML spec being vague and OPML having two > > rather different use cases: cataloging RSS feeds and storing outlines of > > books, task lists, and things in outlining software. > > > > That said, the OPML spec <http://dev.opml.org/spec2.html> says this > about > > the <outline> element: > > > > *Text attribute <http://dev.opml.org/spec2.html#textAttribute>* > >> Every outline element must have at least a *text* attribute, which is > >> what is displayed when an outliner > >> <http://support.opml.org/basicOutlining> opens the OPML file. To omit > the > >> text attribute would render the outline useless in an outliner. This is > >> what the user would see > >> < > http://images.scripting.com/archiveScriptingCom/2005/10/14/badopml2.gif> > -- > >> clearly an unacceptable user experience. Part of the purpose of > producing > >> OPML is to give users the power to accumulate and organize related > >> information in an outliner. This is as important a use for OPML as data > >> interchange. > >> A missing text attribute in any outline element is an error. > >> Text attributes may contain encoded HTML markup. > > > > So HTML tags should be allowed. If HTML tags are allowed, then Markdown > > should be allowed as well, since it's basically a shorthand for HTML. > > > > Upon further investigation, it's not just the * lists that get escaped. > > > > Ordered lists 1. 2. 3. etc. are also escaped as 1\. 2\., and headers are > > escaped as \#\#\#\#. > > > > So it seems to me that OPML is not handled correctly during conversion. > > > > I can write a shell script to get around this, but maybe there's a > better > > way? > > > > On Friday, April 12, 2019 at 8:58:36 AM UTC+9, John MacFarlane wrote: > >> > >> > >> The way the OPML reader and writer work is: > >> > >> - <outline> elements correspond to section headings > >> The text attribute is the heading text > >> - The contents of the _note attribute, if present, are > >> parsed as Markdown and treated as text under the > >> heading. > >> > >> I have never used OPML myself, so I don't have a good > >> sense why it's this way; maybe someone else does. > >> > >> > >> Patrick Kenny <ptmk...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <javascript:>> writes: > >> > >> > I'm relatively new to Pandoc and hoping someone can point me in the > >> right > >> > direction. > >> > > >> > > >> > I created an outline in some software that exports to OPML. This > >> outline > >> > contains HTML tags and Markdown (for example, lists made of *). > >> > > >> > > >> > So, when I export the outline to OPML, it looks like this: > >> > > >> > > >> > <outline text="List title"> > >> > > >> > <outline text="* Item 1" /> > >> > > >> > <outline text="* Item 2" /> > >> > > >> > <outline text="* Item 3" /> > >> > > >> > > >> > I converted from OPML to Markdown like this: > >> > > >> > > >> > pandoc -o -s myfile.md myfile.opml --from=opml --to=commonmark > >> > > >> > However, this results in the markdown being escaped: > >> > > >> > > >> > List title > >> > > >> > \* Item 1 > >> > > >> > \* Item 2 > >> > > >> > \* Item 3 > >> > > >> > > >> > HTML tags are escaped similarly. > >> > > >> > > >> > How can I turn this escaping off? I want the markdown in the OPML > file > >> to > >> > be preserved (treated as markdown) upon conversion to markdown. > >> > > >> > > >> > -- > >> > You received this message because you are subscribed to the Google > >> Groups "pandoc-discuss" group. > >> > To unsubscribe from this group and stop receiving emails from it, > send > >> an email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>. > >> > To post to this group, send email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > >> <javascript:>. > >> > To view this discussion on the web visit > >> > https://groups.google.com/d/msgid/pandoc-discuss/7dd34cfd-e19f-47b2-a20c-0e62f3901ba4%40googlegroups.com. > > >> > >> > For more options, visit https://groups.google.com/d/optout. > >> > > > > -- > > You received this message because you are subscribed to the Google > Groups "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/1cc15235-38f9-49f9-a89b-bd37bf331ccb%40googlegroups.com > . > > For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/m2mukvb6va.fsf%40johnmacfarlane.net > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAE3gDhw7PXHZPKu8SQH__ZV6xB2xv89ZOEx0B3Gd8pZ%3D-sB7pg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #2: Type: text/html, Size: 11811 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <CAE3gDhw7PXHZPKu8SQH__ZV6xB2xv89ZOEx0B3Gd8pZ=-sB7pg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: How to convert from OPML to Markdown without escaping the Markdown [not found] ` <CAE3gDhw7PXHZPKu8SQH__ZV6xB2xv89ZOEx0B3Gd8pZ=-sB7pg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2019-04-12 19:35 ` BP Jonsson 0 siblings, 0 replies; 11+ messages in thread From: BP Jonsson @ 2019-04-12 19:35 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw, Patrick Kenny [-- Attachment #1: Type: text/plain, Size: 2496 bytes --] Den 2019-04-12 kl. 18:26, skrev Patrick Kenny: > Thank you for the clarification about the distinction between HTML and > Markdown; I was confused. > > As for pandoc mapping the outline elements to section headings, that's not > ideal in my case (or for others who are using an app like Workflowy or > Dynalist to export outlines that contain lots of markdown formatting as > OPML, but, the Pandoc documentation on filters has a great example of using > behead.hs to adjust the formatting for section headers below a certain > threshold, which is all I needed to fix that part for my case. I use the attached Perl script to sanitize Markdown produced by Pandoc from Dynalist OPML: md-dyna-clean.pl - cleanup Markdown produced by Pandoc from Dynalist.io OPML $ pandoc input.opml -w markdown --atx-headers\ |perl md-dyna-clean.pl [OPTIONS] >output.md OPTIONS: ------------------------------ ------------------------------------ -l <INT>, --list-level=<INT> Highest heading level to turn into nested unordered list. Default: 3 -t <INT>, --tab-stop=<INT> Number of spaces per tab. Default: 4 -h, --help Show this help text and exit. -------------------------------------------------------------------- DESCRIPTION: This Perl script does the following to clean up Markdown produced by Pandoc <http://pandoc.org> from OPML exported from Dynalist <https://dynalist.io>: - Remove backslash escapes in headings (because Pandoc escapes punctuation in the text attribute.) - Turn headings at a certain threshold and below into nested unordered lists. The highest level to "listify" can be set with the -l option. If you want another number of spaces than 4 per indent level you can set it with the -t option. CAVEAT: This is not a Pandoc JSON filter, but a line-by-line text filter which modifies the Markdown text output by Pandoc. -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/c0b472ee-2aba-9fb1-10b7-e86955761895%40gmail.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #2: md-dyna-clean.pl --] [-- Type: application/x-perl, Size: 2468 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: How to convert from OPML to Markdown without escaping the Markdown [not found] ` <m2mukvb6va.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org> 2019-04-12 16:26 ` Patrick Kenny @ 2019-04-13 10:38 ` BP Jonsson [not found] ` <bdf2bba3-35f6-4358-8340-550b9f2b6cac-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 1 sibling, 1 reply; 11+ messages in thread From: BP Jonsson @ 2019-04-13 10:38 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw, John MacFarlane, Patrick Kenny Den 2019-04-12 kl. 17:06, skrev John MacFarlane: > > Interesting about the spec allowing HTML formatting. > We could implement that. I've created an issue: > > https://github.com/jgm/pandoc/issues/5444 > > But that's not the only issue here. The other issue > is that pandoc maps all outline elements to *section > headings*. I get the impression that you're expecting > something else, since you're inserting content that > doesn't make sense in a heading (like lists). > > Content that goes under headings should go in the > _note attribute. Currently this accepts markdown > formatting. We might want to change that to HTML, > as noted in the issue. Much of the confusion and trouble is due to Dynalist's interface and how Dynalist exports and imports OPML. Dynalist documents are displayed as (what looks like) nested HTML lists. Each list item corresponds to an OPML outline element where the list item text is in the text attribute. You can also have notes associated with list items but it takes extra effort: you have to open a menu, press an extra key (Shift+Enter) while just Enter creates a new item (outline text), so the interface invites you to ignore/not use notes. As for markup Dynalist supports what ostensibly is a subset of markdown[^1] inside both items and notes, but *not* raw HTML. It exports OPML containing the literal "markdown" without converting it to HTML. The result is that you get backslash escaped "markdown" punctuation in headings when converting Dynalist OPML to Markdown with Pandoc, and typically the document will consist almost entirely of headings to a great depth (which Pandoc happily converts to headings below level 6!). [^1]: Although their markup is weird, they have `` __italic__ **bold** `code` `` --- note the double underscores for italic! I don't correct that automatically in the script I posted yesterday because I generally want to inspect things first and then do a `:%s/\v_(_.{-}_)_/\1/gc` in Vim to get better control. The attached screen capture shows what a "document" looks like in the Dynalist web interface. When exporting this as OPML it looks like this: ``` {.xml} <?xml version="1.0" encoding="utf-8"?> <opml version="2.0"> <head> <title></title> <flavor>dynalist</flavor> <source>https://dynalist.io</source> <ownerName>...</ownerName> <ownerEmail>...</ownerEmail> </head> <body> <outline text="Untitled"> <outline text="Soluta Laborum"> <outline text="Et Modi" _note="Porro Aut Pariatur Velit"> <outline text="Sint Laboriosam" _note="In Sed"/> </outline> <outline text="Debitis Harum"> <outline text="Velit Libero" _note="Maxime __provident__ **nemo** `assumenda`."/> </outline> <outline text="Impedit Voluptas"/> </outline> </outline> </body> </opml> ``` Maybe whether Pandoc shall interpret the contents of the `text` and `_note` attributes could be made subject to the `raw_html` extension --- which could then be on by default for OPML. (I have posted this in the GitHub issue as well.) > >> So HTML tags should be allowed. If HTML tags are allowed, then Markdown >> should be allowed as well, since it's basically a shorthand for HTML. > > No, it's not (look at what pandoc is used for), and > no, it shouldn't be allowed. The meaning of a document > is different if it's interpreted as HTML or as > Markdown. We should accord with the spec and > interpret the text attribute as HTML. We should > consider doing the same for the _note attribute, but > I'd want to hear from current OPML users before making > a change. > > > Patrick Kenny <ptmkenny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: > >> Ok, thanks, that's helpful. >> >> The problem may stem from the OPML spec being vague and OPML having two >> rather different use cases: cataloging RSS feeds and storing outlines of >> books, task lists, and things in outlining software. >> >> That said, the OPML spec <http://dev.opml.org/spec2.html> says this about >> the <outline> element: >> >> *Text attribute <http://dev.opml.org/spec2.html#textAttribute>* >>> Every outline element must have at least a *text* attribute, which is >>> what is displayed when an outliner >>> <http://support.opml.org/basicOutlining> opens the OPML file. To omit the >>> text attribute would render the outline useless in an outliner. This is >>> what the user would see >>> <http://images.scripting.com/archiveScriptingCom/2005/10/14/badopml2.gif> -- >>> clearly an unacceptable user experience. Part of the purpose of producing >>> OPML is to give users the power to accumulate and organize related >>> information in an outliner. This is as important a use for OPML as data >>> interchange. >>> A missing text attribute in any outline element is an error. >>> Text attributes may contain encoded HTML markup. >> >> So HTML tags should be allowed. If HTML tags are allowed, then Markdown >> should be allowed as well, since it's basically a shorthand for HTML. >> >> Upon further investigation, it's not just the * lists that get escaped. >> >> Ordered lists 1. 2. 3. etc. are also escaped as 1\. 2\., and headers are >> escaped as \#\#\#\#. >> >> So it seems to me that OPML is not handled correctly during conversion. >> >> I can write a shell script to get around this, but maybe there's a better >> way? >> >> On Friday, April 12, 2019 at 8:58:36 AM UTC+9, John MacFarlane wrote: >>> >>> >>> The way the OPML reader and writer work is: >>> >>> - <outline> elements correspond to section headings >>> The text attribute is the heading text >>> - The contents of the _note attribute, if present, are >>> parsed as Markdown and treated as text under the >>> heading. >>> >>> I have never used OPML myself, so I don't have a good >>> sense why it's this way; maybe someone else does. >>> >>> >>> Patrick Kenny <ptmk...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <javascript:>> writes: >>> >>>> I'm relatively new to Pandoc and hoping someone can point me in the >>> right >>>> direction. >>>> >>>> >>>> I created an outline in some software that exports to OPML. This >>> outline >>>> contains HTML tags and Markdown (for example, lists made of *). >>>> >>>> >>>> So, when I export the outline to OPML, it looks like this: >>>> >>>> >>>> <outline text="List title"> >>>> >>>> <outline text="* Item 1" /> >>>> >>>> <outline text="* Item 2" /> >>>> >>>> <outline text="* Item 3" /> >>>> >>>> >>>> I converted from OPML to Markdown like this: >>>> >>>> >>>> pandoc -o -s myfile.md myfile.opml --from=opml --to=commonmark >>>> >>>> However, this results in the markdown being escaped: >>>> >>>> >>>> List title >>>> >>>> \* Item 1 >>>> >>>> \* Item 2 >>>> >>>> \* Item 3 >>>> >>>> >>>> HTML tags are escaped similarly. >>>> >>>> >>>> How can I turn this escaping off? I want the markdown in the OPML file >>> to >>>> be preserved (treated as markdown) upon conversion to markdown. >>>> >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>> Groups "pandoc-discuss" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>. >>>> To post to this group, send email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org >>> <javascript:>. >>>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/pandoc-discuss/7dd34cfd-e19f-47b2-a20c-0e62f3901ba4%40googlegroups.com. >>> >>>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- >> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org >> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/1cc15235-38f9-49f9-a89b-bd37bf331ccb%40googlegroups.com. >> For more options, visit https://groups.google.com/d/optout. > ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <bdf2bba3-35f6-4358-8340-550b9f2b6cac-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: How to convert from OPML to Markdown without escaping the Markdown [not found] ` <bdf2bba3-35f6-4358-8340-550b9f2b6cac-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2019-04-13 10:59 ` Benct Philip Jonsson 2019-04-13 11:17 ` Benct Philip Jonsson 1 sibling, 0 replies; 11+ messages in thread From: Benct Philip Jonsson @ 2019-04-13 10:59 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw, Patrick Kenny [-- Attachment #1: Type: text/plain, Size: 670 bytes --] Den 2019-04-13 kl. 12:38, skrev BP Jonsson: > The attached screen capture Which I forgot. Here it is! -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/f87c6253-52a9-e135-3a3e-903e9d8d7e85%40gmail.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #2: dynalist-capture.png --] [-- Type: image/png, Size: 17452 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: How to convert from OPML to Markdown without escaping the Markdown [not found] ` <bdf2bba3-35f6-4358-8340-550b9f2b6cac-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2019-04-13 10:59 ` Benct Philip Jonsson @ 2019-04-13 11:17 ` Benct Philip Jonsson [not found] ` <0605902c-ffad-1916-b5af-bf96041dc266-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 1 sibling, 1 reply; 11+ messages in thread From: Benct Philip Jonsson @ 2019-04-13 11:17 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw, Patrick Kenny Followup: The Dynalist HTML export ````html <!DOCTYPE html><html><body><ul><li>Untitled<ul><li>Soluta Laborum<ul><li>Et Modi<br>Porro Aut<br><br>Pariatur Velit<ul><li>Sint Laboriosam<br>In Sed</li></ul></li><li>Debitis Harum<ul><li>Velit Libero<br>Maxime <i>provident</i> <b>nemo</b> `assumenda`.</li></ul></li><li>Impedit Voluptas</li></ul></li></ul></li></ul></body></html> ```` or with some line breaks added for readability ````html <!DOCTYPE html> <html> <body> <ul> <li>Untitled <ul> <li>Soluta Laborum <ul> <li>Et Modi <br>Porro Aut <br> <br>Pariatur Velit <ul> <li>Sint Laboriosam <br>In Sed</li></ul></li> <li>Debitis Harum <ul> <li>Velit Libero <br>Maxime <i>provident</i> <b>nemo</b> `assumenda`.</li></ul></li> <li>Impedit Voluptas</li></ul></li></ul></li></ul></body></html> ```` in some ways gives results which probably are more like what the OP probably expects when converted to Markdown with Pandoc ````markdown - Untitled - Soluta Laborum - Et Modi\ Porro Aut\ \ Pariatur Velit - Sint Laboriosam\ In Sed - Debitis Harum - Velit Libero\ Maxime *provident* **nemo** \`assumenda\`. - Impedit Voluptas ```` There are still some issues, like hard line breaks instead of paragraphs, and the code showing up as raw markdown in the HTML instead of inside `<code>` tags. ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <0605902c-ffad-1916-b5af-bf96041dc266-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: How to convert from OPML to Markdown without escaping the Markdown [not found] ` <0605902c-ffad-1916-b5af-bf96041dc266-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2019-04-17 11:21 ` Patrick Kenny 0 siblings, 0 replies; 11+ messages in thread From: Patrick Kenny @ 2019-04-17 11:21 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw [-- Attachment #1: Type: text/plain, Size: 2531 bytes --] Thanks for sharing the script; that's a very helpful reference. I'm actually using Outlinely and Workflowy, which produce OPML that has slightly different issues, but by referencing your script I was able to make some progress and I've almost got it working now, so thank you! On Sat, Apr 13, 2019 at 8:17 PM Benct Philip Jonsson <bpjonsson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > Followup: > > The Dynalist HTML export > > ````html > <!DOCTYPE html><html><body><ul><li>Untitled<ul><li>Soluta > Laborum<ul><li>Et Modi<br>Porro Aut<br><br>Pariatur > Velit<ul><li>Sint Laboriosam<br>In Sed</li></ul></li><li>Debitis > Harum<ul><li>Velit Libero<br>Maxime <i>provident</i> <b>nemo</b> > `assumenda`.</li></ul></li><li>Impedit > Voluptas</li></ul></li></ul></li></ul></body></html> > ```` > > or with some line breaks added for readability > > ````html > <!DOCTYPE html> > <html> > <body> > <ul> > <li>Untitled > <ul> > <li>Soluta Laborum > <ul> > <li>Et Modi > <br>Porro Aut > <br> > <br>Pariatur Velit > <ul> > <li>Sint Laboriosam > <br>In Sed</li></ul></li> > <li>Debitis Harum > <ul> > <li>Velit Libero > <br>Maxime > <i>provident</i> > <b>nemo</b> `assumenda`.</li></ul></li> > <li>Impedit Voluptas</li></ul></li></ul></li></ul></body></html> > ```` > > in some ways gives results which probably are more like what the > OP probably expects when converted to Markdown with Pandoc > > ````markdown > - Untitled > - Soluta Laborum > - Et Modi\ > Porro Aut\ > \ > Pariatur Velit > - Sint Laboriosam\ > In Sed > - Debitis Harum > - Velit Libero\ > Maxime *provident* **nemo** \`assumenda\`. > - Impedit Voluptas > ```` > > There are still some issues, like hard line breaks instead of > paragraphs, and the code showing up as raw markdown in the HTML > instead of inside `<code>` tags. > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAE3gDhw0dWYQvYy9p_PF-QNno-%3DiCUaKOgSEE%2BLwipYcN9rVmQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #2: Type: text/html, Size: 4026 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: How to convert from OPML to Markdown without escaping the Markdown [not found] ` <7dd34cfd-e19f-47b2-a20c-0e62f3901ba4-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2019-04-11 23:58 ` John MacFarlane @ 2019-04-12 8:41 ` BPJ 1 sibling, 0 replies; 11+ messages in thread From: BPJ @ 2019-04-12 8:41 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw [-- Attachment #1: Type: text/plain, Size: 2503 bytes --] Possibly each `<outline>` is parsed as a separate piece of Markdown so the parser sees not a list but each as a Paragraph starting with an asterisk. Still a bit strange. I'd expect single-item lists to be permitted. Den tors 11 apr. 2019 19:40Patrick Kenny <ptmkenny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev: > I'm relatively new to Pandoc and hoping someone can point me in the right > direction. > > > I created an outline in some software that exports to OPML. This outline > contains HTML tags and Markdown (for example, lists made of *). > > > So, when I export the outline to OPML, it looks like this: > > > <outline text="List title"> > > <outline text="* Item 1" /> > > <outline text="* Item 2" /> > > <outline text="* Item 3" /> > > > I converted from OPML to Markdown like this: > > > pandoc -o -s myfile.md myfile.opml --from=opml --to=commonmark > > However, this results in the markdown being escaped: > > > List title > > \* Item 1 > > \* Item 2 > > \* Item 3 > > > HTML tags are escaped similarly. > > > How can I turn this escaping off? I want the markdown in the OPML file to > be preserved (treated as markdown) upon conversion to markdown. > > > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/7dd34cfd-e19f-47b2-a20c-0e62f3901ba4%40googlegroups.com > <https://groups.google.com/d/msgid/pandoc-discuss/7dd34cfd-e19f-47b2-a20c-0e62f3901ba4%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhDVk8NCd1tYPfG3A8a-7o6XJwOLR5WrBLZRSCr0QSgs1g%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #2: Type: text/html, Size: 12110 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2019-04-17 11:21 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-04-11 17:39 How to convert from OPML to Markdown without escaping the Markdown Patrick Kenny [not found] ` <7dd34cfd-e19f-47b2-a20c-0e62f3901ba4-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2019-04-11 23:58 ` John MacFarlane [not found] ` <yh480kef68xffn.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org> 2019-04-12 9:41 ` Patrick Kenny [not found] ` <1cc15235-38f9-49f9-a89b-bd37bf331ccb-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2019-04-12 15:06 ` John MacFarlane [not found] ` <m2mukvb6va.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org> 2019-04-12 16:26 ` Patrick Kenny [not found] ` <CAE3gDhw7PXHZPKu8SQH__ZV6xB2xv89ZOEx0B3Gd8pZ=-sB7pg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2019-04-12 19:35 ` BP Jonsson 2019-04-13 10:38 ` BP Jonsson [not found] ` <bdf2bba3-35f6-4358-8340-550b9f2b6cac-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2019-04-13 10:59 ` Benct Philip Jonsson 2019-04-13 11:17 ` Benct Philip Jonsson [not found] ` <0605902c-ffad-1916-b5af-bf96041dc266-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2019-04-17 11:21 ` Patrick Kenny 2019-04-12 8:41 ` BPJ
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).