How to programmatically enforcing a pandoc markdown style

public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed

* How to programmatically enforcing a pandoc markdown style
@ 2016-10-22  9:09 Kolen Cheung
       [not found] ` <e82e943f-604e-4a5b-a621-4b3dd82e42c0-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Kolen Cheung @ 2016-10-22  9:09 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1.1: Type: text/plain, Size: 4541 bytes --]

Hi, all,
pandoc from markdown to markdown 

Ever since I read issue #2814 <https://github.com/jgm/pandoc/issues/2814>, 
I find it a very useful trick.

I am now working on a project, that starting from next semester will open 
up to about 100 GSIs to collaboratively update a series of workbooks. I 
want to incorporate the said trick as a cleanup tool to normalize the 
source code (in pandoc markdown with minimal raw LaTeX).

Most things works very well, but however I find a few problems. I don’t 
know if there’s any way to get around these?

   1. ### Main Goals {-} becomes ### Main Goals {#main-goals .unnumbered}: 
   I want to keep using {-} for 2 reasons: shorter, and does not depends on 
   the header (which will gets repeated after cat). 
   2. 1. abcd... becomes 1. abcd...: it seems that pandoc enforce 2 spaces 
   after the enumerated list/bullet list. Are there ways to change this 
   behavior? I suppose I could use a regex to transform it back but it seems 
   to prone to error. 
   3. inline footnotes: I found that pandoc would convert inline footnotes 
   to explicit footnotes with [^1], [^2].... And the use of inline_notes 
   cannot be enforced. I opened an issue in #3172 
   <https://github.com/jgm/pandoc/issues/3172>. I suppose I can change the 
   source code to use explicit footnotes only. But it seems difficult to 
   enforce it and tell people not to use inline footnotes. 
   4. &trade; becomes ™: after studying how trademark should be typeset, 
   considering I aim at HTML+LaTeX output and no non-ascii characters in the 
   source code, I chose &trade;. But pandoc would happily convert that to ™ 
   without my consent. I suppose other such HTML characters might behave 
   similarly. (by the way, input &trade; from markdown would output ™ in 
   TeX, and pdflatex has no problem with that. The resultant PDF looks 
   identical as if I use \texttrademark. Does anyone knows why? I thought 
   pdflatex don’t like unicode.) 
   5. pipe tables becomes HTML tables: I believe it is a bug so I opened issue 
   #3171 <https://github.com/jgm/pandoc/issues/3171>. Even more 
   interestingly, the pipe tables were obtained by a .docx to .md 
   conversion. 

The command I used to enforce “pandoc style” is:

find . -maxdepth 2 -mindepth 2 -iname "*.md" -exec pandoc -f markdown+abbreviations+autolink_bare_uris+markdown_attribute+mmd_header_identifiers+mmd_link_attributes+mmd_title_block+tex_math_double_backslash-latex_macros -t markdown+raw_tex-native_spans-simple_tables-multiline_tables-grid_tables-latex_macros --normalize -s --wrap=none --atx-headers -o {} {} \;

“pandoc lint” 

By the way, does anyone know how to do some sort of “pandoc lint”? 
Currently I checked the TeX output by chktex -q and lacheck, which 
sometimes gives useful typographical hints on what to correct.

And I remembered I read somewhere @jgm mentioned something about a random 
string should be a valid markdown syntax (part of the markdown philosophy 
kind of thing). In this sense it seems very difficult to enforce a “right” 
syntax in markdown.
cat a lot of markdown files into one 

Lastly, there’s a very minor issue: if I cat lots of markdown files into 
one, then between the end of one file to the beginning of another, the lack 
of enough newlines between them might make it a wrong markdown syntax. (
*e.g.* the beginning of a file starts with a heading, some text editors (
*e.g.* Atom) normalized my trailing newline without my consent to 1 empty 
line. So then the heading would start immediately after the last paragraph, 
which pandoc will not parse it as a heading.)

I currently get around this problem with a script to normalize every files 
with exactly 2 trailing empty lines.

I suppose cating markdown files would be a very common process. How 
normally would others do it?

Thanks in advance,
Kolen

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e82e943f-604e-4a5b-a621-4b3dd82e42c0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 22506 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found] ` <e82e943f-604e-4a5b-a621-4b3dd82e42c0-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2016-10-22 20:54   ` John MacFarlane
       [not found]     ` <20161022205406.GB83446-jF64zX8BO091tJRe0FUodcM6rOWSkUom@public.gmane.org>
  2017-02-07 22:36   ` Kolen Cheung
  1 sibling, 1 reply; 39+ messages in thread
From: John MacFarlane @ 2016-10-22 20:54 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

In order to function as a proper linter, pandoc would have
to store much more detail in the AST than it currently does.
Is a header an AST or setext style header?  How many spaces
before, and how many after, a list delimiter?  Inline or
regular footnote?  If regular, what's the label for the
marker?  Inline or reference link?  If reference, what's
the label?  Fenced or indented code?  Was a particular
unicode character entered as an entity or verbatim? Etc.

I just didn't design pandoc to be used as a linter; the
underlying design would need to be very different.


+++ Kolen Cheung [Oct 22 16 02:09 ]:
>   Hi, all,
>
>pandoc from markdown to markdown
>
>   Ever since I read [1]issue #2814, I find it a very useful trick.
>
>   I am now working on a project, that starting from next semester will
>   open up to about 100 GSIs to collaboratively update a series of
>   workbooks. I want to incorporate the said trick as a cleanup tool to
>   normalize the source code (in pandoc markdown with minimal raw LaTeX).
>
>   Most things works very well, but however I find a few problems. I don’t
>   know if there’s any way to get around these?
>    1. ### Main Goals {-} becomes ### Main Goals {#main-goals
>       .unnumbered}: I want to keep using {-} for 2 reasons: shorter, and
>       does not depends on the header (which will gets repeated after
>       cat).
>    2. 1. abcd... becomes 1. abcd...: it seems that pandoc enforce 2
>       spaces after the enumerated list/bullet list. Are there ways to
>       change this behavior? I suppose I could use a regex to transform it
>       back but it seems to prone to error.
>    3. inline footnotes: I found that pandoc would convert inline
>       footnotes to explicit footnotes with [^1], [^2].... And the use of
>       inline_notes cannot be enforced. I opened an [2]issue in #3172. I
>       suppose I can change the source code to use explicit footnotes
>       only. But it seems difficult to enforce it and tell people not to
>       use inline footnotes.
>    4. &trade; becomes ™: after studying how trademark should be typeset,
>       considering I aim at HTML+LaTeX output and no non-ascii characters
>       in the source code, I chose &trade;. But pandoc would happily
>       convert that to ™ without my consent. I suppose other such HTML
>       characters might behave similarly. (by the way, input &trade; from
>       markdown would output ™ in TeX, and pdflatex has no problem with
>       that. The resultant PDF looks identical as if I use \texttrademark.
>       Does anyone knows why? I thought pdflatex don’t like unicode.)
>    5. pipe tables becomes HTML tables: I believe it is a bug so I opened
>       [3]issue #3171. Even more interestingly, the pipe tables were
>       obtained by a .docx to .md conversion.
>
>   The command I used to enforce “pandoc style” is:
>find . -maxdepth 2 -mindepth 2 -iname "*.md" -exec pandoc -f markdown+abbreviati
>ons+autolink_bare_uris+markdown_attribute+mmd_header_identifiers+mmd_link_attrib
>utes+mmd_title_block+tex_math_double_backslash-latex_macros -t markdown+raw_tex-
>native_spans-simple_tables-multiline_tables-grid_tables-latex_macros --normalize
> -s --wrap=none --atx-headers -o {} {} \;
>
>“pandoc lint”
>
>   By the way, does anyone know how to do some sort of “pandoc lint”?
>   Currently I checked the TeX output by chktex -q and lacheck, which
>   sometimes gives useful typographical hints on what to correct.
>
>   And I remembered I read somewhere @jgm mentioned something about a
>   random string should be a valid markdown syntax (part of the markdown
>   philosophy kind of thing). In this sense it seems very difficult to
>   enforce a “right” syntax in markdown.
>
>cat a lot of markdown files into one
>
>   Lastly, there’s a very minor issue: if I cat lots of markdown files
>   into one, then between the end of one file to the beginning of another,
>   the lack of enough newlines between them might make it a wrong markdown
>   syntax. (e.g. the beginning of a file starts with a heading, some text
>   editors (e.g. Atom) normalized my trailing newline without my consent
>   to 1 empty line. So then the heading would start immediately after the
>   last paragraph, which pandoc will not parse it as a heading.)
>
>   I currently get around this problem with a script to normalize every
>   files with exactly 2 trailing empty lines.
>
>   I suppose cating markdown files would be a very common process. How
>   normally would others do it?
>
>   Thanks in advance,
>   Kolen
>   
>
>   --
>   You received this message because you are subscribed to the Google
>   Groups "pandoc-discuss" group.
>   To unsubscribe from this group and stop receiving emails from it, send
>   an email to [4]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>   To post to this group, send email to
>   [5]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>   To view this discussion on the web visit
>   [6]https://groups.google.com/d/msgid/pandoc-discuss/e82e943f-604e-4a5b-
>   a621-4b3dd82e42c0%40googlegroups.com.
>   For more options, visit [7]https://groups.google.com/d/optout.
>
>References
>
>   1. https://github.com/jgm/pandoc/issues/2814
>   2. https://github.com/jgm/pandoc/issues/3172
>   3. https://github.com/jgm/pandoc/issues/3171
>   4. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>   5. mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>   6. https://groups.google.com/d/msgid/pandoc-discuss/e82e943f-604e-4a5b-a621-4b3dd82e42c0-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=email&utm_source=footer
>   7. https://groups.google.com/d/optout

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/20161022205406.GB83446%40MacBook-Air-2.local.
For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found]     ` <20161022205406.GB83446-jF64zX8BO091tJRe0FUodcM6rOWSkUom@public.gmane.org>
@ 2016-10-22 22:41       ` Kolen Cheung
       [not found]         ` <964c8fc2-834a-4f4c-8390-091177a82562-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2016-10-24  8:42       ` Kolen Cheung
  2016-10-27  7:05       ` Kolen Cheung
  2 siblings, 1 reply; 39+ messages in thread
From: Kolen Cheung @ 2016-10-22 22:41 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 7938 bytes --]



Thanks for the info.

So, in a sense, to use pandoc as a linter, the style to enforce will be to 
use the output of pandoc markdown writer as the *de facto* standard. In 
that case, I think point 1 can still cause an issue. Are there anyway for 
the writer to write something that is unnumbered, but do not have the ID? 
If not, may be div & amsthm can be used to get around the issue.

I also want to discover tricks on using linters. As I said, I currently use 
lacheck and chktex to check styles of the tex output, where the source code 
(in markdown) actually benefits from it. In a sense, this is a blessing 
from pandoc since I can output to any format pandoc support and use the 
lint there. May be someone will point out that I could output to ConTeXt, 
or docx to find some linter too.

And lastly, I know if throwing pandoc a bunch of files at the same time, 
pandoc will cat them together first. How pandoc handle the empty line issue 
between the beginning of one and the end of another?

Thanks again.

On Saturday, October 22, 2016 at 1:54:23 PM UTC-7, John MacFarlane wrote:

In order to function as a proper linter, pandoc would have 
> to store much more detail in the AST than it currently does. 
> Is a header an AST or setext style header?  How many spaces 
> before, and how many after, a list delimiter?  Inline or 
> regular footnote?  If regular, what's the label for the 
> marker?  Inline or reference link?  If reference, what's 
> the label?  Fenced or indented code?  Was a particular 
> unicode character entered as an entity or verbatim? Etc. 
>
> I just didn't design pandoc to be used as a linter; the 
> underlying design would need to be very different. 
>
>
> +++ Kolen Cheung [Oct 22 16 02:09 ]: 
> >   Hi, all, 
> > 
> >pandoc from markdown to markdown 
> > 
> >   Ever since I read [1]issue #2814, I find it a very useful trick. 
> > 
> >   I am now working on a project, that starting from next semester will 
> >   open up to about 100 GSIs to collaboratively update a series of 
> >   workbooks. I want to incorporate the said trick as a cleanup tool to 
> >   normalize the source code (in pandoc markdown with minimal raw LaTeX). 
> > 
> >   Most things works very well, but however I find a few problems. I 
> don’t 
> >   know if there’s any way to get around these? 
> >    1. ### Main Goals {-} becomes ### Main Goals {#main-goals 
> >       .unnumbered}: I want to keep using {-} for 2 reasons: shorter, and 
> >       does not depends on the header (which will gets repeated after 
> >       cat). 
> >    2. 1. abcd... becomes 1. abcd...: it seems that pandoc enforce 2 
> >       spaces after the enumerated list/bullet list. Are there ways to 
> >       change this behavior? I suppose I could use a regex to transform 
> it 
> >       back but it seems to prone to error. 
> >    3. inline footnotes: I found that pandoc would convert inline 
> >       footnotes to explicit footnotes with [^1], [^2].... And the use of 
> >       inline_notes cannot be enforced. I opened an [2]issue in #3172. I 
> >       suppose I can change the source code to use explicit footnotes 
> >       only. But it seems difficult to enforce it and tell people not to 
> >       use inline footnotes. 
> >    4. &trade; becomes ™: after studying how trademark should be typeset, 
> >       considering I aim at HTML+LaTeX output and no non-ascii characters 
> >       in the source code, I chose &trade;. But pandoc would happily 
> >       convert that to ™ without my consent. I suppose other such HTML 
> >       characters might behave similarly. (by the way, input &trade; from 
> >       markdown would output ™ in TeX, and pdflatex has no problem with 
> >       that. The resultant PDF looks identical as if I use 
> \texttrademark. 
> >       Does anyone knows why? I thought pdflatex don’t like unicode.) 
> >    5. pipe tables becomes HTML tables: I believe it is a bug so I opened 
> >       [3]issue #3171. Even more interestingly, the pipe tables were 
> >       obtained by a .docx to .md conversion. 
> > 
> >   The command I used to enforce “pandoc style” is: 
> >find . -maxdepth 2 -mindepth 2 -iname "*.md" -exec pandoc -f 
> markdown+abbreviati 
> >ons+autolink_bare_uris+markdown_attribute+mmd_header_identifiers+mmd_link_attrib 
>
> >utes+mmd_title_block+tex_math_double_backslash-latex_macros -t 
> markdown+raw_tex- 
> >native_spans-simple_tables-multiline_tables-grid_tables-latex_macros 
> --normalize 
> > -s --wrap=none --atx-headers -o {} {} \; 
> > 
> >“pandoc lint” 
> > 
> >   By the way, does anyone know how to do some sort of “pandoc lint”? 
> >   Currently I checked the TeX output by chktex -q and lacheck, which 
> >   sometimes gives useful typographical hints on what to correct. 
> > 
> >   And I remembered I read somewhere @jgm mentioned something about a 
> >   random string should be a valid markdown syntax (part of the markdown 
> >   philosophy kind of thing). In this sense it seems very difficult to 
> >   enforce a “right” syntax in markdown. 
> > 
> >cat a lot of markdown files into one 
> > 
> >   Lastly, there’s a very minor issue: if I cat lots of markdown files 
> >   into one, then between the end of one file to the beginning of 
> another, 
> >   the lack of enough newlines between them might make it a wrong 
> markdown 
> >   syntax. (e.g. the beginning of a file starts with a heading, some text 
> >   editors (e.g. Atom) normalized my trailing newline without my consent 
> >   to 1 empty line. So then the heading would start immediately after the 
> >   last paragraph, which pandoc will not parse it as a heading.) 
> > 
> >   I currently get around this problem with a script to normalize every 
> >   files with exactly 2 trailing empty lines. 
> > 
> >   I suppose cating markdown files would be a very common process. How 
> >   normally would others do it? 
> > 
> >   Thanks in advance, 
> >   Kolen 
> >    
> > 
> >   -- 
> >   You received this message because you are subscribed to the Google 
> >   Groups "pandoc-discuss" group. 
> >   To unsubscribe from this group and stop receiving emails from it, send 
> >   an email to [4]pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>. 
> >   To post to this group, send email to 
> >   [5]pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>. 
> >   To view this discussion on the web visit 
> >   [6]
> https://groups.google.com/d/msgid/pandoc-discuss/e82e943f-604e-4a5b- 
> >   a621-4b3dd82e42c0%40googlegroups.com. 
> >   For more options, visit [7]https://groups.google.com/d/optout. 
> > 
> >References 
> > 
> >   1. https://github.com/jgm/pandoc/issues/2814 
> >   2. https://github.com/jgm/pandoc/issues/3172 
> >   3. https://github.com/jgm/pandoc/issues/3171 
> >   4. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:> 
> >   5. mailto:pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:> 
> >   6. 
> https://groups.google.com/d/msgid/pandoc-discuss/e82e943f-604e-4a5b-a621-4b3dd82e42c0-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=email&utm_source=footer 
> >   7. https://groups.google.com/d/optout 
>
> 

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/964c8fc2-834a-4f4c-8390-091177a82562%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 28807 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found]         ` <964c8fc2-834a-4f4c-8390-091177a82562-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2016-10-23  2:36           ` Sergio Correia
       [not found]             ` <e7bd4eba-c43a-4f20-8536-5fa0926b857b-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Sergio Correia @ 2016-10-23  2:36 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 990 bytes --]


>
> And lastly, I know if throwing pandoc a bunch of files at the same time, 
> pandoc will cat them together first. How pandoc handle the empty line 
> issue between the beginning of one and the end of another?
>
I also had problems with this. If I don't leave an empty line at the end of 
a file, and the next one starts with a header ("# A Section"), then 
everything will compile but no title will be created... 

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e7bd4eba-c43a-4f20-8536-5fa0926b857b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 1953 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found]             ` <e7bd4eba-c43a-4f20-8536-5fa0926b857b-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2016-10-23 21:39               ` John MacFarlane
  2016-10-24  7:59               ` Kolen Cheung
  1 sibling, 0 replies; 39+ messages in thread
From: John MacFarlane @ 2016-10-23 21:39 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ Sergio Correia [Oct 22 16 19:36 ]:
>   And lastly, I know if throwing pandoc a bunch of files at the same
>   time, pandoc will cat them together first. How pandoc handle the empty
>   line issue between the beginning of one and the end of another?
>
>   I also had problems with this. If I don't leave an empty line at the
>   end of a file, and the next one starts with a header ("# A Section"),
>   then everything will compile but no title will be created...

This is intentional.  For some purposes you might want to be
able to combine things without blank spaces.  If so, just
leave off the newline from the final line of the first file.

Normally text files end with a newline, so pandoc should
in effect insert a blank line.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found]             ` <e7bd4eba-c43a-4f20-8536-5fa0926b857b-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2016-10-23 21:39               ` John MacFarlane
@ 2016-10-24  7:59               ` Kolen Cheung
  1 sibling, 0 replies; 39+ messages in thread
From: Kolen Cheung @ 2016-10-24  7:59 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2226 bytes --]



Currently, my “fix” to the problem is to enclose a style in the source. An 
example is in ickc/travis-ci-pandoc-latex-config/makefile 
<https://github.com/ickc/travis-ci-pandoc-latex-config/blob/master/makefile#L83-L85>
:

# Normalize white spaces: 1. Add 2 trailing newlines 2. delete all CONSECUTIVE blank lines from file except the first; deletes all blank lines from top and end of file; allows 0 blanks at top, 0,1,2 at EOF 3. delete trailing whitespace (spaces, tabs) from end of each line 
normalize:
    find . -maxdepth 2 -iname "*.md" -exec bash -c 'printf "\n\n" >> "$$0"' {} \; -exec sed -i -e '/./,/^$$/!d' -e 's/[ \t]*$$//' {} \;

where some 1-liners are from ‎sed.sourceforge.net/sed1line.txt 
<http://sed.sourceforge.net/sed1line.txt>.

The 1-liners said it only allow optionally 1 trailing newline. But from my 
test it actually allow 0-2 traling newlines. So the way I enforce 2 
trailing newlines are to add 2 more and use it to remove any more than 2.

This way, it will make sure the end and beginning of the texts of the 2 md 
files being cat are separated by exactly 1 empty line.

On Saturday, October 22, 2016 at 7:36:24 PM UTC-7, Sergio Correia wrote:

And lastly, I know if throwing pandoc a bunch of files at the same time, 
>> pandoc will cat them together first. How pandoc handle the empty line 
>> issue between the beginning of one and the end of another?
>>
> I also had problems with this. If I don't leave an empty line at the end 
> of a file, and the next one starts with a header ("# A Section"), then 
> everything will compile but no title will be created... 
>


-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/2dad80d8-0652-45e6-9fb7-6df98c88c24c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 10296 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found]     ` <20161022205406.GB83446-jF64zX8BO091tJRe0FUodcM6rOWSkUom@public.gmane.org>
  2016-10-22 22:41       ` Kolen Cheung
@ 2016-10-24  8:42       ` Kolen Cheung
  2016-10-27  7:05       ` Kolen Cheung
  2 siblings, 0 replies; 39+ messages in thread
From: Kolen Cheung @ 2016-10-24  8:42 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 8327 bytes --]



Excuse me for getting back to the pandoc lint concept. In a sense, the 
minimal requirement for it to work is this:

Consider a mapping [image: x\mapsto y \mapsto z] where each of the map is 
done by pandoc -t markdown -f markdown. It is not realistic to expect [image: 
x \equiv y], as you pointed out many info is lost in the reader (and if so 
it won’t be a linter too because it is not changing anything). *But*, if [image: 
y \equiv z], than it should be a good “linter”. *i.e.* if it is *idempotent*. 
The point is I could use it to enforce a style, and repeatedly applying it 
would not further change it so the styling is fixed.

In this sense, most of the problems listed in the original post is not a 
problem. The one thing I am worrying is the last point about tables, as in issue 
#3171 <https://github.com/jgm/pandoc/issues/3171>.

Just as I’m brain-storming about it, it seems reasonable to expect *all* 
reader-writer pairs should be *idempotent*. I don’t know if the tests are 
checking for this. If not, it seems it’s a simple tests to detect potential 
error. May be add it as a “allow failure” kind of test?

By the way, I improved the first problem by using -f 
markdown-auto_identifiers.... It still expands {-} to {.unnumbered} though. 
May be actually this is better since any people who are unfamiliar with 
pandoc syntax would still recognize what it means.

On Saturday, October 22, 2016 at 1:54:23 PM UTC-7, John MacFarlane wrote:

In order to function as a proper linter, pandoc would have 
> to store much more detail in the AST than it currently does. 
> Is a header an AST or setext style header?  How many spaces 
> before, and how many after, a list delimiter?  Inline or 
> regular footnote?  If regular, what's the label for the 
> marker?  Inline or reference link?  If reference, what's 
> the label?  Fenced or indented code?  Was a particular 
> unicode character entered as an entity or verbatim? Etc. 
>
> I just didn't design pandoc to be used as a linter; the 
> underlying design would need to be very different. 
>
>
> +++ Kolen Cheung [Oct 22 16 02:09 ]: 
> >   Hi, all, 
> > 
> >pandoc from markdown to markdown 
> > 
> >   Ever since I read [1]issue #2814, I find it a very useful trick. 
> > 
> >   I am now working on a project, that starting from next semester will 
> >   open up to about 100 GSIs to collaboratively update a series of 
> >   workbooks. I want to incorporate the said trick as a cleanup tool to 
> >   normalize the source code (in pandoc markdown with minimal raw LaTeX). 
> > 
> >   Most things works very well, but however I find a few problems. I 
> don’t 
> >   know if there’s any way to get around these? 
> >    1. ### Main Goals {-} becomes ### Main Goals {#main-goals 
> >       .unnumbered}: I want to keep using {-} for 2 reasons: shorter, and 
> >       does not depends on the header (which will gets repeated after 
> >       cat). 
> >    2. 1. abcd... becomes 1. abcd...: it seems that pandoc enforce 2 
> >       spaces after the enumerated list/bullet list. Are there ways to 
> >       change this behavior? I suppose I could use a regex to transform 
> it 
> >       back but it seems to prone to error. 
> >    3. inline footnotes: I found that pandoc would convert inline 
> >       footnotes to explicit footnotes with [^1], [^2].... And the use of 
> >       inline_notes cannot be enforced. I opened an [2]issue in #3172. I 
> >       suppose I can change the source code to use explicit footnotes 
> >       only. But it seems difficult to enforce it and tell people not to 
> >       use inline footnotes. 
> >    4. &trade; becomes ™: after studying how trademark should be typeset, 
> >       considering I aim at HTML+LaTeX output and no non-ascii characters 
> >       in the source code, I chose &trade;. But pandoc would happily 
> >       convert that to ™ without my consent. I suppose other such HTML 
> >       characters might behave similarly. (by the way, input &trade; from 
> >       markdown would output ™ in TeX, and pdflatex has no problem with 
> >       that. The resultant PDF looks identical as if I use 
> \texttrademark. 
> >       Does anyone knows why? I thought pdflatex don’t like unicode.) 
> >    5. pipe tables becomes HTML tables: I believe it is a bug so I opened 
> >       [3]issue #3171. Even more interestingly, the pipe tables were 
> >       obtained by a .docx to .md conversion. 
> > 
> >   The command I used to enforce “pandoc style” is: 
> >find . -maxdepth 2 -mindepth 2 -iname "*.md" -exec pandoc -f 
> markdown+abbreviati 
> >ons+autolink_bare_uris+markdown_attribute+mmd_header_identifiers+mmd_link_attrib 
>
> >utes+mmd_title_block+tex_math_double_backslash-latex_macros -t 
> markdown+raw_tex- 
> >native_spans-simple_tables-multiline_tables-grid_tables-latex_macros 
> --normalize 
> > -s --wrap=none --atx-headers -o {} {} \; 
> > 
> >“pandoc lint” 
> > 
> >   By the way, does anyone know how to do some sort of “pandoc lint”? 
> >   Currently I checked the TeX output by chktex -q and lacheck, which 
> >   sometimes gives useful typographical hints on what to correct. 
> > 
> >   And I remembered I read somewhere @jgm mentioned something about a 
> >   random string should be a valid markdown syntax (part of the markdown 
> >   philosophy kind of thing). In this sense it seems very difficult to 
> >   enforce a “right” syntax in markdown. 
> > 
> >cat a lot of markdown files into one 
> > 
> >   Lastly, there’s a very minor issue: if I cat lots of markdown files 
> >   into one, then between the end of one file to the beginning of 
> another, 
> >   the lack of enough newlines between them might make it a wrong 
> markdown 
> >   syntax. (e.g. the beginning of a file starts with a heading, some text 
> >   editors (e.g. Atom) normalized my trailing newline without my consent 
> >   to 1 empty line. So then the heading would start immediately after the 
> >   last paragraph, which pandoc will not parse it as a heading.) 
> > 
> >   I currently get around this problem with a script to normalize every 
> >   files with exactly 2 trailing empty lines. 
> > 
> >   I suppose cating markdown files would be a very common process. How 
> >   normally would others do it? 
> > 
> >   Thanks in advance, 
> >   Kolen 
> >    
> > 
> >   -- 
> >   You received this message because you are subscribed to the Google 
> >   Groups "pandoc-discuss" group. 
> >   To unsubscribe from this group and stop receiving emails from it, send 
> >   an email to [4]pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>. 
> >   To post to this group, send email to 
> >   [5]pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>. 
> >   To view this discussion on the web visit 
> >   [6]
> https://groups.google.com/d/msgid/pandoc-discuss/e82e943f-604e-4a5b- 
> >   a621-4b3dd82e42c0%40googlegroups.com. 
> >   For more options, visit [7]https://groups.google.com/d/optout. 
> > 
> >References 
> > 
> >   1. https://github.com/jgm/pandoc/issues/2814 
> >   2. https://github.com/jgm/pandoc/issues/3172 
> >   3. https://github.com/jgm/pandoc/issues/3171 
> >   4. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:> 
> >   5. mailto:pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:> 
> >   6. 
> https://groups.google.com/d/msgid/pandoc-discuss/e82e943f-604e-4a5b-a621-4b3dd82e42c0-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=email&utm_source=footer 
> >   7. https://groups.google.com/d/optout 
>
> 

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5405bad9-caa9-439d-b298-72731a144fbd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 27822 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found]     ` <20161022205406.GB83446-jF64zX8BO091tJRe0FUodcM6rOWSkUom@public.gmane.org>
  2016-10-22 22:41       ` Kolen Cheung
  2016-10-24  8:42       ` Kolen Cheung
@ 2016-10-27  7:05       ` Kolen Cheung
       [not found]         ` <0f0bc668-c454-4119-a62b-307e318553f8-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2 siblings, 1 reply; 39+ messages in thread
From: Kolen Cheung @ 2016-10-27  7:05 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1: Type: text/plain, Size: 1008 bytes --]

I found that the pandoc markdown writer will use unicode output if possible. Not only in the case like the trademark symbol `&trade;`, but also when non-breaking space is used. The markdown writer will output a unicode non-breaking space character rather than `\ `. Since the latter is more markdown-ish and is the recommended way of typing non-braking space in the manual, it seems the markdown writer should use that instead.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/0f0bc668-c454-4119-a62b-307e318553f8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found]         ` <0f0bc668-c454-4119-a62b-307e318553f8-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2016-10-27 19:02           ` John MacFarlane
       [not found]             ` <20161027190242.GD1044-jF64zX8BO091tJRe0FUodcM6rOWSkUom@public.gmane.org>
  2016-10-27 19:10           ` Jesse Rosenthal
  1 sibling, 1 reply; 39+ messages in thread
From: John MacFarlane @ 2016-10-27 19:02 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ Kolen Cheung [Oct 27 16 00:05 ]:
>I found that the pandoc markdown writer will use unicode output if possible. Not only in the case like the trademark symbol `&trade;`, but also when non-breaking space is used. The markdown writer will output a unicode non-breaking space character rather than `\ `. Since the latter is more markdown-ish and is the recommended way of typing non-braking space in the manual, it seems the markdown writer should use that instead.

I don't know if it's more markdown-ish.  The goal of getting
a text that reads naturally without special processing is
better met by using a unicode nonbreaking space.  The \ is
pretty ugly.  I don't think the manual recommends \ as
preferable to a literal nonbreaking space.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/20161027190242.GD1044%40MacBook-Air-2.local.
For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found]         ` <0f0bc668-c454-4119-a62b-307e318553f8-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2016-10-27 19:02           ` John MacFarlane
@ 2016-10-27 19:10           ` Jesse Rosenthal
       [not found]             ` <87bmy5txp1.fsf-4GNroTWusrE@public.gmane.org>
  1 sibling, 1 reply; 39+ messages in thread
From: Jesse Rosenthal @ 2016-10-27 19:10 UTC (permalink / raw)
  To: Kolen Cheung, pandoc-discuss


Kolen Cheung <christian.kolen-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> I found that the pandoc markdown writer will use unicode output if
> possible. Not only in the case like the trademark symbol `&trade;`,
> but also when non-breaking space is used. The markdown writer will
> output a unicode non-breaking space character rather than `\ `. Since
> the latter is more markdown-ish and is the recommended way of typing
> non-braking space in the manual, it seems the markdown writer should
> use that instead.

I prefer to keep my markdown files with ascii dashes, quotes, etc. So I
have a filter called `dumbDowner` that I run on files that I'm
converting to markdown. It doesn't replace a nbsp with a '\ ', but that
would certainly be possible. See below (it's in haskell, but I hope that
what it does should be pretty clear):

~~~{.haskell}
import Text.Pandoc.JSON


-- convert unicode chars into their dumb versions
dumbDownChar :: Char -> String
dumbDownChar '\160' = " "
dumbDownChar '\8211' = "--"
dumbDownChar '\8212' = "---"
dumbDownChar '\8230' = "..."
dumbDownChar '\8216' = "'"
dumbDownChar '\8217' = "'"
dumbDownChar '\8220' = "\""
dumbDownChar '\8221' = "\""
dumbDownChar c = [c]

-- convert an inline into a list of dumb inlines
dumbDown' :: Inline -> [Inline]
dumbDown' (Str cs) = [Str $ concatMap dumbDownChar cs]
dumbDown' (Quoted SingleQuote ils) = [Str "'"] ++ ils ++ [Str "'"]
dumbDown' (Quoted DoubleQuote ils) = [Str "\""] ++ ils ++ [Str "\""]
dumbDown' il = [il]

-- do the conversion if it's going out to a lightweight markup format.
dumbDown :: Maybe Format -> Inline -> [Inline]
dumbDown (Just fmt) il | Format f <- fmt =
  if f `elem` ["markdown", "plain", "textile", "org", "rst"]
  then dumbDown' il
  else [il]

main = toJSONFilter dumbDown
~~~


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found]             ` <20161027190242.GD1044-jF64zX8BO091tJRe0FUodcM6rOWSkUom@public.gmane.org>
@ 2016-10-28  0:58               ` Kolen Cheung
       [not found]                 ` <cbf3c105-241b-45de-8519-8962cadda270-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2016-10-28  3:45               ` Kolen Cheung
  2016-10-29  9:33               ` Kolen Cheung
  2 siblings, 1 reply; 39+ messages in thread
From: Kolen Cheung @ 2016-10-28  0:58 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1.1: Type: text/plain, Size: 2795 bytes --]

On Thursday, October 27, 2016 at 12:02:43 PM UTC-7, John MacFarlane wrote: 
+++ Kolen Cheung [Oct 27 16 00:05 ]: 

I found that the pandoc markdown writer will use unicode output if 
possible. Not only in the case like the trademark symbol &trade;, but also 
when non-breaking space is used. The markdown writer will output a unicode 
non-breaking space character rather than \. Since the latter is more 
markdown-ish and is the recommended way of typing non-braking space in the 
manual, it seems the markdown writer should use that instead. 

I don’t know if it’s more markdown-ish. The goal of getting a text that 
reads naturally without special processing is better met by using a unicode 
nonbreaking space. The \ is pretty ugly. I don’t think the manual 
recommends \ as preferable to a literal nonbreaking space. 

I was referring to:

If you just want a regular inline image, just make sure it is not the only 
thing in the paragraph. One way to do this is to insert a nonbreaking space 
after the image:

![This image won't be a figure](/url/of/image.png)\

If the unicode non-breaking space is recommended, it should reads ![This 
image won't be a figure](/url/of/image.png).

I would say a unicode non-breaking space actually doesn’t quite “read 
naturally”. Since it requires me to turn on features in my text editor to 
show invisible characters, and even then the non-breaking space and 
“normal” space looks almost the same with only a different shade. (It spent 
me quite some time to realize that. I thought pandoc was converting \ into `, 
but it actually convert it into `, which looks probably identical here.)

Inspired by the reply of Jesse Rosenthal, and what --smart do for any 
writers, may be there should be an --unsmart option for the markdown writer 
that will represent characters in pure ASCII whenever possible.

I actually have another related problem with the markdown writer: currently 
if there’s string with intraword_underscores, after the pandoc -t 
markdown... -f markdown..., it becomes intraword\_underscores. In this case 
it wrote an unnecessary escape character, which makes it reads less 
naturally.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/cbf3c105-241b-45de-8519-8962cadda270%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 15159 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found]             ` <87bmy5txp1.fsf-4GNroTWusrE@public.gmane.org>
@ 2016-10-28  1:21               ` Kolen Cheung
       [not found]                 ` <7e8b352f-1df9-4a92-81df-10359475f869-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Kolen Cheung @ 2016-10-28  1:21 UTC (permalink / raw)
  To: pandoc-discuss; +Cc: christian.kolen-Re5JQEeQqe8AvxtiuMwx3w


[-- Attachment #1.1: Type: text/plain, Size: 3471 bytes --]



Thanks! I have a similar script in ickc/markdown-variants/unSmartyPants.sh 
<https://github.com/ickc/markdown-variants/blob/master/bin/unSmartyPants.sh>. 
I use it to process files that has already been converted to markdown.

It seems yours will be very useful to add as a filter during the conversion 
to markdown. Did you host it somewhere already? If not, I suggest hosting 
it. I hope to better document available pandoc filters, and started a 
collaborative spread sheet in pandoc-filters - Google Sheets 
<https://docs.google.com/spreadsheets/d/1eqMwPyxT0rN3z_tXpsISGBys0QR25W0x-tYDRsFBKAE/edit#gid=0>. 
Even very simple filters that can be serves as both a productivity tip and 
an example on writing filters.

By the way, may I ask why you use the unicode number but not the unicode 
character itself?

On Thursday, October 27, 2016 at 12:10:36 PM UTC-7, Jesse Rosenthal wrote:


> Kolen Cheung <christi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <javascript:>> writes: 
>
> > I found that the pandoc markdown writer will use unicode output if 
> > possible. Not only in the case like the trademark symbol `&trade;`, 
> > but also when non-breaking space is used. The markdown writer will 
> > output a unicode non-breaking space character rather than `\ `. Since 
> > the latter is more markdown-ish and is the recommended way of typing 
> > non-braking space in the manual, it seems the markdown writer should 
> > use that instead. 
>
> I prefer to keep my markdown files with ascii dashes, quotes, etc. So I 
> have a filter called `dumbDowner` that I run on files that I'm 
> converting to markdown. It doesn't replace a nbsp with a '\ ', but that 
> would certainly be possible. See below (it's in haskell, but I hope that 
> what it does should be pretty clear): 
>
> ~~~{.haskell} 
> import Text.Pandoc.JSON 
>
>
> -- convert unicode chars into their dumb versions 
> dumbDownChar :: Char -> String 
> dumbDownChar '\160' = " " 
> dumbDownChar '\8211' = "--" 
> dumbDownChar '\8212' = "---" 
> dumbDownChar '\8230' = "..." 
> dumbDownChar '\8216' = "'" 
> dumbDownChar '\8217' = "'" 
> dumbDownChar '\8220' = "\"" 
> dumbDownChar '\8221' = "\"" 
> dumbDownChar c = [c] 
>
> -- convert an inline into a list of dumb inlines 
> dumbDown' :: Inline -> [Inline] 
> dumbDown' (Str cs) = [Str $ concatMap dumbDownChar cs] 
> dumbDown' (Quoted SingleQuote ils) = [Str "'"] ++ ils ++ [Str "'"] 
> dumbDown' (Quoted DoubleQuote ils) = [Str "\""] ++ ils ++ [Str "\""] 
> dumbDown' il = [il] 
>
> -- do the conversion if it's going out to a lightweight markup format. 
> dumbDown :: Maybe Format -> Inline -> [Inline] 
> dumbDown (Just fmt) il | Format f <- fmt = 
>   if f `elem` ["markdown", "plain", "textile", "org", "rst"] 
>   then dumbDown' il 
>   else [il] 
>
> main = toJSONFilter dumbDown 
> ~~~ 
>


-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/7e8b352f-1df9-4a92-81df-10359475f869%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 9857 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found]                 ` <7e8b352f-1df9-4a92-81df-10359475f869-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2016-10-28  1:31                   ` Sergio Correia
       [not found]                     ` <4e8bc6eb-5f42-4db1-bf6b-2b2fb44482c1-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Sergio Correia @ 2016-10-28  1:31 UTC (permalink / raw)
  To: pandoc-discuss; +Cc: christian.kolen-Re5JQEeQqe8AvxtiuMwx3w


[-- Attachment #1.1: Type: text/plain, Size: 1888 bytes --]

I think an useful filter would be one that efficiently converts unicode to 
html entities (and latex, etc.) , and viceversa.

If you are interested, this issue 
<https://github.com/mmechtley/pandoc-filter-test/issues/1> might be useful, 
as well as this XML file 
<http://stackoverflow.com/questions/2354067/map-between-latex-commands-and-unicode-points/2356160#2356160> 
that maps unicode-html-latex



On Thursday, October 27, 2016 at 9:21:58 PM UTC-4, Kolen Cheung wrote:
>
> Thanks! I have a similar script in ickc/markdown-variants/unSmartyPants.sh 
> <https://github.com/ickc/markdown-variants/blob/master/bin/unSmartyPants.sh>. 
> I use it to process files that has already been converted to markdown.
>
> It seems yours will be very useful to add as a filter during the 
> conversion to markdown. Did you host it somewhere already? If not, I 
> suggest hosting it. I hope to better document available pandoc filters, and 
> started a collaborative spread sheet in pandoc-filters - Google Sheets 
> <https://docs.google.com/spreadsheets/d/1eqMwPyxT0rN3z_tXpsISGBys0QR25W0x-tYDRsFBKAE/edit#gid=0>. 
> Even very simple filters that can be serves as both a productivity tip and 
> an example on writing filters.
>
> By the way, may I ask why you use the unicode number but not the unicode 
> character itself?
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/4e8bc6eb-5f42-4db1-bf6b-2b2fb44482c1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 3781 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found]                     ` <4e8bc6eb-5f42-4db1-bf6b-2b2fb44482c1-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2016-10-28  1:53                       ` Kolen Cheung
  2016-10-28  8:23                       ` John MacFarlane
  1 sibling, 0 replies; 39+ messages in thread
From: Kolen Cheung @ 2016-10-28  1:53 UTC (permalink / raw)
  To: pandoc-discuss; +Cc: christian.kolen-Re5JQEeQqe8AvxtiuMwx3w


[-- Attachment #1.1: Type: text/plain, Size: 2897 bytes --]



Interesting. I once wanted to type Greek in math using just Greek 
characters. But then relying on XeLaTeX and unicode-math seems too 
restrictive (especially when one has no control on the LaTeX engine to 
use). LaTeX 3 might be a hope, but in the far future. If the filter you 
referred to becomes mature, it will be very helpful to disentangle the 
requirements between the output and the source.

FYI, I have a short script that kind of do that in 
ickc/markdown-variants/unicode-to-math.sh 
<https://github.com/ickc/markdown-variants/blob/master/bin/unicode-to-math.sh>. 
It is not designed to be general purpose however but for dealing with some 
messy .md converted from ancient .doc files (which converted to docx by 
Word first).

On Thursday, October 27, 2016 at 6:31:44 PM UTC-7, Sergio Correia wrote:

I think an useful filter would be one that efficiently converts unicode to 
> html entities (and latex, etc.) , and viceversa.
>
> If you are interested, this issue 
> <https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Fmmechtley%2Fpandoc-filter-test%2Fissues%2F1&sa=D&sntz=1&usg=AFQjCNEvfQlMezuFtLk1dmls6jbspSKpXw> 
> might be useful, as well as this XML file 
> <http://stackoverflow.com/questions/2354067/map-between-latex-commands-and-unicode-points/2356160#2356160> 
> that maps unicode-html-latex
>
>
>
> On Thursday, October 27, 2016 at 9:21:58 PM UTC-4, Kolen Cheung wrote:
>>
>> Thanks! I have a similar script in 
>> ickc/markdown-variants/unSmartyPants.sh 
>> <https://github.com/ickc/markdown-variants/blob/master/bin/unSmartyPants.sh>. 
>> I use it to process files that has already been converted to markdown.
>>
>> It seems yours will be very useful to add as a filter during the 
>> conversion to markdown. Did you host it somewhere already? If not, I 
>> suggest hosting it. I hope to better document available pandoc filters, and 
>> started a collaborative spread sheet in pandoc-filters - Google Sheets 
>> <https://docs.google.com/spreadsheets/d/1eqMwPyxT0rN3z_tXpsISGBys0QR25W0x-tYDRsFBKAE/edit#gid=0>. 
>> Even very simple filters that can be serves as both a productivity tip and 
>> an example on writing filters.
>>
>> By the way, may I ask why you use the unicode number but not the unicode 
>> character itself?
>>
> 

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/221e45d5-87af-4ad9-8338-214b304b3c69%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 13391 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found]             ` <20161027190242.GD1044-jF64zX8BO091tJRe0FUodcM6rOWSkUom@public.gmane.org>
  2016-10-28  0:58               ` Kolen Cheung
@ 2016-10-28  3:45               ` Kolen Cheung
       [not found]                 ` <6a504fbe-45c3-4221-ab15-0dc47b4591c7-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2016-10-29  9:33               ` Kolen Cheung
  2 siblings, 1 reply; 39+ messages in thread
From: Kolen Cheung @ 2016-10-28  3:45 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1860 bytes --]



Another case I found puzzling on how pandoc handle special character is

illegal
>block
>in
>pandoc

pandoc -s -o test-pandoc.md test.md will output:

illegal &gt;block &gt;in &gt;pandoc

The focus here is not the illegal block quote, but the > becomes $gt;. It 
is the least expected result, considering how pandoc turn $trade; into ™. 
It seems either > or \> is more reasonable.
On Thursday, October 27, 2016 at 12:02:43 PM UTC-7, John MacFarlane wrote:

+++ Kolen Cheung [Oct 27 16 00:05 ]: 
> >I found that the pandoc markdown writer will use unicode output if 
> possible. Not only in the case like the trademark symbol `&trade;`, but 
> also when non-breaking space is used. The markdown writer will output a 
> unicode non-breaking space character rather than `\ `. Since the latter is 
> more markdown-ish and is the recommended way of typing non-braking space in 
> the manual, it seems the markdown writer should use that instead. 
>
> I don't know if it's more markdown-ish.  The goal of getting 
> a text that reads naturally without special processing is 
> better met by using a unicode nonbreaking space.  The \ is 
> pretty ugly.  I don't think the manual recommends \ as 
> preferable to a literal nonbreaking space. 
>
> 

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/6a504fbe-45c3-4221-ab15-0dc47b4591c7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 9769 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found]                     ` <4e8bc6eb-5f42-4db1-bf6b-2b2fb44482c1-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2016-10-28  1:53                       ` Kolen Cheung
@ 2016-10-28  8:23                       ` John MacFarlane
  1 sibling, 0 replies; 39+ messages in thread
From: John MacFarlane @ 2016-10-28  8:23 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ Sergio Correia [Oct 27 16 18:31 ]:
>   I think an useful filter would be one that efficiently converts unicode
>   to html entities (and latex, etc.) , and viceversa.


Do you know about the --ascii option to pandoc?

% pandoc --ascii
“hi\ there”
^D
<p>&#8220;hi&#160;there&#8221;</p>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/20161028082339.GD4501%40MacBook-Air-2.local.
For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found]                 ` <6a504fbe-45c3-4221-ab15-0dc47b4591c7-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2016-10-28  8:25                   ` John MacFarlane
  0 siblings, 0 replies; 39+ messages in thread
From: John MacFarlane @ 2016-10-28  8:25 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ Kolen Cheung [Oct 27 16 20:45 ]:
>   Another case I found puzzling on how pandoc handle special character is
>illegal
>>block
>>in
>>pandoc
>
>   pandoc -s -o test-pandoc.md test.md will output:
>illegal &gt;block &gt;in &gt;pandoc
>
>   The focus here is not the illegal block quote, but the > becomes $gt;.
>   It is the least expected result, considering how pandoc turn $trade;
>   into ™. It seems either > or \> is more reasonable.

There are a few characters that must always be escaped as
entities in HTML:  <, >, &, ".

For others we use UTF-8.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/20161028082505.GE4501%40MacBook-Air-2.local.
For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found]                 ` <cbf3c105-241b-45de-8519-8962cadda270-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2016-10-28 13:15                   ` BP Jonsson
       [not found]                     ` <CAFC_yuSXMWATCa0GFO0Y94H0PpFXcShGMLwEEaHBqutpxuLSiw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: BP Jonsson @ 2016-10-28 13:15 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 4247 bytes --]

The way I see it `\<space>` is preferred for input and U+00a0 for output
which IMO is perfectly sensible. I have configured Vim to distinguish nbsp
no-break hyphen, soft hyphen, dashes and a few other things with
highlighting.
If I want to convert U+00a0 into `\<space>` I just do `:%s/\%xa0/\\ /g`. I
also have a perl script which converts non-ASCII characters to entities,
selecting them by regex, thus by codepoint, range, general category, block
or whatever properties perl regexes support (which are very many).

Den 28 okt 2016 02:58 skrev "Kolen Cheung" <christian.kolen-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:

> On Thursday, October 27, 2016 at 12:02:43 PM UTC-7, John MacFarlane wrote:
> +++ Kolen Cheung [Oct 27 16 00:05 ]:
>
> I found that the pandoc markdown writer will use unicode output if
> possible. Not only in the case like the trademark symbol &trade;, but
> also when non-breaking space is used. The markdown writer will output a
> unicode non-breaking space character rather than \. Since the latter is
> more markdown-ish and is the recommended way of typing non-braking space in
> the manual, it seems the markdown writer should use that instead.
>
> I don’t know if it’s more markdown-ish. The goal of getting a text that
> reads naturally without special processing is better met by using a unicode
> nonbreaking space. The \ is pretty ugly. I don’t think the manual
> recommends \ as preferable to a literal nonbreaking space.
>
> I was referring to:
>
> If you just want a regular inline image, just make sure it is not the only
> thing in the paragraph. One way to do this is to insert a nonbreaking space
> after the image:
>
> ![This image won't be a figure](/url/of/image.png)\
>
> If the unicode non-breaking space is recommended, it should reads ![This
> image won't be a figure](/url/of/image.png).
>
> I would say a unicode non-breaking space actually doesn’t quite “read
> naturally”. Since it requires me to turn on features in my text editor to
> show invisible characters, and even then the non-breaking space and
> “normal” space looks almost the same with only a different shade. (It spent
> me quite some time to realize that. I thought pandoc was converting \
> into `, but it actually convert it into `, which looks probably identical
> here.)
>
> Inspired by the reply of Jesse Rosenthal, and what --smart do for any
> writers, may be there should be an --unsmart option for the markdown
> writer that will represent characters in pure ASCII whenever possible.
>
> I actually have another related problem with the markdown writer:
> currently if there’s string with intraword_underscores, after the pandoc
> -t markdown... -f markdown..., it becomes intraword\_underscores. In this
> case it wrote an unnecessary escape character, which makes it reads less
> naturally.
> 
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/pandoc-discuss/cbf3c105-241b-45de-8519-8962cadda270%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/pandoc-discuss/cbf3c105-241b-45de-8519-8962cadda270%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuSXMWATCa0GFO0Y94H0PpFXcShGMLwEEaHBqutpxuLSiw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 16502 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found]                     ` <CAFC_yuSXMWATCa0GFO0Y94H0PpFXcShGMLwEEaHBqutpxuLSiw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-10-28 20:45                       ` Kolen Cheung
       [not found]                         ` <10bddc3d-e533-44bd-8d8c-5b132e56a57f-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Kolen Cheung @ 2016-10-28 20:45 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 5418 bytes --]



I try to use grep --color='auto' -P -H -n "[^\x00-\x7F]" to highlight all 
the non-ASCII characters. I puzzled by the output (on hundreds of files) 
for some time because on the screen it printed out a lot of lines with 
nothing highlighted. I debugged it quite a while, until I later discovered 
(on a different circumstance: add non-breaking space to the end of image to 
force non-implicit figure) those are actually unicode non-breaking space.

Now that this is “discovered”, it will be easy to write a script to regex 
it back. (And I put it under the unSmartyPants.sh 
<https://github.com/ickc/markdown-variants/blob/master/bin/unSmartyPants.sh> 
umbrella (although technically has nothing to do with SmartyPants).

After @jgm mentioned --ascii, it got me thinking if the function of this 
option could be expanded to cover other output (currently HTML output 
only), e.g. markdown (that use “native” ASCII before using HTML ASCII); or 
even LaTeX.

On Friday, October 28, 2016 at 6:15:17 AM UTC-7, BP Jonsson wrote:

The way I see it `\<space>` is preferred for input and U+00a0 for output 
> which IMO is perfectly sensible. I have configured Vim to distinguish nbsp 
> no-break hyphen, soft hyphen, dashes and a few other things with 
> highlighting.
> If I want to convert U+00a0 into `\<space>` I just do `:%s/\%xa0/\\ /g`. I 
> also have a perl script which converts non-ASCII characters to entities, 
> selecting them by regex, thus by codepoint, range, general category, block 
> or whatever properties perl regexes support (which are very many).
>
> Den 28 okt 2016 02:58 skrev "Kolen Cheung" <christi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org 
> <javascript:>>:
>
>> On Thursday, October 27, 2016 at 12:02:43 PM UTC-7, John MacFarlane 
>> wrote: +++ Kolen Cheung [Oct 27 16 00:05 ]: 
>>
>> I found that the pandoc markdown writer will use unicode output if 
>> possible. Not only in the case like the trademark symbol &trade;, but 
>> also when non-breaking space is used. The markdown writer will output a 
>> unicode non-breaking space character rather than \. Since the latter is 
>> more markdown-ish and is the recommended way of typing non-braking space in 
>> the manual, it seems the markdown writer should use that instead. 
>>
>> I don’t know if it’s more markdown-ish. The goal of getting a text that 
>> reads naturally without special processing is better met by using a unicode 
>> nonbreaking space. The \ is pretty ugly. I don’t think the manual 
>> recommends \ as preferable to a literal nonbreaking space. 
>>
>> I was referring to:
>>
>> If you just want a regular inline image, just make sure it is not the 
>> only thing in the paragraph. One way to do this is to insert a nonbreaking 
>> space after the image:
>>
>> ![This image won't be a figure](/url/of/image.png)\
>>
>> If the unicode non-breaking space is recommended, it should reads ![This 
>> image won't be a figure](/url/of/image.png).
>>
>> I would say a unicode non-breaking space actually doesn’t quite “read 
>> naturally”. Since it requires me to turn on features in my text editor to 
>> show invisible characters, and even then the non-breaking space and 
>> “normal” space looks almost the same with only a different shade. (It spent 
>> me quite some time to realize that. I thought pandoc was converting \ 
>> into `, but it actually convert it into `, which looks probably 
>> identical here.)
>>
>> Inspired by the reply of Jesse Rosenthal, and what --smart do for any 
>> writers, may be there should be an --unsmart option for the markdown 
>> writer that will represent characters in pure ASCII whenever possible.
>>
>> I actually have another related problem with the markdown writer: 
>> currently if there’s string with intraword_underscores, after the pandoc 
>> -t markdown... -f markdown..., it becomes intraword\_underscores. In 
>> this case it wrote an unnecessary escape character, which makes it reads 
>> less naturally.
>> 
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>.
>> To post to this group, send email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org 
>> <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/pandoc-discuss/cbf3c105-241b-45de-8519-8962cadda270%40googlegroups.com 
>> <https://groups.google.com/d/msgid/pandoc-discuss/cbf3c105-241b-45de-8519-8962cadda270%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
> 

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/10bddc3d-e533-44bd-8d8c-5b132e56a57f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 42801 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found]                         ` <10bddc3d-e533-44bd-8d8c-5b132e56a57f-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2016-10-29  6:19                           ` John MacFarlane
       [not found]                             ` <20161029061904.GF7496-jF64zX8BO091tJRe0FUodcM6rOWSkUom@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: John MacFarlane @ 2016-10-29  6:19 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ Kolen Cheung [Oct 28 16 13:45 ]:
>   After @jgm mentioned --ascii, it got me thinking if the function of
>   this option could be expanded to cover other output (currently HTML
>   output only), e.g. markdown (that use “native” ASCII before using HTML
>   ASCII); or even LaTeX.

Supporting --ascii in Markdown would be feasible, since we
can use entities.  (This is ugly, though!)

Supporting it in LaTeX would not in general be possible.
For some western european accented characters there are
TeX control sequences, like \"a, but in general there's
no way to get ascii equivalents of arbitrary unicode
characters.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/20161029061904.GF7496%40MacBook-Air-2.local.
For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found]                             ` <20161029061904.GF7496-jF64zX8BO091tJRe0FUodcM6rOWSkUom@public.gmane.org>
@ 2016-10-29  6:53                               ` Kolen Cheung
       [not found]                                 ` <45d24e60-5523-4bbe-8c9f-a49e53583198-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Kolen Cheung @ 2016-10-29  6:53 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1726 bytes --]



But actually what I want to do is not exactly “entities” in markdown. 
Essentially I’m talking about “un-SmartyPants” so that markdown writers 
will use the character sequence *before* SmartyPaints. i.e. non-breaking 
space is \, em-dash is ---. So it would be best described by --unsmart vs. 
--smart but not really related to --ascii.

I also agree leaving the markdown source using “entities” is ugly. I am 
rethinking my strategy in my project to relax that requirement. Right now 
I’m using [^™[:ascii:]] to check for “illegal” character. And by the way, 
interestingly, grep would regard Ampère and Schrödinger as [:ascii:]. Only 
[^\x00-\x7F] would detect them.

On Friday, October 28, 2016 at 11:19:09 PM UTC-7, John MacFarlane wrote:

Supporting —ascii in Markdown would be feasible, since we can use entities. 
(This is ugly, though!) 

Supporting it in LaTeX would not in general be possible. For some western 
european accented characters there are TeX control sequences, like \”a, but 
in general there’s no way to get ascii equivalents of arbitrary unicode 
characters. 



-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/45d24e60-5523-4bbe-8c9f-a49e53583198%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 11802 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found]                                 ` <45d24e60-5523-4bbe-8c9f-a49e53583198-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2016-10-29  7:01                                   ` Kolen Cheung
  0 siblings, 0 replies; 39+ messages in thread
From: Kolen Cheung @ 2016-10-29  7:01 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2501 bytes --]



Ah, sorry I might have mixed up the issue here. I said before but mixed it 
again: non-breaking space using backslash escape has nothing to do with 
SmartyPants. But you get the idea—some special sequence means a special 
character in markdown, and whenever they could be done, they should be 
written in that way. But I think this way of thinking is not really about 
markdown (like how easy to read) but about source code (enforcing a style). 
It would be great if such kind of --unsmart or --ascii option for markdown 
writer would be created, but I’m ok to settle for the regex I used since it 
is a very simple one-one correspondence.

On Friday, October 28, 2016 at 11:53:13 PM UTC-7, Kolen Cheung wrote:

But actually what I want to do is not exactly “entities” in markdown. 
> Essentially I’m talking about “un-SmartyPants” so that markdown writers 
> will use the character sequence *before* SmartyPaints. i.e. non-breaking 
> space is \, em-dash is ---. So it would be best described by --unsmart 
> vs. --smart but not really related to --ascii.
>
> I also agree leaving the markdown source using “entities” is ugly. I am 
> rethinking my strategy in my project to relax that requirement. Right now 
> I’m using [^™[:ascii:]] to check for “illegal” character. And by the way, 
> interestingly, grep would regard Ampère and Schrödinger as [:ascii:]. 
> Only [^\x00-\x7F] would detect them.
>
> On Friday, October 28, 2016 at 11:19:09 PM UTC-7, John MacFarlane wrote:
>
> Supporting —ascii in Markdown would be feasible, since we can use 
> entities. (This is ugly, though!) 
>
> Supporting it in LaTeX would not in general be possible. For some western 
> european accented characters there are TeX control sequences, like \”a, but 
> in general there’s no way to get ascii equivalents of arbitrary unicode 
> characters. 
>
> 
>


-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e5732eff-d1b9-4f60-95a6-80ffc4278b9f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 28375 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found]             ` <20161027190242.GD1044-jF64zX8BO091tJRe0FUodcM6rOWSkUom@public.gmane.org>
  2016-10-28  0:58               ` Kolen Cheung
  2016-10-28  3:45               ` Kolen Cheung
@ 2016-10-29  9:33               ` Kolen Cheung
       [not found]                 ` <565f0a35-b5d3-45b3-8cde-e0c9dfe0ca3b-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2 siblings, 1 reply; 39+ messages in thread
From: Kolen Cheung @ 2016-10-29  9:33 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2550 bytes --]



I just encounter a problem kind of related:

I’m writing a README for the project, having a line like this:

End shortform in non-breaking space, like this: `e.g.\ `, `i.e.\ `. The backslash escaped space, `\ `...

However I found the space got eaten, and found a documentation on this 
behavior in the manual:

(The spaces after the opening backticks and before the closing backticks 
will be ignored.)

Are there any way to get around this? e.g. I notice that ending with space 
in code block is ok.

Code used to test:

# printf "%s\n\n" '`e.g.\ `' | pandoc -f markdown -t native
[Para [Code ("",[],[]) "e.g.\\"]]# printf "%s\n\n" '``e.g.\ ``' | pandoc -f markdown -t native
[Para [Code ("",[],[]) "e.g.\\"]]# printf "%s\n\n" '```e.g.\ ```' | pandoc -f markdown -t native
[Para [Code ("",[],[]) "e.g.\\"]]# printf "%s\n\n" '    end with a space \ ' | pandoc -f markdown -t native
[CodeBlock ("",[],[]) "end with a space \\ "]# printf "%s\n\n" '```' 'end with a space \ ' '```' | pandoc -f markdown -t native
[CodeBlock ("",[],[]) "\nend with a space \\ \n"]

On Thursday, October 27, 2016 at 12:02:43 PM UTC-7, John MacFarlane wrote:

+++ Kolen Cheung [Oct 27 16 00:05 ]: 
> >I found that the pandoc markdown writer will use unicode output if 
> possible. Not only in the case like the trademark symbol `&trade;`, but 
> also when non-breaking space is used. The markdown writer will output a 
> unicode non-breaking space character rather than `\ `. Since the latter is 
> more markdown-ish and is the recommended way of typing non-braking space in 
> the manual, it seems the markdown writer should use that instead. 
>
> I don't know if it's more markdown-ish.  The goal of getting 
> a text that reads naturally without special processing is 
> better met by using a unicode nonbreaking space.  The \ is 
> pretty ugly.  I don't think the manual recommends \ as 
> preferable to a literal nonbreaking space. 
>
> 

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/565f0a35-b5d3-45b3-8cde-e0c9dfe0ca3b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 11889 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found]                 ` <565f0a35-b5d3-45b3-8cde-e0c9dfe0ca3b-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2016-10-29 18:54                   ` John MacFarlane
       [not found]                     ` <20161029185445.GE5364-jF64zX8BO091tJRe0FUodcM6rOWSkUom@public.gmane.org>
  2016-10-29 18:55                   ` John MacFarlane
  1 sibling, 1 reply; 39+ messages in thread
From: John MacFarlane @ 2016-10-29 18:54 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

This space collapsing is unfortunately part of the Markdown
syntax description.  There's no way around it that I can
think of.

+++ Kolen Cheung [Oct 29 16 02:33 ]:
>   I just encounter a problem kind of related:
>
>   I’m writing a README for the project, having a line like this:
>End shortform in non-breaking space, like this: `e.g.\ `, `i.e.\ `. The backslas
>h escaped space, `\ `...
>
>   However I found the space got eaten, and found a documentation on this
>   behavior in the manual:
>
>     (The spaces after the opening backticks and before the closing
>     backticks will be ignored.)
>
>   Are there any way to get around this? e.g. I notice that ending with
>   space in code block is ok.
>
>   Code used to test:
># printf "%s\n\n" '`e.g.\ `' | pandoc -f markdown -t native
>[Para [Code ("",[],[]) "e.g.\\"]]
># printf "%s\n\n" '``e.g.\ ``' | pandoc -f markdown -t native
>[Para [Code ("",[],[]) "e.g.\\"]]
># printf "%s\n\n" '```e.g.\ ```' | pandoc -f markdown -t native
>[Para [Code ("",[],[]) "e.g.\\"]]
># printf "%s\n\n" '    end with a space \ ' | pandoc -f markdown -t native
>[CodeBlock ("",[],[]) "end with a space \\ "]
># printf "%s\n\n" '```' 'end with a space \ ' '```' | pandoc -f markdown -t nati
>ve
>[CodeBlock ("",[],[]) "\nend with a space \\ \n"]
>
>   On Thursday, October 27, 2016 at 12:02:43 PM UTC-7, John MacFarlane
>   wrote:
>
>     +++ Kolen Cheung [Oct 27 16 00:05 ]:
>     >I found that the pandoc markdown writer will use unicode output if
>     possible. Not only in the case like the trademark symbol `&trade;`,
>     but also when non-breaking space is used. The markdown writer will
>     output a unicode non-breaking space character rather than `\ `.
>     Since the latter is more markdown-ish and is the recommended way of
>     typing non-braking space in the manual, it seems the markdown writer
>     should use that instead.
>     I don't know if it's more markdown-ish.  The goal of getting
>     a text that reads naturally without special processing is
>     better met by using a unicode nonbreaking space.  The \ is
>     pretty ugly.  I don't think the manual recommends \ as
>     preferable to a literal nonbreaking space.
>
>   
>
>   --
>   You received this message because you are subscribed to the Google
>   Groups "pandoc-discuss" group.
>   To unsubscribe from this group and stop receiving emails from it, send
>   an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>   To post to this group, send email to
>   [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>   To view this discussion on the web visit
>   [3]https://groups.google.com/d/msgid/pandoc-discuss/565f0a35-b5d3-45b3-
>   8cde-e0c9dfe0ca3b%40googlegroups.com.
>   For more options, visit [4]https://groups.google.com/d/optout.
>
>References
>
>   1. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>   2. mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>   3. https://groups.google.com/d/msgid/pandoc-discuss/565f0a35-b5d3-45b3-8cde-e0c9dfe0ca3b-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=email&utm_source=footer
>   4. https://groups.google.com/d/optout

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/20161029185445.GE5364%40MacBook-Air-2.local.
For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found]                 ` <565f0a35-b5d3-45b3-8cde-e0c9dfe0ca3b-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2016-10-29 18:54                   ` John MacFarlane
@ 2016-10-29 18:55                   ` John MacFarlane
  1 sibling, 0 replies; 39+ messages in thread
From: John MacFarlane @ 2016-10-29 18:55 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Some relevant discussion here:
https://talk.commonmark.org/t/leading-and-trailing-white-spaces-in-code-blocks/628/3

+++ Kolen Cheung [Oct 29 16 02:33 ]:
>   I just encounter a problem kind of related:
>
>   I’m writing a README for the project, having a line like this:
>End shortform in non-breaking space, like this: `e.g.\ `, `i.e.\ `. The backslas
>h escaped space, `\ `...
>
>   However I found the space got eaten, and found a documentation on this
>   behavior in the manual:
>
>     (The spaces after the opening backticks and before the closing
>     backticks will be ignored.)
>
>   Are there any way to get around this? e.g. I notice that ending with
>   space in code block is ok.
>
>   Code used to test:
># printf "%s\n\n" '`e.g.\ `' | pandoc -f markdown -t native
>[Para [Code ("",[],[]) "e.g.\\"]]
># printf "%s\n\n" '``e.g.\ ``' | pandoc -f markdown -t native
>[Para [Code ("",[],[]) "e.g.\\"]]
># printf "%s\n\n" '```e.g.\ ```' | pandoc -f markdown -t native
>[Para [Code ("",[],[]) "e.g.\\"]]
># printf "%s\n\n" '    end with a space \ ' | pandoc -f markdown -t native
>[CodeBlock ("",[],[]) "end with a space \\ "]
># printf "%s\n\n" '```' 'end with a space \ ' '```' | pandoc -f markdown -t nati
>ve
>[CodeBlock ("",[],[]) "\nend with a space \\ \n"]
>
>   On Thursday, October 27, 2016 at 12:02:43 PM UTC-7, John MacFarlane
>   wrote:
>
>     +++ Kolen Cheung [Oct 27 16 00:05 ]:
>     >I found that the pandoc markdown writer will use unicode output if
>     possible. Not only in the case like the trademark symbol `&trade;`,
>     but also when non-breaking space is used. The markdown writer will
>     output a unicode non-breaking space character rather than `\ `.
>     Since the latter is more markdown-ish and is the recommended way of
>     typing non-braking space in the manual, it seems the markdown writer
>     should use that instead.
>     I don't know if it's more markdown-ish.  The goal of getting
>     a text that reads naturally without special processing is
>     better met by using a unicode nonbreaking space.  The \ is
>     pretty ugly.  I don't think the manual recommends \ as
>     preferable to a literal nonbreaking space.
>
>   
>
>   --
>   You received this message because you are subscribed to the Google
>   Groups "pandoc-discuss" group.
>   To unsubscribe from this group and stop receiving emails from it, send
>   an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>   To post to this group, send email to
>   [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>   To view this discussion on the web visit
>   [3]https://groups.google.com/d/msgid/pandoc-discuss/565f0a35-b5d3-45b3-
>   8cde-e0c9dfe0ca3b%40googlegroups.com.
>   For more options, visit [4]https://groups.google.com/d/optout.
>
>References
>
>   1. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>   2. mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>   3. https://groups.google.com/d/msgid/pandoc-discuss/565f0a35-b5d3-45b3-8cde-e0c9dfe0ca3b-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=email&utm_source=footer
>   4. https://groups.google.com/d/optout

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/20161029185533.GF5364%40MacBook-Air-2.local.
For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found]                     ` <20161029185445.GE5364-jF64zX8BO091tJRe0FUodcM6rOWSkUom@public.gmane.org>
@ 2016-10-29 19:27                       ` Kolen Cheung
       [not found]                         ` <af7d7c17-b985-4370-b5c7-872433996afd-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Kolen Cheung @ 2016-10-29 19:27 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 5318 bytes --]



Thanks. I will just lengthen my example so that it doesn’t end on a space.

Going back to the point to using pandoc from markdown to markdown to 
enforce a style: it is a prerequisite to apply pandoc filters on the 
source. For example, I am updating my project to use 
--top-level-division=part. For this I need to increment the header levels 
by 1. As suggested in Pandoc - Scripting with pandoc 
<http://pandoc.org/scripting.html>, using regex could mess somethings else 
up. But in order to write and use a pandoc filter for this task, one needs 
to make sure the read/write cycle don’t change other things else.

As I said earlier, expecting read/write cycle to be an identify is 
unreasonable, but as long as it is idempotent it would make this use 
possible. There’s only need to be a commit using a read/write cycle to 
enforce a style, then apply the said hypothetical filter to change header 
level only, making the commits a lot cleaner.

So, in addition to control freaks like me that want to enforce a style on 
the source, being able to do that also has other real world use case.

On Saturday, October 29, 2016 at 11:54:52 AM UTC-7, John MacFarlane wrote:

This space collapsing is unfortunately part of the Markdown 
> syntax description.  There's no way around it that I can 
> think of. 
>
> +++ Kolen Cheung [Oct 29 16 02:33 ]: 
> >   I just encounter a problem kind of related: 
> > 
> >   I’m writing a README for the project, having a line like this: 
> >End shortform in non-breaking space, like this: `e.g.\ `, `i.e.\ `. The 
> backslas 
> >h escaped space, `\ `... 
> > 
> >   However I found the space got eaten, and found a documentation on this 
> >   behavior in the manual: 
> > 
> >     (The spaces after the opening backticks and before the closing 
> >     backticks will be ignored.) 
> > 
> >   Are there any way to get around this? e.g. I notice that ending with 
> >   space in code block is ok. 
> > 
> >   Code used to test: 
> ># printf "%s\n\n" '`e.g.\ `' | pandoc -f markdown -t native 
> >[Para [Code ("",[],[]) "e.g.\\"]] 
> ># printf "%s\n\n" '``e.g.\ ``' | pandoc -f markdown -t native 
> >[Para [Code ("",[],[]) "e.g.\\"]] 
> ># printf "%s\n\n" '```e.g.\ ```' | pandoc -f markdown -t native 
> >[Para [Code ("",[],[]) "e.g.\\"]] 
> ># printf "%s\n\n" '    end with a space \ ' | pandoc -f markdown -t 
> native 
> >[CodeBlock ("",[],[]) "end with a space \\ "] 
> ># printf "%s\n\n" '```' 'end with a space \ ' '```' | pandoc -f markdown 
> -t nati 
> >ve 
> >[CodeBlock ("",[],[]) "\nend with a space \\ \n"] 
> > 
> >   On Thursday, October 27, 2016 at 12:02:43 PM UTC-7, John MacFarlane 
> >   wrote: 
> > 
> >     +++ Kolen Cheung [Oct 27 16 00:05 ]: 
> >     >I found that the pandoc markdown writer will use unicode output if 
> >     possible. Not only in the case like the trademark symbol `&trade;`, 
> >     but also when non-breaking space is used. The markdown writer will 
> >     output a unicode non-breaking space character rather than `\ `. 
> >     Since the latter is more markdown-ish and is the recommended way of 
> >     typing non-braking space in the manual, it seems the markdown writer 
> >     should use that instead. 
> >     I don't know if it's more markdown-ish.  The goal of getting 
> >     a text that reads naturally without special processing is 
> >     better met by using a unicode nonbreaking space.  The \ is 
> >     pretty ugly.  I don't think the manual recommends \ as 
> >     preferable to a literal nonbreaking space. 
> > 
> >    
> > 
> >   -- 
> >   You received this message because you are subscribed to the Google 
> >   Groups "pandoc-discuss" group. 
> >   To unsubscribe from this group and stop receiving emails from it, send 
> >   an email to [1]pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>. 
> >   To post to this group, send email to 
> >   [2]pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>. 
> >   To view this discussion on the web visit 
> >   [3]
> https://groups.google.com/d/msgid/pandoc-discuss/565f0a35-b5d3-45b3- 
> >   8cde-e0c9dfe0ca3b%40googlegroups.com. 
> >   For more options, visit [4]https://groups.google.com/d/optout. 
> > 
> >References 
> > 
> >   1. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:> 
> >   2. mailto:pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:> 
> >   3. 
> https://groups.google.com/d/msgid/pandoc-discuss/565f0a35-b5d3-45b3-8cde-e0c9dfe0ca3b-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=email&utm_source=footer 
> >   4. https://groups.google.com/d/optout 
>
> 

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/af7d7c17-b985-4370-b5c7-872433996afd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 19431 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found]                         ` <af7d7c17-b985-4370-b5c7-872433996afd-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2016-10-29 21:03                           ` Melroch
       [not found]                             ` <CADAJKhDsYm4yaCen61Q4kKXpgL9oKgToMn=hT71g5UZrwDeWSA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2016-10-30  9:42                           ` John MacFarlane
  2016-11-08  7:25                           ` BP Jonsson
  2 siblings, 1 reply; 39+ messages in thread
From: Melroch @ 2016-10-29 21:03 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 6963 bytes --]

Incrementing the level of all headers with a filter would be easy. Note
however that Pandoc apparently is OK with headings of level 7 and higher. I
ran into that when converting a very long and detailed outliner document
from OPML to Markdown. Since that document didn't contain any ambiguous
lines I could use regex -- even a perl oneliner -- to convert too deep
headings to bullet lists, although I managed to break even LaTeX's limit on
list nesting depth in a few places!

You can of course have a commit hook which has pandoc convert your
documents from markdown to markdown running any number of filters. I will
certainly try it now that I've tought of it (and I hope that the Markdown
writer soon will be able to output bracket spans! :-)

Den 29 okt 2016 21:28 skrev "Kolen Cheung" <christian.kolen-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:

> Thanks. I will just lengthen my example so that it doesn’t end on a space.
>
> Going back to the point to using pandoc from markdown to markdown to
> enforce a style: it is a prerequisite to apply pandoc filters on the
> source. For example, I am updating my project to use
> --top-level-division=part. For this I need to increment the header levels
> by 1. As suggested in Pandoc - Scripting with pandoc
> <http://pandoc.org/scripting.html>, using regex could mess somethings
> else up. But in order to write and use a pandoc filter for this task, one
> needs to make sure the read/write cycle don’t change other things else.
>
> As I said earlier, expecting read/write cycle to be an identify is
> unreasonable, but as long as it is idempotent it would make this use
> possible. There’s only need to be a commit using a read/write cycle to
> enforce a style, then apply the said hypothetical filter to change header
> level only, making the commits a lot cleaner.
>
> So, in addition to control freaks like me that want to enforce a style on
> the source, being able to do that also has other real world use case.
>
> On Saturday, October 29, 2016 at 11:54:52 AM UTC-7, John MacFarlane wrote:
>
> This space collapsing is unfortunately part of the Markdown
>> syntax description.  There's no way around it that I can
>> think of.
>>
>> +++ Kolen Cheung [Oct 29 16 02:33 ]:
>> >   I just encounter a problem kind of related:
>> >
>> >   I’m writing a README for the project, having a line like this:
>> >End shortform in non-breaking space, like this: `e.g.\ `, `i.e.\ `. The
>> backslas
>> >h escaped space, `\ `...
>> >
>> >   However I found the space got eaten, and found a documentation on
>> this
>> >   behavior in the manual:
>> >
>> >     (The spaces after the opening backticks and before the closing
>> >     backticks will be ignored.)
>> >
>> >   Are there any way to get around this? e.g. I notice that ending with
>> >   space in code block is ok.
>> >
>> >   Code used to test:
>> ># printf "%s\n\n" '`e.g.\ `' | pandoc -f markdown -t native
>> >[Para [Code ("",[],[]) "e.g.\\"]]
>> ># printf "%s\n\n" '``e.g.\ ``' | pandoc -f markdown -t native
>> >[Para [Code ("",[],[]) "e.g.\\"]]
>> ># printf "%s\n\n" '```e.g.\ ```' | pandoc -f markdown -t native
>> >[Para [Code ("",[],[]) "e.g.\\"]]
>> ># printf "%s\n\n" '    end with a space \ ' | pandoc -f markdown -t
>> native
>> >[CodeBlock ("",[],[]) "end with a space \\ "]
>> ># printf "%s\n\n" '```' 'end with a space \ ' '```' | pandoc -f markdown
>> -t nati
>> >ve
>> >[CodeBlock ("",[],[]) "\nend with a space \\ \n"]
>> >
>> >   On Thursday, October 27, 2016 at 12:02:43 PM UTC-7, John MacFarlane
>> >   wrote:
>> >
>> >     +++ Kolen Cheung [Oct 27 16 00:05 ]:
>> >     >I found that the pandoc markdown writer will use unicode output if
>> >     possible. Not only in the case like the trademark symbol `&trade;`,
>> >     but also when non-breaking space is used. The markdown writer will
>> >     output a unicode non-breaking space character rather than `\ `.
>> >     Since the latter is more markdown-ish and is the recommended way of
>> >     typing non-braking space in the manual, it seems the markdown
>> writer
>> >     should use that instead.
>> >     I don't know if it's more markdown-ish.  The goal of getting
>> >     a text that reads naturally without special processing is
>> >     better met by using a unicode nonbreaking space.  The \ is
>> >     pretty ugly.  I don't think the manual recommends \ as
>> >     preferable to a literal nonbreaking space.
>> >
>> >   
>> >
>> >   --
>> >   You received this message because you are subscribed to the Google
>> >   Groups "pandoc-discuss" group.
>> >   To unsubscribe from this group and stop receiving emails from it,
>> send
>> >   an email to [1]pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> >   To post to this group, send email to
>> >   [2]pandoc-...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> >   To view this discussion on the web visit
>> >   [3]https://groups.google.com/d/msgid/pandoc-discuss/565f0a35
>> -b5d3-45b3-
>> >   8cde-e0c9dfe0ca3b%40googlegroups.com.
>> >   For more options, visit [4]https://groups.google.com/d/optout.
>> >
>> >References
>> >
>> >   1. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>> >   2. mailto:pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>> >   3. https://groups.google.com/d/msgid/pandoc-discuss/565f0a35-b5
>> d3-45b3-8cde-e0c9dfe0ca3b-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=email&
>> utm_source=footer
>> >   4. https://groups.google.com/d/optout
>>
>> 
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/pandoc-discuss/af7d7c17-b985-4370-b5c7-872433996afd%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/pandoc-discuss/af7d7c17-b985-4370-b5c7-872433996afd%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhDsYm4yaCen61Q4kKXpgL9oKgToMn%3DhT71g5UZrwDeWSA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 21302 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found]                             ` <CADAJKhDsYm4yaCen61Q4kKXpgL9oKgToMn=hT71g5UZrwDeWSA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2016-10-29 21:44                               ` Kolen Cheung
  0 siblings, 0 replies; 39+ messages in thread
From: Kolen Cheung @ 2016-10-29 21:44 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1: Type: text/plain, Size: 1186 bytes --]

I'm not talking about the possibility (or ease) of writing such a filter, but the practicality of writing such a filter (that output to markdown and overwrite the source) if the markdown source hasn't been "standardized" by the markdown writer. So if we dismiss the usefulness of using pandoc as a markdown styling tool, we also dismiss the usefulness of using pandoc's parser and filter system to act on the source.

The example I gave is suppose to be easy in both case: regex (on source) or filter (on AST). But (one of) the very nature of filter is to address the shortcoming of not having a parser.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/021f8bb7-3349-4d5b-859d-cb7f08e94893%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found]                         ` <af7d7c17-b985-4370-b5c7-872433996afd-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2016-10-29 21:03                           ` Melroch
@ 2016-10-30  9:42                           ` John MacFarlane
       [not found]                             ` <20161030094223.GH6690-jF64zX8BO091tJRe0FUodcM6rOWSkUom@public.gmane.org>
  2016-11-08  7:25                           ` BP Jonsson
  2 siblings, 1 reply; 39+ messages in thread
From: John MacFarlane @ 2016-10-30  9:42 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Achieving idempotency (write markdown == write markdown ->
read markdown -> write markdown) is actually not so easy.
Partly this is because unless we escape things VERY
aggressively, regular characters can turn into syntax.

In cmark (commonmark library), I worked hard at this and
managed to get idempotence for all the test cases in the
test suite (except for a few special cases).  But
pandoc's writer is not so close.

We can make steps towards this, but a guarantee is going
to be very tough.  I tried adding a QuickCheck property for
this, and each time I ran it, it falsified idempotency on
the first try...

+++ Kolen Cheung [Oct 29 16 12:27 ]:
>   Thanks. I will just lengthen my example so that it doesn’t end on a
>   space.
>
>   Going back to the point to using pandoc from markdown to markdown to
>   enforce a style: it is a prerequisite to apply pandoc filters on the
>   source. For example, I am updating my project to use
>   --top-level-division=part. For this I need to increment the header
>   levels by 1. As suggested in [1]Pandoc - Scripting with pandoc, using
>   regex could mess somethings else up. But in order to write and use a
>   pandoc filter for this task, one needs to make sure the read/write
>   cycle don’t change other things else.
>
>   As I said earlier, expecting read/write cycle to be an identify is
>   unreasonable, but as long as it is idempotent it would make this use
>   possible. There’s only need to be a commit using a read/write cycle to
>   enforce a style, then apply the said hypothetical filter to change
>   header level only, making the commits a lot cleaner.
>
>   So, in addition to control freaks like me that want to enforce a style
>   on the source, being able to do that also has other real world use
>   case.
>
>   On Saturday, October 29, 2016 at 11:54:52 AM UTC-7, John MacFarlane
>   wrote:
>
>     This space collapsing is unfortunately part of the Markdown
>     syntax description.  There's no way around it that I can
>     think of.
>     +++ Kolen Cheung [Oct 29 16 02:33 ]:
>     >   I just encounter a problem kind of related:
>     >
>     >   I’m writing a README for the project, having a line like this:
>     >End shortform in non-breaking space, like this: `e.g.\ `, `i.e.\ `.
>     The backslas
>     >h escaped space, `\ `...
>     >
>     >   However I found the space got eaten, and found a documentation
>     on this
>     >   behavior in the manual:
>     >
>     >     (The spaces after the opening backticks and before the closing
>     >     backticks will be ignored.)
>     >
>     >   Are there any way to get around this? e.g. I notice that ending
>     with
>     >   space in code block is ok.
>     >
>     >   Code used to test:
>     ># printf "%s\n\n" '`e.g.\ `' | pandoc -f markdown -t native
>     >[Para [Code ("",[],[]) "e.g.\\"]]
>     ># printf "%s\n\n" '``e.g.\ ``' | pandoc -f markdown -t native
>     >[Para [Code ("",[],[]) "e.g.\\"]]
>     ># printf "%s\n\n" '```e.g.\ ```' | pandoc -f markdown -t native
>     >[Para [Code ("",[],[]) "e.g.\\"]]
>     ># printf "%s\n\n" '    end with a space \ ' | pandoc -f markdown -t
>     native
>     >[CodeBlock ("",[],[]) "end with a space \\ "]
>     ># printf "%s\n\n" '```' 'end with a space \ ' '```' | pandoc -f
>     markdown -t nati
>     >ve
>     >[CodeBlock ("",[],[]) "\nend with a space \\ \n"]
>     >
>     >   On Thursday, October 27, 2016 at 12:02:43 PM UTC-7, John
>     MacFarlane
>     >   wrote:
>     >
>     >     +++ Kolen Cheung [Oct 27 16 00:05 ]:
>     >     >I found that the pandoc markdown writer will use unicode
>     output if
>     >     possible. Not only in the case like the trademark symbol
>     `&trade;`,
>     >     but also when non-breaking space is used. The markdown writer
>     will
>     >     output a unicode non-breaking space character rather than `\
>     `.
>     >     Since the latter is more markdown-ish and is the recommended
>     way of
>     >     typing non-braking space in the manual, it seems the markdown
>     writer
>     >     should use that instead.
>     >     I don't know if it's more markdown-ish.  The goal of getting
>     >     a text that reads naturally without special processing is
>     >     better met by using a unicode nonbreaking space.  The \ is
>     >     pretty ugly.  I don't think the manual recommends \ as
>     >     preferable to a literal nonbreaking space.
>     >
>     >   
>     >
>     >   --
>     >   You received this message because you are subscribed to the
>     Google
>     >   Groups "pandoc-discuss" group.
>     >   To unsubscribe from this group and stop receiving emails from
>     it, send
>     >   an email to [1][2]pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>     >   To post to this group, send email to
>     >   [2][3]pandoc-...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>     >   To view this discussion on the web visit
>     >   [3][4]https://groups.google.com/d/msgid/pandoc-discuss/
>     565f0a35-b5d3-45b3-
>     >   8cde-e0c9dfe0ca3b%[5]40googlegroups.com.
>     >   For more options, visit [4][6]https://groups.google.com/
>     d/optout.
>     >
>     >References
>     >
>     >   1. mailto:[7]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>     >   2. mailto:[8]pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>     >   3. [9]https://groups.google.com/d/msgid/pandoc-discuss/565f0a35-
>     b5d3-45b3-8cde-e0c9dfe0ca3b-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=
>     email&utm_source=footer
>     >   4. [10]https://groups.google.com/d/optout
>
>   
>
>   --
>   You received this message because you are subscribed to the Google
>   Groups "pandoc-discuss" group.
>   To unsubscribe from this group and stop receiving emails from it, send
>   an email to [11]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>   To post to this group, send email to
>   [12]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>   To view this discussion on the web visit
>   [13]https://groups.google.com/d/msgid/pandoc-discuss/af7d7c17-b985-4370
>   -b5c7-872433996afd%40googlegroups.com.
>   For more options, visit [14]https://groups.google.com/d/optout.
>
>References
>
>   1. http://pandoc.org/scripting.html
>   2. javascript:/
>   3. javascript:/
>   4. https://groups.google.com/d/msgid/pandoc-discuss/565f0a35-b5d3-45b3-
>   5. http://40googlegroups.com/
>   6. https://groups.google.com/d/optout
>   7. javascript:/
>   8. javascript:/
>   9. https://groups.google.com/d/msgid/pandoc-discuss/565f0a35-b5d3-45b3-8cde-e0c9dfe0ca3b-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=email&utm_source=footer
>  10. https://groups.google.com/d/optout
>  11. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>  12. mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>  13. https://groups.google.com/d/msgid/pandoc-discuss/af7d7c17-b985-4370-b5c7-872433996afd-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=email&utm_source=footer
>  14. https://groups.google.com/d/optout

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/20161030094223.GH6690%40MacBook-Air-2.local.
For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found]                             ` <20161030094223.GH6690-jF64zX8BO091tJRe0FUodcM6rOWSkUom@public.gmane.org>
@ 2016-10-31  1:50                               ` Kolen Cheung
  2016-11-30  2:46                               ` Kolen Cheung
  1 sibling, 0 replies; 39+ messages in thread
From: Kolen Cheung @ 2016-10-31  1:50 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1.1: Type: text/plain, Size: 5487 bytes --]

On Sunday, October 30, 2016 at 2:42:30 AM UTC-7, John MacFarlane wrote: 
Achieving idempotency (write markdown == write markdown -> read markdown -> 
write markdown) is actually not so easy. Partly this is because unless we 
escape things VERY aggressively, regular characters can turn into syntax. 

In cmark (commonmark library), I worked hard at this and managed to get 
idempotence for all the test cases in the test suite (except for a few 
special cases). But pandoc’s writer is not so close. 

We can make steps towards this, but a guarantee is going to be very tough. 
I tried adding a QuickCheck property for this, and each time I ran it, it 
falsified idempotency on the first try…

Very interesting to know these!

After thinking more about it, I can see it is complicated (probably @jgm 
understand all these and may have thought about it already! It serves as a 
note to myself and perhaps any others who are interested in):

Note: inline math is used, I see that in Google Groups’ web view it is 
rendered correctly.

Let [image: A] be the set of all valid AST, [image: M] be the set of all 
valid Markdown, and [image: f: A \rightarrow M] be the markdown writer, [image: 
g: M \rightarrow A] be the markdown reader.

Criterion 1: (write markdown == write markdown -> read markdown -> write 
markdown): can be written as [image: \forall a \in A, f(a) = f\circ g\circ 
f(a)], which is equivalent to [image: \forall m\in f(A), m=f\circ g(m)].

Criterion 2: idempotence of [image: f\circ g] means: [image: \forall m \in 
M, (f\circ g)\circ(f\circ g)(m)=f\circ g(m)], since [image: g] might not be 
surjective, criterion 1 implies 2 and is stronger than 2.

Criterion 3: Ideally, [image: f] is one-one and [image: g] is surjective. 
In a sense it means the markdown writer does not lose information and [image: 
g] can recover all of these information. It also means that any features 
supported by the AST has a markdown representation. Given people is not 
writing in AST but markdown, this is a desirable feature. In this case, the 
criterion is [image: \forall a \in A, a = g\circ f(a)], i.e. [image: g\circ 
f=I]. However, markdown syntax doesn’t (yet) allowed this, e.g. space at 
the end of verbatim is eaten, none of the native tables syntax supports all 
features/properties of the internal AST. The requirement is too strong.

Criterion 4: So to relax criterion 3 to a practical level: [image: \forall 
a \in g(M), a = g\circ f(a))]. i.e. all AST that could be generated by 
markdown reader will fulfill the one-one requirement on markdown writer. 
This is equivalent to [image: \forall m \in M, g(m) = g\circ f \circ g(m)], 
which implies criterion 2, and is the “opposite” of criterion 1.

To summaries, criterion 3 is strongest (implying all other criteria) but 
impossible. Criterion 2 is the least strong (all criteria implies 2), 
allowing them to be a markdown styling tool, and hence, say, using the 
pandoc filter system to act on the source (after fixing the styling). 
criterion 1 is important when the AST is obtained through somewhere else, 
e.g. docx reader. criterion 4 is important to guarantee the correctness of 
markdown writer that uses markdown to represent the information in the AST 
(as far as markdown syntax is allowed). So criteria 1 & 4 should be “the 
goal”. To summaries it in 1 statement (and generalize to any format): [image: 
\forall i,j, \forall b_i \in f_{ji}(B_j), b_i=f_{ji}\circ f_{ij}(b_i)], 
where the [image: i,j] runs through formats, [image: f_{ij}] maps from [image: 
i]-format to [image: j]-format. i.e. all reader & writer pairs forms an 
identity when constrain to a subset.

After thinking about all these, the criteria become more important than 
what I originally asked for (markdown styler), they guarantee the output 
from AST won’t be misinterpreted. And now I can see why it is difficult: at 
least in the case of markdown output, the target format is “not very well 
defined”. I remember @jgm mentioned somewhere that any string is a valid 
markdown. Probably this is what behinds the statement “unless we escape 
things VERY aggressively, regular characters can turn into syntax”.

I guess one way it could help solving the problem is this (for the 
meanwhile concerns markdown-AST pair only): say we have an experimental 
command-line option, --safe, then immediately after the markdown writer 
writes it as markdown, it reads back into AST immediately and calculate the 
diff. If there’s a diff, then it starts escaping more aggressively until 
identity is reached. It might even starts to calculate the diff from the 
smaller subset of the document to hunt down the “trouble-maker”. (It is 
easier said that done, and can be much slower in corner cases, hence the 
“experimental command-line option”.)

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e586ec3b-703d-4a9c-9c12-03689b6847e5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 17160 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found]                         ` <af7d7c17-b985-4370-b5c7-872433996afd-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2016-10-29 21:03                           ` Melroch
  2016-10-30  9:42                           ` John MacFarlane
@ 2016-11-08  7:25                           ` BP Jonsson
  2 siblings, 0 replies; 39+ messages in thread
From: BP Jonsson @ 2016-11-08  7:25 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 7927 bytes --]

Sorry for awakening an old thread, but this would actually be a use case
for the Unicode non-breaking space, wouldn't it? Perhaps you could use some
visible character which you don't actually use inside code and have a
filter substitute to nbspace. To fool the Markdown parser I mean.

Thinking of that: when you use regex to 'unsmart' your source you probably
would like to leave code alone. There is an idiom for that which works at
least In Perl:

````perl
my %unsmart_chars_for = ( '“' => '"', '”' => '"', ... );
$text =~ s{(([\`]+).+?\2)|([“”...])}{ $1 || $unsmart_chars_for{$3} }egs;
````

First you set up an associative array with the 'smart' chars as keys and
their unsmart equivalents as values. The idea is that you use a regex with
an alternation which captures substrings you don't want to change before
the stuff you want to change. If you get a match on a substring you don't
want to change you just put it back in, while if you get a match on a
substring you want to change/replace you put in the changed
substring/replacement as usual. Thus the regex captures the opening
backticks of code into $2 aka \2 and the text upto the closing backticks
into $1 (note the non-greedy quantifier!), or a smart char into $3. In the
replacement: if there was a match for code $1 is non-empty/true so you just
put the code back in. If there wasn't a code match you got a smart char in
$3, so you use it as key on the associative array to retrieve the unsmart
equivalent. Finally the s modifier makes dot match newlines too so that $1
captures code blocks too. You should be able to do this in python by using
a replacement function with re.sub() http://stackoverflow.com/a/12597709

Den 29 okt 2016 21:27 skrev "Kolen Cheung" <christian.kolen-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:

> Thanks. I will just lengthen my example so that it doesn’t end on a space.
>
> Going back to the point to using pandoc from markdown to markdown to
> enforce a style: it is a prerequisite to apply pandoc filters on the
> source. For example, I am updating my project to use
> --top-level-division=part. For this I need to increment the header levels
> by 1. As suggested in Pandoc - Scripting with pandoc
> <http://pandoc.org/scripting.html>, using regex could mess somethings
> else up. But in order to write and use a pandoc filter for this task, one
> needs to make sure the read/write cycle don’t change other things else.
>
> As I said earlier, expecting read/write cycle to be an identify is
> unreasonable, but as long as it is idempotent it would make this use
> possible. There’s only need to be a commit using a read/write cycle to
> enforce a style, then apply the said hypothetical filter to change header
> level only, making the commits a lot cleaner.
>
> So, in addition to control freaks like me that want to enforce a style on
> the source, being able to do that also has other real world use case.
>
> On Saturday, October 29, 2016 at 11:54:52 AM UTC-7, John MacFarlane wrote:
>
> This space collapsing is unfortunately part of the Markdown
>> syntax description.  There's no way around it that I can
>> think of.
>>
>> +++ Kolen Cheung [Oct 29 16 02:33 ]:
>> >   I just encounter a problem kind of related:
>> >
>> >   I’m writing a README for the project, having a line like this:
>> >End shortform in non-breaking space, like this: `e.g.\ `, `i.e.\ `. The
>> backslas
>> >h escaped space, `\ `...
>> >
>> >   However I found the space got eaten, and found a documentation on
>> this
>> >   behavior in the manual:
>> >
>> >     (The spaces after the opening backticks and before the closing
>> >     backticks will be ignored.)
>> >
>> >   Are there any way to get around this? e.g. I notice that ending with
>> >   space in code block is ok.
>> >
>> >   Code used to test:
>> ># printf "%s\n\n" '`e.g.\ `' | pandoc -f markdown -t native
>> >[Para [Code ("",[],[]) "e.g.\\"]]
>> ># printf "%s\n\n" '``e.g.\ ``' | pandoc -f markdown -t native
>> >[Para [Code ("",[],[]) "e.g.\\"]]
>> ># printf "%s\n\n" '```e.g.\ ```' | pandoc -f markdown -t native
>> >[Para [Code ("",[],[]) "e.g.\\"]]
>> ># printf "%s\n\n" '    end with a space \ ' | pandoc -f markdown -t
>> native
>> >[CodeBlock ("",[],[]) "end with a space \\ "]
>> ># printf "%s\n\n" '```' 'end with a space \ ' '```' | pandoc -f markdown
>> -t nati
>> >ve
>> >[CodeBlock ("",[],[]) "\nend with a space \\ \n"]
>> >
>> >   On Thursday, October 27, 2016 at 12:02:43 PM UTC-7, John MacFarlane
>> >   wrote:
>> >
>> >     +++ Kolen Cheung [Oct 27 16 00:05 ]:
>> >     >I found that the pandoc markdown writer will use unicode output if
>> >     possible. Not only in the case like the trademark symbol `&trade;`,
>> >     but also when non-breaking space is used. The markdown writer will
>> >     output a unicode non-breaking space character rather than `\ `.
>> >     Since the latter is more markdown-ish and is the recommended way of
>> >     typing non-braking space in the manual, it seems the markdown
>> writer
>> >     should use that instead.
>> >     I don't know if it's more markdown-ish.  The goal of getting
>> >     a text that reads naturally without special processing is
>> >     better met by using a unicode nonbreaking space.  The \ is
>> >     pretty ugly.  I don't think the manual recommends \ as
>> >     preferable to a literal nonbreaking space.
>> >
>> >   
>> >
>> >   --
>> >   You received this message because you are subscribed to the Google
>> >   Groups "pandoc-discuss" group.
>> >   To unsubscribe from this group and stop receiving emails from it,
>> send
>> >   an email to [1]pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> >   To post to this group, send email to
>> >   [2]pandoc-...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> >   To view this discussion on the web visit
>> >   [3]https://groups.google.com/d/msgid/pandoc-discuss/565f0a35
>> -b5d3-45b3-
>> >   8cde-e0c9dfe0ca3b%40googlegroups.com.
>> >   For more options, visit [4]https://groups.google.com/d/optout.
>> >
>> >References
>> >
>> >   1. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>> >   2. mailto:pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>> >   3. https://groups.google.com/d/msgid/pandoc-discuss/565f0a35-b5
>> d3-45b3-8cde-e0c9dfe0ca3b-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=email&
>> utm_source=footer
>> >   4. https://groups.google.com/d/optout
>>
>> 
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/pandoc-discuss/af7d7c17-b985-4370-b5c7-872433996afd%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/pandoc-discuss/af7d7c17-b985-4370-b5c7-872433996afd%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFC_yuS9S7wPeVFfsH7iSpVT0Po4midWhJ0Fn%2BLEH3VJifKBEA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 22414 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found]                             ` <20161030094223.GH6690-jF64zX8BO091tJRe0FUodcM6rOWSkUom@public.gmane.org>
  2016-10-31  1:50                               ` Kolen Cheung
@ 2016-11-30  2:46                               ` Kolen Cheung
       [not found]                                 ` <e29cd3d1-0cfb-42be-8cbe-c3c771efe125-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  1 sibling, 1 reply; 39+ messages in thread
From: Kolen Cheung @ 2016-11-30  2:46 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1.1: Type: text/plain, Size: 1921 bytes --]

When I was testing some filters I wrote, I find it is actually not quite 
difficult to achieve a looser condition: [image: P^3 = P^2]. My tests 
aren’t very complicated though. But it seems to me that it is reasonable 
for something like [image: \exists n, \forall x, \forall m > 0, P^{n+m} x = 
P^n x] because if you allow enough iteration, eventually things will dies 
down. (If it doesn’t converge, then I guess the reader-writer pairs should 
be “fixed”. It will be interesting if such [image: n] doesn’t exist but can 
get arbitrarily large though. In practice I hope [image: n=2].)

If it truly works (for some [image: n]), then the benefits are:

   1. 

   automated tests: no matter what has changed and needed to test, the 
   bottom line is to satisfied the “weaker idempotent” requirement. (To be 
   fancy, imagine to discover bugs from crawling random documents across the 
   internet and feed into this test.)
   2. 

   a corollary is that [image: P^n] will be idempotent, useful for
   1. 

      pandoc as a “linter”
      2. 

      as discussed in the last post, ideally all reader-writer pairs should 
      be idempotent, so this automatically gives us something like a --safe 
      option that is slower but guarantee to be idempotent.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e29cd3d1-0cfb-42be-8cbe-c3c771efe125%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 6349 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found]                                 ` <e29cd3d1-0cfb-42be-8cbe-c3c771efe125-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2016-11-30  3:05                                   ` Kolen Cheung
       [not found]                                     ` <1cf2f022-2a64-4a9e-94d3-f2da097709ba-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Kolen Cheung @ 2016-11-30  3:05 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1.1: Type: text/plain, Size: 1381 bytes --]

Oops… just as I was hopeful, I just found a runaway situation with this 
example:

Both equations give the same result, and you may choose whichever is more convenient for a given problem. *F\~ $\perp$ ~*means\\\\\\\\\\\\ the\\\\\\\\\\\\ component\\\\\\\\\\\\ of\\\\\\\\\\\\ $F$\\\\\\\\\\\\ perpendicular\\\\\\\\\\\\ to\\\\\\\\\\\\ $r$,\\\\\\\\\\\\ while*r~ $\perp$ \~* means the component of $r$ perpendicular to $F_.$

The was from an erroneous conversion from .doc to .docx to .md. But 
basically if you try to apply pandoc -f markdown -t markdown to it, the 
long line of escape sequence \\\\... will be getting longer. (The “source” 
has such long sequence of \\\... perhaps because I use pandoc as linter 
from time to time and didn’t look too carefully in this file.)

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/1cf2f022-2a64-4a9e-94d3-f2da097709ba%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 8989 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found]                                     ` <1cf2f022-2a64-4a9e-94d3-f2da097709ba-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2016-11-30  4:18                                       ` Sergio Correia
       [not found]                                         ` <a679cf14-0eea-4a14-85b5-2506b61975fe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2016-11-30 11:52                                       ` John MacFarlane
  1 sibling, 1 reply; 39+ messages in thread
From: Sergio Correia @ 2016-11-30  4:18 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2478 bytes --]

TBH it kinda feels like a bug in Pandoc's markdown writer:

*example.md:*

> *a cat*
> ~a cat~
> *~a cat~*
> ~*a cat*~


*pandoc example.md --to=native*

[Para [Emph [Str "a",Space,Str "cat"]]
,Para [Str "~a",Space,Str "cat~"]
,Para [Emph [Str "~a",Space,Str "cat~"]]
,Para [Subscript [Emph [Str "a",Space,Str "cat"]]]]

So if instead of writing *~a cat~* you write *~*a cat*~, *pandoc recognizes 
the text as subscript (same as if you just type *~cat~* ). So far so good.

However, when writing to markdown, somehow two backslashes get added (maybe 
for escape reasons?). This is the part that looks like a bug, because two 
consecutive backslashes mean that a backslash will be produced as output.

This is even easier to see in html:

*example.html:*

> <p><sub><em>a cat</em></sub></p>


*pandoc example.html --to=markdown*


*~*a\\ cat*~*

Or even worse:

*pandoc example.html --to=markdown | pandoc --to=html*

*<p><sub><em>a\ cat</em></sub></p>*




On Tuesday, November 29, 2016 at 10:05:06 PM UTC-5, Kolen Cheung wrote:
>
> Oops… just as I was hopeful, I just found a runaway situation with this 
> example:
>
> Both equations give the same result, and you may choose whichever is more convenient for a given problem. *F\~ $\perp$ ~*means\\\\\\\\\\\\ the\\\\\\\\\\\\ component\\\\\\\\\\\\ of\\\\\\\\\\\\ $F$\\\\\\\\\\\\ perpendicular\\\\\\\\\\\\ to\\\\\\\\\\\\ $r$,\\\\\\\\\\\\ while*r~ $\perp$ \~* means the component of $r$ perpendicular to $F_.$
>
> The was from an erroneous conversion from .doc to .docx to .md. But 
> basically if you try to apply pandoc -f markdown -t markdown to it, the 
> long line of escape sequence \\\\... will be getting longer. (The 
> “source” has such long sequence of \\\... perhaps because I use pandoc as 
> linter from time to time and didn’t look too carefully in this file.)
> 
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/a679cf14-0eea-4a14-85b5-2506b61975fe%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 11506 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found]                                     ` <1cf2f022-2a64-4a9e-94d3-f2da097709ba-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2016-11-30  4:18                                       ` Sergio Correia
@ 2016-11-30 11:52                                       ` John MacFarlane
  1 sibling, 0 replies; 39+ messages in thread
From: John MacFarlane @ 2016-11-30 11:52 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Could you come up with a more minimal example to isolate the
problem?

+++ Kolen Cheung [Nov 29 16 19:05 ]:
>   Oops… just as I was hopeful, I just found a runaway situation with this
>   example:
>Both equations give the same result, and you may choose whichever is more conven
>ient for a given problem. *F\~ $\perp$ ~*means\\\\\\\\\\\\ the\\\\\\\\\\\\ compo
>nent\\\\\\\\\\\\ of\\\\\\\\\\\\ $F$\\\\\\\\\\\\ perpendicular\\\\\\\\\\\\ to\\\\
>\\\\\\\\ $r$,\\\\\\\\\\\\ while*r~ $\perp$ \~* means the component of $r$ perpen
>dicular to $F_.$
>
>   The was from an erroneous conversion from .doc to .docx to .md. But
>   basically if you try to apply pandoc -f markdown -t markdown to it, the
>   long line of escape sequence \\\\... will be getting longer. (The
>   “source” has such long sequence of \\\... perhaps because I use pandoc
>   as linter from time to time and didn’t look too carefully in this
>   file.)
>   
>
>   --
>   You received this message because you are subscribed to the Google
>   Groups "pandoc-discuss" group.
>   To unsubscribe from this group and stop receiving emails from it, send
>   an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>   To post to this group, send email to
>   [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>   To view this discussion on the web visit
>   [3]https://groups.google.com/d/msgid/pandoc-discuss/1cf2f022-2a64-4a9e-
>   94d3-f2da097709ba%40googlegroups.com.
>   For more options, visit [4]https://groups.google.com/d/optout.
>
>References
>
>   1. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>   2. mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>   3. https://groups.google.com/d/msgid/pandoc-discuss/1cf2f022-2a64-4a9e-94d3-f2da097709ba-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=email&utm_source=footer
>   4. https://groups.google.com/d/optout

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/20161130115207.GD15143%40Administrateurs-iMac-3.local.
For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found]                                         ` <a679cf14-0eea-4a14-85b5-2506b61975fe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2016-11-30 11:59                                           ` John MacFarlane
  2016-11-30 11:59                                           ` John MacFarlane
  1 sibling, 0 replies; 39+ messages in thread
From: John MacFarlane @ 2016-11-30 11:59 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

I just tried your example with the current dev version
of pandoc and could not reproduce:

% pandoc -f html -t markdown
<p><sub><em>a cat</em></sub></p>
~*a\ cat*~

But I could reproduce it with 1.18 release, so this bug
has already been fixed.



+++ Sergio Correia [Nov 29 16 20:18 ]:
>   TBH it kinda feels like a bug in Pandoc's markdown writer:
>   example.md:
>
>     *a cat*
>     ~a cat~
>     *~a cat~*
>     ~*a cat*~
>
>   pandoc example.md --to=native
>   [Para [Emph [Str "a",Space,Str "cat"]]
>   ,Para [Str "~a",Space,Str "cat~"]
>   ,Para [Emph [Str "~a",Space,Str "cat~"]]
>   ,Para [Subscript [Emph [Str "a",Space,Str "cat"]]]]
>   So if instead of writing ~a cat~ you write ~*a cat*~, pandoc recognizes
>   the text as subscript (same as if you just type ~cat~ ). So far so
>   good.
>   However, when writing to markdown, somehow two backslashes get added
>   (maybe for escape reasons?). This is the part that looks like a bug,
>   because two consecutive backslashes mean that a backslash will be
>   produced as output.
>   This is even easier to see in html:
>   example.html:
>
>     <p><sub><em>a cat</em></sub></p>
>
>   pandoc example.html --to=markdown
>   ~*a\\ cat*~
>   Or even worse:
>   pandoc example.html --to=markdown | pandoc --to=html
>   <p><sub><em>a\ cat</em></sub></p>
>   On Tuesday, November 29, 2016 at 10:05:06 PM UTC-5, Kolen Cheung wrote:
>
>   Oops… just as I was hopeful, I just found a runaway situation with this
>   example:
>Both equations give the same result, and you may choose whichever is more conven
>ient for a given problem. *F\~ $\perp$ ~*means\\\\\\\\\\\\ the\\\\\\\\\\\\ compo
>nent\\\\\\\\\\\\ of\\\\\\\\\\\\ $F$\\\\\\\\\\\\ perpendicular\\\\\\\\\\\\ to\\\\
>\\\\\\\\ $r$,\\\\\\\\\\\\ while*r~ $\perp$ \~* means the component of $r$ perpen
>dicular to $F_.$
>
>   The was from an erroneous conversion from .doc to .docx to .md. But
>   basically if you try to apply pandoc -f markdown -t markdown to it, the
>   long line of escape sequence \\\\... will be getting longer. (The
>   “source” has such long sequence of \\\... perhaps because I use pandoc
>   as linter from time to time and didn’t look too carefully in this
>   file.)
>   
>
>   --
>   You received this message because you are subscribed to the Google
>   Groups "pandoc-discuss" group.
>   To unsubscribe from this group and stop receiving emails from it, send
>   an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>   To post to this group, send email to
>   [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>   To view this discussion on the web visit
>   [3]https://groups.google.com/d/msgid/pandoc-discuss/a679cf14-0eea-4a14-
>   85b5-2506b61975fe%40googlegroups.com.
>   For more options, visit [4]https://groups.google.com/d/optout.
>
>References
>
>   1. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>   2. mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>   3. https://groups.google.com/d/msgid/pandoc-discuss/a679cf14-0eea-4a14-85b5-2506b61975fe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=email&utm_source=footer
>   4. https://groups.google.com/d/optout

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/20161130115916.GE15143%40Administrateurs-iMac-3.local.
For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found]                                         ` <a679cf14-0eea-4a14-85b5-2506b61975fe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2016-11-30 11:59                                           ` John MacFarlane
@ 2016-11-30 11:59                                           ` John MacFarlane
       [not found]                                             ` <20161130115945.GF15143-BKjuZOBx5Kn2N3qrpRCZGbhGAdq7xJNKhPhL2mjWHbk@public.gmane.org>
  1 sibling, 1 reply; 39+ messages in thread
From: John MacFarlane @ 2016-11-30 11:59 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

It's bug #3225 by the way.

+++ Sergio Correia [Nov 29 16 20:18 ]:
>   TBH it kinda feels like a bug in Pandoc's markdown writer:
>   example.md:
>
>     *a cat*
>     ~a cat~
>     *~a cat~*
>     ~*a cat*~
>
>   pandoc example.md --to=native
>   [Para [Emph [Str "a",Space,Str "cat"]]
>   ,Para [Str "~a",Space,Str "cat~"]
>   ,Para [Emph [Str "~a",Space,Str "cat~"]]
>   ,Para [Subscript [Emph [Str "a",Space,Str "cat"]]]]
>   So if instead of writing ~a cat~ you write ~*a cat*~, pandoc recognizes
>   the text as subscript (same as if you just type ~cat~ ). So far so
>   good.
>   However, when writing to markdown, somehow two backslashes get added
>   (maybe for escape reasons?). This is the part that looks like a bug,
>   because two consecutive backslashes mean that a backslash will be
>   produced as output.
>   This is even easier to see in html:
>   example.html:
>
>     <p><sub><em>a cat</em></sub></p>
>
>   pandoc example.html --to=markdown
>   ~*a\\ cat*~
>   Or even worse:
>   pandoc example.html --to=markdown | pandoc --to=html
>   <p><sub><em>a\ cat</em></sub></p>
>   On Tuesday, November 29, 2016 at 10:05:06 PM UTC-5, Kolen Cheung wrote:
>
>   Oops… just as I was hopeful, I just found a runaway situation with this
>   example:
>Both equations give the same result, and you may choose whichever is more conven
>ient for a given problem. *F\~ $\perp$ ~*means\\\\\\\\\\\\ the\\\\\\\\\\\\ compo
>nent\\\\\\\\\\\\ of\\\\\\\\\\\\ $F$\\\\\\\\\\\\ perpendicular\\\\\\\\\\\\ to\\\\
>\\\\\\\\ $r$,\\\\\\\\\\\\ while*r~ $\perp$ \~* means the component of $r$ perpen
>dicular to $F_.$
>
>   The was from an erroneous conversion from .doc to .docx to .md. But
>   basically if you try to apply pandoc -f markdown -t markdown to it, the
>   long line of escape sequence \\\\... will be getting longer. (The
>   “source” has such long sequence of \\\... perhaps because I use pandoc
>   as linter from time to time and didn’t look too carefully in this
>   file.)
>   
>
>   --
>   You received this message because you are subscribed to the Google
>   Groups "pandoc-discuss" group.
>   To unsubscribe from this group and stop receiving emails from it, send
>   an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>   To post to this group, send email to
>   [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>   To view this discussion on the web visit
>   [3]https://groups.google.com/d/msgid/pandoc-discuss/a679cf14-0eea-4a14-
>   85b5-2506b61975fe%40googlegroups.com.
>   For more options, visit [4]https://groups.google.com/d/optout.
>
>References
>
>   1. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>   2. mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>   3. https://groups.google.com/d/msgid/pandoc-discuss/a679cf14-0eea-4a14-85b5-2506b61975fe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=email&utm_source=footer
>   4. https://groups.google.com/d/optout

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/20161130115945.GF15143%40Administrateurs-iMac-3.local.
For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found]                                             ` <20161130115945.GF15143-BKjuZOBx5Kn2N3qrpRCZGbhGAdq7xJNKhPhL2mjWHbk@public.gmane.org>
@ 2016-11-30 14:29                                               ` Sergio Correia
  0 siblings, 0 replies; 39+ messages in thread
From: Sergio Correia @ 2016-11-30 14:29 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 700 bytes --]

Great, thanks!

On Wednesday, November 30, 2016 at 7:00:32 AM UTC-5, John MacFarlane wrote:
>
> It's bug #3225 by the way. 
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/b190207a-dd35-4036-8b82-1fb11cc07feb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 1251 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: How to programmatically enforcing a pandoc markdown style
       [not found] ` <e82e943f-604e-4a5b-a621-4b3dd82e42c0-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2016-10-22 20:54   ` John MacFarlane
@ 2017-02-07 22:36   ` Kolen Cheung
  1 sibling, 0 replies; 39+ messages in thread
From: Kolen Cheung @ 2017-02-07 22:36 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 784 bytes --]

For the unsmartypant we mentioned earlier on, pandoc 2.0 will support this. 
See https://github.com/jgm/pandoc/issues/3416#issuecomment-277425921

(basically `smart` becomes an extension and can be toggled off.)

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/cd7d9fe4-c5a9-47c9-b8e3-f45d3718a1a4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 1224 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2017-02-07 22:36 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-22  9:09 How to programmatically enforcing a pandoc markdown style Kolen Cheung
     [not found] ` <e82e943f-604e-4a5b-a621-4b3dd82e42c0-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2016-10-22 20:54   ` John MacFarlane
     [not found]     ` <20161022205406.GB83446-jF64zX8BO091tJRe0FUodcM6rOWSkUom@public.gmane.org>
2016-10-22 22:41       ` Kolen Cheung
     [not found]         ` <964c8fc2-834a-4f4c-8390-091177a82562-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2016-10-23  2:36           ` Sergio Correia
     [not found]             ` <e7bd4eba-c43a-4f20-8536-5fa0926b857b-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2016-10-23 21:39               ` John MacFarlane
2016-10-24  7:59               ` Kolen Cheung
2016-10-24  8:42       ` Kolen Cheung
2016-10-27  7:05       ` Kolen Cheung
     [not found]         ` <0f0bc668-c454-4119-a62b-307e318553f8-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2016-10-27 19:02           ` John MacFarlane
     [not found]             ` <20161027190242.GD1044-jF64zX8BO091tJRe0FUodcM6rOWSkUom@public.gmane.org>
2016-10-28  0:58               ` Kolen Cheung
     [not found]                 ` <cbf3c105-241b-45de-8519-8962cadda270-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2016-10-28 13:15                   ` BP Jonsson
     [not found]                     ` <CAFC_yuSXMWATCa0GFO0Y94H0PpFXcShGMLwEEaHBqutpxuLSiw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-10-28 20:45                       ` Kolen Cheung
     [not found]                         ` <10bddc3d-e533-44bd-8d8c-5b132e56a57f-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2016-10-29  6:19                           ` John MacFarlane
     [not found]                             ` <20161029061904.GF7496-jF64zX8BO091tJRe0FUodcM6rOWSkUom@public.gmane.org>
2016-10-29  6:53                               ` Kolen Cheung
     [not found]                                 ` <45d24e60-5523-4bbe-8c9f-a49e53583198-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2016-10-29  7:01                                   ` Kolen Cheung
2016-10-28  3:45               ` Kolen Cheung
     [not found]                 ` <6a504fbe-45c3-4221-ab15-0dc47b4591c7-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2016-10-28  8:25                   ` John MacFarlane
2016-10-29  9:33               ` Kolen Cheung
     [not found]                 ` <565f0a35-b5d3-45b3-8cde-e0c9dfe0ca3b-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2016-10-29 18:54                   ` John MacFarlane
     [not found]                     ` <20161029185445.GE5364-jF64zX8BO091tJRe0FUodcM6rOWSkUom@public.gmane.org>
2016-10-29 19:27                       ` Kolen Cheung
     [not found]                         ` <af7d7c17-b985-4370-b5c7-872433996afd-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2016-10-29 21:03                           ` Melroch
     [not found]                             ` <CADAJKhDsYm4yaCen61Q4kKXpgL9oKgToMn=hT71g5UZrwDeWSA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-10-29 21:44                               ` Kolen Cheung
2016-10-30  9:42                           ` John MacFarlane
     [not found]                             ` <20161030094223.GH6690-jF64zX8BO091tJRe0FUodcM6rOWSkUom@public.gmane.org>
2016-10-31  1:50                               ` Kolen Cheung
2016-11-30  2:46                               ` Kolen Cheung
     [not found]                                 ` <e29cd3d1-0cfb-42be-8cbe-c3c771efe125-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2016-11-30  3:05                                   ` Kolen Cheung
     [not found]                                     ` <1cf2f022-2a64-4a9e-94d3-f2da097709ba-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2016-11-30  4:18                                       ` Sergio Correia
     [not found]                                         ` <a679cf14-0eea-4a14-85b5-2506b61975fe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2016-11-30 11:59                                           ` John MacFarlane
2016-11-30 11:59                                           ` John MacFarlane
     [not found]                                             ` <20161130115945.GF15143-BKjuZOBx5Kn2N3qrpRCZGbhGAdq7xJNKhPhL2mjWHbk@public.gmane.org>
2016-11-30 14:29                                               ` Sergio Correia
2016-11-30 11:52                                       ` John MacFarlane
2016-11-08  7:25                           ` BP Jonsson
2016-10-29 18:55                   ` John MacFarlane
2016-10-27 19:10           ` Jesse Rosenthal
     [not found]             ` <87bmy5txp1.fsf-4GNroTWusrE@public.gmane.org>
2016-10-28  1:21               ` Kolen Cheung
     [not found]                 ` <7e8b352f-1df9-4a92-81df-10359475f869-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2016-10-28  1:31                   ` Sergio Correia
     [not found]                     ` <4e8bc6eb-5f42-4db1-bf6b-2b2fb44482c1-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2016-10-28  1:53                       ` Kolen Cheung
2016-10-28  8:23                       ` John MacFarlane
2017-02-07 22:36   ` Kolen Cheung

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).