Could you help to update zapspace.pl to work with pandoc 2.9.2.1 ? I have Chinese markdown files that use spaces to separate groups of words, and would like to ignore spaces between Chinese characters before converting to Word. Many thanks ! On Tuesday, July 16, 2013 at 11:34:32 PM UTC+8, BP Jonsson wrote: > > 2013-07-15 19:51, John MacFarlane skrev: > > +++ Bill Chen (CHEN, Zhechuan) [Jul 15 13 17:16 ]: > >> Have found a way to make this feature done. > >> Just add "\n" at the last of the line > > > > This would violate the general rule that backslashes before letters in > > markdown are just literal backslashes. > > > > I think that a better approach would be to provide a markdown > > extension like the current 'hard_line_breaks': perhaps > > 'ignore_line_breaks'. 'hard_line_breaks' causes line > > breaks in a paragraph to be interpreted as hard breaks; > > 'ignore_line_breaks' would cause them to be ignored entirely. > > (One of these would have to be designated as taking precedence > > if both were selected.) > > > > John > > > > The attached perl script, when used as a filter on pandoc's > json output, should enable Bill to get what he wants. I have > used an earlier version on Tibetan text with satisfactory > results. Someone who knows Haskell could probably write > something shorter which interacts with pandoc in a more > elegant way, but this script works. > > The description inside the file reads as follows: > > FILE: zapspace.pl > > USAGE: pandoc -w json some.markdown | zapspace.pl | pandoc -r json > > DESCRIPTION: Takes as input a document in pandoc's json format and > removes all "Space" elements inside any list which also > contains any {"Str":"..."} element, and outputs a > modified json document, which when given as input to > pandoc will produce output suitable for languages which > don't put spaces between words or sentences, with no spaces > inside paragraphs -- unless you insert non-breaking spaces, > see below! --, and notably spaces caused by linebreaks > in the markdown paragraph will be removed. > > Additionally it does two things which allow you to > insert whitespace inside paragraph-like elements: > > 1) It replaces any non-breaking space (U+00A0) inside a > "Str" element with ordinary soft spaces (U+0020) > *if* the "Str" element also contains characters other > than non-breaking spaces. > > This allows you to insert spaces into your markdown > paragraphs as non-breaking spaces (in pandoc notation > a backslash followed by an ordinary space "like\ this") > and get ordinary spaces in your output. > > 2) Preserves any "Str" element which only contains one > or more non-breaking spaces as is. > > This allows you to put non-breaking spaces between > words by inserting ordinary whitespace -- which will > be removed -- on either side of the non-breaking > spaces "like \ this". > ^ ^ > > N.B. that this is *not* done by scanning the JSON text > with regular expressions! The JSON is loaded into a > perl data structure which is modified and then converted > back into JSON. Precautions are taken not to modify the > structure such that the output will be rejected by > pandoc, nor to modify code elements, but I can't guarantee > that this will remain true with future versions of pandoc, > or that it is true for any input. > > OPTIONS: --- > REQUIREMENTS: * A reasonably recent version of perl. > * The following CPAN modules: > > - [JSON::Any](https://metacpan.org/module/JSON::Any) > + A JSON 'backend' module like JSON or JSON::XS. > - [List::MoreUtils]( > https://metacpan.org/module/List::MoreUtils) > - [autovivification]( > https://metacpan.org/module/autovivification) > > > > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/35356bdb-9f45-4f0c-bb49-3fb4e2db98a0%40googlegroups.com.