Wow that script is really ancient! I'll try to port it to a Lua filter tomorrow. It's 9 PM here now and I have been coding or writing for twelve hours, so I'm quite exhausted. Just to be clear, the old script removes all spaces which are next to a "string" element, i.e. all "words", digits and punctuation alike, and not just CJK characters. If you are OK with that behavior porting it to a Lua filter will be trivial, and Lua is built-in in Pandoc. Otherwise I'll have to look into rewriting the Perl script, which may be not quite as trivial. /BPJ Den mån 13 apr. 2020 20:45J skrev: > Could you help to update zapspace.pl to work with pandoc 2.9.2.1 ? I have > Chinese markdown files that use spaces to separate groups of words, and > would like to ignore spaces between Chinese characters before converting to > Word. > Many thanks ! > > On Tuesday, July 16, 2013 at 11:34:32 PM UTC+8, BP Jonsson wrote: >> >> 2013-07-15 19:51, John MacFarlane skrev: >> > +++ Bill Chen (CHEN, Zhechuan) [Jul 15 13 17:16 ]: >> >> Have found a way to make this feature done. >> >> Just add "\n" at the last of the line >> > >> > This would violate the general rule that backslashes before letters in >> > markdown are just literal backslashes. >> > >> > I think that a better approach would be to provide a markdown >> > extension like the current 'hard_line_breaks': perhaps >> > 'ignore_line_breaks'. 'hard_line_breaks' causes line >> > breaks in a paragraph to be interpreted as hard breaks; >> > 'ignore_line_breaks' would cause them to be ignored entirely. >> > (One of these would have to be designated as taking precedence >> > if both were selected.) >> > >> > John >> > >> >> The attached perl script, when used as a filter on pandoc's >> json output, should enable Bill to get what he wants. I have >> used an earlier version on Tibetan text with satisfactory >> results. Someone who knows Haskell could probably write >> something shorter which interacts with pandoc in a more >> elegant way, but this script works. >> >> The description inside the file reads as follows: >> >> FILE: zapspace.pl >> >> USAGE: pandoc -w json some.markdown | zapspace.pl | pandoc -r >> json >> >> DESCRIPTION: Takes as input a document in pandoc's json format and >> removes all "Space" elements inside any list which also >> contains any {"Str":"..."} element, and outputs a >> modified json document, which when given as input to >> pandoc will produce output suitable for languages which >> don't put spaces between words or sentences, with no spaces >> inside paragraphs -- unless you insert non-breaking spaces, >> see below! --, and notably spaces caused by linebreaks >> in the markdown paragraph will be removed. >> >> Additionally it does two things which allow you to >> insert whitespace inside paragraph-like elements: >> >> 1) It replaces any non-breaking space (U+00A0) inside a >> "Str" element with ordinary soft spaces (U+0020) >> *if* the "Str" element also contains characters other >> than non-breaking spaces. >> >> This allows you to insert spaces into your markdown >> paragraphs as non-breaking spaces (in pandoc notation >> a backslash followed by an ordinary space "like\ this") >> and get ordinary spaces in your output. >> >> 2) Preserves any "Str" element which only contains one >> or more non-breaking spaces as is. >> >> This allows you to put non-breaking spaces between >> words by inserting ordinary whitespace -- which will >> be removed -- on either side of the non-breaking >> spaces "like \ this". >> ^ ^ >> >> N.B. that this is *not* done by scanning the JSON text >> with regular expressions! The JSON is loaded into a >> perl data structure which is modified and then converted >> back into JSON. Precautions are taken not to modify the >> structure such that the output will be rejected by >> pandoc, nor to modify code elements, but I can't guarantee >> that this will remain true with future versions of pandoc, >> or that it is true for any input. >> >> OPTIONS: --- >> REQUIREMENTS: * A reasonably recent version of perl. >> * The following CPAN modules: >> >> - [JSON::Any](https://metacpan.org/module/JSON::Any) >> + A JSON 'backend' module like JSON or JSON::XS. >> - [List::MoreUtils]( >> https://metacpan.org/module/List::MoreUtils) >> - [autovivification]( >> https://metacpan.org/module/autovivification) >> >> >> >> -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/35356bdb-9f45-4f0c-bb49-3fb4e2db98a0%40googlegroups.com > > . > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhDMPQveCFfsDYp1-CJKTTA6EMmWf_M_11edGF8uvEcHJg%40mail.gmail.com.