Wow that script is really ancient! I'll try to port it to a Lua filter
tomorrow. It's 9 PM here now and I have been coding or writing for twelve
hours, so I'm quite exhausted.

Just to be clear, the old script removes all spaces which are next to a
"string" element, i.e. all "words", digits and punctuation alike, and not
just CJK characters. If you are OK with that behavior porting it to a Lua
filter will be trivial, and Lua is built-in in Pandoc. Otherwise I'll have
to look into rewriting the Perl script, which may be not quite as trivial.

/BPJ

Den mån 13 apr. 2020 20:45J <lixichen-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev:

> Could you help to update zapspace.pl to work with pandoc 2.9.2.1 ? I have
> Chinese markdown files that use spaces to separate groups of words, and
> would like to ignore spaces between Chinese characters before converting to
> Word.
> Many thanks !
>
> On Tuesday, July 16, 2013 at 11:34:32 PM UTC+8, BP Jonsson wrote:
>>
>> 2013-07-15 19:51, John MacFarlane skrev:
>> > +++ Bill Chen (CHEN, Zhechuan) [Jul 15 13 17:16 ]:
>> >>     Have found a way to make this feature done.
>> >>     Just add "\n" at the last of the line
>> >
>> > This would violate the general rule that backslashes before letters in
>> > markdown are just literal backslashes.
>> >
>> > I think that a better approach would be to provide a markdown
>> > extension like the current 'hard_line_breaks':  perhaps
>> > 'ignore_line_breaks'.  'hard_line_breaks' causes line
>> > breaks in a paragraph to be interpreted as hard breaks;
>> > 'ignore_line_breaks' would cause them to be ignored entirely.
>> > (One of these would have to be designated as taking precedence
>> > if both were selected.)
>> >
>> > John
>> >
>>
>> The attached perl script, when used as a filter on pandoc's
>> json output, should enable Bill to get what he wants.  I have
>> used an earlier version on Tibetan text with satisfactory
>> results. Someone who knows Haskell could probably write
>> something shorter which interacts with pandoc in a more
>> elegant way, but this script works.
>>
>> The description inside the file reads as follows:
>>
>>         FILE: zapspace.pl
>>
>>        USAGE: pandoc -w json some.markdown | zapspace.pl | pandoc -r
>> json
>>
>>  DESCRIPTION: Takes as input a document in pandoc's json format and
>>               removes all "Space" elements inside any list which also
>>               contains any {"Str":"..."} element, and outputs a
>>               modified json document, which when given as input to
>>               pandoc will produce output suitable for languages which
>>               don't put spaces between words or sentences, with no spaces
>>               inside paragraphs -- unless you insert non-breaking spaces,
>>               see below! --, and notably spaces caused by linebreaks
>>               in the markdown paragraph will be removed.
>>
>>               Additionally it does two things which allow you to
>>               insert whitespace inside paragraph-like elements:
>>
>>               1)  It replaces any non-breaking space (U+00A0) inside a
>>                   "Str" element with ordinary soft spaces (U+0020)
>>                   *if* the "Str" element also contains characters other
>>                   than non-breaking spaces.
>>
>>                   This allows you to insert spaces into your markdown
>>                   paragraphs as non-breaking spaces (in pandoc notation
>>                   a backslash followed by an ordinary space "like\ this")
>>                   and get ordinary spaces in your output.
>>
>>               2)  Preserves any "Str" element which only contains one
>>                   or more non-breaking spaces as is.
>>
>>                   This allows you to put non-breaking spaces between
>>                   words by inserting ordinary whitespace -- which will
>>                   be removed -- on either side of the non-breaking
>>                   spaces "like \  this".
>>                               ^  ^
>>
>>               N.B. that this is *not* done by scanning the JSON text
>>               with regular expressions!  The JSON is loaded into a
>>               perl data structure which is modified and then converted
>>               back into JSON. Precautions are taken not to modify the
>>               structure such that the output will be rejected by
>>               pandoc, nor to modify code elements, but I can't guarantee
>>               that this will remain true with future versions of pandoc,
>>               or that it is true for any input.
>>
>>      OPTIONS: ---
>> REQUIREMENTS: *   A reasonably recent version of perl.
>>               *   The following CPAN modules:
>>
>>                   -   [JSON::Any](https://metacpan.org/module/JSON::Any)
>>                       +   A JSON 'backend' module like JSON or JSON::XS.
>>                   -   [List::MoreUtils](
>> https://metacpan.org/module/List::MoreUtils)
>>                   -   [autovivification](
>> https://metacpan.org/module/autovivification)
>>
>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/35356bdb-9f45-4f0c-bb49-3fb4e2db98a0%40googlegroups.com
> <https://groups.google.com/d/msgid/pandoc-discuss/35356bdb-9f45-4f0c-bb49-3fb4e2db98a0%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhDMPQveCFfsDYp1-CJKTTA6EMmWf_M_11edGF8uvEcHJg%40mail.gmail.com.