Wow that script is really ancient! I'll try to port it to a Lua filter tomorrow. It's 9 PM here now and I have been coding or writing for twelve hours, so I'm quite exhausted.

Just to be clear, the old script removes all spaces which are next to a "string" element, i.e. all "words", digits and punctuation alike, and not just CJK characters. If you are OK with that behavior porting it to a Lua filter will be trivial, and Lua is built-in in Pandoc. Otherwise I'll have to look into rewriting the Perl script, which may be not quite as trivial.

/BPJ

Den mån 13 apr. 2020 20:45J <lixichen-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev:
Could you help to update zapspace.pl to work with pandoc 2.9.2.1 ? I have Chinese markdown files that use spaces to separate groups of words, and would like to ignore spaces between Chinese characters before converting to Word.
Many thanks ! 

On Tuesday, July 16, 2013 at 11:34:32 PM UTC+8, BP Jonsson wrote:
2013-07-15 19:51, John MacFarlane skrev:
> +++ Bill Chen (CHEN, Zhechuan) [Jul 15 13 17:16 ]:
>>     Have found a way to make this feature done.
>>     Just add "\n" at the last of the line
>
> This would violate the general rule that backslashes before letters in
> markdown are just literal backslashes.
>
> I think that a better approach would be to provide a markdown
> extension like the current 'hard_line_breaks':  perhaps
> 'ignore_line_breaks'.  'hard_line_breaks' causes line
> breaks in a paragraph to be interpreted as hard breaks;
> 'ignore_line_breaks' would cause them to be ignored entirely.
> (One of these would have to be designated as taking precedence
> if both were selected.)
>
> John
>

The attached perl script, when used as a filter on pandoc's
json output, should enable Bill to get what he wants.  I have
used an earlier version on Tibetan text with satisfactory
results. Someone who knows Haskell could probably write
something shorter which interacts with pandoc in a more
elegant way, but this script works.

The description inside the file reads as follows:

        FILE: zapspace.pl

       USAGE: pandoc -w json some.markdown | zapspace.pl | pandoc -r json

 DESCRIPTION: Takes as input a document in pandoc's json format and
              removes all "Space" elements inside any list which also
              contains any {"Str":"..."} element, and outputs a
              modified json document, which when given as input to
              pandoc will produce output suitable for languages which
              don't put spaces between words or sentences, with no spaces
              inside paragraphs -- unless you insert non-breaking spaces,
              see below! --, and notably spaces caused by linebreaks
              in the markdown paragraph will be removed.

              Additionally it does two things which allow you to
              insert whitespace inside paragraph-like elements:

              1)  It replaces any non-breaking space (U+00A0) inside a
                  "Str" element with ordinary soft spaces (U+0020)
                  *if* the "Str" element also contains characters other
                  than non-breaking spaces.

                  This allows you to insert spaces into your markdown
                  paragraphs as non-breaking spaces (in pandoc notation
                  a backslash followed by an ordinary space "like\ this")
                  and get ordinary spaces in your output.

              2)  Preserves any "Str" element which only contains one
                  or more non-breaking spaces as is.

                  This allows you to put non-breaking spaces between
                  words by inserting ordinary whitespace -- which will
                  be removed -- on either side of the non-breaking
                  spaces "like \  this".
                              ^  ^

              N.B. that this is *not* done by scanning the JSON text
              with regular expressions!  The JSON is loaded into a
              perl data structure which is modified and then converted
              back into JSON. Precautions are taken not to modify the
              structure such that the output will be rejected by
              pandoc, nor to modify code elements, but I can't guarantee
              that this will remain true with future versions of pandoc,
              or that it is true for any input.

     OPTIONS: ---
REQUIREMENTS: *   A reasonably recent version of perl.
              *   The following CPAN modules:

                  -   [JSON::Any](https://metacpan.org/module/JSON::Any)
                      +   A JSON 'backend' module like JSON or JSON::XS.
                  -   [List::MoreUtils](https://metacpan.org/module/List::MoreUtils)
                  -   [autovivification](https://metacpan.org/module/autovivification)



--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh4Ykp1iOSErHA@public.gmane.orgm.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/35356bdb-9f45-4f0c-bb49-3fb4e2db98a0%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhDMPQveCFfsDYp1-CJKTTA6EMmWf_M_11edGF8uvEcHJg%40mail.gmail.com.