Den mån 13 apr. 2020 20:45J <lixichen-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev:

Could you help to update zapspace.pl to work with pandoc 2.9.2.1 ? I have Chinese markdown files that use spaces to separate groups of words, and would like to ignore spaces between Chinese characters before converting to Word.
Many thanks !

On Tuesday, July 16, 2013 at 11:34:32 PM UTC+8, BP Jonsson wrote:
2013-07-15 19:51, John MacFarlane skrev:
> +++ Bill Chen (CHEN, Zhechuan) [Jul 15 13 17:16 ]:
>> Have found a way to make this feature done.
>> Just add "\n" at the last of the line
>
> This would violate the general rule that backslashes before letters in
> markdown are just literal backslashes.
>
> I think that a better approach would be to provide a markdown
> extension like the current 'hard_line_breaks': perhaps
> 'ignore_line_breaks'. 'hard_line_breaks' causes line
> breaks in a paragraph to be interpreted as hard breaks;
> 'ignore_line_breaks' would cause them to be ignored entirely.
> (One of these would have to be designated as taking precedence
> if both were selected.)
>
> John
>

The attached perl script, when used as a filter on pandoc's
json output, should enable Bill to get what he wants. I have
used an earlier version on Tibetan text with satisfactory
results. Someone who knows Haskell could probably write
something shorter which interacts with pandoc in a more
elegant way, but this script works.

The description inside the file reads as follows:

FILE: zapspace.pl

USAGE: pandoc -w json some.markdown | zapspace.pl | pandoc -r json

DESCRIPTION: Takes as input a document in pandoc's json format and
removes all "Space" elements inside any list which also
contains any {"Str":"..."} element, and outputs a
modified json document, which when given as input to
pandoc will produce output suitable for languages which
don't put spaces between words or sentences, with no spaces
inside paragraphs -- unless you insert non-breaking spaces,
see below! --, and notably spaces caused by linebreaks
in the markdown paragraph will be removed.

Additionally it does two things which allow you to
insert whitespace inside paragraph-like elements:

1) It replaces any non-breaking space (U+00A0) inside a
"Str" element with ordinary soft spaces (U+0020)
*if* the "Str" element also contains characters other
than non-breaking spaces.

This allows you to insert spaces into your markdown
paragraphs as non-breaking spaces (in pandoc notation
a backslash followed by an ordinary space "like\ this")
and get ordinary spaces in your output.

2) Preserves any "Str" element which only contains one
or more non-breaking spaces as is.

This allows you to put non-breaking spaces between
words by inserting ordinary whitespace -- which will
be removed -- on either side of the non-breaking
spaces "like \ this".
^ ^

N.B. that this is *not* done by scanning the JSON text
with regular expressions! The JSON is loaded into a
perl data structure which is modified and then converted
back into JSON. Precautions are taken not to modify the
structure such that the output will be rejected by
pandoc, nor to modify code elements, but I can't guarantee
that this will remain true with future versions of pandoc,
or that it is true for any input.

OPTIONS: ---
REQUIREMENTS: * A reasonably recent version of perl.
* The following CPAN modules:

- [JSON::Any](https://metacpan.org/module/JSON::Any)
+ A JSON 'backend' module like JSON or JSON::XS.
- [List::MoreUtils](https://metacpan.org/module/List::MoreUtils)
- [autovivification](https://metacpan.org/module/autovivification)

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh4Ykp1iOSErHA@public.gmane.orgm.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/35356bdb-9f45-4f0c-bb49-3fb4e2db98a0%40googlegroups.com.