A New Feature for Pandoc's Markdown Extension -- No Space with Newline

public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed

* A New Feature for Pandoc's Markdown Extension -- No Space with Newline
@ 2013-07-15  3:33 Bill Chen (CHEN, Zhechuan)
       [not found] ` <CAFOcPC+oqesoOPbFkiyo_cjAQFW7qGj4oidMU5gn+BnfpWM2aw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Bill Chen (CHEN, Zhechuan) @ 2013-07-15  3:33 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 2196 bytes --]

First, I like the Pandoc's extension for markdown language **very very
much**. That is great and useful.
And it could be much better, isn't it?
So I have a new feature suggestion here. It may be useful for both Chinese
and Japanese Users.

# Background
As a Chinese, I need to type many Chinese documents.
I am using Git to organize my documents. So I wrap lines for every Chinese
sentence. Because that will be easy to show me the diffs for every commit,
even these sentences are in a long paragraph.
But there is NO space existed between Chinese sentences. Neither in
Japanese.
That means, there could be NO space in a Chinese paragraph except some
special using.

# Feature Require and Example
ReStructuredText, another lightweight markup language, use "\" at the end
of the line, to make the next line joined this line **without** any spaces.
eg.
RST code:
    before\
    After
What it will be showed like below:
    beforeAfter
There is no space between two words, unlike what is done without "\".
And this feature is very import for Chinese users. No more unnecessary
space exists and no more further actions need to execute to delete the
spaces between Chinese characters.

However, this char is defined in Pandoc's markdown.
> Extension: escaped_line_breaks
> A backslash followed by a newline is also a hard line break.

# The Question and Discussion
Last but not least, I haven't found any dialect of markdown has support
this feature, which may be a great feature for east Asian users.
So the most import question is which char should be used for this feature?

Best Regards,
Bill Chen

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFOcPC%2BoqesoOPbFkiyo_cjAQFW7qGj4oidMU5gn%2BBnfpWM2aw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

[-- Attachment #2: Type: text/html, Size: 2727 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: A New Feature for Pandoc's Markdown Extension -- No Space with Newline
       [not found] ` <CAFOcPC+oqesoOPbFkiyo_cjAQFW7qGj4oidMU5gn+BnfpWM2aw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-07-15  9:16   ` Bill Chen (CHEN, Zhechuan)
       [not found]     ` <CAFOcPCLzrV1dWji3BNjWopayQXLDebAzFpF0WMwfZ_i8x8d63w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Bill Chen (CHEN, Zhechuan) @ 2013-07-15  9:16 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 2508 bytes --]

Have found a way to make this feature done.
Just add "\n" at the last of the line

Best Regards,
Bill Chen


On Mon, Jul 15, 2013 at 11:33 AM, Bill Chen (CHEN, Zhechuan) <
chen.bill.bill-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> First, I like the Pandoc's extension for markdown language **very very
> much**. That is great and useful.
> And it could be much better, isn't it?
> So I have a new feature suggestion here. It may be useful for both Chinese
> and Japanese Users.
>
> # Background
> As a Chinese, I need to type many Chinese documents.
> I am using Git to organize my documents. So I wrap lines for every Chinese
> sentence. Because that will be easy to show me the diffs for every commit,
> even these sentences are in a long paragraph.
> But there is NO space existed between Chinese sentences. Neither in
> Japanese.
> That means, there could be NO space in a Chinese paragraph except some
> special using.
>
> # Feature Require and Example
> ReStructuredText, another lightweight markup language, use "\" at the end
> of the line, to make the next line joined this line **without** any spaces.
> eg.
> RST code:
>     before\
>     After
> What it will be showed like below:
>     beforeAfter
> There is no space between two words, unlike what is done without "\".
> And this feature is very import for Chinese users. No more unnecessary
> space exists and no more further actions need to execute to delete the
> spaces between Chinese characters.
>
> However, this char is defined in Pandoc's markdown.
> > Extension: escaped_line_breaks
> > A backslash followed by a newline is also a hard line break.
>
> # The Question and Discussion
> Last but not least, I haven't found any dialect of markdown has support
> this feature, which may be a great feature for east Asian users.
> So the most import question is which char should be used for this feature?
>
>
> Best Regards,
> Bill Chen
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFOcPCLzrV1dWji3BNjWopayQXLDebAzFpF0WMwfZ_i8x8d63w%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.



[-- Attachment #2: Type: text/html, Size: 3340 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: A New Feature for Pandoc's Markdown Extension -- No Space with Newline
       [not found]     ` <CAFOcPCLzrV1dWji3BNjWopayQXLDebAzFpF0WMwfZ_i8x8d63w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-07-15 17:51       ` John MacFarlane
       [not found]         ` <20130715175101.GA20541-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: John MacFarlane @ 2013-07-15 17:51 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ Bill Chen (CHEN, Zhechuan) [Jul 15 13 17:16 ]:
>    Have found a way to make this feature done.
>    Just add "\n" at the last of the line

This would violate the general rule that backslashes before letters in
markdown are just literal backslashes.

I think that a better approach would be to provide a markdown
extension like the current 'hard_line_breaks':  perhaps
'ignore_line_breaks'.  'hard_line_breaks' causes line
breaks in a paragraph to be interpreted as hard breaks;
'ignore_line_breaks' would cause them to be ignored entirely.
(One of these would have to be designated as taking precedence
if both were selected.)

John

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: A New Feature for Pandoc's Markdown Extension -- No Space with Newline
       [not found]         ` <20130715175101.GA20541-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
@ 2013-07-16 15:34           ` BP Jonsson
       [not found]             ` <51E56808.5000500-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2013-07-17 22:47           ` John MacFarlane
  1 sibling, 1 reply; 14+ messages in thread
From: BP Jonsson @ 2013-07-16 15:34 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 4409 bytes --]

2013-07-15 19:51, John MacFarlane skrev:
> +++ Bill Chen (CHEN, Zhechuan) [Jul 15 13 17:16 ]:
>>     Have found a way to make this feature done.
>>     Just add "\n" at the last of the line
> 
> This would violate the general rule that backslashes before letters in
> markdown are just literal backslashes.
> 
> I think that a better approach would be to provide a markdown
> extension like the current 'hard_line_breaks':  perhaps
> 'ignore_line_breaks'.  'hard_line_breaks' causes line
> breaks in a paragraph to be interpreted as hard breaks;
> 'ignore_line_breaks' would cause them to be ignored entirely.
> (One of these would have to be designated as taking precedence
> if both were selected.)
> 
> John
> 

The attached perl script, when used as a filter on pandoc's
json output, should enable Bill to get what he wants.  I have
used an earlier version on Tibetan text with satisfactory
results. Someone who knows Haskell could probably write
something shorter which interacts with pandoc in a more
elegant way, but this script works.

The description inside the file reads as follows:

        FILE: zapspace.pl

       USAGE: pandoc -w json some.markdown | zapspace.pl | pandoc -r json

 DESCRIPTION: Takes as input a document in pandoc's json format and
              removes all "Space" elements inside any list which also
              contains any {"Str":"..."} element, and outputs a
              modified json document, which when given as input to
              pandoc will produce output suitable for languages which
              don't put spaces between words or sentences, with no spaces
              inside paragraphs -- unless you insert non-breaking spaces,
              see below! --, and notably spaces caused by linebreaks
              in the markdown paragraph will be removed.

              Additionally it does two things which allow you to
              insert whitespace inside paragraph-like elements:

              1)  It replaces any non-breaking space (U+00A0) inside a
                  "Str" element with ordinary soft spaces (U+0020)
                  *if* the "Str" element also contains characters other
                  than non-breaking spaces.

                  This allows you to insert spaces into your markdown
                  paragraphs as non-breaking spaces (in pandoc notation
                  a backslash followed by an ordinary space "like\ this")
                  and get ordinary spaces in your output.

              2)  Preserves any "Str" element which only contains one
                  or more non-breaking spaces as is.

                  This allows you to put non-breaking spaces between
                  words by inserting ordinary whitespace -- which will
                  be removed -- on either side of the non-breaking
                  spaces "like \  this".
                              ^  ^

              N.B. that this is *not* done by scanning the JSON text
              with regular expressions!  The JSON is loaded into a
              perl data structure which is modified and then converted
              back into JSON. Precautions are taken not to modify the
              structure such that the output will be rejected by
              pandoc, nor to modify code elements, but I can't guarantee
              that this will remain true with future versions of pandoc,
              or that it is true for any input.

     OPTIONS: ---
REQUIREMENTS: *   A reasonably recent version of perl.
              *   The following CPAN modules:

                  -   [JSON::Any](https://metacpan.org/module/JSON::Any)
                      +   A JSON 'backend' module like JSON or JSON::XS.
                  -   [List::MoreUtils](https://metacpan.org/module/List::MoreUtils)
                  -   [autovivification](https://metacpan.org/module/autovivification)

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/51E56808.5000500%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

[-- Attachment #2: zapspace.pl --]
[-- Type: text/x-perl, Size: 5114 bytes --]

#!/usr/bin/perl 
#===============================================================================
#
#         FILE: zapspace.pl
#
#        USAGE: pandoc -w json some.markdown | zapspace.pl | pandoc -r json
#
#  DESCRIPTION: Takes as input a document in pandoc's json format and
#               removes all "Space" elements inside any list which also
#               contains any {"Str":"..."} element, and outputs a
#               modified json document, which when given as input to
#               pandoc will produce output suitable for languages which
#               don't put spaces between words or sentences, with no spaces
#               inside paragraphs -- unless you insert non-breaking spaces,
#               see below! --, and notably spaces caused by linebreaks
#               in the markdown paragraph will be removed.
#
#               Additionally it does two things which allow you to
#               insert whitespace inside paragraph-like elements:
#
#               1)  It replaces any non-breaking space (U+00A0) inside a
#                   "Str" element with ordinary soft spaces (U+0020)
#                   *if* the "Str" element also contains characters other 
#                   than non-breaking spaces.
#
#                   This allows you to insert spaces into your markdown
#                   paragraphs as non-breaking spaces (in pandoc notation
#                   a backslash followed by an ordinary space "like\ this") 
#                   and get ordinary spaces in your output.
#
#               2)  Preserves any "Str" element which only contains one
#                   or more non-breaking spaces as is.
#
#                   This allows you to put non-breaking spaces between
#                   words by inserting ordinary whitespace -- which will
#                   be removed -- on either side of the non-breaking
#                   spaces "like \  this".
#                               ^  ^
#
#               N.B. that this is *not* done by scanning the JSON text
#               with regular expressions!  The JSON is loaded into a
#               perl data structure which is modified and then converted
#               back into JSON. Precautions are taken not to modify the
#               structure such that the output will be rejected by
#               pandoc, nor to modify code elements, but I can't guarantee 
#               that this will remain true with future versions of pandoc,
#               or that it is true for any input.
#
#      OPTIONS: ---
# REQUIREMENTS: *   A reasonably recent version of perl.
#               *   The following CPAN modules:
#                   
#                   -   [JSON::Any](https://metacpan.org/module/JSON::Any)
#                       +   A JSON 'backend' module like JSON or JSON::XS.
#                   -   [List::MoreUtils](https://metacpan.org/module/List::MoreUtils)
#                   -   [autovivification](https://metacpan.org/module/autovivification)
#
#         BUGS: None known, but it does what it does rather
#               heavyhandedly, so check your output!
#        NOTES: ---
#       AUTHOR: Benct Philip Jonsson (bpjonsson-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org), 
# ORGANIZATION: None.
#      VERSION: 2.0
#      CREATED: 2013-07-16 14:23:13
#     REVISION: ---
#    LICENSE AND COPYRIGHT: Copyright 2013 Benct Philip Jonsson.
#
#    This program is free software; you can redistribute it and/or modify it
#    under the terms of either: the GNU General Public License as published
#    by the Free Software Foundation; or the Artistic License.
#
#    See http://dev.perl.org/licenses/ for more information.
#===============================================================================

use strict;
use warnings;
use utf8;

use open qw/:utf8 :std/;

use JSON::Any;
use List::MoreUtils qw/any/;

use constant {
    AREF => ref([]),
    HREF => ref({}),
};

sub is_aref { AREF eq ref shift }
sub is_href { HREF eq ref shift }

sub has_key {
    no autovivification;
    my($key, $datum) = @_;
    return unless HREF eq ref $datum;
    return exists $datum->{$key};
}

my $do_stuff = sub {
    no autovivification;
    my $datum = shift;
    return 1 if any { has_key($_, $datum) } qw(Code CodeBlock);
    if ( has_key( Str => $datum ) ) {
        return 1 if $datum->{Str} =~ /\A\x{a0}+\z/;
        $datum->{Str} =~ tr[\x{a0}][ ];
    }
    elsif ( is_aref($datum) ) {
        @$datum = grep { ref $_ or !/\ASpace\z/ } @$datum
        if any { has_key( Str => $_ ) } @$datum;
    }
    return;
};

my $json = JSON::Any->new( utf8 => 1 );

my $input = do { local $/; <> };

my $data = $json->decode($input);

loop( $do_stuff, $data );

print $json->encode($data);

sub loop {
    no autovivification;
    my($callback, @data) = @_;
    while ( @data ) {
        my $datum = shift @data;
        next if $callback->($datum);
        if ( is_href($datum) ) {
            push @data, grep { is_href($_) or is_aref($_) } values %$datum;
        }
        elsif ( is_aref($datum) ) {
            push @data, grep { is_href($_) or is_aref($_) } @$datum;
        }
    }
}

__END__

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: A New Feature for Pandoc's Markdown Extension -- No Space with Newline
       [not found]         ` <20130715175101.GA20541-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
  2013-07-16 15:34           ` BP Jonsson
@ 2013-07-17 22:47           ` John MacFarlane
       [not found]             ` <20130717224659.GA23839-9Rnp8PDaXcZ2EAH53EmH34tHsfhOvSUSZkel5v8DVj8@public.gmane.org>
  1 sibling, 1 reply; 14+ messages in thread
From: John MacFarlane @ 2013-07-17 22:47 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

+++ John MacFarlane [Jul 15 13 10:51 ]:
> +++ Bill Chen (CHEN, Zhechuan) [Jul 15 13 17:16 ]:
> >    Have found a way to make this feature done.
> >    Just add "\n" at the last of the line
> 
> This would violate the general rule that backslashes before letters in
> markdown are just literal backslashes.
> 
> I think that a better approach would be to provide a markdown
> extension like the current 'hard_line_breaks':  perhaps
> 'ignore_line_breaks'.  'hard_line_breaks' causes line
> breaks in a paragraph to be interpreted as hard breaks;
> 'ignore_line_breaks' would cause them to be ignored entirely.
> (One of these would have to be designated as taking precedence
> if both were selected.)

I've implemented the ignore_line_breaks markdown variant.
So, for your purposes you'll be able to do

pandoc -fmarkdown+ignore_line_breaks


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: A New Feature for Pandoc's Markdown Extension -- No Space with Newline
       [not found]             ` <20130717224659.GA23839-9Rnp8PDaXcZ2EAH53EmH34tHsfhOvSUSZkel5v8DVj8@public.gmane.org>
@ 2013-07-18  4:38               ` Bill Chen (CHEN, Zhechuan)
  0 siblings, 0 replies; 14+ messages in thread
From: Bill Chen (CHEN, Zhechuan) @ 2013-07-18  4:38 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 1332 bytes --]

Hi John,

Thank you very much.
Please forgive my late reply, I was writing documents days before.

I never thought this feature would be added so quickly.

On Thu, Jul 18, 2013 at 6:47 AM, John MacFarlane <fiddlosopher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>wrote:

> I've implemented the ignore_line_breaks markdown variant.
> So, for your purposes you'll be able to do
>
> pandoc -fmarkdown+ignore_line_breaks
>

Thank you very much again. It works fine now.

Maybe there is something could be better as you said in your first reply
> (One of these would have to be designated as taking precedence if both
were selected.)
Maybe some tips to users, if both options were selected?

Thank you very much again.



Best Regards,
Bill Chen

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAFOcPCJFTKnrd_aL8NLLEeAHhjg-neCj_nnYtwRg4v0gwP%3DTHg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.



[-- Attachment #2: Type: text/html, Size: 2076 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: A New Feature for Pandoc's Markdown Extension -- No Space with Newline
       [not found]             ` <51E56808.5000500-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2020-04-13 18:44               ` J
       [not found]                 ` <35356bdb-9f45-4f0c-bb49-3fb4e2db98a0-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: J @ 2020-04-13 18:44 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 4806 bytes --]

Could you help to update zapspace.pl to work with pandoc 2.9.2.1 ? I have 
Chinese markdown files that use spaces to separate groups of words, and 
would like to ignore spaces between Chinese characters before converting to 
Word.
Many thanks ! 

On Tuesday, July 16, 2013 at 11:34:32 PM UTC+8, BP Jonsson wrote:
>
> 2013-07-15 19:51, John MacFarlane skrev: 
> > +++ Bill Chen (CHEN, Zhechuan) [Jul 15 13 17:16 ]: 
> >>     Have found a way to make this feature done. 
> >>     Just add "\n" at the last of the line 
> > 
> > This would violate the general rule that backslashes before letters in 
> > markdown are just literal backslashes. 
> > 
> > I think that a better approach would be to provide a markdown 
> > extension like the current 'hard_line_breaks':  perhaps 
> > 'ignore_line_breaks'.  'hard_line_breaks' causes line 
> > breaks in a paragraph to be interpreted as hard breaks; 
> > 'ignore_line_breaks' would cause them to be ignored entirely. 
> > (One of these would have to be designated as taking precedence 
> > if both were selected.) 
> > 
> > John 
> > 
>
> The attached perl script, when used as a filter on pandoc's 
> json output, should enable Bill to get what he wants.  I have 
> used an earlier version on Tibetan text with satisfactory 
> results. Someone who knows Haskell could probably write 
> something shorter which interacts with pandoc in a more 
> elegant way, but this script works. 
>
> The description inside the file reads as follows: 
>
>         FILE: zapspace.pl 
>
>        USAGE: pandoc -w json some.markdown | zapspace.pl | pandoc -r json 
>
>  DESCRIPTION: Takes as input a document in pandoc's json format and 
>               removes all "Space" elements inside any list which also 
>               contains any {"Str":"..."} element, and outputs a 
>               modified json document, which when given as input to 
>               pandoc will produce output suitable for languages which 
>               don't put spaces between words or sentences, with no spaces 
>               inside paragraphs -- unless you insert non-breaking spaces, 
>               see below! --, and notably spaces caused by linebreaks 
>               in the markdown paragraph will be removed. 
>
>               Additionally it does two things which allow you to 
>               insert whitespace inside paragraph-like elements: 
>
>               1)  It replaces any non-breaking space (U+00A0) inside a 
>                   "Str" element with ordinary soft spaces (U+0020) 
>                   *if* the "Str" element also contains characters other 
>                   than non-breaking spaces. 
>
>                   This allows you to insert spaces into your markdown 
>                   paragraphs as non-breaking spaces (in pandoc notation 
>                   a backslash followed by an ordinary space "like\ this") 
>                   and get ordinary spaces in your output. 
>
>               2)  Preserves any "Str" element which only contains one 
>                   or more non-breaking spaces as is. 
>
>                   This allows you to put non-breaking spaces between 
>                   words by inserting ordinary whitespace -- which will 
>                   be removed -- on either side of the non-breaking 
>                   spaces "like \  this". 
>                               ^  ^ 
>
>               N.B. that this is *not* done by scanning the JSON text 
>               with regular expressions!  The JSON is loaded into a 
>               perl data structure which is modified and then converted 
>               back into JSON. Precautions are taken not to modify the 
>               structure such that the output will be rejected by 
>               pandoc, nor to modify code elements, but I can't guarantee 
>               that this will remain true with future versions of pandoc, 
>               or that it is true for any input. 
>
>      OPTIONS: --- 
> REQUIREMENTS: *   A reasonably recent version of perl. 
>               *   The following CPAN modules: 
>
>                   -   [JSON::Any](https://metacpan.org/module/JSON::Any) 
>                       +   A JSON 'backend' module like JSON or JSON::XS. 
>                   -   [List::MoreUtils](
> https://metacpan.org/module/List::MoreUtils) 
>                   -   [autovivification](
> https://metacpan.org/module/autovivification) 
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/35356bdb-9f45-4f0c-bb49-3fb4e2db98a0%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 8290 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: A New Feature for Pandoc's Markdown Extension -- No Space with Newline
       [not found]                 ` <35356bdb-9f45-4f0c-bb49-3fb4e2db98a0-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-04-13 19:16                   ` BPJ
       [not found]                     ` <CADAJKhDMPQveCFfsDYp1-CJKTTA6EMmWf_M_11edGF8uvEcHJg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: BPJ @ 2020-04-13 19:16 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1: Type: text/plain, Size: 6186 bytes --]

Wow that script is really ancient! I'll try to port it to a Lua filter
tomorrow. It's 9 PM here now and I have been coding or writing for twelve
hours, so I'm quite exhausted.

Just to be clear, the old script removes all spaces which are next to a
"string" element, i.e. all "words", digits and punctuation alike, and not
just CJK characters. If you are OK with that behavior porting it to a Lua
filter will be trivial, and Lua is built-in in Pandoc. Otherwise I'll have
to look into rewriting the Perl script, which may be not quite as trivial.

/BPJ

Den mån 13 apr. 2020 20:45J <lixichen-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev:

> Could you help to update zapspace.pl to work with pandoc 2.9.2.1 ? I have
> Chinese markdown files that use spaces to separate groups of words, and
> would like to ignore spaces between Chinese characters before converting to
> Word.
> Many thanks !
>
> On Tuesday, July 16, 2013 at 11:34:32 PM UTC+8, BP Jonsson wrote:
>>
>> 2013-07-15 19:51, John MacFarlane skrev:
>> > +++ Bill Chen (CHEN, Zhechuan) [Jul 15 13 17:16 ]:
>> >>     Have found a way to make this feature done.
>> >>     Just add "\n" at the last of the line
>> >
>> > This would violate the general rule that backslashes before letters in
>> > markdown are just literal backslashes.
>> >
>> > I think that a better approach would be to provide a markdown
>> > extension like the current 'hard_line_breaks':  perhaps
>> > 'ignore_line_breaks'.  'hard_line_breaks' causes line
>> > breaks in a paragraph to be interpreted as hard breaks;
>> > 'ignore_line_breaks' would cause them to be ignored entirely.
>> > (One of these would have to be designated as taking precedence
>> > if both were selected.)
>> >
>> > John
>> >
>>
>> The attached perl script, when used as a filter on pandoc's
>> json output, should enable Bill to get what he wants.  I have
>> used an earlier version on Tibetan text with satisfactory
>> results. Someone who knows Haskell could probably write
>> something shorter which interacts with pandoc in a more
>> elegant way, but this script works.
>>
>> The description inside the file reads as follows:
>>
>>         FILE: zapspace.pl
>>
>>        USAGE: pandoc -w json some.markdown | zapspace.pl | pandoc -r
>> json
>>
>>  DESCRIPTION: Takes as input a document in pandoc's json format and
>>               removes all "Space" elements inside any list which also
>>               contains any {"Str":"..."} element, and outputs a
>>               modified json document, which when given as input to
>>               pandoc will produce output suitable for languages which
>>               don't put spaces between words or sentences, with no spaces
>>               inside paragraphs -- unless you insert non-breaking spaces,
>>               see below! --, and notably spaces caused by linebreaks
>>               in the markdown paragraph will be removed.
>>
>>               Additionally it does two things which allow you to
>>               insert whitespace inside paragraph-like elements:
>>
>>               1)  It replaces any non-breaking space (U+00A0) inside a
>>                   "Str" element with ordinary soft spaces (U+0020)
>>                   *if* the "Str" element also contains characters other
>>                   than non-breaking spaces.
>>
>>                   This allows you to insert spaces into your markdown
>>                   paragraphs as non-breaking spaces (in pandoc notation
>>                   a backslash followed by an ordinary space "like\ this")
>>                   and get ordinary spaces in your output.
>>
>>               2)  Preserves any "Str" element which only contains one
>>                   or more non-breaking spaces as is.
>>
>>                   This allows you to put non-breaking spaces between
>>                   words by inserting ordinary whitespace -- which will
>>                   be removed -- on either side of the non-breaking
>>                   spaces "like \  this".
>>                               ^  ^
>>
>>               N.B. that this is *not* done by scanning the JSON text
>>               with regular expressions!  The JSON is loaded into a
>>               perl data structure which is modified and then converted
>>               back into JSON. Precautions are taken not to modify the
>>               structure such that the output will be rejected by
>>               pandoc, nor to modify code elements, but I can't guarantee
>>               that this will remain true with future versions of pandoc,
>>               or that it is true for any input.
>>
>>      OPTIONS: ---
>> REQUIREMENTS: *   A reasonably recent version of perl.
>>               *   The following CPAN modules:
>>
>>                   -   [JSON::Any](https://metacpan.org/module/JSON::Any)
>>                       +   A JSON 'backend' module like JSON or JSON::XS.
>>                   -   [List::MoreUtils](
>> https://metacpan.org/module/List::MoreUtils)
>>                   -   [autovivification](
>> https://metacpan.org/module/autovivification)
>>
>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/35356bdb-9f45-4f0c-bb49-3fb4e2db98a0%40googlegroups.com
> <https://groups.google.com/d/msgid/pandoc-discuss/35356bdb-9f45-4f0c-bb49-3fb4e2db98a0%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhDMPQveCFfsDYp1-CJKTTA6EMmWf_M_11edGF8uvEcHJg%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 8337 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: A New Feature for Pandoc's Markdown Extension -- No Space with Newline
       [not found]                     ` <CADAJKhDMPQveCFfsDYp1-CJKTTA6EMmWf_M_11edGF8uvEcHJg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2020-04-14  0:39                       ` J
       [not found]                         ` <1beb6ec0-19a5-4da7-b785-ebb7d340c865-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: J @ 2020-04-14  0:39 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 6695 bytes --]

Thank you for your efforts very much ! I wonder if the script can keep the 
spaces inside English words, digits, and punctuation, since my files also 
contain short groups of English words and number with digits ?

On Tuesday, April 14, 2020 at 3:16:40 AM UTC+8, BP wrote:
>
> Wow that script is really ancient! I'll try to port it to a Lua filter 
> tomorrow. It's 9 PM here now and I have been coding or writing for twelve 
> hours, so I'm quite exhausted.
>
> Just to be clear, the old script removes all spaces which are next to a 
> "string" element, i.e. all "words", digits and punctuation alike, and not 
> just CJK characters. If you are OK with that behavior porting it to a Lua 
> filter will be trivial, and Lua is built-in in Pandoc. Otherwise I'll have 
> to look into rewriting the Perl script, which may be not quite as trivial.
>
> /BPJ
>
> Den mån 13 apr. 2020 20:45J <lixi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <javascript:>> skrev:
>
>> Could you help to update zapspace.pl to work with pandoc 2.9.2.1 ? I 
>> have Chinese markdown files that use spaces to separate groups of words, 
>> and would like to ignore spaces between Chinese characters before 
>> converting to Word.
>> Many thanks ! 
>>
>> On Tuesday, July 16, 2013 at 11:34:32 PM UTC+8, BP Jonsson wrote:
>>>
>>> 2013-07-15 19:51, John MacFarlane skrev: 
>>> > +++ Bill Chen (CHEN, Zhechuan) [Jul 15 13 17:16 ]: 
>>> >>     Have found a way to make this feature done. 
>>> >>     Just add "\n" at the last of the line 
>>> > 
>>> > This would violate the general rule that backslashes before letters in 
>>> > markdown are just literal backslashes. 
>>> > 
>>> > I think that a better approach would be to provide a markdown 
>>> > extension like the current 'hard_line_breaks':  perhaps 
>>> > 'ignore_line_breaks'.  'hard_line_breaks' causes line 
>>> > breaks in a paragraph to be interpreted as hard breaks; 
>>> > 'ignore_line_breaks' would cause them to be ignored entirely. 
>>> > (One of these would have to be designated as taking precedence 
>>> > if both were selected.) 
>>> > 
>>> > John 
>>> > 
>>>
>>> The attached perl script, when used as a filter on pandoc's 
>>> json output, should enable Bill to get what he wants.  I have 
>>> used an earlier version on Tibetan text with satisfactory 
>>> results. Someone who knows Haskell could probably write 
>>> something shorter which interacts with pandoc in a more 
>>> elegant way, but this script works. 
>>>
>>> The description inside the file reads as follows: 
>>>
>>>         FILE: zapspace.pl 
>>>
>>>        USAGE: pandoc -w json some.markdown | zapspace.pl | pandoc -r 
>>> json 
>>>
>>>  DESCRIPTION: Takes as input a document in pandoc's json format and 
>>>               removes all "Space" elements inside any list which also 
>>>               contains any {"Str":"..."} element, and outputs a 
>>>               modified json document, which when given as input to 
>>>               pandoc will produce output suitable for languages which 
>>>               don't put spaces between words or sentences, with no 
>>> spaces 
>>>               inside paragraphs -- unless you insert non-breaking 
>>> spaces, 
>>>               see below! --, and notably spaces caused by linebreaks 
>>>               in the markdown paragraph will be removed. 
>>>
>>>               Additionally it does two things which allow you to 
>>>               insert whitespace inside paragraph-like elements: 
>>>
>>>               1)  It replaces any non-breaking space (U+00A0) inside a 
>>>                   "Str" element with ordinary soft spaces (U+0020) 
>>>                   *if* the "Str" element also contains characters other 
>>>                   than non-breaking spaces. 
>>>
>>>                   This allows you to insert spaces into your markdown 
>>>                   paragraphs as non-breaking spaces (in pandoc notation 
>>>                   a backslash followed by an ordinary space "like\ 
>>> this") 
>>>                   and get ordinary spaces in your output. 
>>>
>>>               2)  Preserves any "Str" element which only contains one 
>>>                   or more non-breaking spaces as is. 
>>>
>>>                   This allows you to put non-breaking spaces between 
>>>                   words by inserting ordinary whitespace -- which will 
>>>                   be removed -- on either side of the non-breaking 
>>>                   spaces "like \  this". 
>>>                               ^  ^ 
>>>
>>>               N.B. that this is *not* done by scanning the JSON text 
>>>               with regular expressions!  The JSON is loaded into a 
>>>               perl data structure which is modified and then converted 
>>>               back into JSON. Precautions are taken not to modify the 
>>>               structure such that the output will be rejected by 
>>>               pandoc, nor to modify code elements, but I can't guarantee 
>>>               that this will remain true with future versions of pandoc, 
>>>               or that it is true for any input. 
>>>
>>>      OPTIONS: --- 
>>> REQUIREMENTS: *   A reasonably recent version of perl. 
>>>               *   The following CPAN modules: 
>>>
>>>                   -   [JSON::Any](https://metacpan.org/module/JSON::Any) 
>>>
>>>                       +   A JSON 'backend' module like JSON or JSON::XS. 
>>>                   -   [List::MoreUtils](
>>> https://metacpan.org/module/List::MoreUtils) 
>>>                   -   [autovivification](
>>> https://metacpan.org/module/autovivification) 
>>>
>>>
>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/pandoc-discuss/35356bdb-9f45-4f0c-bb49-3fb4e2db98a0%40googlegroups.com 
>> <https://groups.google.com/d/msgid/pandoc-discuss/35356bdb-9f45-4f0c-bb49-3fb4e2db98a0%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/1beb6ec0-19a5-4da7-b785-ebb7d340c865%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 11505 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: A New Feature for Pandoc's Markdown Extension -- No Space with Newline
       [not found]                         ` <1beb6ec0-19a5-4da7-b785-ebb7d340c865-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-04-14  5:17                           ` BPJ
       [not found]                             ` <CADAJKhDkCQ-GsQ7-G2_U_SZSx-1zheZAdQizRn-Cjb0jaC92Pw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: BPJ @ 2020-04-14  5:17 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1: Type: text/plain, Size: 7656 bytes --]

A Perl filter which removes Space and SoftBreak elements sandwiched between
two Str elements which respectively ends and starts with a character with
Unicode script property CJK is certainly doable. Will that be OK?

/BPJ


Den tis 14 apr. 2020 02:39J <lixichen-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev:

> Thank you for your efforts very much ! I wonder if the script can keep the
> spaces inside English words, digits, and punctuation, since my files also
> contain short groups of English words and number with digits ?
>
> On Tuesday, April 14, 2020 at 3:16:40 AM UTC+8, BP wrote:
>>
>> Wow that script is really ancient! I'll try to port it to a Lua filter
>> tomorrow. It's 9 PM here now and I have been coding or writing for twelve
>> hours, so I'm quite exhausted.
>>
>> Just to be clear, the old script removes all spaces which are next to a
>> "string" element, i.e. all "words", digits and punctuation alike, and not
>> just CJK characters. If you are OK with that behavior porting it to a Lua
>> filter will be trivial, and Lua is built-in in Pandoc. Otherwise I'll have
>> to look into rewriting the Perl script, which may be not quite as trivial.
>>
>> /BPJ
>>
>> Den mån 13 apr. 2020 20:45J <lixi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev:
>>
>>> Could you help to update zapspace.pl to work with pandoc 2.9.2.1 ? I
>>> have Chinese markdown files that use spaces to separate groups of words,
>>> and would like to ignore spaces between Chinese characters before
>>> converting to Word.
>>> Many thanks !
>>>
>>> On Tuesday, July 16, 2013 at 11:34:32 PM UTC+8, BP Jonsson wrote:
>>>>
>>>> 2013-07-15 19:51, John MacFarlane skrev:
>>>> > +++ Bill Chen (CHEN, Zhechuan) [Jul 15 13 17:16 ]:
>>>> >>     Have found a way to make this feature done.
>>>> >>     Just add "\n" at the last of the line
>>>> >
>>>> > This would violate the general rule that backslashes before letters
>>>> in
>>>> > markdown are just literal backslashes.
>>>> >
>>>> > I think that a better approach would be to provide a markdown
>>>> > extension like the current 'hard_line_breaks':  perhaps
>>>> > 'ignore_line_breaks'.  'hard_line_breaks' causes line
>>>> > breaks in a paragraph to be interpreted as hard breaks;
>>>> > 'ignore_line_breaks' would cause them to be ignored entirely.
>>>> > (One of these would have to be designated as taking precedence
>>>> > if both were selected.)
>>>> >
>>>> > John
>>>> >
>>>>
>>>> The attached perl script, when used as a filter on pandoc's
>>>> json output, should enable Bill to get what he wants.  I have
>>>> used an earlier version on Tibetan text with satisfactory
>>>> results. Someone who knows Haskell could probably write
>>>> something shorter which interacts with pandoc in a more
>>>> elegant way, but this script works.
>>>>
>>>> The description inside the file reads as follows:
>>>>
>>>>         FILE: zapspace.pl
>>>>
>>>>        USAGE: pandoc -w json some.markdown | zapspace.pl | pandoc -r
>>>> json
>>>>
>>>>  DESCRIPTION: Takes as input a document in pandoc's json format and
>>>>               removes all "Space" elements inside any list which also
>>>>               contains any {"Str":"..."} element, and outputs a
>>>>               modified json document, which when given as input to
>>>>               pandoc will produce output suitable for languages which
>>>>               don't put spaces between words or sentences, with no
>>>> spaces
>>>>               inside paragraphs -- unless you insert non-breaking
>>>> spaces,
>>>>               see below! --, and notably spaces caused by linebreaks
>>>>               in the markdown paragraph will be removed.
>>>>
>>>>               Additionally it does two things which allow you to
>>>>               insert whitespace inside paragraph-like elements:
>>>>
>>>>               1)  It replaces any non-breaking space (U+00A0) inside a
>>>>                   "Str" element with ordinary soft spaces (U+0020)
>>>>                   *if* the "Str" element also contains characters other
>>>>                   than non-breaking spaces.
>>>>
>>>>                   This allows you to insert spaces into your markdown
>>>>                   paragraphs as non-breaking spaces (in pandoc notation
>>>>                   a backslash followed by an ordinary space "like\
>>>> this")
>>>>                   and get ordinary spaces in your output.
>>>>
>>>>               2)  Preserves any "Str" element which only contains one
>>>>                   or more non-breaking spaces as is.
>>>>
>>>>                   This allows you to put non-breaking spaces between
>>>>                   words by inserting ordinary whitespace -- which will
>>>>                   be removed -- on either side of the non-breaking
>>>>                   spaces "like \  this".
>>>>                               ^  ^
>>>>
>>>>               N.B. that this is *not* done by scanning the JSON text
>>>>               with regular expressions!  The JSON is loaded into a
>>>>               perl data structure which is modified and then converted
>>>>               back into JSON. Precautions are taken not to modify the
>>>>               structure such that the output will be rejected by
>>>>               pandoc, nor to modify code elements, but I can't
>>>> guarantee
>>>>               that this will remain true with future versions of
>>>> pandoc,
>>>>               or that it is true for any input.
>>>>
>>>>      OPTIONS: ---
>>>> REQUIREMENTS: *   A reasonably recent version of perl.
>>>>               *   The following CPAN modules:
>>>>
>>>>                   -   [JSON::Any](https://metacpan.org/module/JSON::Any)
>>>>
>>>>                       +   A JSON 'backend' module like JSON or
>>>> JSON::XS.
>>>>                   -   [List::MoreUtils](
>>>> https://metacpan.org/module/List::MoreUtils)
>>>>                   -   [autovivification](
>>>> https://metacpan.org/module/autovivification)
>>>>
>>>>
>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "pandoc-discuss" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to pandoc-...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/pandoc-discuss/35356bdb-9f45-4f0c-bb49-3fb4e2db98a0%40googlegroups.com
>>> <https://groups.google.com/d/msgid/pandoc-discuss/35356bdb-9f45-4f0c-bb49-3fb4e2db98a0%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/1beb6ec0-19a5-4da7-b785-ebb7d340c865%40googlegroups.com
> <https://groups.google.com/d/msgid/pandoc-discuss/1beb6ec0-19a5-4da7-b785-ebb7d340c865%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhDkCQ-GsQ7-G2_U_SZSx-1zheZAdQizRn-Cjb0jaC92Pw%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 10015 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: A New Feature for Pandoc's Markdown Extension -- No Space with Newline
       [not found]                             ` <CADAJKhDkCQ-GsQ7-G2_U_SZSx-1zheZAdQizRn-Cjb0jaC92Pw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2020-04-14 14:13                               ` J
       [not found]                                 ` <b3c84390-28d9-4962-909a-43eceab09108-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: J @ 2020-04-14 14:13 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 8046 bytes --]

Thank sounds perfect ! Many thanks for your efforts ! 

On Tuesday, April 14, 2020 at 1:18:17 PM UTC+8, BP wrote:
>
> A Perl filter which removes Space and SoftBreak elements sandwiched 
> between two Str elements which respectively ends and starts with a 
> character with Unicode script property CJK is certainly doable. Will that 
> be OK?
>
> /BPJ
>
>
> Den tis 14 apr. 2020 02:39J <lixi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <javascript:>> skrev:
>
>> Thank you for your efforts very much ! I wonder if the script can keep 
>> the spaces inside English words, digits, and punctuation, since my files 
>> also contain short groups of English words and number with digits ?
>>
>> On Tuesday, April 14, 2020 at 3:16:40 AM UTC+8, BP wrote:
>>>
>>> Wow that script is really ancient! I'll try to port it to a Lua filter 
>>> tomorrow. It's 9 PM here now and I have been coding or writing for twelve 
>>> hours, so I'm quite exhausted.
>>>
>>> Just to be clear, the old script removes all spaces which are next to a 
>>> "string" element, i.e. all "words", digits and punctuation alike, and not 
>>> just CJK characters. If you are OK with that behavior porting it to a Lua 
>>> filter will be trivial, and Lua is built-in in Pandoc. Otherwise I'll have 
>>> to look into rewriting the Perl script, which may be not quite as trivial.
>>>
>>> /BPJ
>>>
>>> Den mån 13 apr. 2020 20:45J <lixi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev:
>>>
>>>> Could you help to update zapspace.pl to work with pandoc 2.9.2.1 ? I 
>>>> have Chinese markdown files that use spaces to separate groups of words, 
>>>> and would like to ignore spaces between Chinese characters before 
>>>> converting to Word.
>>>> Many thanks ! 
>>>>
>>>> On Tuesday, July 16, 2013 at 11:34:32 PM UTC+8, BP Jonsson wrote:
>>>>>
>>>>> 2013-07-15 19:51, John MacFarlane skrev: 
>>>>> > +++ Bill Chen (CHEN, Zhechuan) [Jul 15 13 17:16 ]: 
>>>>> >>     Have found a way to make this feature done. 
>>>>> >>     Just add "\n" at the last of the line 
>>>>> > 
>>>>> > This would violate the general rule that backslashes before letters 
>>>>> in 
>>>>> > markdown are just literal backslashes. 
>>>>> > 
>>>>> > I think that a better approach would be to provide a markdown 
>>>>> > extension like the current 'hard_line_breaks':  perhaps 
>>>>> > 'ignore_line_breaks'.  'hard_line_breaks' causes line 
>>>>> > breaks in a paragraph to be interpreted as hard breaks; 
>>>>> > 'ignore_line_breaks' would cause them to be ignored entirely. 
>>>>> > (One of these would have to be designated as taking precedence 
>>>>> > if both were selected.) 
>>>>> > 
>>>>> > John 
>>>>> > 
>>>>>
>>>>> The attached perl script, when used as a filter on pandoc's 
>>>>> json output, should enable Bill to get what he wants.  I have 
>>>>> used an earlier version on Tibetan text with satisfactory 
>>>>> results. Someone who knows Haskell could probably write 
>>>>> something shorter which interacts with pandoc in a more 
>>>>> elegant way, but this script works. 
>>>>>
>>>>> The description inside the file reads as follows: 
>>>>>
>>>>>         FILE: zapspace.pl 
>>>>>
>>>>>        USAGE: pandoc -w json some.markdown | zapspace.pl | pandoc -r 
>>>>> json 
>>>>>
>>>>>  DESCRIPTION: Takes as input a document in pandoc's json format and 
>>>>>               removes all "Space" elements inside any list which also 
>>>>>               contains any {"Str":"..."} element, and outputs a 
>>>>>               modified json document, which when given as input to 
>>>>>               pandoc will produce output suitable for languages which 
>>>>>               don't put spaces between words or sentences, with no 
>>>>> spaces 
>>>>>               inside paragraphs -- unless you insert non-breaking 
>>>>> spaces, 
>>>>>               see below! --, and notably spaces caused by linebreaks 
>>>>>               in the markdown paragraph will be removed. 
>>>>>
>>>>>               Additionally it does two things which allow you to 
>>>>>               insert whitespace inside paragraph-like elements: 
>>>>>
>>>>>               1)  It replaces any non-breaking space (U+00A0) inside a 
>>>>>                   "Str" element with ordinary soft spaces (U+0020) 
>>>>>                   *if* the "Str" element also contains characters 
>>>>> other 
>>>>>                   than non-breaking spaces. 
>>>>>
>>>>>                   This allows you to insert spaces into your markdown 
>>>>>                   paragraphs as non-breaking spaces (in pandoc 
>>>>> notation 
>>>>>                   a backslash followed by an ordinary space "like\ 
>>>>> this") 
>>>>>                   and get ordinary spaces in your output. 
>>>>>
>>>>>               2)  Preserves any "Str" element which only contains one 
>>>>>                   or more non-breaking spaces as is. 
>>>>>
>>>>>                   This allows you to put non-breaking spaces between 
>>>>>                   words by inserting ordinary whitespace -- which will 
>>>>>                   be removed -- on either side of the non-breaking 
>>>>>                   spaces "like \  this". 
>>>>>                               ^  ^ 
>>>>>
>>>>>               N.B. that this is *not* done by scanning the JSON text 
>>>>>               with regular expressions!  The JSON is loaded into a 
>>>>>               perl data structure which is modified and then converted 
>>>>>               back into JSON. Precautions are taken not to modify the 
>>>>>               structure such that the output will be rejected by 
>>>>>               pandoc, nor to modify code elements, but I can't 
>>>>> guarantee 
>>>>>               that this will remain true with future versions of 
>>>>> pandoc, 
>>>>>               or that it is true for any input. 
>>>>>
>>>>>      OPTIONS: --- 
>>>>> REQUIREMENTS: *   A reasonably recent version of perl. 
>>>>>               *   The following CPAN modules: 
>>>>>
>>>>>                   -   [JSON::Any](
>>>>> https://metacpan.org/module/JSON::Any) 
>>>>>                       +   A JSON 'backend' module like JSON or 
>>>>> JSON::XS. 
>>>>>                   -   [List::MoreUtils](
>>>>> https://metacpan.org/module/List::MoreUtils) 
>>>>>                   -   [autovivification](
>>>>> https://metacpan.org/module/autovivification) 
>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "pandoc-discuss" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to pandoc-...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/pandoc-discuss/35356bdb-9f45-4f0c-bb49-3fb4e2db98a0%40googlegroups.com 
>>>> <https://groups.google.com/d/msgid/pandoc-discuss/35356bdb-9f45-4f0c-bb49-3fb4e2db98a0%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/pandoc-discuss/1beb6ec0-19a5-4da7-b785-ebb7d340c865%40googlegroups.com 
>> <https://groups.google.com/d/msgid/pandoc-discuss/1beb6ec0-19a5-4da7-b785-ebb7d340c865%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/b3c84390-28d9-4962-909a-43eceab09108%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 13409 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: A New Feature for Pandoc's Markdown Extension -- No Space with Newline
       [not found]                                 ` <b3c84390-28d9-4962-909a-43eceab09108-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-04-14 16:07                                   ` BPJ
       [not found]                                     ` <CADAJKhC+k=sdZVJV5GMKM9xZsP_L8KFGqny2f5AZQ6FDXngy6A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: BPJ @ 2020-04-14 16:07 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1: Type: text/plain, Size: 9087 bytes --]

Are you conversant with perl and CPAN?
If not what operating system(s) do you use (Windows/Mac/Linux)?

I ask because if the answer to the first question is no I may have to guide
you through installing some stuff, including perl itself if the answer to
the second question is Windows.

Den tis 14 apr. 2020 16:13J <lixichen-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev:

> Thank sounds perfect ! Many thanks for your efforts !
>
> On Tuesday, April 14, 2020 at 1:18:17 PM UTC+8, BP wrote:
>>
>> A Perl filter which removes Space and SoftBreak elements sandwiched
>> between two Str elements which respectively ends and starts with a
>> character with Unicode script property CJK is certainly doable. Will that
>> be OK?
>>
>> /BPJ
>>
>>
>> Den tis 14 apr. 2020 02:39J <lixi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev:
>>
>>> Thank you for your efforts very much ! I wonder if the script can keep
>>> the spaces inside English words, digits, and punctuation, since my files
>>> also contain short groups of English words and number with digits ?
>>>
>>> On Tuesday, April 14, 2020 at 3:16:40 AM UTC+8, BP wrote:
>>>>
>>>> Wow that script is really ancient! I'll try to port it to a Lua filter
>>>> tomorrow. It's 9 PM here now and I have been coding or writing for twelve
>>>> hours, so I'm quite exhausted.
>>>>
>>>> Just to be clear, the old script removes all spaces which are next to a
>>>> "string" element, i.e. all "words", digits and punctuation alike, and not
>>>> just CJK characters. If you are OK with that behavior porting it to a Lua
>>>> filter will be trivial, and Lua is built-in in Pandoc. Otherwise I'll have
>>>> to look into rewriting the Perl script, which may be not quite as trivial.
>>>>
>>>> /BPJ
>>>>
>>>> Den mån 13 apr. 2020 20:45J <lixi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev:
>>>>
>>>>> Could you help to update zapspace.pl to work with pandoc 2.9.2.1 ? I
>>>>> have Chinese markdown files that use spaces to separate groups of words,
>>>>> and would like to ignore spaces between Chinese characters before
>>>>> converting to Word.
>>>>> Many thanks !
>>>>>
>>>>> On Tuesday, July 16, 2013 at 11:34:32 PM UTC+8, BP Jonsson wrote:
>>>>>>
>>>>>> 2013-07-15 19:51, John MacFarlane skrev:
>>>>>> > +++ Bill Chen (CHEN, Zhechuan) [Jul 15 13 17:16 ]:
>>>>>> >>     Have found a way to make this feature done.
>>>>>> >>     Just add "\n" at the last of the line
>>>>>> >
>>>>>> > This would violate the general rule that backslashes before letters
>>>>>> in
>>>>>> > markdown are just literal backslashes.
>>>>>> >
>>>>>> > I think that a better approach would be to provide a markdown
>>>>>> > extension like the current 'hard_line_breaks':  perhaps
>>>>>> > 'ignore_line_breaks'.  'hard_line_breaks' causes line
>>>>>> > breaks in a paragraph to be interpreted as hard breaks;
>>>>>> > 'ignore_line_breaks' would cause them to be ignored entirely.
>>>>>> > (One of these would have to be designated as taking precedence
>>>>>> > if both were selected.)
>>>>>> >
>>>>>> > John
>>>>>> >
>>>>>>
>>>>>> The attached perl script, when used as a filter on pandoc's
>>>>>> json output, should enable Bill to get what he wants.  I have
>>>>>> used an earlier version on Tibetan text with satisfactory
>>>>>> results. Someone who knows Haskell could probably write
>>>>>> something shorter which interacts with pandoc in a more
>>>>>> elegant way, but this script works.
>>>>>>
>>>>>> The description inside the file reads as follows:
>>>>>>
>>>>>>         FILE: zapspace.pl
>>>>>>
>>>>>>        USAGE: pandoc -w json some.markdown | zapspace.pl | pandoc -r
>>>>>> json
>>>>>>
>>>>>>  DESCRIPTION: Takes as input a document in pandoc's json format and
>>>>>>               removes all "Space" elements inside any list which also
>>>>>>               contains any {"Str":"..."} element, and outputs a
>>>>>>               modified json document, which when given as input to
>>>>>>               pandoc will produce output suitable for languages which
>>>>>>               don't put spaces between words or sentences, with no
>>>>>> spaces
>>>>>>               inside paragraphs -- unless you insert non-breaking
>>>>>> spaces,
>>>>>>               see below! --, and notably spaces caused by linebreaks
>>>>>>               in the markdown paragraph will be removed.
>>>>>>
>>>>>>               Additionally it does two things which allow you to
>>>>>>               insert whitespace inside paragraph-like elements:
>>>>>>
>>>>>>               1)  It replaces any non-breaking space (U+00A0) inside
>>>>>> a
>>>>>>                   "Str" element with ordinary soft spaces (U+0020)
>>>>>>                   *if* the "Str" element also contains characters
>>>>>> other
>>>>>>                   than non-breaking spaces.
>>>>>>
>>>>>>                   This allows you to insert spaces into your markdown
>>>>>>                   paragraphs as non-breaking spaces (in pandoc
>>>>>> notation
>>>>>>                   a backslash followed by an ordinary space "like\
>>>>>> this")
>>>>>>                   and get ordinary spaces in your output.
>>>>>>
>>>>>>               2)  Preserves any "Str" element which only contains one
>>>>>>                   or more non-breaking spaces as is.
>>>>>>
>>>>>>                   This allows you to put non-breaking spaces between
>>>>>>                   words by inserting ordinary whitespace -- which
>>>>>> will
>>>>>>                   be removed -- on either side of the non-breaking
>>>>>>                   spaces "like \  this".
>>>>>>                               ^  ^
>>>>>>
>>>>>>               N.B. that this is *not* done by scanning the JSON text
>>>>>>               with regular expressions!  The JSON is loaded into a
>>>>>>               perl data structure which is modified and then
>>>>>> converted
>>>>>>               back into JSON. Precautions are taken not to modify the
>>>>>>               structure such that the output will be rejected by
>>>>>>               pandoc, nor to modify code elements, but I can't
>>>>>> guarantee
>>>>>>               that this will remain true with future versions of
>>>>>> pandoc,
>>>>>>               or that it is true for any input.
>>>>>>
>>>>>>      OPTIONS: ---
>>>>>> REQUIREMENTS: *   A reasonably recent version of perl.
>>>>>>               *   The following CPAN modules:
>>>>>>
>>>>>>                   -   [JSON::Any](
>>>>>> https://metacpan.org/module/JSON::Any)
>>>>>>                       +   A JSON 'backend' module like JSON or
>>>>>> JSON::XS.
>>>>>>                   -   [List::MoreUtils](
>>>>>> https://metacpan.org/module/List::MoreUtils)
>>>>>>                   -   [autovivification](
>>>>>> https://metacpan.org/module/autovivification)
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "pandoc-discuss" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to pandoc-...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/pandoc-discuss/35356bdb-9f45-4f0c-bb49-3fb4e2db98a0%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/pandoc-discuss/35356bdb-9f45-4f0c-bb49-3fb4e2db98a0%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "pandoc-discuss" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to pandoc-...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/pandoc-discuss/1beb6ec0-19a5-4da7-b785-ebb7d340c865%40googlegroups.com
>>> <https://groups.google.com/d/msgid/pandoc-discuss/1beb6ec0-19a5-4da7-b785-ebb7d340c865%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/b3c84390-28d9-4962-909a-43eceab09108%40googlegroups.com
> <https://groups.google.com/d/msgid/pandoc-discuss/b3c84390-28d9-4962-909a-43eceab09108%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhC%2Bk%3DsdZVJV5GMKM9xZsP_L8KFGqny2f5AZQ6FDXngy6A%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 11623 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: A New Feature for Pandoc's Markdown Extension -- No Space with Newline
       [not found]                                     ` <CADAJKhC+k=sdZVJV5GMKM9xZsP_L8KFGqny2f5AZQ6FDXngy6A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2020-04-15  5:57                                       ` J
       [not found]                                         ` <5fe78fc8-7050-4342-8d5e-1350b9b06794-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: J @ 2020-04-15  5:57 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 9590 bytes --]

Please don't worry about CPAN. Google will help and I am willing to try the 
steps needed. :D

On Wednesday, April 15, 2020 at 12:07:47 AM UTC+8, BPJ wrote:
>
> Are you conversant with perl and CPAN?
> If not what operating system(s) do you use (Windows/Mac/Linux)?
>
> I ask because if the answer to the first question is no I may have to 
> guide you through installing some stuff, including perl itself if the 
> answer to the second question is Windows.
>
> Den tis 14 apr. 2020 16:13J <lixi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <javascript:>> skrev:
>
>> Thank sounds perfect ! Many thanks for your efforts ! 
>>
>> On Tuesday, April 14, 2020 at 1:18:17 PM UTC+8, BP wrote:
>>>
>>> A Perl filter which removes Space and SoftBreak elements sandwiched 
>>> between two Str elements which respectively ends and starts with a 
>>> character with Unicode script property CJK is certainly doable. Will that 
>>> be OK?
>>>
>>> /BPJ
>>>
>>>
>>> Den tis 14 apr. 2020 02:39J <lixi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev:
>>>
>>>> Thank you for your efforts very much ! I wonder if the script can keep 
>>>> the spaces inside English words, digits, and punctuation, since my files 
>>>> also contain short groups of English words and number with digits ?
>>>>
>>>> On Tuesday, April 14, 2020 at 3:16:40 AM UTC+8, BP wrote:
>>>>>
>>>>> Wow that script is really ancient! I'll try to port it to a Lua filter 
>>>>> tomorrow. It's 9 PM here now and I have been coding or writing for twelve 
>>>>> hours, so I'm quite exhausted.
>>>>>
>>>>> Just to be clear, the old script removes all spaces which are next to 
>>>>> a "string" element, i.e. all "words", digits and punctuation alike, and not 
>>>>> just CJK characters. If you are OK with that behavior porting it to a Lua 
>>>>> filter will be trivial, and Lua is built-in in Pandoc. Otherwise I'll have 
>>>>> to look into rewriting the Perl script, which may be not quite as trivial.
>>>>>
>>>>> /BPJ
>>>>>
>>>>> Den mån 13 apr. 2020 20:45J <lixi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev:
>>>>>
>>>>>> Could you help to update zapspace.pl to work with pandoc 2.9.2.1 ? I 
>>>>>> have Chinese markdown files that use spaces to separate groups of words, 
>>>>>> and would like to ignore spaces between Chinese characters before 
>>>>>> converting to Word.
>>>>>> Many thanks ! 
>>>>>>
>>>>>> On Tuesday, July 16, 2013 at 11:34:32 PM UTC+8, BP Jonsson wrote:
>>>>>>>
>>>>>>> 2013-07-15 19:51, John MacFarlane skrev: 
>>>>>>> > +++ Bill Chen (CHEN, Zhechuan) [Jul 15 13 17:16 ]: 
>>>>>>> >>     Have found a way to make this feature done. 
>>>>>>> >>     Just add "\n" at the last of the line 
>>>>>>> > 
>>>>>>> > This would violate the general rule that backslashes before 
>>>>>>> letters in 
>>>>>>> > markdown are just literal backslashes. 
>>>>>>> > 
>>>>>>> > I think that a better approach would be to provide a markdown 
>>>>>>> > extension like the current 'hard_line_breaks':  perhaps 
>>>>>>> > 'ignore_line_breaks'.  'hard_line_breaks' causes line 
>>>>>>> > breaks in a paragraph to be interpreted as hard breaks; 
>>>>>>> > 'ignore_line_breaks' would cause them to be ignored entirely. 
>>>>>>> > (One of these would have to be designated as taking precedence 
>>>>>>> > if both were selected.) 
>>>>>>> > 
>>>>>>> > John 
>>>>>>> > 
>>>>>>>
>>>>>>> The attached perl script, when used as a filter on pandoc's 
>>>>>>> json output, should enable Bill to get what he wants.  I have 
>>>>>>> used an earlier version on Tibetan text with satisfactory 
>>>>>>> results. Someone who knows Haskell could probably write 
>>>>>>> something shorter which interacts with pandoc in a more 
>>>>>>> elegant way, but this script works. 
>>>>>>>
>>>>>>> The description inside the file reads as follows: 
>>>>>>>
>>>>>>>         FILE: zapspace.pl 
>>>>>>>
>>>>>>>        USAGE: pandoc -w json some.markdown | zapspace.pl | pandoc 
>>>>>>> -r json 
>>>>>>>
>>>>>>>  DESCRIPTION: Takes as input a document in pandoc's json format and 
>>>>>>>               removes all "Space" elements inside any list which 
>>>>>>> also 
>>>>>>>               contains any {"Str":"..."} element, and outputs a 
>>>>>>>               modified json document, which when given as input to 
>>>>>>>               pandoc will produce output suitable for languages 
>>>>>>> which 
>>>>>>>               don't put spaces between words or sentences, with no 
>>>>>>> spaces 
>>>>>>>               inside paragraphs -- unless you insert non-breaking 
>>>>>>> spaces, 
>>>>>>>               see below! --, and notably spaces caused by linebreaks 
>>>>>>>               in the markdown paragraph will be removed. 
>>>>>>>
>>>>>>>               Additionally it does two things which allow you to 
>>>>>>>               insert whitespace inside paragraph-like elements: 
>>>>>>>
>>>>>>>               1)  It replaces any non-breaking space (U+00A0) inside 
>>>>>>> a 
>>>>>>>                   "Str" element with ordinary soft spaces (U+0020) 
>>>>>>>                   *if* the "Str" element also contains characters 
>>>>>>> other 
>>>>>>>                   than non-breaking spaces. 
>>>>>>>
>>>>>>>                   This allows you to insert spaces into your 
>>>>>>> markdown 
>>>>>>>                   paragraphs as non-breaking spaces (in pandoc 
>>>>>>> notation 
>>>>>>>                   a backslash followed by an ordinary space "like\ 
>>>>>>> this") 
>>>>>>>                   and get ordinary spaces in your output. 
>>>>>>>
>>>>>>>               2)  Preserves any "Str" element which only contains 
>>>>>>> one 
>>>>>>>                   or more non-breaking spaces as is. 
>>>>>>>
>>>>>>>                   This allows you to put non-breaking spaces between 
>>>>>>>                   words by inserting ordinary whitespace -- which 
>>>>>>> will 
>>>>>>>                   be removed -- on either side of the non-breaking 
>>>>>>>                   spaces "like \  this". 
>>>>>>>                               ^  ^ 
>>>>>>>
>>>>>>>               N.B. that this is *not* done by scanning the JSON text 
>>>>>>>               with regular expressions!  The JSON is loaded into a 
>>>>>>>               perl data structure which is modified and then 
>>>>>>> converted 
>>>>>>>               back into JSON. Precautions are taken not to modify 
>>>>>>> the 
>>>>>>>               structure such that the output will be rejected by 
>>>>>>>               pandoc, nor to modify code elements, but I can't 
>>>>>>> guarantee 
>>>>>>>               that this will remain true with future versions of 
>>>>>>> pandoc, 
>>>>>>>               or that it is true for any input. 
>>>>>>>
>>>>>>>      OPTIONS: --- 
>>>>>>> REQUIREMENTS: *   A reasonably recent version of perl. 
>>>>>>>               *   The following CPAN modules: 
>>>>>>>
>>>>>>>                   -   [JSON::Any](
>>>>>>> https://metacpan.org/module/JSON::Any) 
>>>>>>>                       +   A JSON 'backend' module like JSON or 
>>>>>>> JSON::XS. 
>>>>>>>                   -   [List::MoreUtils](
>>>>>>> https://metacpan.org/module/List::MoreUtils) 
>>>>>>>                   -   [autovivification](
>>>>>>> https://metacpan.org/module/autovivification) 
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "pandoc-discuss" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to pandoc-...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/pandoc-discuss/35356bdb-9f45-4f0c-bb49-3fb4e2db98a0%40googlegroups.com 
>>>>>> <https://groups.google.com/d/msgid/pandoc-discuss/35356bdb-9f45-4f0c-bb49-3fb4e2db98a0%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "pandoc-discuss" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to pandoc-...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/pandoc-discuss/1beb6ec0-19a5-4da7-b785-ebb7d340c865%40googlegroups.com 
>>>> <https://groups.google.com/d/msgid/pandoc-discuss/1beb6ec0-19a5-4da7-b785-ebb7d340c865%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/pandoc-discuss/b3c84390-28d9-4962-909a-43eceab09108%40googlegroups.com 
>> <https://groups.google.com/d/msgid/pandoc-discuss/b3c84390-28d9-4962-909a-43eceab09108%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5fe78fc8-7050-4342-8d5e-1350b9b06794%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 15476 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: A New Feature for Pandoc's Markdown Extension -- No Space with Newline
       [not found]                                         ` <5fe78fc8-7050-4342-8d5e-1350b9b06794-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-04-15 15:23                                           ` Benct Philip Jonsson
  0 siblings, 0 replies; 14+ messages in thread
From: Benct Philip Jonsson @ 2020-04-15 15:23 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw, J

[-- Attachment #1: Type: text/plain, Size: 9971 bytes --]

Perl/JSON filter attached.

Take care not to overwrite your original files as this is barely tested 
on a single line of text with mixed Hanzi/Latin letters.

Usage instructions and installation hints in the file (Below 
$DOCUMENTATION).

On 2020-04-15 07:57, J wrote:
> Please don't worry about CPAN. Google will help and I am willing to try the
> steps needed. :D
> 
> On Wednesday, April 15, 2020 at 12:07:47 AM UTC+8, BPJ wrote:
>>
>> Are you conversant with perl and CPAN?
>> If not what operating system(s) do you use (Windows/Mac/Linux)?
>>
>> I ask because if the answer to the first question is no I may have to
>> guide you through installing some stuff, including perl itself if the
>> answer to the second question is Windows.
>>
>> Den tis 14 apr. 2020 16:13J <lixi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <javascript:>> skrev:
>>
>>> Thank sounds perfect ! Many thanks for your efforts !
>>>
>>> On Tuesday, April 14, 2020 at 1:18:17 PM UTC+8, BP wrote:
>>>>
>>>> A Perl filter which removes Space and SoftBreak elements sandwiched
>>>> between two Str elements which respectively ends and starts with a
>>>> character with Unicode script property CJK is certainly doable. Will that
>>>> be OK?
>>>>
>>>> /BPJ
>>>>
>>>>
>>>> Den tis 14 apr. 2020 02:39J <lixi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev:
>>>>
>>>>> Thank you for your efforts very much ! I wonder if the script can keep
>>>>> the spaces inside English words, digits, and punctuation, since my files
>>>>> also contain short groups of English words and number with digits ?
>>>>>
>>>>> On Tuesday, April 14, 2020 at 3:16:40 AM UTC+8, BP wrote:
>>>>>>
>>>>>> Wow that script is really ancient! I'll try to port it to a Lua filter
>>>>>> tomorrow. It's 9 PM here now and I have been coding or writing for twelve
>>>>>> hours, so I'm quite exhausted.
>>>>>>
>>>>>> Just to be clear, the old script removes all spaces which are next to
>>>>>> a "string" element, i.e. all "words", digits and punctuation alike, and not
>>>>>> just CJK characters. If you are OK with that behavior porting it to a Lua
>>>>>> filter will be trivial, and Lua is built-in in Pandoc. Otherwise I'll have
>>>>>> to look into rewriting the Perl script, which may be not quite as trivial.
>>>>>>
>>>>>> /BPJ
>>>>>>
>>>>>> Den mån 13 apr. 2020 20:45J <lixi...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> skrev:
>>>>>>
>>>>>>> Could you help to update zapspace.pl to work with pandoc 2.9.2.1 ? I
>>>>>>> have Chinese markdown files that use spaces to separate groups of words,
>>>>>>> and would like to ignore spaces between Chinese characters before
>>>>>>> converting to Word.
>>>>>>> Many thanks !
>>>>>>>
>>>>>>> On Tuesday, July 16, 2013 at 11:34:32 PM UTC+8, BP Jonsson wrote:
>>>>>>>>
>>>>>>>> 2013-07-15 19:51, John MacFarlane skrev:
>>>>>>>>> +++ Bill Chen (CHEN, Zhechuan) [Jul 15 13 17:16 ]:
>>>>>>>>>>      Have found a way to make this feature done.
>>>>>>>>>>      Just add "\n" at the last of the line
>>>>>>>>>
>>>>>>>>> This would violate the general rule that backslashes before
>>>>>>>> letters in
>>>>>>>>> markdown are just literal backslashes.
>>>>>>>>>
>>>>>>>>> I think that a better approach would be to provide a markdown
>>>>>>>>> extension like the current 'hard_line_breaks':  perhaps
>>>>>>>>> 'ignore_line_breaks'.  'hard_line_breaks' causes line
>>>>>>>>> breaks in a paragraph to be interpreted as hard breaks;
>>>>>>>>> 'ignore_line_breaks' would cause them to be ignored entirely.
>>>>>>>>> (One of these would have to be designated as taking precedence
>>>>>>>>> if both were selected.)
>>>>>>>>>
>>>>>>>>> John
>>>>>>>>>
>>>>>>>>
>>>>>>>> The attached perl script, when used as a filter on pandoc's
>>>>>>>> json output, should enable Bill to get what he wants.  I have
>>>>>>>> used an earlier version on Tibetan text with satisfactory
>>>>>>>> results. Someone who knows Haskell could probably write
>>>>>>>> something shorter which interacts with pandoc in a more
>>>>>>>> elegant way, but this script works.
>>>>>>>>
>>>>>>>> The description inside the file reads as follows:
>>>>>>>>
>>>>>>>>          FILE: zapspace.pl
>>>>>>>>
>>>>>>>>         USAGE: pandoc -w json some.markdown | zapspace.pl | pandoc
>>>>>>>> -r json
>>>>>>>>
>>>>>>>>   DESCRIPTION: Takes as input a document in pandoc's json format and
>>>>>>>>                removes all "Space" elements inside any list which
>>>>>>>> also
>>>>>>>>                contains any {"Str":"..."} element, and outputs a
>>>>>>>>                modified json document, which when given as input to
>>>>>>>>                pandoc will produce output suitable for languages
>>>>>>>> which
>>>>>>>>                don't put spaces between words or sentences, with no
>>>>>>>> spaces
>>>>>>>>                inside paragraphs -- unless you insert non-breaking
>>>>>>>> spaces,
>>>>>>>>                see below! --, and notably spaces caused by linebreaks
>>>>>>>>                in the markdown paragraph will be removed.
>>>>>>>>
>>>>>>>>                Additionally it does two things which allow you to
>>>>>>>>                insert whitespace inside paragraph-like elements:
>>>>>>>>
>>>>>>>>                1)  It replaces any non-breaking space (U+00A0) inside
>>>>>>>> a
>>>>>>>>                    "Str" element with ordinary soft spaces (U+0020)
>>>>>>>>                    *if* the "Str" element also contains characters
>>>>>>>> other
>>>>>>>>                    than non-breaking spaces.
>>>>>>>>
>>>>>>>>                    This allows you to insert spaces into your
>>>>>>>> markdown
>>>>>>>>                    paragraphs as non-breaking spaces (in pandoc
>>>>>>>> notation
>>>>>>>>                    a backslash followed by an ordinary space "like\
>>>>>>>> this")
>>>>>>>>                    and get ordinary spaces in your output.
>>>>>>>>
>>>>>>>>                2)  Preserves any "Str" element which only contains
>>>>>>>> one
>>>>>>>>                    or more non-breaking spaces as is.
>>>>>>>>
>>>>>>>>                    This allows you to put non-breaking spaces between
>>>>>>>>                    words by inserting ordinary whitespace -- which
>>>>>>>> will
>>>>>>>>                    be removed -- on either side of the non-breaking
>>>>>>>>                    spaces "like \  this".
>>>>>>>>                                ^  ^
>>>>>>>>
>>>>>>>>                N.B. that this is *not* done by scanning the JSON text
>>>>>>>>                with regular expressions!  The JSON is loaded into a
>>>>>>>>                perl data structure which is modified and then
>>>>>>>> converted
>>>>>>>>                back into JSON. Precautions are taken not to modify
>>>>>>>> the
>>>>>>>>                structure such that the output will be rejected by
>>>>>>>>                pandoc, nor to modify code elements, but I can't
>>>>>>>> guarantee
>>>>>>>>                that this will remain true with future versions of
>>>>>>>> pandoc,
>>>>>>>>                or that it is true for any input.
>>>>>>>>
>>>>>>>>       OPTIONS: ---
>>>>>>>> REQUIREMENTS: *   A reasonably recent version of perl.
>>>>>>>>                *   The following CPAN modules:
>>>>>>>>
>>>>>>>>                    -   [JSON::Any](
>>>>>>>> https://metacpan.org/module/JSON::Any)
>>>>>>>>                        +   A JSON 'backend' module like JSON or
>>>>>>>> JSON::XS.
>>>>>>>>                    -   [List::MoreUtils](
>>>>>>>> https://metacpan.org/module/List::MoreUtils)
>>>>>>>>                    -   [autovivification](
>>>>>>>> https://metacpan.org/module/autovivification)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -- 
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "pandoc-discuss" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to pandoc-...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>>>>>> To view this discussion on the web visit
>>>>>>> https://groups.google.com/d/msgid/pandoc-discuss/35356bdb-9f45-4f0c-bb49-3fb4e2db98a0%40googlegroups.com
>>>>>>> <https://groups.google.com/d/msgid/pandoc-discuss/35356bdb-9f45-4f0c-bb49-3fb4e2db98a0%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>>
>>>>>> -- 
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "pandoc-discuss" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to pandoc-...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/pandoc-discuss/1beb6ec0-19a5-4da7-b785-ebb7d340c865%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/pandoc-discuss/1beb6ec0-19a5-4da7-b785-ebb7d340c865%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> -- 
>>> You received this message because you are subscribed to the Google Groups
>>> "pandoc-discuss" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/pandoc-discuss/b3c84390-28d9-4962-909a-43eceab09108%40googlegroups.com
>>> <https://groups.google.com/d/msgid/pandoc-discuss/b3c84390-28d9-4962-909a-43eceab09108%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>
> 

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/09a13fd8-db26-b851-b426-2fa7ad96ecf4%40gmail.com.

[-- Attachment #2: zapspace-cjk.pl --]
[-- Type: application/x-perl, Size: 2884 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-04-15 15:23 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-15  3:33 A New Feature for Pandoc's Markdown Extension -- No Space with Newline Bill Chen (CHEN, Zhechuan)
     [not found] ` <CAFOcPC+oqesoOPbFkiyo_cjAQFW7qGj4oidMU5gn+BnfpWM2aw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-07-15  9:16   ` Bill Chen (CHEN, Zhechuan)
     [not found]     ` <CAFOcPCLzrV1dWji3BNjWopayQXLDebAzFpF0WMwfZ_i8x8d63w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-07-15 17:51       ` John MacFarlane
     [not found]         ` <20130715175101.GA20541-nFAEphtLEs+AA6luYCgp0U1S2cYJDpTV9nwVQlTi/Pw@public.gmane.org>
2013-07-16 15:34           ` BP Jonsson
     [not found]             ` <51E56808.5000500-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2020-04-13 18:44               ` J
     [not found]                 ` <35356bdb-9f45-4f0c-bb49-3fb4e2db98a0-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-04-13 19:16                   ` BPJ
     [not found]                     ` <CADAJKhDMPQveCFfsDYp1-CJKTTA6EMmWf_M_11edGF8uvEcHJg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-04-14  0:39                       ` J
     [not found]                         ` <1beb6ec0-19a5-4da7-b785-ebb7d340c865-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-04-14  5:17                           ` BPJ
     [not found]                             ` <CADAJKhDkCQ-GsQ7-G2_U_SZSx-1zheZAdQizRn-Cjb0jaC92Pw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-04-14 14:13                               ` J
     [not found]                                 ` <b3c84390-28d9-4962-909a-43eceab09108-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-04-14 16:07                                   ` BPJ
     [not found]                                     ` <CADAJKhC+k=sdZVJV5GMKM9xZsP_L8KFGqny2f5AZQ6FDXngy6A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-04-15  5:57                                       ` J
     [not found]                                         ` <5fe78fc8-7050-4342-8d5e-1350b9b06794-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-04-15 15:23                                           ` Benct Philip Jonsson
2013-07-17 22:47           ` John MacFarlane
     [not found]             ` <20130717224659.GA23839-9Rnp8PDaXcZ2EAH53EmH34tHsfhOvSUSZkel5v8DVj8@public.gmane.org>
2013-07-18  4:38               ` Bill Chen (CHEN, Zhechuan)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).