public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: Melroch <melroch-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Subject: Re: Writing custom filter in python to remove non-breaking spaces
Date: Fri, 4 Aug 2017 20:58:53 +0200	[thread overview]
Message-ID: <CADAJKhBEZ8-BdJTQRJE4M2nettrGKhf1xYzqBYs=pe=_DAodpA@mail.gmail.com> (raw)
In-Reply-To: <f0bb6fae-6104-4efc-840d-34fd19b02840-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 2315 bytes --]

I think the OP might actually want to replace literal or entity nbspaces
with regular spaces. There certainly may be reasonable reasons to want to
do that.
I think that the OP might be helped by a prefilter written in python. The
following tries to skip any fenced code or code blocks in order to not
replace nbsp entities inside code. It may be thrown by things like
`\~~~strikeout~~` but those are unlikely in practice.

````python
import sys
import re

inp = sys.stdin.read()
txt = inp.decode('utf-8')
pat = u"""(?isxu)
# Match delimited code or code block
(?P<code>
(?P<backtick> \`{1,} ) .*? (?P=backtick)
|
(?P<tilde> \~{3,} ) .*?  (?P=tilde)
)
|
# Match any form of nbsp
( \& (?: nbsp|[#]160|[#]xa0) \; | \u00a0 )
"""

# keep code and replace ordinary space
def rep(m):
    return m.group(1) if m.group(1) else u"\u0020"

print re.sub(pat, rep, txt).encode('utf-8')
````

fre 4 aug. 2017 kl. 00:30 skrev Kolen Cheung <christian.kolen-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:

> I actually want to know why you would want to remove that in the first
> place? It seems only if the source has bugs you would want to do that.
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/pandoc-discuss/f0bb6fae-6104-4efc-840d-34fd19b02840%
> 40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CADAJKhBEZ8-BdJTQRJE4M2nettrGKhf1xYzqBYs%3Dpe%3D_DAodpA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 4308 bytes --]

  parent reply	other threads:[~2017-08-04 18:58 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-02  9:05 Karim Mohammadi
     [not found] ` <3be5ee09-90dc-41ad-a368-9298b965dfaa-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2017-08-02 14:09   ` John MacFarlane
     [not found]     ` <20170802140916.GF38349-9Rnp8PDaXcadBw3G0RLmbRFnWt+6NQIA@public.gmane.org>
2017-08-03  4:55       ` Karim Mohammadi
     [not found]         ` <cc13bbea-06e5-422b-bcdd-cd9ba1c4cf95-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2017-08-03  6:11           ` Andrew Dunning
2017-08-03 22:29   ` Kolen Cheung
     [not found]     ` <f0bb6fae-6104-4efc-840d-34fd19b02840-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2017-08-04 18:58       ` Melroch [this message]
     [not found]         ` <CADAJKhBEZ8-BdJTQRJE4M2nettrGKhf1xYzqBYs=pe=_DAodpA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-08-04 22:03           ` Kolen Cheung
2017-08-04 22:09           ` Kolen Cheung
     [not found]             ` <4abb2571-34b8-4e49-a189-05632083aab9-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2017-08-05  8:45               ` Melroch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CADAJKhBEZ8-BdJTQRJE4M2nettrGKhf1xYzqBYs=pe=_DAodpA@mail.gmail.com' \
    --to=melroch-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).