From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/17985
Path: news.gmane.org!.POSTED!not-for-mail
From: Karim Mohammadi <s.karim.mohammadi-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Newsgroups: gmane.text.pandoc
Subject: Writing custom filter in python to remove non-breaking spaces
Date: Wed, 2 Aug 2017 02:05:08 -0700 (PDT)
Message-ID: <3be5ee09-90dc-41ad-a368-9298b965dfaa@googlegroups.com>
Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
NNTP-Posting-Host: blaine.gmane.org
Mime-Version: 1.0
Content-Type: multipart/mixed; 
	boundary="----=_Part_6027_1665163991.1501664708846"
X-Trace: blaine.gmane.org 1501664713 19719 195.159.176.226 (2 Aug 2017 09:05:13 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Wed, 2 Aug 2017 09:05:13 +0000 (UTC)
To: pandoc-discuss <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Original-X-From: pandoc-discuss+bncBDHLBDVL2MIBBRNLQ3GAKGQEQ56DKJI-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Wed Aug 02 11:05:08 2017
Return-path: <pandoc-discuss+bncBDHLBDVL2MIBBRNLQ3GAKGQEQ56DKJI-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Envelope-to: gtp-pandoc-discuss@m.gmane.org
Original-Received: from mail-yw0-f186.google.com ([209.85.161.186])
	by blaine.gmane.org with esmtp (Exim 4.84_2)
	(envelope-from <pandoc-discuss+bncBDHLBDVL2MIBBRNLQ3GAKGQEQ56DKJI-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>)
	id 1dcpaR-0004hh-Vh
	for gtp-pandoc-discuss@m.gmane.org; Wed, 02 Aug 2017 11:05:04 +0200
Original-Received: by mail-yw0-f186.google.com with SMTP id s143sf1492844ywg.0
        for <gtp-pandoc-discuss@m.gmane.org>; Wed, 02 Aug 2017 02:05:10 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=googlegroups.com; s=20161025;
        h=sender:date:from:to:message-id:subject:mime-version
         :x-original-sender:reply-to:precedence:mailing-list:list-id
         :list-post:list-help:list-archive:list-subscribe:list-unsubscribe;
        bh=7EM/JjsLB68GHoe2L16A9LMyNb1RUkwoyPxrV1PI4hU=;
        b=BH0jcYwkBG5H2N7zSmPSCFoSaEwZRVK9rBglJUsT8n/wKi2t+gG0a4sfwFMbjzz+8m
         /CV+RptqMMypjIFeFYcmUcP6aiDk1WxmxXGK+C+PLS3PdZE9lqyBWZs/qfI2L7sNyyzo
         cZf0rJ9Pc00YFcV6x8roxbC5nDdh82jzoKwp5OVZGHPN6rGNh6PVS/sh+qssFw+VKypE
         t/vS3qPtVOsCdxKuRkM3i/2XVEyyFyRDp4gBRO2jZ5R9cYZo9rhER+jnEam/dEANhlyl
         zq/M27J23fg1EX733SM0ly5tTblTyDvRanVJo70f+KkH1VTw7KDhLOKrglbkD/Wpts4S
         UtUA==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=date:from:to:message-id:subject:mime-version:x-original-sender
         :reply-to:precedence:mailing-list:list-id:list-post:list-help
         :list-archive:list-subscribe:list-unsubscribe;
        bh=7EM/JjsLB68GHoe2L16A9LMyNb1RUkwoyPxrV1PI4hU=;
        b=R9c9E/7EWh4ONy1NPJePo1GuQYqT6Z3FRY+yGgC01mfK68wjw+TJCr57DKAmtnFsXv
         N42xD2mrvF851HWLvLPb2fZoOB+2CQ5RfaRwsJHuojtTrvcxRYKtAJr1n22j4N7RTFRo
         X76SsM30rcOljwwTPGEO3mItPpMVxvUxFE9MvjmpdLUAKdwjiV9UfbhVZxDISkU6c3Hz
         jkdOVzr4Fxn4+Dtuh214l0EFRsZldKPrALqIyea0tKXQEArTHMiVmxO3NP6jK1ZrQliX
         mVEQVvxOmJptrS9FiObyFmASnfMvaKQS4DOMeV+YLEDciYK4FbDak/l2+ea1rdBJYPA7
         uMaQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=sender:x-gm-message-state:date:from:to:message-id:subject
         :mime-version:x-original-sender:reply-to:precedence:mailing-list
         :list-id:x-spam-checked-in-group:list-post:list-help:list-archive
         :list-subscribe:list-unsubscribe;
        bh=7EM/JjsLB68GHoe2L16A9LMyNb1RUkwoyPxrV1PI4hU=;
        b=GxlcU8xn612ROL09VXEcj3/oCC+S+/0/sz4xoHzeU2TrI4jZ4CFLKDwZhcWyjf/Z/X
         WSRiUoFGrMB8Tk9UK6uDg9XKssNvMpcU2300+cs2jpj4QyQavUgl9rPK1m+KTG2uWA2i
         RIXF3zl9HFdteXelZumooLjfUoCUehUEX+p3BJB3xQEpdbrIP+Jf9FHe59fo7pf2zIoE
         aq4349MUOjHXeo/MqH7Z8AZc9VqeEgSTLvjYYUVw6xICldA+DqHfzT8nMjOU6JcEZN9/
         PJYJwkaBMviWXIfQM8p6NNM4ZWQ+aX0swLe2A1uqkvb7yIYDg6yD3sQpjjforP0e5mbJ
         ZfEQ==
Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
X-Gm-Message-State: AIVw113uzPX+EQNXB4IuGiF5TniiUtbKvy4RqmE3OxJeyz+3Q6GUs6YD
	7y7xakdItnIarQ==
X-Received: by 10.36.53.8 with SMTP id k8mr199480ita.12.1501664709991;
        Wed, 02 Aug 2017 02:05:09 -0700 (PDT)
X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Original-Received: by 10.107.155.9 with SMTP id d9ls11839777ioe.40.gmail; Wed, 02 Aug
 2017 02:05:09 -0700 (PDT)
X-Received: by 10.36.53.8 with SMTP id k8mr199475ita.12.1501664709275;
        Wed, 02 Aug 2017 02:05:09 -0700 (PDT)
X-Original-Sender: s.karim.mohammadi-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
Precedence: list
Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
List-ID: <pandoc-discuss.googlegroups.com>
X-Google-Group-Id: 1007024079513
List-Post: <https://groups.google.com/group/pandoc-discuss/post>, <mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
List-Help: <https://groups.google.com/support/>, <mailto:pandoc-discuss+help-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
List-Archive: <https://groups.google.com/group/pandoc-discuss
List-Subscribe: <https://groups.google.com/group/pandoc-discuss/subscribe>, <mailto:pandoc-discuss+subscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
List-Unsubscribe: <mailto:googlegroups-manage+1007024079513+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>,
 <https://groups.google.com/group/pandoc-discuss/subscribe>
Xref: news.gmane.org gmane.text.pandoc:17985
Archived-At: <http://permalink.gmane.org/gmane.text.pandoc/17985>

------=_Part_6027_1665163991.1501664708846
Content-Type: multipart/alternative; 
	boundary="----=_Part_6028_814728258.1501664708846"

------=_Part_6028_814728258.1501664708846
Content-Type: text/plain; charset="UTF-8"

Hi. I want to convert html file *file1.html* to *file1.tex*. The html file 
contains *&nbsp;* .

How can i write an python script as a filter to remove non-breaking space 
(&nbsp)?

This is my code:

    #!/usr/bin/env python
    
    """
    Pandoc filter to removing &nbsp; string from the text
    """
    
    from pandocfilters import toJSONFilter, Para
    
    def debug(content):
    file = open('debug.txt', 'w')
    for item in content:
    file.write("%s\n" % item)
    
    def nbsp(key, value, format, meta):
    uniString = unicode(value, "UTF-8")
    uniString = value.replace("&nbsp;", " ")
    
    return uniString
    
    if __name__ == "__main__":
    toJSONFilter(nbsp)

 but calling command:

pandoc file1.html --filter ./nbsp.py -o file1.tex

give me some errors.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/3be5ee09-90dc-41ad-a368-9298b965dfaa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

------=_Part_6028_814728258.1501664708846
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>Hi. I want to convert html file <b>file1.html</b> to =
<b>file1.tex</b>. The html file contains <b>&amp;nbsp;</b> .</div><div><br>=
</div><div>How can i write an python script as a filter to remove non-break=
ing space (&amp;nbsp)?</div><div><br></div><div>This is my code:<br><br><di=
v class=3D"prettyprint" style=3D"border-width: 1px; border-style: solid; bo=
rder-color: rgb(187, 187, 187); background-color: rgb(250, 250, 250); word-=
wrap: break-word;"><code class=3D"prettyprint"><div class=3D"subprettyprint=
"><span style=3D"color: #000;" class=3D"styled-by-prettify">=C2=A0 =C2=A0 <=
/span><span style=3D"color: #800;" class=3D"styled-by-prettify">#!/usr/bin/=
env python</span><span style=3D"color: #000;" class=3D"styled-by-prettify">=
<br>=C2=A0 =C2=A0 <br>=C2=A0 =C2=A0 </span><span style=3D"color: #080;" cla=
ss=3D"styled-by-prettify">&quot;&quot;&quot;<br>=C2=A0 =C2=A0 Pandoc filter=
 to removing &amp;nbsp; string from the text<br>=C2=A0 =C2=A0 &quot;&quot;&=
quot;</span><span style=3D"color: #000;" class=3D"styled-by-prettify"><br>=
=C2=A0 =C2=A0 <br>=C2=A0 =C2=A0 </span><span style=3D"color: #008;" class=
=3D"styled-by-prettify">from</span><span style=3D"color: #000;" class=3D"st=
yled-by-prettify"> pandocfilters </span><span style=3D"color: #008;" class=
=3D"styled-by-prettify">import</span><span style=3D"color: #000;" class=3D"=
styled-by-prettify"> toJSONFilter</span><span style=3D"color: #660;" class=
=3D"styled-by-prettify">,</span><span style=3D"color: #000;" class=3D"style=
d-by-prettify"> </span><span style=3D"color: #606;" class=3D"styled-by-pret=
tify">Para</span><span style=3D"color: #000;" class=3D"styled-by-prettify">=
<br>=C2=A0 =C2=A0 <br>=C2=A0 =C2=A0 </span><span style=3D"color: #008;" cla=
ss=3D"styled-by-prettify">def</span><span style=3D"color: #000;" class=3D"s=
tyled-by-prettify"> debug</span><span style=3D"color: #660;" class=3D"style=
d-by-prettify">(</span><span style=3D"color: #000;" class=3D"styled-by-pret=
tify">content</span><span style=3D"color: #660;" class=3D"styled-by-prettif=
y">):</span><span style=3D"color: #000;" class=3D"styled-by-prettify"><br>=
=C2=A0 =C2=A0 file </span><span style=3D"color: #660;" class=3D"styled-by-p=
rettify">=3D</span><span style=3D"color: #000;" class=3D"styled-by-prettify=
"> open</span><span style=3D"color: #660;" class=3D"styled-by-prettify">(</=
span><span style=3D"color: #080;" class=3D"styled-by-prettify">&#39;debug.t=
xt&#39;</span><span style=3D"color: #660;" class=3D"styled-by-prettify">,</=
span><span style=3D"color: #000;" class=3D"styled-by-prettify"> </span><spa=
n style=3D"color: #080;" class=3D"styled-by-prettify">&#39;w&#39;</span><sp=
an style=3D"color: #660;" class=3D"styled-by-prettify">)</span><span style=
=3D"color: #000;" class=3D"styled-by-prettify"><br>=C2=A0 =C2=A0 </span><sp=
an style=3D"color: #008;" class=3D"styled-by-prettify">for</span><span styl=
e=3D"color: #000;" class=3D"styled-by-prettify"> item </span><span style=3D=
"color: #008;" class=3D"styled-by-prettify">in</span><span style=3D"color: =
#000;" class=3D"styled-by-prettify"> content</span><span style=3D"color: #6=
60;" class=3D"styled-by-prettify">:</span><span style=3D"color: #000;" clas=
s=3D"styled-by-prettify"><br>=C2=A0 =C2=A0 file</span><span style=3D"color:=
 #660;" class=3D"styled-by-prettify">.</span><span style=3D"color: #000;" c=
lass=3D"styled-by-prettify">write</span><span style=3D"color: #660;" class=
=3D"styled-by-prettify">(</span><span style=3D"color: #080;" class=3D"style=
d-by-prettify">&quot;%s\n&quot;</span><span style=3D"color: #000;" class=3D=
"styled-by-prettify"> </span><span style=3D"color: #660;" class=3D"styled-b=
y-prettify">%</span><span style=3D"color: #000;" class=3D"styled-by-prettif=
y"> item</span><span style=3D"color: #660;" class=3D"styled-by-prettify">)<=
/span><span style=3D"color: #000;" class=3D"styled-by-prettify"><br>=C2=A0 =
=C2=A0 <br>=C2=A0 =C2=A0 </span><span style=3D"color: #008;" class=3D"style=
d-by-prettify">def</span><span style=3D"color: #000;" class=3D"styled-by-pr=
ettify"> nbsp</span><span style=3D"color: #660;" class=3D"styled-by-prettif=
y">(</span><span style=3D"color: #000;" class=3D"styled-by-prettify">key</s=
pan><span style=3D"color: #660;" class=3D"styled-by-prettify">,</span><span=
 style=3D"color: #000;" class=3D"styled-by-prettify"> value</span><span sty=
le=3D"color: #660;" class=3D"styled-by-prettify">,</span><span style=3D"col=
or: #000;" class=3D"styled-by-prettify"> format</span><span style=3D"color:=
 #660;" class=3D"styled-by-prettify">,</span><span style=3D"color: #000;" c=
lass=3D"styled-by-prettify"> meta</span><span style=3D"color: #660;" class=
=3D"styled-by-prettify">):</span><span style=3D"color: #000;" class=3D"styl=
ed-by-prettify"><br>=C2=A0 =C2=A0 uniString </span><span style=3D"color: #6=
60;" class=3D"styled-by-prettify">=3D</span><span style=3D"color: #000;" cl=
ass=3D"styled-by-prettify"> unicode</span><span style=3D"color: #660;" clas=
s=3D"styled-by-prettify">(</span><span style=3D"color: #000;" class=3D"styl=
ed-by-prettify">value</span><span style=3D"color: #660;" class=3D"styled-by=
-prettify">,</span><span style=3D"color: #000;" class=3D"styled-by-prettify=
"> </span><span style=3D"color: #080;" class=3D"styled-by-prettify">&quot;U=
TF-8&quot;</span><span style=3D"color: #660;" class=3D"styled-by-prettify">=
)</span><span style=3D"color: #000;" class=3D"styled-by-prettify"><br>=C2=
=A0 =C2=A0 uniString </span><span style=3D"color: #660;" class=3D"styled-by=
-prettify">=3D</span><span style=3D"color: #000;" class=3D"styled-by-pretti=
fy"> value</span><span style=3D"color: #660;" class=3D"styled-by-prettify">=
.</span><span style=3D"color: #000;" class=3D"styled-by-prettify">replace</=
span><span style=3D"color: #660;" class=3D"styled-by-prettify">(</span><spa=
n style=3D"color: #080;" class=3D"styled-by-prettify">&quot;&amp;nbsp;&quot=
;</span><span style=3D"color: #660;" class=3D"styled-by-prettify">,</span><=
span style=3D"color: #000;" class=3D"styled-by-prettify"> </span><span styl=
e=3D"color: #080;" class=3D"styled-by-prettify">&quot; &quot;</span><span s=
tyle=3D"color: #660;" class=3D"styled-by-prettify">)</span><span style=3D"c=
olor: #000;" class=3D"styled-by-prettify"><br>=C2=A0 =C2=A0 <br>=C2=A0 =C2=
=A0 </span><span style=3D"color: #008;" class=3D"styled-by-prettify">return=
</span><span style=3D"color: #000;" class=3D"styled-by-prettify"> uniString=
<br>=C2=A0 =C2=A0 <br>=C2=A0 =C2=A0 </span><span style=3D"color: #008;" cla=
ss=3D"styled-by-prettify">if</span><span style=3D"color: #000;" class=3D"st=
yled-by-prettify"> __name__ </span><span style=3D"color: #660;" class=3D"st=
yled-by-prettify">=3D=3D</span><span style=3D"color: #000;" class=3D"styled=
-by-prettify"> </span><span style=3D"color: #080;" class=3D"styled-by-prett=
ify">&quot;__main__&quot;</span><span style=3D"color: #660;" class=3D"style=
d-by-prettify">:</span><span style=3D"color: #000;" class=3D"styled-by-pret=
tify"><br>=C2=A0 =C2=A0 toJSONFilter</span><span style=3D"color: #660;" cla=
ss=3D"styled-by-prettify">(</span><span style=3D"color: #000;" class=3D"sty=
led-by-prettify">nbsp</span><span style=3D"color: #660;" class=3D"styled-by=
-prettify">)</span></div></code></div><br>=C2=A0but calling command:<br><br=
><div class=3D"prettyprint" style=3D"background-color: rgb(250, 250, 250); =
border-color: rgb(187, 187, 187); border-style: solid; border-width: 1px; w=
ord-wrap: break-word;"><code class=3D"prettyprint"><div class=3D"subprettyp=
rint"><span style=3D"color: #000;" class=3D"styled-by-prettify">pandoc file=
1</span><span style=3D"color: #660;" class=3D"styled-by-prettify">.</span><=
span style=3D"color: #000;" class=3D"styled-by-prettify">html </span><span =
style=3D"color: #660;" class=3D"styled-by-prettify">--</span><span style=3D=
"color: #000;" class=3D"styled-by-prettify">filter </span><span style=3D"co=
lor: #660;" class=3D"styled-by-prettify">./</span><span style=3D"color: #00=
0;" class=3D"styled-by-prettify">nbsp</span><span style=3D"color: #660;" cl=
ass=3D"styled-by-prettify">.</span><span style=3D"color: #000;" class=3D"st=
yled-by-prettify">py </span><span style=3D"color: #660;" class=3D"styled-by=
-prettify">-</span><span style=3D"color: #000;" class=3D"styled-by-prettify=
">o file1</span><span style=3D"color: #660;" class=3D"styled-by-prettify">.=
</span><span style=3D"color: #000;" class=3D"styled-by-prettify">tex</span>=
</div></code></div><br> give me some errors.</div></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;pandoc-discuss&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org">pand=
oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:pandoc-discuss@googl=
egroups.com">pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/d/msgid/pandoc-discuss/3be5ee09-90dc-41ad-a368-9298b965dfaa%40googlegrou=
ps.com?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.com/d/=
msgid/pandoc-discuss/3be5ee09-90dc-41ad-a368-9298b965dfaa%40googlegroups.co=
m</a>.<br />
For more options, visit <a href=3D"https://groups.google.com/d/optout">http=
s://groups.google.com/d/optout</a>.<br />

------=_Part_6028_814728258.1501664708846--

------=_Part_6027_1665163991.1501664708846--