From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/17985 Path: news.gmane.org!.POSTED!not-for-mail From: Karim Mohammadi Newsgroups: gmane.text.pandoc Subject: Writing custom filter in python to remove non-breaking spaces Date: Wed, 2 Aug 2017 02:05:08 -0700 (PDT) Message-ID: <3be5ee09-90dc-41ad-a368-9298b965dfaa@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_6027_1665163991.1501664708846" X-Trace: blaine.gmane.org 1501664713 19719 195.159.176.226 (2 Aug 2017 09:05:13 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 2 Aug 2017 09:05:13 +0000 (UTC) To: pandoc-discuss Original-X-From: pandoc-discuss+bncBDHLBDVL2MIBBRNLQ3GAKGQEQ56DKJI-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Wed Aug 02 11:05:08 2017 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-yw0-f186.google.com ([209.85.161.186]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dcpaR-0004hh-Vh for gtp-pandoc-discuss@m.gmane.org; Wed, 02 Aug 2017 11:05:04 +0200 Original-Received: by mail-yw0-f186.google.com with SMTP id s143sf1492844ywg.0 for ; Wed, 02 Aug 2017 02:05:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:date:from:to:message-id:subject:mime-version :x-original-sender:reply-to:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-subscribe:list-unsubscribe; bh=7EM/JjsLB68GHoe2L16A9LMyNb1RUkwoyPxrV1PI4hU=; b=BH0jcYwkBG5H2N7zSmPSCFoSaEwZRVK9rBglJUsT8n/wKi2t+gG0a4sfwFMbjzz+8m /CV+RptqMMypjIFeFYcmUcP6aiDk1WxmxXGK+C+PLS3PdZE9lqyBWZs/qfI2L7sNyyzo cZf0rJ9Pc00YFcV6x8roxbC5nDdh82jzoKwp5OVZGHPN6rGNh6PVS/sh+qssFw+VKypE t/vS3qPtVOsCdxKuRkM3i/2XVEyyFyRDp4gBRO2jZ5R9cYZo9rhER+jnEam/dEANhlyl zq/M27J23fg1EX733SM0ly5tTblTyDvRanVJo70f+KkH1VTw7KDhLOKrglbkD/Wpts4S UtUA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:message-id:subject:mime-version:x-original-sender :reply-to:precedence:mailing-list:list-id:list-post:list-help :list-archive:list-subscribe:list-unsubscribe; bh=7EM/JjsLB68GHoe2L16A9LMyNb1RUkwoyPxrV1PI4hU=; b=R9c9E/7EWh4ONy1NPJePo1GuQYqT6Z3FRY+yGgC01mfK68wjw+TJCr57DKAmtnFsXv N42xD2mrvF851HWLvLPb2fZoOB+2CQ5RfaRwsJHuojtTrvcxRYKtAJr1n22j4N7RTFRo X76SsM30rcOljwwTPGEO3mItPpMVxvUxFE9MvjmpdLUAKdwjiV9UfbhVZxDISkU6c3Hz jkdOVzr4Fxn4+Dtuh214l0EFRsZldKPrALqIyea0tKXQEArTHMiVmxO3NP6jK1ZrQliX mVEQVvxOmJptrS9FiObyFmASnfMvaKQS4DOMeV+YLEDciYK4FbDak/l2+ea1rdBJYPA7 uMaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:date:from:to:message-id:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=7EM/JjsLB68GHoe2L16A9LMyNb1RUkwoyPxrV1PI4hU=; b=GxlcU8xn612ROL09VXEcj3/oCC+S+/0/sz4xoHzeU2TrI4jZ4CFLKDwZhcWyjf/Z/X WSRiUoFGrMB8Tk9UK6uDg9XKssNvMpcU2300+cs2jpj4QyQavUgl9rPK1m+KTG2uWA2i RIXF3zl9HFdteXelZumooLjfUoCUehUEX+p3BJB3xQEpdbrIP+Jf9FHe59fo7pf2zIoE aq4349MUOjHXeo/MqH7Z8AZc9VqeEgSTLvjYYUVw6xICldA+DqHfzT8nMjOU6JcEZN9/ PJYJwkaBMviWXIfQM8p6NNM4ZWQ+aX0swLe2A1uqkvb7yIYDg6yD3sQpjjforP0e5mbJ ZfEQ== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AIVw113uzPX+EQNXB4IuGiF5TniiUtbKvy4RqmE3OxJeyz+3Q6GUs6YD 7y7xakdItnIarQ== X-Received: by 10.36.53.8 with SMTP id k8mr199480ita.12.1501664709991; Wed, 02 Aug 2017 02:05:09 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 10.107.155.9 with SMTP id d9ls11839777ioe.40.gmail; Wed, 02 Aug 2017 02:05:09 -0700 (PDT) X-Received: by 10.36.53.8 with SMTP id k8mr199475ita.12.1501664709275; Wed, 02 Aug 2017 02:05:09 -0700 (PDT) X-Original-Sender: s.karim.mohammadi-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.org gmane.text.pandoc:17985 Archived-At: ------=_Part_6027_1665163991.1501664708846 Content-Type: multipart/alternative; boundary="----=_Part_6028_814728258.1501664708846" ------=_Part_6028_814728258.1501664708846 Content-Type: text/plain; charset="UTF-8" Hi. I want to convert html file *file1.html* to *file1.tex*. The html file contains * * . How can i write an python script as a filter to remove non-breaking space ( )? This is my code: #!/usr/bin/env python """ Pandoc filter to removing   string from the text """ from pandocfilters import toJSONFilter, Para def debug(content): file = open('debug.txt', 'w') for item in content: file.write("%s\n" % item) def nbsp(key, value, format, meta): uniString = unicode(value, "UTF-8") uniString = value.replace(" ", " ") return uniString if __name__ == "__main__": toJSONFilter(nbsp) but calling command: pandoc file1.html --filter ./nbsp.py -o file1.tex give me some errors. -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/3be5ee09-90dc-41ad-a368-9298b965dfaa%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. ------=_Part_6028_814728258.1501664708846 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi. I want to convert html file file1.html to = file1.tex. The html file contains &nbsp; .

=
How can i write an python script as a filter to remove non-break= ing space (&nbsp)?

This is my code:

=C2=A0 =C2=A0 <= /span>#!/usr/bin/= env python=
=C2=A0 =C2=A0
=C2=A0 =C2=A0
"""
=C2=A0 =C2=A0 Pandoc filter= to removing &nbsp; string from the text
=C2=A0 =C2=A0 ""&= quot;

= =C2=A0 =C2=A0
=C2=A0 =C2=A0
from pandocfilters import toJSONFilter, Para=
=C2=A0 =C2=A0
=C2=A0 =C2=A0
def debug(content):
= =C2=A0 =C2=A0 file
=3D open('debug.t= xt', 'w')
=C2=A0 =C2=A0
for
item in content:
=C2=A0 =C2=A0 file
.write("%s\n" % item)<= /span>
=C2=A0 = =C2=A0
=C2=A0 =C2=A0
def nbsp(key, value, format, meta):
=C2=A0 =C2=A0 uniString
=3D unicode(value, "U= TF-8"= )
=C2= =A0 =C2=A0 uniString
=3D value= .replace("&nbsp;"= ;,<= span style=3D"color: #000;" class=3D"styled-by-prettify">
" ")
=C2=A0 =C2=A0
=C2=A0 =C2= =A0
return= uniString=
=C2=A0 =C2=A0
=C2=A0 =C2=A0
if __name__ =3D=3D "__main__":
=C2=A0 =C2=A0 toJSONFilter
(nbsp)

=C2=A0but calling command:
pandoc file= 1.<= span style=3D"color: #000;" class=3D"styled-by-prettify">html --filter ./nbsp.py -o file1.= tex=

give me some errors.

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/= msgid/pandoc-discuss/3be5ee09-90dc-41ad-a368-9298b965dfaa%40googlegroups.co= m.
For more options, visit http= s://groups.google.com/d/optout.
------=_Part_6028_814728258.1501664708846-- ------=_Part_6027_1665163991.1501664708846--