From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/17990 Path: news.gmane.org!.POSTED!not-for-mail From: Karim Mohammadi Newsgroups: gmane.text.pandoc Subject: Re: Writing custom filter in python to remove non-breaking spaces Date: Wed, 2 Aug 2017 21:55:35 -0700 (PDT) Message-ID: References: <3be5ee09-90dc-41ad-a368-9298b965dfaa@googlegroups.com> <20170802140916.GF38349@Johns-MacBook-Pro.local> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_6474_856812752.1501736135277" X-Trace: blaine.gmane.org 1501736142 3688 195.159.176.226 (3 Aug 2017 04:55:42 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 3 Aug 2017 04:55:42 +0000 (UTC) To: pandoc-discuss Original-X-From: pandoc-discuss+bncBDHLBDVL2MIBBSOZRLGAKGQEHB6VVEY-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Thu Aug 03 06:55:37 2017 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-wm0-f62.google.com ([74.125.82.62]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dd8AW-0000OT-1Q for gtp-pandoc-discuss@m.gmane.org; Thu, 03 Aug 2017 06:55:32 +0200 Original-Received: by mail-wm0-f62.google.com with SMTP id l19sf160895wmi.1 for ; Wed, 02 Aug 2017 21:55:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:date:from:to:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:list-subscribe :list-unsubscribe; bh=KhlQtW1wSX/pe7Uk4pPP8yO2AjyuT3XfPRJroNignpg=; b=R6SRcLHwzJRsB5c7BMqp94XlucwtLYhsb6Nbw9Pxsk39bzLSEcZfeFSVMTNmfhFj+7 NF9E2mFlWUYV6AaCrMz8q717lxURmVuvgN9XLOJqZ2AIiOMJ9v8jxzORtfViSBraO+nT GfFuXkyZGwiWVfqqgx3gelnoYSPPlA+cn+LeTwAPdmCXWuGG6rM98gUhYl5vAHcT77S0 1GMDNgrQlxp3Z0U/6ncWhlQuw1U3QEFlBJLsp97T7uH6WcjHQa38eMBotK2wG6YqmimC ZKD8kOHLiGtlEnbOeoTLSXeUNjsUNnFvpX+cwKa40Sw9E4obIMSxWP56dcb+ZK4zYqOc GBcA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :x-original-sender:reply-to:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-subscribe:list-unsubscribe; bh=KhlQtW1wSX/pe7Uk4pPP8yO2AjyuT3XfPRJroNignpg=; b=H6Hq/clQR0TqeoqESfGygczbuX7bt4vuK4vtWMZx3WxT7rTQeZvJsCPvjwGioBJ001 +sUf92MAYrktieqi/b2Chk9+mYb7nxaKPbv/VGsWeOX/Jy51eJo9V02prLaJl7Uq4V+e MWi4av592PZOJ/d2iQMA+k6LmKZdDuPxBJDIzo/kjHsn75ld9LHVXSSaHaEBuhlkHe1D jRDTurN7CC6TuIN6MXsDOMf3qHHYuilj3O9bWkaQOZB/FVDS4IcVVGFaz1/5V8JLCE8w 273erI0riHBlRTNzA/k8UnqJCYajJ1AAy8xRSX33odVM1u/rxkKJGS+9mUlP9FtZmGge tpYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:date:from:to:message-id:in-reply-to :references:subject:mime-version:x-original-sender:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=KhlQtW1wSX/pe7Uk4pPP8yO2AjyuT3XfPRJroNignpg=; b=kxok2ga4z0zTDJV67pGK/7rPKFBgABXHwxa0jOCRzSzr14k7+fK6UmOWeyyAl+n6ej yiQloKU8FPzWUwwOA68voiCfSr6DVFA1XlwUNQJOjvj7LFYqj5tbSe13KIpyQzzQ5NlN 43VnA0lVz3O99PSPYDx+u3hh7IbuICt3eo+vpS2bU38tbRQDirIAkfPXCNKYKu1GqEWR 8EXa+9y4UntW/uLzY5bhmcntKMtgRE1reAY263kx3IZU1PgWpvuuTt+vLD/uoaPJtzNU BYq9WkWwHasY3N0ClQObu9XylMPi31CAcPyIXT7/y2BM1sOwd1K0fu5lQhTFedqm8qur d4pg== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AIVw112Lpn20/rhc3NDdA9d55JA2C8/4UMOhcDw1fm9J1j6wOWfG3rQ9 nhN9xVCAUnSmcQ== X-Received: by 10.46.5.206 with SMTP id 197mr946ljf.24.1501736138271; Wed, 02 Aug 2017 21:55:38 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 10.46.21.81 with SMTP id 17ls36790ljv.52.gmail; Wed, 02 Aug 2017 21:55:36 -0700 (PDT) X-Received: by 10.46.5.208 with SMTP id 199mr946ljf.20.1501736136556; Wed, 02 Aug 2017 21:55:36 -0700 (PDT) In-Reply-To: <20170802140916.GF38349-9Rnp8PDaXcadBw3G0RLmbRFnWt+6NQIA@public.gmane.org> X-Original-Sender: s.karim.mohammadi-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.org gmane.text.pandoc:17990 Archived-At: ------=_Part_6474_856812752.1501736135277 Content-Type: multipart/alternative; boundary="----=_Part_6475_2065493545.1501736135278" ------=_Part_6475_2065493545.1501736135278 Content-Type: text/plain; charset="UTF-8" Thanks but i'm on windows. Can you guide me about making the desired custom filter? On Wednesday, August 2, 2017 at 6:39:32 PM UTC+4:30, John MacFarlane wrote: > > Probably easier just to preprocess: > > sed e 's/ / /g' input.html | pandoc -f html -t latex > > +++ Karim Mohammadi [Aug 02 17 02:05 ]: > > Hi. I want to convert html file file1.html to file1.tex. The html file > > contains   . > > How can i write an python script as a filter to remove non-breaking > > space ( )? > > This is my code: > > #!/usr/bin/env python > > > > """ > > Pandoc filter to removing   string from the text > > """ > > > > from pandocfilters import toJSONFilter, Para > > > > def debug(content): > > file = open('debug.txt', 'w') > > for item in content: > > file.write("%s\n" % item) > > > > def nbsp(key, value, format, meta): > > uniString = unicode(value, "UTF-8") > > uniString = value.replace(" ", " ") > > > > return uniString > > > > if __name__ == "__main__": > > toJSONFilter(nbsp) > > but calling command: > > pandoc file1.html --filter ./nbsp.py -o file1.tex > > give me some errors. > > > > -- > > You received this message because you are subscribed to the Google > > Groups "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, send > > an email to [1]pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org . > > To post to this group, send email to > > [2]pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org . > > To view this discussion on the web visit > > [3] > https://groups.google.com/d/msgid/pandoc-discuss/3be5ee09-90dc-41ad- > > a368-9298b965dfaa%40googlegroups.com. > > For more options, visit [4]https://groups.google.com/d/optout. > > > >References > > > > 1. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > > 2. mailto:pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > > 3. > https://groups.google.com/d/msgid/pandoc-discuss/3be5ee09-90dc-41ad-a368-9298b965dfaa-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=email&utm_source=footer > > 4. https://groups.google.com/d/optout > > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/cc13bbea-06e5-422b-bcdd-cd9ba1c4cf95%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. ------=_Part_6475_2065493545.1501736135278 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks but i'm on windows.

Can you guide me abo= ut making the desired custom filter?

On Wednesday, August 2, 2017 at= 6:39:32 PM UTC+4:30, John MacFarlane wrote:
Probably easier just to preprocess:

sed e 's/&nbsp;/ /g' input.html | pandoc -f html -t latex

+++ Karim Mohammadi [Aug 02 17 02:05 ]:
> =C2=A0 Hi. I want to convert html file file1.html to file1.tex. Th= e html file
> =C2=A0 contains &nbsp; .
> =C2=A0 How can i write an python script as a filter to remove non-= breaking
> =C2=A0 space (&nbsp)?
> =C2=A0 This is my code:
> =C2=A0 =C2=A0 =C2=A0 #!/usr/bin/env python
>
> =C2=A0 =C2=A0 =C2=A0 """
> =C2=A0 =C2=A0 =C2=A0 Pandoc filter to removing &nbsp; string f= rom the text
> =C2=A0 =C2=A0 =C2=A0 """
>
> =C2=A0 =C2=A0 =C2=A0 from pandocfilters import toJSONFilter, Para
>
> =C2=A0 =C2=A0 =C2=A0 def debug(content):
> =C2=A0 =C2=A0 =C2=A0 file =3D open('debug.txt', 'w'= ;)
> =C2=A0 =C2=A0 =C2=A0 for item in content:
> =C2=A0 =C2=A0 =C2=A0 file.write("%s\n" % item)
>
> =C2=A0 =C2=A0 =C2=A0 def nbsp(key, value, format, meta):
> =C2=A0 =C2=A0 =C2=A0 uniString =3D unicode(value, "UTF-8"= ;)
> =C2=A0 =C2=A0 =C2=A0 uniString =3D value.replace("&nbsp;&= quot;, " ")
>
> =C2=A0 =C2=A0 =C2=A0 return uniString
>
> =C2=A0 =C2=A0 =C2=A0 if __name__ =3D=3D "__main__":
> =C2=A0 =C2=A0 =C2=A0 toJSONFilter(nbsp)
> =C2=A0 =C2=A0but calling command:
> =C2=A0 pandoc file1.html --filter ./nbsp.py -o file1.tex
> =C2=A0 give me some errors.
>
> =C2=A0 --
> =C2=A0 You received this message because you are subscribed to the= Google
> =C2=A0 Groups "pandoc-discuss" group.
> =C2=A0 To unsubscribe from this group and stop receiving emails fr= om it, send
> =C2=A0 an email to [1]pandoc-discus...@googlegroups.com.
> =C2=A0 To post to this group, send email to
> =C2=A0 [2]pandoc-...@googlegroups.com.
> =C2=A0 To view this discussion on the web visit
> =C2=A0 [3]http= s://groups.google.com/d/msgid/pandoc-discuss/3be5ee09-90dc-41ad-<= /a>
> =C2=A0 a368-9298b965dfaa%
40googlegroups.com.
> =C2=A0 For more options, visit [4]https:= //groups.google.com/d/optout.
>
>References
>
> =C2=A0 1. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
> =C2=A0 2. mailto:pandoc-...@googlegroups.com
> =C2=A0 3. https://groups.google.com/d/msgid/pandoc-discuss/3be5ee09-9= 0dc-41ad-a368-9298b965dfaa@googlegroups.com?utm_medium=3Demail&am= p;utm_source=3Dfooter
> =C2=A0 4. https://groups.google.com/d/optout

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/= msgid/pandoc-discuss/cc13bbea-06e5-422b-bcdd-cd9ba1c4cf95%40googlegroups.co= m.
For more options, visit http= s://groups.google.com/d/optout.
------=_Part_6475_2065493545.1501736135278-- ------=_Part_6474_856812752.1501736135277--