From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/17988 Path: news.gmane.org!.POSTED!not-for-mail From: John MacFarlane Newsgroups: gmane.text.pandoc Subject: Re: Writing custom filter in python to remove non-breaking spaces Date: Wed, 2 Aug 2017 07:09:16 -0700 Message-ID: <20170802140916.GF38349@Johns-MacBook-Pro.local> References: <3be5ee09-90dc-41ad-a368-9298b965dfaa@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8"; format=flowed X-Trace: blaine.gmane.org 1501682973 26941 195.159.176.226 (2 Aug 2017 14:09:33 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 2 Aug 2017 14:09:33 +0000 (UTC) User-Agent: Mutt/1.6.2 (2016-07-01) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBCJZJHG45QDBBGN2Q7GAKGQEMH332PI-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Wed Aug 02 16:09:28 2017 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-pg0-f61.google.com ([74.125.83.61]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dcuKy-0006dE-RD for gtp-pandoc-discuss@m.gmane.org; Wed, 02 Aug 2017 16:09:25 +0200 Original-Received: by mail-pg0-f61.google.com with SMTP id u132sf2790709pgb.2 for ; Wed, 02 Aug 2017 07:09:31 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1501682970; cv=pass; d=google.com; s=arc-20160816; b=kwRAGWf9bjWZRXFUz+u/h0gyNBgAHy1ft/ugjEV0ZdAqtV1GhIcBqThbgIeRNmyXQg hdGIX8j68Gm2S47Xj3tjgOWxxpSg2M99Y1gF9yb2ID+mc0Z1oDZmpkvpP3sX2DzpgKEB cdh68SL+0fRRmxibS90Tt9IbbSxDk9dWE/vzq4yhzzeotUuV84vfQ/vHJT3QPvtvhjCi EwP0yn4vvSuZM5OqXeourjXWEcbS9EaMgKe8uKSMTxoed7Gc7kcFx8PNwV77qeYs59Cn Apbg9Ns4uEIAYxVcisW2A2aofEsNdZOw4ArIuia5n6Wt4rxJcKIwUsDl1FCpgnGa/jD1 PHKg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:to :date:from:arc-authentication-results:arc-message-signature:sender :dkim-signature:arc-authentication-results; bh=f/LzZW1ynfkaVUUMSTgDR6omwSmpb70B4sinTLhpkys=; b=t9rpI+mVN3AVCvR4JsshC2SKCgg0xLb8YNmcJJanqJPhmQvhZvK/pezJm1DjDbGGFh V7NwLaHRJDv8UgdCESL6Bv0U8RHId/ZSeSsMSykFxr49fn1bdq+f3zgmV36ZIVJe+llz PwgRceTxUpl1n2arQKmq/+5Cxdcb72GsMrkZRdhcJZGRJlMvUXX4od6kztZeq6iePb5G KQNWHAg/xNi5JhcWXnvj+CCw1cvjKzOSq1+JosfM+zGunbcEMsCRWNQwt7ECSaBA4ZYk pvVxe89GflMzr6GlO0RaC1uULrG5erX1s9zjQGLIuDCDkdftaMWCHaQX07tw1bbphGiX ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20150623.gappssmtp.com header.b=wPH8cLlN; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:400e:c05::22c as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:from:date:to:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent:x-original-sender :x-original-authentication-results:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:list-subscribe :list-unsubscribe; bh=f/LzZW1ynfkaVUUMSTgDR6omwSmpb70B4sinTLhpkys=; b=S8DX1B94Sll7IpKOcxG1IC9ZkKyOoeIbiq336DzFDqmwFY+4E3L1irVtyBul1MWget cOqdUoUAYRCTJV4kFwEPrnSC+0VXKYszPQZvcN1GEj50f7NNznC4/Jynv1jHi5HcaLMN ODfjL67WNa/8SxPKrbCRnIBxHG9lk29PwLSSuvAUODyr50IqOC1whholb2NVNFaE/yTL 08gLiJaOiKKEqolXPtDT+nrTPLN+Ye/WiQLodHuhrfNfdz3D5fpsu6MiO1YWePsw7qHn VeAdeNLuUac+uRFE2lUoYHN/ZnTKg/TndCN2p4nI20u8gleGzgrp5NpSSDXCKr6q5Dmi 1yWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:from:date:to:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=f/LzZW1ynfkaVUUMSTgDR6omwSmpb70B4sinTLhpkys=; b=tseDm80ieGFQnqJwRrGrjCzrFqAmCraSbZrb0oz6ycP4FnboXQFyqMII+IqyoXmOSH vK2nBf6ACTl8stAVzoLtZsljO1Pb0wpGo7cwiJeysZgZ+grGNU2XHpOwxTG6Ysap9YU2 WgTU4y6BNBYaVKlXL7dWqZ/K09d6g6SixuLjsVTmidReYQyaG4wjCE1GegNprgOZsI+n ErGHp/rlfq89mNg9aCT4hXUm434pQfPGGST90H2eZrAZVtxHlXIu5rGDPNkCcCoxNfoJ aPRzkConfbX8DI7cpvir13V8saIF2qKiBEipPppyTovV46iQnaCe2XJttp8T7FQ2fmk4 Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AIVw111eewmrCAsHwLscyPXJdfzN9CmGVV8IYt0M5/luPRNMqVjH/rYP fnBLysMR9kWq/w== X-Received: by 10.36.103.4 with SMTP id u4mr235511itc.14.1501682970545; Wed, 02 Aug 2017 07:09:30 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 10.36.138.134 with SMTP id v128ls3194284itd.15.gmail; Wed, 02 Aug 2017 07:09:29 -0700 (PDT) X-Received: by 10.99.99.70 with SMTP id x67mr15698444pgb.96.1501682969721; Wed, 02 Aug 2017 07:09:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1501682969; cv=none; d=google.com; s=arc-20160816; b=e5bFSOQPchkQkaV3MBnKyvYWxdTIoCRkQM3GAaTGbGTdt0PHH5vS3swq79ZWg2R7JH UTZ7CbGcTtbVVHcpXp6KheTZGakIb+yxT2uDBhiu28otBv+nQmdlXoufV/xf+EzZ8aHu 9sb3ckPqj9bXns99AGYP01cq7GVfRqdoGL7ZGHtrIV7afwwD6m7Y2qIfq+dPgiHNb2zz wnAcFHO2SCe6Oc266H28CcKCyBbHQnQdTEPfCcpm9n5lhqfUInSijl1pVhjaiZL7nBIS xmb2NqDAETiUeHdtqGQTBR8yQuPyvUtmWA9le1YWTGA5r210Pn3s2t3MiVyF5/RKCC8R zOmw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:to:date:from:dkim-signature :arc-authentication-results; bh=GOZP49Tnx4EqNTlR98a87YKazrY54AkJlQGVVkELLtI=; b=uVz8YPzGtJK7yQPNCOU/s9aU2TWIInu6z/rIz7tgi+5CsgEf4GvByh6QuWzXbcB4mi F1c29pWtLLR2JjL9jTg67F/mhoCuooWKKUeB2OIagmeLvtLJ0MO44brr5wNwYy68z12D lIBLwvkZNNP/PN0IkrjH6348Wlt9Hg/AzIsUtNI20Lplw3v6Vk4jaBKQhScVetw/pE7Z 19DuqLAhFCF7N5fI2hsHwtzJB0mw8BmT75p0KuJsBu1vK3ZYBVGDtDhFrtxyvN0bAevb 7pDocntKzOa4iTMFh9Eydx2I0/JE7IDtZgfQBgwTpbWYnRje99NMdDzUakrB1djLwQ8m VHag== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20150623.gappssmtp.com header.b=wPH8cLlN; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:400e:c05::22c as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org Original-Received: from mail-pg0-x22c.google.com (mail-pg0-x22c.google.com. [2607:f8b0:400e:c05::22c]) by gmr-mx.google.com with ESMTPS id z83si920916pfd.17.2017.08.02.07.09.29 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 02 Aug 2017 07:09:29 -0700 (PDT) Received-SPF: pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:400e:c05::22c as permitted sender) client-ip=2607:f8b0:400e:c05::22c; Original-Received: by mail-pg0-x22c.google.com with SMTP id c14so21811743pgn.0 for ; Wed, 02 Aug 2017 07:09:29 -0700 (PDT) X-Received: by 10.84.215.129 with SMTP id l1mr25033401pli.275.1501682969037; Wed, 02 Aug 2017 07:09:29 -0700 (PDT) Original-Received: from johnmacfarlane.net (li55-134.members.linode.com. [74.82.3.134]) by smtp.gmail.com with ESMTPSA id k73sm46407429pfg.17.2017.08.02.07.09.28 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 02 Aug 2017 07:09:28 -0700 (PDT) Original-Received: by johnmacfarlane.net (Postfix, from userid 1000) id E0D21A1C5; Wed, 2 Aug 2017 10:09:17 -0400 (EDT) Content-Disposition: inline In-Reply-To: <3be5ee09-90dc-41ad-a368-9298b965dfaa-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> X-PGP-Key: http://johnmacfarlane.net/jgm.asc X-Original-Sender: jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@berkeley-edu.20150623.gappssmtp.com header.b=wPH8cLlN; spf=pass (google.com: domain of jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org designates 2607:f8b0:400e:c05::22c as permitted sender) smtp.mailfrom=jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.org gmane.text.pandoc:17988 Archived-At: Probably easier just to preprocess: sed e 's/ / /g' input.html | pandoc -f html -t latex +++ Karim Mohammadi [Aug 02 17 02:05 ]: > Hi. I want to convert html file file1.html to file1.tex. The html file > contains   . > How can i write an python script as a filter to remove non-breaking > space ( )? > This is my code: > #!/usr/bin/env python > > """ > Pandoc filter to removing   string from the text > """ > > from pandocfilters import toJSONFilter, Para > > def debug(content): > file = open('debug.txt', 'w') > for item in content: > file.write("%s\n" % item) > > def nbsp(key, value, format, meta): > uniString = unicode(value, "UTF-8") > uniString = value.replace(" ", " ") > > return uniString > > if __name__ == "__main__": > toJSONFilter(nbsp) > but calling command: > pandoc file1.html --filter ./nbsp.py -o file1.tex > give me some errors. > > -- > You received this message because you are subscribed to the Google > Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send > an email to [1]pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To post to this group, send email to > [2]pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit > [3]https://groups.google.com/d/msgid/pandoc-discuss/3be5ee09-90dc-41ad- > a368-9298b965dfaa%40googlegroups.com. > For more options, visit [4]https://groups.google.com/d/optout. > >References > > 1. mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > 2. mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org > 3. https://groups.google.com/d/msgid/pandoc-discuss/3be5ee09-90dc-41ad-a368-9298b965dfaa-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org?utm_medium=email&utm_source=footer > 4. https://groups.google.com/d/optout