From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/16526 Path: news.gmane.org!.POSTED!not-for-mail From: =?UTF-8?Q?Luis_Fernado_Silva_Castro_de_Ara=C3=BAjo?= Newsgroups: gmane.text.pandoc Subject: Re: Scripting with Haskell to achieve a filter for acronyms in pandoc Date: Tue, 27 Dec 2016 19:14:29 -0800 (PST) Message-ID: References: <6c9b4eec-0ea9-46bf-8da3-51998b63c902@googlegroups.com> <20160919182539.GA9066@Johns-MBP.home> <20160923083514.GI86115@Administrateurs-iMac-3.local> <92666292-e593-4249-8ec2-ad37ceba79d2@googlegroups.com> <719e577c-e900-4951-afb2-0423e71ff601@googlegroups.com> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_4584_736532682.1482894870038" X-Trace: blaine.gmane.org 1482894875 18845 195.159.176.226 (28 Dec 2016 03:14:35 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 28 Dec 2016 03:14:35 +0000 (UTC) To: pandoc-discuss Original-X-From: pandoc-discuss+bncBCWMJMNWRMNBBFW4RTBQKGQE6KBFZZI-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Wed Dec 28 04:14:31 2016 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-pg0-f61.google.com ([74.125.83.61]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cM4h8-0003kZ-8S for gtp-pandoc-discuss@m.gmane.org; Wed, 28 Dec 2016 04:14:26 +0100 Original-Received: by mail-pg0-f61.google.com with SMTP id g1sf20996291pgn.0 for ; Tue, 27 Dec 2016 19:14:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:date:from:to:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=imgGiMAraUDKK7rnTHraaB8D15E9Z1Bj4JlRlX5jyT4=; b=Lv+grTdEJ10LMiiP+YXclvCL/1PyHKml4pzqOw5G/G2w6C182sRiaUSNNas8peRJ77 LrS3FkCXRERs4T+MDWv9Zn3B+sYMyr/8G2WuHJPaRtR0dtghobUgn6xlOTKwPSoTDDsH UqF+rNtmVMTtnSHeWqh2CxwGqtkw0xwSUxukfBFk1GzyD+QdVJqMoCDeoNZKCv7XMv97 DUe4bHlmRXbRkONswOele+wSg4+S7zPZ8h/eekr4qzaTJ6rvtQdAdpuzgFsikaGpwtTq qGlcv7k+RakKz1z6QOyNDVlioTG6wPfSrtsk7Q/zp5hHTGwmyauB/oyZA1rZm9qv8zpt k8CQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :x-original-sender:reply-to:precedence:mailing-list:list-id :x-spam-checked-in-group:list-post:list-help:list-archive :list-subscribe:list-unsubscribe; bh=imgGiMAraUDKK7rnTHraaB8D15E9Z1Bj4JlRlX5jyT4=; b=rxnLw2zmyPPGPJjAS9VQ/+VkKTRVK+eEDfCc6HvJI+7mswavw4bDmv5w82E76FnWYQ A47BV2wDKBX4IuGjCyLtSVrqF4skLelEiRb/cZHtbRtPcjLyrhKQ5Bvwyp44DrpK+bdr HWxTpc2PitRbj553H9c6d1SEwHgLYKkhS0Dd0LRG4dEwQPC4O1Qm+eIozQgtPVRhgSSg fkcO75Kpl9EZMOszaTs3EnRQa4mItlQRMNrmRqi39yZDhTQs2E0uLhj2NCgfbJz0fA34 rnJZnyHWMXI5RUKUKxTlOVlfaRX7tRLBKYD+ZvEnQcMes3812ehToWZ2qi+flhyTCugY JwuA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:date:from:to:message-id:in-reply-to :references:subject:mime-version:x-original-sender:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=imgGiMAraUDKK7rnTHraaB8D15E9Z1Bj4JlRlX5jyT4=; b=ecY6J1VTEykQFYreqaTHovNlAscTweF7HCA5N/ryx3JZuC22HZpKUtnoJO2Lw5Eyhj 4aaxStOdYuDCWFB7ZjNGO8phot3eDt41lCvpasCzXJ/Ken8RU0ljqFU0ufHTJ10U90rp QNMWoX7ACtMo7bL+p2kptHM13ThC2vCvzmsZVkMWzgr2Op4hBfH+noQrFRtCvN4XRp3I yv5SxN3kYOU8oOMrE7a6wWIFNBtxKPKOAnsaZlanYU5R0VmD+FhO9Hf2I7tct79L8HKf dXbFd335l+0UL4AqluFGjKsqCQUX8uPnC+ZArtdoFlxIuXFy1Ypy4Aa/oMfAgmWUWCcr jRuA== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AIkVDXJnEs7ptjUGoKDbhZuVAhB9Tr8CLmdShs/H4jWodCHsWbdd2X8jWiqmt3+Nwkh/Pg== X-Received: by 10.157.51.88 with SMTP id u24mr672789otd.6.1482894870888; Tue, 27 Dec 2016 19:14:30 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 10.157.22.198 with SMTP id s6ls28719354ots.33.gmail; Tue, 27 Dec 2016 19:14:30 -0800 (PST) X-Received: by 10.157.17.3 with SMTP id g3mr1796369ote.8.1482894870501; Tue, 27 Dec 2016 19:14:30 -0800 (PST) In-Reply-To: <719e577c-e900-4951-afb2-0423e71ff601-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> X-Original-Sender: luis.nando-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.org gmane.text.pandoc:16526 Archived-At: ------=_Part_4584_736532682.1482894870038 Content-Type: multipart/alternative; boundary="----=_Part_4585_1277849852.1482894870038" ------=_Part_4585_1277849852.1482894870038 Content-Type: text/plain; charset=UTF-8 Thanks for the comment Sergio, here is a first attempt to adapt into panflute. Note that I am a total beginner in Python, but I am not looking for code, I just need pointers. Could you please point me where the problem is arising? It gets in a loop in the python interactive shell and stops with the error 'TypeError: lookForAcronyms() takes 1 positional argument but 2 were given' when I attempt to run as a filter. Here is the code: # Based on https://github.com/cflewis/Pacrodoc/blob/master/pacrodoc.py import panflute as pf import sys import json import re import urllib acronyms = {} def processAcronym(linkData): # Links look like this: # [[{u'Str': u'Link Name'}], [u'Link URL', 'Link Title']] acronym = linkData[0][0]['Str'] acronymText = linkData[1][0] # First we check if there is an acronym being defined if re.search('^acro:', linkData[1][0]): # An acronym is being defined, so strip off the acro: # prefix and unencode the text acronyms[acronym] = {'text': urllib.unquote(acronymText[5:]), 'used': False} # Strip out this link return {'Str': ''} # Now we check if its referring to an acronym instead if not acronymText and acronym in acronyms: if not acronyms[acronym]['used']: acronyms[acronym]['used'] = True return {'Str': '%s (%s)' % (acronyms[acronym]['text'], acronym)} else: return {'Str': acronym} # It was just a normal link, so return it unchanged return {'Link': linkData} def lookForAcronyms(jsonData): if isinstance(jsonData, list): return [lookForAcronyms(value) for value in jsonData] if isinstance(jsonData, dict): if 'Link' in jsonData: return processAcronym(jsonData['Link']) else: return {k: lookForAcronyms(v) for k, v in jsonData.items()} return jsonData def main(doc=None): return pf.run_filter(lookForAcronyms, doc=doc) if __name__ == '__main__': main() -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/f979c153-7a62-41f9-a782-64532d1cee6b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. ------=_Part_4585_1277849852.1482894870038 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Thanks for the comment Sergio, here is a first attempt to = adapt into panflute. Note that I am a total beginner in Python, but I am no= t looking for code, I just need pointers.=C2=A0

Could yo= u please point me where the problem is arising? It gets in a loop in the py= thon interactive shell and stops with the error 'TypeError: lookForAcro= nyms() takes 1 positional argument but 2 were given' when I attempt to = run as a filter.


Here is t= he code:

# Based on https://github.com/cflewis/Pac= rodoc/blob/master/pacrodoc.py

import panflute as pf
import sys
import json
import re
import ur= llib

acronyms =3D {}

def processAcronym(linkData):
=
=C2=A0 =C2=A0 # Links look like this:
= =C2=A0 =C2=A0 # [[{u'Str': u'Link Name'}], [u'Link URL&= #39;, 'Link Title']]
=C2=A0 =C2=A0 acronym =3D= linkData[0][0]['Str']
=C2=A0 =C2=A0 acronymTe= xt =3D linkData[1][0]

=C2=A0= =C2=A0 # First we check if there is an acronym being defined
=C2=A0 =C2=A0 if re.search('^acro:', linkData[1][0]):
=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 # An acronym is being defined, = so strip off the acro:
=C2=A0 =C2=A0 =C2=A0 =C2=A0 # p= refix and unencode the text
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 acronyms[acronym] =3D {'text': urllib.unquote(acronymText[5:]),= 'used': False}

=C2= =A0 =C2=A0 =C2=A0 =C2=A0 # Strip out this link
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 return {'Str': ''}

=C2=A0 =C2=A0 # Now we check if its referring t= o an acronym instead
=C2=A0 =C2=A0 if not acronymText = and acronym in acronyms:
=C2=A0 =C2=A0 =C2=A0 =C2=A0 i= f not acronyms[acronym]['used']:
=C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 acronyms[acronym]['used'] =3D True
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return {'St= r': '%s (%s)' % (acronyms[acronym]['text'], acronym)}
=C2=A0 =C2=A0 =C2=A0 =C2=A0 else:
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return {'Str': acronym}

=C2=A0 =C2=A0 # It was just a= normal link, so return it unchanged
=C2=A0 =C2=A0 ret= urn {'Link': linkData}

def lookForAcronyms(jsonData):
=C2=A0 =C2=A0 if isi= nstance(jsonData, list):
=C2=A0 =C2=A0 =C2=A0 =C2=A0 r= eturn [lookForAcronyms(value) for value in jsonData]
<= br>
=C2=A0 =C2=A0 if isinstance(jsonData, dict):
=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 if 'Link' in jsonData:<= /div>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return proce= ssAcronym(jsonData['Link'])
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 else:
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 return {k: lookForAcronyms(v) for k, v in jsonData.items()}

=C2=A0 =C2=A0 return jsonData
<= /div>

def main(doc=3DNone):
<= div>
=C2=A0 =C2=A0 return pf.run_filter(lookForAcronyms, doc=3Ddoc)=C2= =A0

if __name__ =3D=3D '= __main__':
=C2=A0 =C2=A0 main()

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d/= msgid/pandoc-discuss/f979c153-7a62-41f9-a782-64532d1cee6b%40googlegroups.co= m.
For more options, visit http= s://groups.google.com/d/optout.
------=_Part_4585_1277849852.1482894870038-- ------=_Part_4584_736532682.1482894870038--