From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/11652 Path: news.gmane.org!not-for-mail From: Elliott Slaughter Newsgroups: gmane.text.pandoc Subject: Towards (better) Python filters for Pandoc with fluent queries Date: Thu, 1 Jan 2015 22:58:14 -0800 Message-ID: Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=001a11c2bc1c5c7929050ba5dd58 X-Trace: ger.gmane.org 1420181900 16765 80.91.229.3 (2 Jan 2015 06:58:20 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 2 Jan 2015 06:58:20 +0000 (UTC) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-X-From: pandoc-discuss+bncBDRLZB6H3ABBBB4DTGSQKGQE7P4YHSQ-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Fri Jan 02 07:58:16 2015 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane.org Original-Received: from mail-qg0-f56.google.com ([209.85.192.56]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Y6wBc-0003Wi-3L for gtp-pandoc-discuss@m.gmane.org; Fri, 02 Jan 2015 07:58:16 +0100 Original-Received: by mail-qg0-f56.google.com with SMTP id q107sf2956886qgd.11 for ; Thu, 01 Jan 2015 22:58:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20120806; h=mime-version:date:message-id:subject:from:to:content-type :x-original-sender:x-original-authentication-results:reply-to :precedence:mailing-list:list-id:list-post:list-help:list-archive :sender:list-subscribe:list-unsubscribe; bh=pkmYibjp768OQmxGXCX6nt7Kz5ywjjCGeVvCpEY74Hw=; b=uMIs0+am1g/7OE7nV4/4hhNuAwsEld37fsHEhmHHo3f+k5HyVDlKJM/qvwy2rhiO2b 94eBCxqF8YOUhjAlokVpbkfxUjc6WvHk4Z30gZRAneZkBbDydt7XkcehVQqJn5GDkTE4 wgGvROxim9WxBVPYYB7/nml4+fQmIV5nwGO2XVHt2NRzEEnX4yNpQwxm0OCx3lOlW0Ml yuxhGs2rl5vDVEkF3vYuNkCVij4pX3dnlOVc8ys797mIh6s4NPqDu3CTJpPUE3sHXnmH 0cvWXNrJvrGmtImaZcK7BvuBR+JkARtRsKgd9R/vUwdJy1n6A/VZnmPj+1LTNxW01R+8 9DyQ== X-Received: by 10.140.22.48 with SMTP id 45mr1180029qgm.5.1420181895455; Thu, 01 Jan 2015 22:58:15 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 10.140.93.165 with SMTP id d34ls2827128qge.61.gmail; Thu, 01 Jan 2015 22:58:15 -0800 (PST) X-Received: by 10.236.61.111 with SMTP id v75mr57589585yhc.39.1420181895020; Thu, 01 Jan 2015 22:58:15 -0800 (PST) Original-Received: from mail-yh0-x235.google.com (mail-yh0-x235.google.com. [2607:f8b0:4002:c01::235]) by gmr-mx.google.com with ESMTPS id v47si1460039yhn.0.2015.01.01.22.58.15 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 01 Jan 2015 22:58:15 -0800 (PST) Received-SPF: pass (google.com: domain of elliottslaughter-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2607:f8b0:4002:c01::235 as permitted sender) client-ip=2607:f8b0:4002:c01::235; Original-Received: by mail-yh0-f53.google.com with SMTP id i57so8771060yha.12 for ; Thu, 01 Jan 2015 22:58:14 -0800 (PST) X-Received: by 10.236.227.230 with SMTP id d96mr41680683yhq.152.1420181894885; Thu, 01 Jan 2015 22:58:14 -0800 (PST) Original-Received: by 10.170.124.145 with HTTP; Thu, 1 Jan 2015 22:58:14 -0800 (PST) X-Original-Sender: elliottslaughter-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of elliottslaughter-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2607:f8b0:4002:c01::235 as permitted sender) smtp.mail=elliottslaughter-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; dkim=pass header.i=@gmail.com; dmarc=pass (p=NONE dis=NONE) header.from=gmail.com Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.org gmane.text.pandoc:11652 Archived-At: --001a11c2bc1c5c7929050ba5dd58 Content-Type: text/plain; charset=UTF-8 I like being able to script Pandoc via filters in Python, but one of the major drawbacks of the approach as it currently stands is that Python has no pattern matching to speak of. As a result, code that needs to run queries of the structure of Pandoc documents quickly turns into a nightmare, especially if that code needs to check nested structures. Consider the following partial function in Haskell, which matches against a BlockQuote containing a Para where the first word is "Chapter" in small caps: filter :: Block -> Block filter (BlockQuote [Para (SmallCaps [Str "Chapter"]):_]) = ... Without pattern matching, the equivalent code in Python is painful to write, opaque, and quite brittle. Unfortunately, without support for pattern matching, there is no possibility of a direct analogue in Python. Instead, I propose a fluent interface as a way to provide a query language of sorts for Python. So for example, the same query might look like: m = Matcher(block). BlockQuote(length = 1)[0]. Para(length = -1)[0]. SmallCaps(length = 1)[0]. Str(content = 'Chapter') if m.matches(): ... The code is not quite as dense because I've split it out for legibility, but can be condensed better to fit on a single line if desired. It is at any rate a massive improvement over hand-written queries over the JSON structure of the document. A proof of concept library is available today, and has been demonstrated with the query above as well as other queries I have needed in my own projects. Current coverage of the Pandoc API is at around 50%. The code is made available under an MIT license: https://bitbucket.org/elliottslaughter/pandocpatterns I would greatly appreciate any thoughts or feedback on the concept, design, or implementation. Please feel free to take the code out for a test drive and kick the tires. If there is interest, I would be willing to invest the effort to improve the library and make it more robust and useful. Thank you for your time. -- Elliott Slaughter "Don't worry about what anybody else is going to do. The best way to predict the future is to invent it." - Alan Kay -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAJ9X%3Dkb9W0_Jd4ufPcRiZSSZ%2B5Bpftg4hZ82zCuBLb-moadnSQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout. --001a11c2bc1c5c7929050ba5dd58 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I like being able to script Pandoc via filters in Python, = but one of the major drawbacks of the approach as it currently stands is th= at Python has no pattern matching to speak of. As a result, code that needs= to run queries of the structure of Pandoc documents quickly turns into a n= ightmare, especially if that code needs to check nested structures.

= Consider the following partial function in Haskell, which matches against a= BlockQuote containing a Para where the first word is "Chapter" i= n small caps:

=C2=A0=C2=A0=C2=A0 filter :: Block -> Block
=C2= =A0=C2=A0=C2=A0 filter (BlockQuote [Para (SmallCaps [Str "Chapter"= ;]):_]) =3D ...

Without pattern matching, the equivalent code in Pyt= hon is painful to write, opaque, and quite brittle. Unfortunately, without = support for pattern matching, there is no possibility of a direct analogue = in Python. Instead, I propose a fluent interface as a way to provide a query language o= f sorts for Python. So for example, the same query might look like:

= =C2=A0=C2=A0=C2=A0 m =3D Matcher(block).
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 BlockQuote(length =3D 1)[0].
=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Para(length = =3D -1)[0].
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 SmallCaps(length =3D 1)[0].
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Str(content =3D 'Chapter')
=C2=A0= =C2=A0=C2=A0 if m.matches():
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = ...

The code is not quite as dense because I've split it out for= legibility, but can be condensed better to fit on a single line if desired= . It is at any rate a massive improvement over hand-written queries over th= e JSON structure of the document.

A proof of concept library is avai= lable today, and has been demonstrated with the query above as well as othe= r queries I have needed in my own projects. Current coverage of the Pandoc = API is at around 50%. The code is made available under an MIT license:
<= br>https:= //bitbucket.org/elliottslaughter/pandocpatterns

I would greatly = appreciate any thoughts or feedback on the concept, design, or implementati= on. Please feel free to take the code out for a test drive and kick the tir= es. If there is interest, I would be willing to invest the effort to improv= e the library and make it more robust and useful.

Thank you for your= time.


--
Ellio= tt Slaughter

"Don't worry about what anybody else is going = to do. The best way to predict the future is to invent it." - Alan Kay=

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://gro= ups.google.com/d/msgid/pandoc-discuss/CAJ9X%3Dkb9W0_Jd4ufPcRiZSSZ%2B5Bpftg4= hZ82zCuBLb-moadnSQ%40mail.gmail.com.
For more options, visit http= s://groups.google.com/d/optout.
--001a11c2bc1c5c7929050ba5dd58--