From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/11652
Path: news.gmane.org!not-for-mail
From: Elliott Slaughter <elliottslaughter-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Newsgroups: gmane.text.pandoc
Subject: Towards (better) Python filters for Pandoc with fluent queries
Date: Thu, 1 Jan 2015 22:58:14 -0800
Message-ID: <CAJ9X=kb9W0_Jd4ufPcRiZSSZ+5Bpftg4hZ82zCuBLb-moadnSQ@mail.gmail.com>
Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary=001a11c2bc1c5c7929050ba5dd58
X-Trace: ger.gmane.org 1420181900 16765 80.91.229.3 (2 Jan 2015 06:58:20 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Fri, 2 Jan 2015 06:58:20 +0000 (UTC)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Original-X-From: pandoc-discuss+bncBDRLZB6H3ABBBB4DTGSQKGQE7P4YHSQ-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Fri Jan 02 07:58:16 2015
Return-path: <pandoc-discuss+bncBDRLZB6H3ABBBB4DTGSQKGQE7P4YHSQ-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Envelope-to: gtp-pandoc-discuss@m.gmane.org
Original-Received: from mail-qg0-f56.google.com ([209.85.192.56])
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <pandoc-discuss+bncBDRLZB6H3ABBBB4DTGSQKGQE7P4YHSQ-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>)
	id 1Y6wBc-0003Wi-3L
	for gtp-pandoc-discuss@m.gmane.org; Fri, 02 Jan 2015 07:58:16 +0100
Original-Received: by mail-qg0-f56.google.com with SMTP id q107sf2956886qgd.11
        for <gtp-pandoc-discuss@m.gmane.org>; Thu, 01 Jan 2015 22:58:15 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=googlegroups.com; s=20120806;
        h=mime-version:date:message-id:subject:from:to:content-type
         :x-original-sender:x-original-authentication-results:reply-to
         :precedence:mailing-list:list-id:list-post:list-help:list-archive
         :sender:list-subscribe:list-unsubscribe;
        bh=pkmYibjp768OQmxGXCX6nt7Kz5ywjjCGeVvCpEY74Hw=;
        b=uMIs0+am1g/7OE7nV4/4hhNuAwsEld37fsHEhmHHo3f+k5HyVDlKJM/qvwy2rhiO2b
         94eBCxqF8YOUhjAlokVpbkfxUjc6WvHk4Z30gZRAneZkBbDydt7XkcehVQqJn5GDkTE4
         wgGvROxim9WxBVPYYB7/nml4+fQmIV5nwGO2XVHt2NRzEEnX4yNpQwxm0OCx3lOlW0Ml
         yuxhGs2rl5vDVEkF3vYuNkCVij4pX3dnlOVc8ys797mIh6s4NPqDu3CTJpPUE3sHXnmH
         0cvWXNrJvrGmtImaZcK7BvuBR+JkARtRsKgd9R/vUwdJy1n6A/VZnmPj+1LTNxW01R+8
         9DyQ==
X-Received: by 10.140.22.48 with SMTP id 45mr1180029qgm.5.1420181895455;
        Thu, 01 Jan 2015 22:58:15 -0800 (PST)
X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Original-Received: by 10.140.93.165 with SMTP id d34ls2827128qge.61.gmail; Thu, 01 Jan
 2015 22:58:15 -0800 (PST)
X-Received: by 10.236.61.111 with SMTP id v75mr57589585yhc.39.1420181895020;
        Thu, 01 Jan 2015 22:58:15 -0800 (PST)
Original-Received: from mail-yh0-x235.google.com (mail-yh0-x235.google.com. [2607:f8b0:4002:c01::235])
        by gmr-mx.google.com with ESMTPS id v47si1460039yhn.0.2015.01.01.22.58.15
        for <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
        (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
        Thu, 01 Jan 2015 22:58:15 -0800 (PST)
Received-SPF: pass (google.com: domain of elliottslaughter-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates 2607:f8b0:4002:c01::235 as permitted sender) client-ip=2607:f8b0:4002:c01::235;
Original-Received: by mail-yh0-f53.google.com with SMTP id i57so8771060yha.12
        for <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>; Thu, 01 Jan 2015 22:58:14 -0800 (PST)
X-Received: by 10.236.227.230 with SMTP id d96mr41680683yhq.152.1420181894885;
 Thu, 01 Jan 2015 22:58:14 -0800 (PST)
Original-Received: by 10.170.124.145 with HTTP; Thu, 1 Jan 2015 22:58:14 -0800 (PST)
X-Original-Sender: elliottslaughter-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
X-Original-Authentication-Results: gmr-mx.google.com;       spf=pass
 (google.com: domain of elliottslaughter-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org designates
 2607:f8b0:4002:c01::235 as permitted sender) smtp.mail=elliottslaughter-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org;
       dkim=pass header.i=@gmail.com;       dmarc=pass (p=NONE dis=NONE) header.from=gmail.com
Precedence: list
Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
List-ID: <pandoc-discuss.googlegroups.com>
X-Google-Group-Id: 1007024079513
List-Post: <http://groups.google.com/group/pandoc-discuss/post>, <mailto:pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
List-Help: <http://groups.google.com/support/>, <mailto:pandoc-discuss+help-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
List-Archive: <http://groups.google.com/group/pandoc-discuss
Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
List-Subscribe: <http://groups.google.com/group/pandoc-discuss/subscribe>, <mailto:pandoc-discuss+subscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
List-Unsubscribe: <mailto:googlegroups-manage+1007024079513+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>,
 <http://groups.google.com/group/pandoc-discuss/subscribe>
Xref: news.gmane.org gmane.text.pandoc:11652
Archived-At: <http://permalink.gmane.org/gmane.text.pandoc/11652>

--001a11c2bc1c5c7929050ba5dd58
Content-Type: text/plain; charset=UTF-8

I like being able to script Pandoc via filters in Python, but one of the
major drawbacks of the approach as it currently stands is that Python has
no pattern matching to speak of. As a result, code that needs to run
queries of the structure of Pandoc documents quickly turns into a
nightmare, especially if that code needs to check nested structures.

Consider the following partial function in Haskell, which matches against a
BlockQuote containing a Para where the first word is "Chapter" in small
caps:

    filter :: Block -> Block
    filter (BlockQuote [Para (SmallCaps [Str "Chapter"]):_]) = ...

Without pattern matching, the equivalent code in Python is painful to
write, opaque, and quite brittle. Unfortunately, without support for
pattern matching, there is no possibility of a direct analogue in Python.
Instead, I propose a fluent interface
<https://en.wikipedia.org/wiki/Fluent_interface> as a way to provide a
query language of sorts for Python. So for example, the same query might
look like:

    m = Matcher(block).
            BlockQuote(length = 1)[0].
            Para(length = -1)[0].
            SmallCaps(length = 1)[0].
            Str(content = 'Chapter')
    if m.matches():
        ...

The code is not quite as dense because I've split it out for legibility,
but can be condensed better to fit on a single line if desired. It is at
any rate a massive improvement over hand-written queries over the JSON
structure of the document.

A proof of concept library is available today, and has been demonstrated
with the query above as well as other queries I have needed in my own
projects. Current coverage of the Pandoc API is at around 50%. The code is
made available under an MIT license:

https://bitbucket.org/elliottslaughter/pandocpatterns

I would greatly appreciate any thoughts or feedback on the concept, design,
or implementation. Please feel free to take the code out for a test drive
and kick the tires. If there is interest, I would be willing to invest the
effort to improve the library and make it more robust and useful.

Thank you for your time.


-- 
Elliott Slaughter

"Don't worry about what anybody else is going to do. The best way to
predict the future is to invent it." - Alan Kay

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAJ9X%3Dkb9W0_Jd4ufPcRiZSSZ%2B5Bpftg4hZ82zCuBLb-moadnSQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--001a11c2bc1c5c7929050ba5dd58
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">I like being able to script Pandoc via filters in Python, =
but one of the major drawbacks of the approach as it currently stands is th=
at Python has no pattern matching to speak of. As a result, code that needs=
 to run queries of the structure of Pandoc documents quickly turns into a n=
ightmare, especially if that code needs to check nested structures.<br><br>=
Consider the following partial function in Haskell, which matches against a=
 BlockQuote containing a Para where the first word is &quot;Chapter&quot; i=
n small caps:<br><br>=C2=A0=C2=A0=C2=A0 filter :: Block -&gt; Block<br>=C2=
=A0=C2=A0=C2=A0 filter (BlockQuote [Para (SmallCaps [Str &quot;Chapter&quot=
;]):_]) =3D ...<br><br>Without pattern matching, the equivalent code in Pyt=
hon is painful to write, opaque, and quite brittle. Unfortunately, without =
support for pattern matching, there is no possibility of a direct analogue =
in Python. Instead, I propose a <a href=3D"https://en.wikipedia.org/wiki/Fl=
uent_interface">fluent interface</a> as a way to provide a query language o=
f sorts for Python. So for example, the same query might look like:<br><br>=
=C2=A0=C2=A0=C2=A0 m =3D Matcher(block).<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 BlockQuote(length =3D 1)[0].<br>=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Para(length =
=3D -1)[0].<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0 SmallCaps(length =3D 1)[0].<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Str(content =3D &#39;Chapter&#39;)<br>=C2=A0=
=C2=A0=C2=A0 if m.matches():<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =
...<br><br>The code is not quite as dense because I&#39;ve split it out for=
 legibility, but can be condensed better to fit on a single line if desired=
. It is at any rate a massive improvement over hand-written queries over th=
e JSON structure of the document.<br><br>A proof of concept library is avai=
lable today, and has been demonstrated with the query above as well as othe=
r queries I have needed in my own projects. Current coverage of the Pandoc =
API is at around 50%. The code is made available under an MIT license:<br><=
br><a href=3D"https://bitbucket.org/elliottslaughter/pandocpatterns">https:=
//bitbucket.org/elliottslaughter/pandocpatterns</a><br><br>I would greatly =
appreciate any thoughts or feedback on the concept, design, or implementati=
on. Please feel free to take the code out for a test drive and kick the tir=
es. If there is interest, I would be willing to invest the effort to improv=
e the library and make it more robust and useful.<br><br>Thank you for your=
 time.<br><br clear=3D"all"><br>-- <br><div class=3D"gmail_signature">Ellio=
tt Slaughter<br><br>&quot;Don&#39;t worry about what anybody else is going =
to do. The best way to predict the future is to invent it.&quot; - Alan Kay=
</div>
</div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;pandoc-discuss&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org">pand=
oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:pandoc-discuss@googl=
egroups.com">pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/d/msgid/pandoc-discuss/CAJ9X%3Dkb9W0_Jd4ufPcRiZSSZ%2B5Bpftg4hZ82zCuBLb-m=
oadnSQ%40mail.gmail.com?utm_medium=3Demail&utm_source=3Dfooter">https://gro=
ups.google.com/d/msgid/pandoc-discuss/CAJ9X%3Dkb9W0_Jd4ufPcRiZSSZ%2B5Bpftg4=
hZ82zCuBLb-moadnSQ%40mail.gmail.com</a>.<br />
For more options, visit <a href=3D"https://groups.google.com/d/optout">http=
s://groups.google.com/d/optout</a>.<br />

--001a11c2bc1c5c7929050ba5dd58--