public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
From: Elliott Slaughter <elliottslaughter-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
Subject: Towards (better) Python filters for Pandoc with fluent queries
Date: Thu, 1 Jan 2015 22:58:14 -0800	[thread overview]
Message-ID: <CAJ9X=kb9W0_Jd4ufPcRiZSSZ+5Bpftg4hZ82zCuBLb-moadnSQ@mail.gmail.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 2820 bytes --]

I like being able to script Pandoc via filters in Python, but one of the
major drawbacks of the approach as it currently stands is that Python has
no pattern matching to speak of. As a result, code that needs to run
queries of the structure of Pandoc documents quickly turns into a
nightmare, especially if that code needs to check nested structures.

Consider the following partial function in Haskell, which matches against a
BlockQuote containing a Para where the first word is "Chapter" in small
caps:

    filter :: Block -> Block
    filter (BlockQuote [Para (SmallCaps [Str "Chapter"]):_]) = ...

Without pattern matching, the equivalent code in Python is painful to
write, opaque, and quite brittle. Unfortunately, without support for
pattern matching, there is no possibility of a direct analogue in Python.
Instead, I propose a fluent interface
<https://en.wikipedia.org/wiki/Fluent_interface> as a way to provide a
query language of sorts for Python. So for example, the same query might
look like:

    m = Matcher(block).
            BlockQuote(length = 1)[0].
            Para(length = -1)[0].
            SmallCaps(length = 1)[0].
            Str(content = 'Chapter')
    if m.matches():
        ...

The code is not quite as dense because I've split it out for legibility,
but can be condensed better to fit on a single line if desired. It is at
any rate a massive improvement over hand-written queries over the JSON
structure of the document.

A proof of concept library is available today, and has been demonstrated
with the query above as well as other queries I have needed in my own
projects. Current coverage of the Pandoc API is at around 50%. The code is
made available under an MIT license:

https://bitbucket.org/elliottslaughter/pandocpatterns

I would greatly appreciate any thoughts or feedback on the concept, design,
or implementation. Please feel free to take the code out for a test drive
and kick the tires. If there is interest, I would be willing to invest the
effort to improve the library and make it more robust and useful.

Thank you for your time.


-- 
Elliott Slaughter

"Don't worry about what anybody else is going to do. The best way to
predict the future is to invent it." - Alan Kay

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAJ9X%3Dkb9W0_Jd4ufPcRiZSSZ%2B5Bpftg4hZ82zCuBLb-moadnSQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 3583 bytes --]

             reply	other threads:[~2015-01-02  6:58 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-02  6:58 Elliott Slaughter [this message]
     [not found] ` <CAJ9X=kb9W0_Jd4ufPcRiZSSZ+5Bpftg4hZ82zCuBLb-moadnSQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-01-02 10:34   ` Mark Szepieniec
     [not found]     ` <CAE4-1rVjgCKdkXWOVkmyiwhDxxDawLHaLt0qkr7nUEp0QnPQyQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-01-03  5:46       ` Elliott Slaughter
2015-01-02 15:09   ` Wagner Macedo
2015-01-02 16:50   ` John MacFarlane
     [not found]     ` <20150102165038.GA25833-bi+AKbBUZKbivNSvqvJHCtPlBySK3R6THiGdP5j34PU@public.gmane.org>
2015-01-03  5:50       ` Elliott Slaughter
     [not found]         ` <CAJ9X=ka-d7_rg4qSejF_ueWysg_oJ_LxYS9QHcJKgdU-Os0XHQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-01-03 10:17           ` Caleb McDaniel
     [not found]             ` <4b9fb48b-091c-44c6-9201-f5c5b559253f-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2015-01-04  8:23               ` Elliott Slaughter
     [not found]                 ` <CAJ9X=kZu94RTat+SKLknvUJJQWnENd9xkF9DJCJCHhiBY-qyDA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-01-04  8:25                   ` Elliott Slaughter
2015-01-22  5:25       ` Elliott Slaughter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJ9X=kb9W0_Jd4ufPcRiZSSZ+5Bpftg4hZ82zCuBLb-moadnSQ@mail.gmail.com' \
    --to=elliottslaughter-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).