Towards (better) Python filters for Pandoc with fluent queries

public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed

* Towards (better) Python filters for Pandoc with fluent queries
@ 2015-01-02  6:58 Elliott Slaughter
       [not found] ` <CAJ9X=kb9W0_Jd4ufPcRiZSSZ+5Bpftg4hZ82zCuBLb-moadnSQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Elliott Slaughter @ 2015-01-02  6:58 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 2820 bytes --]

I like being able to script Pandoc via filters in Python, but one of the
major drawbacks of the approach as it currently stands is that Python has
no pattern matching to speak of. As a result, code that needs to run
queries of the structure of Pandoc documents quickly turns into a
nightmare, especially if that code needs to check nested structures.

Consider the following partial function in Haskell, which matches against a
BlockQuote containing a Para where the first word is "Chapter" in small
caps:

    filter :: Block -> Block
    filter (BlockQuote [Para (SmallCaps [Str "Chapter"]):_]) = ...

Without pattern matching, the equivalent code in Python is painful to
write, opaque, and quite brittle. Unfortunately, without support for
pattern matching, there is no possibility of a direct analogue in Python.
Instead, I propose a fluent interface
<https://en.wikipedia.org/wiki/Fluent_interface> as a way to provide a
query language of sorts for Python. So for example, the same query might
look like:

    m = Matcher(block).
            BlockQuote(length = 1)[0].
            Para(length = -1)[0].
            SmallCaps(length = 1)[0].
            Str(content = 'Chapter')
    if m.matches():
        ...

The code is not quite as dense because I've split it out for legibility,
but can be condensed better to fit on a single line if desired. It is at
any rate a massive improvement over hand-written queries over the JSON
structure of the document.

A proof of concept library is available today, and has been demonstrated
with the query above as well as other queries I have needed in my own
projects. Current coverage of the Pandoc API is at around 50%. The code is
made available under an MIT license:

https://bitbucket.org/elliottslaughter/pandocpatterns

I would greatly appreciate any thoughts or feedback on the concept, design,
or implementation. Please feel free to take the code out for a test drive
and kick the tires. If there is interest, I would be willing to invest the
effort to improve the library and make it more robust and useful.

Thank you for your time.

-- 
Elliott Slaughter

"Don't worry about what anybody else is going to do. The best way to
predict the future is to invent it." - Alan Kay

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAJ9X%3Dkb9W0_Jd4ufPcRiZSSZ%2B5Bpftg4hZ82zCuBLb-moadnSQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 3583 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Towards (better) Python filters for Pandoc with fluent queries
       [not found] ` <CAJ9X=kb9W0_Jd4ufPcRiZSSZ+5Bpftg4hZ82zCuBLb-moadnSQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-01-02 10:34   ` Mark Szepieniec
       [not found]     ` <CAE4-1rVjgCKdkXWOVkmyiwhDxxDawLHaLt0qkr7nUEp0QnPQyQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2015-01-02 15:09   ` Wagner Macedo
  2015-01-02 16:50   ` John MacFarlane
  2 siblings, 1 reply; 10+ messages in thread
From: Mark Szepieniec @ 2015-01-02 10:34 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 4077 bytes --]

Just from a first glance, wouldn't it be handy to be able to do

m = Matcher()
    ...

and then

if m.matches(block):
    ...

This way you could reuse Matchers easily?

More generally, this does seem like a significant improvement relative to
the current python workflow.

On Fri, Jan 2, 2015 at 7:58 AM, Elliott Slaughter <
elliottslaughter-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> I like being able to script Pandoc via filters in Python, but one of the
> major drawbacks of the approach as it currently stands is that Python has
> no pattern matching to speak of. As a result, code that needs to run
> queries of the structure of Pandoc documents quickly turns into a
> nightmare, especially if that code needs to check nested structures.
>
> Consider the following partial function in Haskell, which matches against
> a BlockQuote containing a Para where the first word is "Chapter" in small
> caps:
>
>     filter :: Block -> Block
>     filter (BlockQuote [Para (SmallCaps [Str "Chapter"]):_]) = ...
>
> Without pattern matching, the equivalent code in Python is painful to
> write, opaque, and quite brittle. Unfortunately, without support for
> pattern matching, there is no possibility of a direct analogue in Python.
> Instead, I propose a fluent interface
> <https://en.wikipedia.org/wiki/Fluent_interface> as a way to provide a
> query language of sorts for Python. So for example, the same query might
> look like:
>
>     m = Matcher(block).
>             BlockQuote(length = 1)[0].
>             Para(length = -1)[0].
>             SmallCaps(length = 1)[0].
>             Str(content = 'Chapter')
>     if m.matches():
>         ...
>
> The code is not quite as dense because I've split it out for legibility,
> but can be condensed better to fit on a single line if desired. It is at
> any rate a massive improvement over hand-written queries over the JSON
> structure of the document.
>
> A proof of concept library is available today, and has been demonstrated
> with the query above as well as other queries I have needed in my own
> projects. Current coverage of the Pandoc API is at around 50%. The code is
> made available under an MIT license:
>
> https://bitbucket.org/elliottslaughter/pandocpatterns
>
> I would greatly appreciate any thoughts or feedback on the concept,
> design, or implementation. Please feel free to take the code out for a test
> drive and kick the tires. If there is interest, I would be willing to
> invest the effort to improve the library and make it more robust and useful.
>
> Thank you for your time.
>
>
> --
> Elliott Slaughter
>
> "Don't worry about what anybody else is going to do. The best way to
> predict the future is to invent it." - Alan Kay
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/CAJ9X%3Dkb9W0_Jd4ufPcRiZSSZ%2B5Bpftg4hZ82zCuBLb-moadnSQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/pandoc-discuss/CAJ9X%3Dkb9W0_Jd4ufPcRiZSSZ%2B5Bpftg4hZ82zCuBLb-moadnSQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAE4-1rVjgCKdkXWOVkmyiwhDxxDawLHaLt0qkr7nUEp0QnPQyQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 5590 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Towards (better) Python filters for Pandoc with fluent queries
       [not found] ` <CAJ9X=kb9W0_Jd4ufPcRiZSSZ+5Bpftg4hZ82zCuBLb-moadnSQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2015-01-02 10:34   ` Mark Szepieniec
@ 2015-01-02 15:09   ` Wagner Macedo
  2015-01-02 16:50   ` John MacFarlane
  2 siblings, 0 replies; 10+ messages in thread
From: Wagner Macedo @ 2015-01-02 15:09 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 3974 bytes --]

Very, very good work!

I wrote and still write a lot of pandoc filters with Python and I agree:
it's often a nightmare. I will see your effort with care!

--
Wagner Macedo

On 2 January 2015 at 03:58, Elliott Slaughter <elliottslaughter-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
wrote:

> I like being able to script Pandoc via filters in Python, but one of the
> major drawbacks of the approach as it currently stands is that Python has
> no pattern matching to speak of. As a result, code that needs to run
> queries of the structure of Pandoc documents quickly turns into a
> nightmare, especially if that code needs to check nested structures.
>
> Consider the following partial function in Haskell, which matches against
> a BlockQuote containing a Para where the first word is "Chapter" in small
> caps:
>
>     filter :: Block -> Block
>     filter (BlockQuote [Para (SmallCaps [Str "Chapter"]):_]) = ...
>
> Without pattern matching, the equivalent code in Python is painful to
> write, opaque, and quite brittle. Unfortunately, without support for
> pattern matching, there is no possibility of a direct analogue in Python.
> Instead, I propose a fluent interface
> <https://en.wikipedia.org/wiki/Fluent_interface> as a way to provide a
> query language of sorts for Python. So for example, the same query might
> look like:
>
>     m = Matcher(block).
>             BlockQuote(length = 1)[0].
>             Para(length = -1)[0].
>             SmallCaps(length = 1)[0].
>             Str(content = 'Chapter')
>     if m.matches():
>         ...
>
> The code is not quite as dense because I've split it out for legibility,
> but can be condensed better to fit on a single line if desired. It is at
> any rate a massive improvement over hand-written queries over the JSON
> structure of the document.
>
> A proof of concept library is available today, and has been demonstrated
> with the query above as well as other queries I have needed in my own
> projects. Current coverage of the Pandoc API is at around 50%. The code is
> made available under an MIT license:
>
> https://bitbucket.org/elliottslaughter/pandocpatterns
>
> I would greatly appreciate any thoughts or feedback on the concept,
> design, or implementation. Please feel free to take the code out for a test
> drive and kick the tires. If there is interest, I would be willing to
> invest the effort to improve the library and make it more robust and useful.
>
> Thank you for your time.
>
>
> --
> Elliott Slaughter
>
> "Don't worry about what anybody else is going to do. The best way to
> predict the future is to invent it." - Alan Kay
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/CAJ9X%3Dkb9W0_Jd4ufPcRiZSSZ%2B5Bpftg4hZ82zCuBLb-moadnSQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/pandoc-discuss/CAJ9X%3Dkb9W0_Jd4ufPcRiZSSZ%2B5Bpftg4hZ82zCuBLb-moadnSQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAKGY2P%3Dgf8EWB9LYJ8ERenVAH19uxtY9w9gWB%2BPnx2iF0sbUTA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 5468 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Towards (better) Python filters for Pandoc with fluent queries
       [not found] ` <CAJ9X=kb9W0_Jd4ufPcRiZSSZ+5Bpftg4hZ82zCuBLb-moadnSQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2015-01-02 10:34   ` Mark Szepieniec
  2015-01-02 15:09   ` Wagner Macedo
@ 2015-01-02 16:50   ` John MacFarlane
       [not found]     ` <20150102165038.GA25833-bi+AKbBUZKbivNSvqvJHCtPlBySK3R6THiGdP5j34PU@public.gmane.org>
  2 siblings, 1 reply; 10+ messages in thread
From: John MacFarlane @ 2015-01-02 16:50 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Nice.  When it stabilizes it would be good to add a note about it
in the documentation for pandocfilters.  (Or maybe even integrate
it into pandocfilters.)

+++ Elliott Slaughter [Jan 01 15 22:58 ]:
>I like being able to script Pandoc via filters in Python, but one of the
>major drawbacks of the approach as it currently stands is that Python has
>no pattern matching to speak of. As a result, code that needs to run
>queries of the structure of Pandoc documents quickly turns into a
>nightmare, especially if that code needs to check nested structures.
>
>Consider the following partial function in Haskell, which matches against a
>BlockQuote containing a Para where the first word is "Chapter" in small
>caps:
>
>    filter :: Block -> Block
>    filter (BlockQuote [Para (SmallCaps [Str "Chapter"]):_]) = ...
>
>Without pattern matching, the equivalent code in Python is painful to
>write, opaque, and quite brittle. Unfortunately, without support for
>pattern matching, there is no possibility of a direct analogue in Python.
>Instead, I propose a fluent interface
><https://en.wikipedia.org/wiki/Fluent_interface> as a way to provide a
>query language of sorts for Python. So for example, the same query might
>look like:
>
>    m = Matcher(block).
>            BlockQuote(length = 1)[0].
>            Para(length = -1)[0].
>            SmallCaps(length = 1)[0].
>            Str(content = 'Chapter')
>    if m.matches():
>        ...
>
>The code is not quite as dense because I've split it out for legibility,
>but can be condensed better to fit on a single line if desired. It is at
>any rate a massive improvement over hand-written queries over the JSON
>structure of the document.
>
>A proof of concept library is available today, and has been demonstrated
>with the query above as well as other queries I have needed in my own
>projects. Current coverage of the Pandoc API is at around 50%. The code is
>made available under an MIT license:
>
>https://bitbucket.org/elliottslaughter/pandocpatterns
>
>I would greatly appreciate any thoughts or feedback on the concept, design,
>or implementation. Please feel free to take the code out for a test drive
>and kick the tires. If there is interest, I would be willing to invest the
>effort to improve the library and make it more robust and useful.
>
>Thank you for your time.
>
>
>-- 
>Elliott Slaughter
>
>"Don't worry about what anybody else is going to do. The best way to
>predict the future is to invent it." - Alan Kay
>
>-- 
>You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
>To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAJ9X%3Dkb9W0_Jd4ufPcRiZSSZ%2B5Bpftg4hZ82zCuBLb-moadnSQ%40mail.gmail.com.
>For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Towards (better) Python filters for Pandoc with fluent queries
       [not found]     ` <CAE4-1rVjgCKdkXWOVkmyiwhDxxDawLHaLt0qkr7nUEp0QnPQyQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-01-03  5:46       ` Elliott Slaughter
  0 siblings, 0 replies; 10+ messages in thread
From: Elliott Slaughter @ 2015-01-03  5:46 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 2579 bytes --]

On Fri, Jan 2, 2015 at 2:34 AM, Mark Szepieniec <mszepien-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> Just from a first glance, wouldn't it be handy to be able to do
>
> m = Matcher()
>     ...
>
> and then
>
> if m.matches(block):
>     ...
>
> This way you could reuse Matchers easily?
>

Thanks; that's the sort of feedback I'm looking for. And yes, I think this
makes sense. To take this a step further, I could imagine supporting
different matching functions for different use cases:

m.match(block) # matches only if the match starts at the root element
m.search(block) # matches if any element at or below the root matches
m.replace(block, replacement_func, count) # replaces up to count
occurrences of the pattern with the result of calling replacement_func

Would that be useful? My immediate need is for the first only, but I could
see cases where I would want the other two.

More generally, I think it makes sense to start thinking of this as more of
a full query language rather than just a way to do pattern matching. Are
there any major features missing that people would want from such a tool?

For example, one might imagine a feature allowing arbitrary subelements to
match. (This is essentially the // operator from XPath.) One potential API
for doing this might look like:

Matcher().Div().contains(Matcher().Emph()) # matches Emphs nested
arbitrarily deep inside a Div

This use case is notable because it is not automatically
backwards-compatible with my current API (it requires the same sort of API
rearrangement as Mark's suggestion above). Are there any other cases like
this? I don't necessarily care if the API is not feature complete on day 1,
but I would still prefer to design things to be forwards-compatible, if
possible. So if you have features you'd like to see, please let me know.

Comments appreciated.

-- 
Elliott Slaughter

"Don't worry about what anybody else is going to do. The best way to
predict the future is to invent it." - Alan Kay

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAJ9X%3DkbsYb5XpXP7715VihFAqAD34FF4BtwTeiuUiSHnynNFOg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 4046 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Towards (better) Python filters for Pandoc with fluent queries
       [not found]     ` <20150102165038.GA25833-bi+AKbBUZKbivNSvqvJHCtPlBySK3R6THiGdP5j34PU@public.gmane.org>
@ 2015-01-03  5:50       ` Elliott Slaughter
       [not found]         ` <CAJ9X=ka-d7_rg4qSejF_ueWysg_oJ_LxYS9QHcJKgdU-Os0XHQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2015-01-22  5:25       ` Elliott Slaughter
  1 sibling, 1 reply; 10+ messages in thread
From: Elliott Slaughter @ 2015-01-03  5:50 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 1063 bytes --]

On Fri, Jan 2, 2015 at 8:50 AM, John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> wrote:

> Nice.  When it stabilizes it would be good to add a note about it
> in the documentation for pandocfilters.  (Or maybe even integrate
> it into pandocfilters.)
>

Yes, I would be happy to submit this for inclusion when it's ready.

-- 
Elliott Slaughter

"Don't worry about what anybody else is going to do. The best way to
predict the future is to invent it." - Alan Kay

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAJ9X%3Dka-d7_rg4qSejF_ueWysg_oJ_LxYS9QHcJKgdU-Os0XHQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 1875 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Towards (better) Python filters for Pandoc with fluent queries
       [not found]         ` <CAJ9X=ka-d7_rg4qSejF_ueWysg_oJ_LxYS9QHcJKgdU-Os0XHQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-01-03 10:17           ` Caleb McDaniel
       [not found]             ` <4b9fb48b-091c-44c6-9201-f5c5b559253f-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Caleb McDaniel @ 2015-01-03 10:17 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 1150 bytes --]

Thanks, this looks great! And seems like it would be helpful for a filter I use to turn endnotes into inline notes in Markdown. "Brittle" is a good word for my current implementation of the query. 

https://gist.github.com/wcaleb/17ca606788f9b4b9a36b

If I understand you correctly, the current Matcher only works for Blocks, not Inlines? (Currently writing on low sleep, so that may not even be a sensible question.) And could it match, say, any block that has any block other than a paragraph nested inside it? (Ditto to previous parenthetical.)

Caleb McDaniel

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/4b9fb48b-091c-44c6-9201-f5c5b559253f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Towards (better) Python filters for Pandoc with fluent queries
       [not found]             ` <4b9fb48b-091c-44c6-9201-f5c5b559253f-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2015-01-04  8:23               ` Elliott Slaughter
       [not found]                 ` <CAJ9X=kZu94RTat+SKLknvUJJQWnENd9xkF9DJCJCHhiBY-qyDA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Elliott Slaughter @ 2015-01-04  8:23 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 2879 bytes --]

On Sat, Jan 3, 2015 at 2:17 AM, Caleb McDaniel <calebmcd-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> Thanks, this looks great! And seems like it would be helpful for a filter
> I use to turn endnotes into inline notes in Markdown. "Brittle" is a good
> word for my current implementation of the query.
>
> https://gist.github.com/wcaleb/17ca606788f9b4b9a36b
>
> If I understand you correctly, the current Matcher only works for Blocks,
> not Inlines? (Currently writing on low sleep, so that may not even be a
> sensible question.)


Well, the API was a bit sparse on Inlines. I've gone ahead and added
queries the rest of the Inline elements, so API support is now up to 90% or
so. All the Inlines are there, but there are a couple instances of
attributes I can't query yet.

And could it match, say, any block that has any block other than a
> paragraph nested inside it? (Ditto to previous parenthetical.)
>

So right now, there's no negation, and no quantification other than "there
exists". I added support for a search operator, so you can say:

Query().Para().search(node)

and it will go find all paragraphs at node or below. But it can't prove
e.g. that there aren't any non-Paragraph nodes. I could imagine adding a
"forall" operator that would allow something like that. Maybe something
like:

Query().Note().forall(Query().Para())

But I have to spend some more time thinking about the semantics of that.

Caleb McDaniel
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "pandoc-discuss" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/pandoc-discuss/NsEGkTN4fnk/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/4b9fb48b-091c-44c6-9201-f5c5b559253f%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Elliott Slaughter

"Don't worry about what anybody else is going to do. The best way to
predict the future is to invent it." - Alan Kay

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAJ9X%3DkZu94RTat%2BSKLknvUJJQWnENd9xkF9DJCJCHhiBY-qyDA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 4736 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Towards (better) Python filters for Pandoc with fluent queries
       [not found]                 ` <CAJ9X=kZu94RTat+SKLknvUJJQWnENd9xkF9DJCJCHhiBY-qyDA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-01-04  8:25                   ` Elliott Slaughter
  0 siblings, 0 replies; 10+ messages in thread
From: Elliott Slaughter @ 2015-01-04  8:25 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 3422 bytes --]

By the way, I also added a test suite, so if you want to take a look at
what the queries actually look like, head over to:

https://bitbucket.org/elliottslaughter/pandocpatterns/src/714e309c/tests.py

On Sun, Jan 4, 2015 at 12:23 AM, Elliott Slaughter <
elliottslaughter-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> On Sat, Jan 3, 2015 at 2:17 AM, Caleb McDaniel <calebmcd-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
>> Thanks, this looks great! And seems like it would be helpful for a filter
>> I use to turn endnotes into inline notes in Markdown. "Brittle" is a good
>> word for my current implementation of the query.
>>
>> https://gist.github.com/wcaleb/17ca606788f9b4b9a36b
>>
>> If I understand you correctly, the current Matcher only works for Blocks,
>> not Inlines? (Currently writing on low sleep, so that may not even be a
>> sensible question.)
>
>
> Well, the API was a bit sparse on Inlines. I've gone ahead and added
> queries the rest of the Inline elements, so API support is now up to 90% or
> so. All the Inlines are there, but there are a couple instances of
> attributes I can't query yet.
>
> And could it match, say, any block that has any block other than a
>> paragraph nested inside it? (Ditto to previous parenthetical.)
>>
>
> So right now, there's no negation, and no quantification other than "there
> exists". I added support for a search operator, so you can say:
>
> Query().Para().search(node)
>
> and it will go find all paragraphs at node or below. But it can't prove
> e.g. that there aren't any non-Paragraph nodes. I could imagine adding a
> "forall" operator that would allow something like that. Maybe something
> like:
>
> Query().Note().forall(Query().Para())
>
> But I have to spend some more time thinking about the semantics of that.
>
> Caleb McDaniel
>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "pandoc-discuss" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/pandoc-discuss/NsEGkTN4fnk/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/pandoc-discuss/4b9fb48b-091c-44c6-9201-f5c5b559253f%40googlegroups.com
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> --
> Elliott Slaughter
>
> "Don't worry about what anybody else is going to do. The best way to
> predict the future is to invent it." - Alan Kay
>



-- 
Elliott Slaughter

"Don't worry about what anybody else is going to do. The best way to
predict the future is to invent it." - Alan Kay

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAJ9X%3DkYhwmbgciU0naFzQoQ7oSVBmw5w_Z0%2B4C19Sn9WSq6d5g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 5788 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Towards (better) Python filters for Pandoc with fluent queries
       [not found]     ` <20150102165038.GA25833-bi+AKbBUZKbivNSvqvJHCtPlBySK3R6THiGdP5j34PU@public.gmane.org>
  2015-01-03  5:50       ` Elliott Slaughter
@ 2015-01-22  5:25       ` Elliott Slaughter
  1 sibling, 0 replies; 10+ messages in thread
From: Elliott Slaughter @ 2015-01-22  5:25 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 5335 bytes --]

John,

How do you want to proceed?

I'm reasonably happy with where the API is now, but am also aware that it
hasn't necessarily received much testing outside of my own use (and test
suite). I would be happy to submit this for inclusion in pandocfilters, but
it might make sense to publish it as a separate library first to let the
API air out more and so that people can get their hands dirty and kick the
tires first. I'm always a bit nervous committing to an API that hasn't seen
much real use yet, even if people have glanced at it and said it looks
reasonable.

Thoughts?

On Fri, Jan 2, 2015 at 8:50 AM, John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> wrote:

> Nice.  When it stabilizes it would be good to add a note about it
> in the documentation for pandocfilters.  (Or maybe even integrate
> it into pandocfilters.)
>
> +++ Elliott Slaughter [Jan 01 15 22:58 ]:
>
>> I like being able to script Pandoc via filters in Python, but one of the
>> major drawbacks of the approach as it currently stands is that Python has
>> no pattern matching to speak of. As a result, code that needs to run
>> queries of the structure of Pandoc documents quickly turns into a
>> nightmare, especially if that code needs to check nested structures.
>>
>> Consider the following partial function in Haskell, which matches against
>> a
>> BlockQuote containing a Para where the first word is "Chapter" in small
>> caps:
>>
>>    filter :: Block -> Block
>>    filter (BlockQuote [Para (SmallCaps [Str "Chapter"]):_]) = ...
>>
>> Without pattern matching, the equivalent code in Python is painful to
>> write, opaque, and quite brittle. Unfortunately, without support for
>> pattern matching, there is no possibility of a direct analogue in Python.
>> Instead, I propose a fluent interface
>> <https://en.wikipedia.org/wiki/Fluent_interface> as a way to provide a
>>
>> query language of sorts for Python. So for example, the same query might
>> look like:
>>
>>    m = Matcher(block).
>>            BlockQuote(length = 1)[0].
>>            Para(length = -1)[0].
>>            SmallCaps(length = 1)[0].
>>            Str(content = 'Chapter')
>>    if m.matches():
>>        ...
>>
>> The code is not quite as dense because I've split it out for legibility,
>> but can be condensed better to fit on a single line if desired. It is at
>> any rate a massive improvement over hand-written queries over the JSON
>> structure of the document.
>>
>> A proof of concept library is available today, and has been demonstrated
>> with the query above as well as other queries I have needed in my own
>> projects. Current coverage of the Pandoc API is at around 50%. The code is
>> made available under an MIT license:
>>
>> https://bitbucket.org/elliottslaughter/pandocpatterns
>>
>> I would greatly appreciate any thoughts or feedback on the concept,
>> design,
>> or implementation. Please feel free to take the code out for a test drive
>> and kick the tires. If there is interest, I would be willing to invest the
>> effort to improve the library and make it more robust and useful.
>>
>> Thank you for your time.
>>
>>
>> --
>> Elliott Slaughter
>>
>> "Don't worry about what anybody else is going to do. The best way to
>> predict the future is to invent it." - Alan Kay
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "pandoc-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>> To view this discussion on the web visit https://groups.google.com/d/
>> msgid/pandoc-discuss/CAJ9X%3Dkb9W0_Jd4ufPcRiZSSZ%
>> 2B5Bpftg4hZ82zCuBLb-moadnSQ%40mail.gmail.com.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "pandoc-discuss" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/
> topic/pandoc-discuss/NsEGkTN4fnk/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/pandoc-discuss/20150102165038.GA25833%40localhost.hsd1.ca.comcast.
> net.
>
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Elliott Slaughter

"Don't worry about what anybody else is going to do. The best way to
predict the future is to invent it." - Alan Kay

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAJ9X%3Dka-YGfH9JfK1otkiw98doWAHQNUYUPzxPTvSRmhJFuxVg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 7737 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-01-22  5:25 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-02  6:58 Towards (better) Python filters for Pandoc with fluent queries Elliott Slaughter
     [not found] ` <CAJ9X=kb9W0_Jd4ufPcRiZSSZ+5Bpftg4hZ82zCuBLb-moadnSQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-01-02 10:34   ` Mark Szepieniec
     [not found]     ` <CAE4-1rVjgCKdkXWOVkmyiwhDxxDawLHaLt0qkr7nUEp0QnPQyQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-01-03  5:46       ` Elliott Slaughter
2015-01-02 15:09   ` Wagner Macedo
2015-01-02 16:50   ` John MacFarlane
     [not found]     ` <20150102165038.GA25833-bi+AKbBUZKbivNSvqvJHCtPlBySK3R6THiGdP5j34PU@public.gmane.org>
2015-01-03  5:50       ` Elliott Slaughter
     [not found]         ` <CAJ9X=ka-d7_rg4qSejF_ueWysg_oJ_LxYS9QHcJKgdU-Os0XHQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-01-03 10:17           ` Caleb McDaniel
     [not found]             ` <4b9fb48b-091c-44c6-9201-f5c5b559253f-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2015-01-04  8:23               ` Elliott Slaughter
     [not found]                 ` <CAJ9X=kZu94RTat+SKLknvUJJQWnENd9xkF9DJCJCHhiBY-qyDA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-01-04  8:25                   ` Elliott Slaughter
2015-01-22  5:25       ` Elliott Slaughter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).