Pandoc Document Model in Python

public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed

* Pandoc Document Model in Python
@ 2021-12-04 13:20   ` Sébastien Boisgérault
       [not found]     ` <f224cd2c-7d68-40b4-a855-7d4d0d7aa442n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Sébastien Boisgérault @ 2021-12-04 13:20 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 937 bytes --]

Hi everyone,

I have just released the 2.0 version of a pandoc python library which 
exposes the Pandoc document model to Python :

    https://github.com/boisgera/pandoc#-overview

The main goal is to enable all kinds of analysis, generation and 
transformation of documents with Python (roughly speaking, an alternative 
to pandoc filters); it is *not* to convert from one format to another (it 
can do it, but so can the standard pandoc command-line tool). 

Feedback welcome !

Cheers,

Sébastien


-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/f224cd2c-7d68-40b4-a855-7d4d0d7aa442n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 1404 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* AW: Pandoc Document Model in Python
       [not found]     ` <f224cd2c-7d68-40b4-a855-7d4d0d7aa442n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2021-12-04 14:06       ` denis.maier-NSENcxR/0n0
       [not found]         ` <3b5d75fe4e2a45e38ab45a820d110faf-NSENcxR/0n0@public.gmane.org>
  2021-12-22 18:05       ` Joseph Reagle
  1 sibling, 1 reply; 12+ messages in thread
From: denis.maier-NSENcxR/0n0 @ 2021-12-04 14:06 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Hi Sebastian
Looks interesting. But I don't undertake how you'd use that as an alternative to a filter.  Can you give an example?
Denis

________________________________________
Von: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> im Auftrag von Sébastien Boisgérault <sebastien.boisgerault-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Gesendet: Samstag, 4. Dezember 2021 14:20:03
An: pandoc-discuss
Betreff: Pandoc Document Model in Python

Hi everyone,

I have just released the 2.0 version of a pandoc python library which exposes the Pandoc document model to Python :

    https://github.com/boisgera/pandoc#-overview

The main goal is to enable all kinds of analysis, generation and transformation of documents with Python (roughly speaking, an alternative to pandoc filters); it is not to convert from one format to another (it can do it, but so can the standard pandoc command-line tool).

Feedback welcome !

Cheers,

Sébastien

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/f224cd2c-7d68-40b4-a855-7d4d0d7aa442n%40googlegroups.com<https://groups.google.com/d/msgid/pandoc-discuss/f224cd2c-7d68-40b4-a855-7d4d0d7aa442n%40googlegroups.com?utm_medium=email&utm_source=footer>.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/3b5d75fe4e2a45e38ab45a820d110faf%40unibe.ch.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Pandoc Document Model in Python
       [not found]         ` <3b5d75fe4e2a45e38ab45a820d110faf-NSENcxR/0n0@public.gmane.org>
@ 2021-12-04 14:43           ` Sébastien Boisgérault
       [not found]             ` <de1fd005-0d0d-49a2-86cc-5a72c764835dn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Sébastien Boisgérault @ 2021-12-04 14:43 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 3177 bytes --]

(Ooops probably answered Denis only and not the group. Here we go again!)

AFAICT filters are document AST to AST transformations. In this Python 
library, docs (Pandoc instances) represent this AST, so a typical AST 
(in-place) transform would be:

import pandoc
from pandoc.types import *

def uppercase(doc): 
    for elt in pandoc.iter(doc):
        if isinstance(elt, Str):
            elt[0] = elt[0].upper() # elt: Str(Text)

If you need to make a markdown to markdown transformation instead, you read 
the input markdown, transform it then write it back:

>>> markdown = "Hello world!"
>>> doc = pandoc.read(markdown)
>>> uppercase(doc)
>>> markdown = pandoc.write(doc)
>>> print(markdown)
HELLO WORLD!

There are more examples here:

    https://boisgera.github.io/pandoc/examples/

and a documentation of common patterns here:

    https://boisgera.github.io/pandoc/cookbook/

Cheers, 

Sébastien

Le samedi 4 décembre 2021 à 15:06:10 UTC+1, denis...-NSENcxR/0n0@public.gmane.org a écrit :

> Hi Sebastian
> Looks interesting. But I don't undertake how you'd use that as an 
> alternative to a filter. Can you give an example?
> Denis
>
>
>
> ________________________________________
> Von: pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> im Auftrag 
> von Sébastien Boisgérault <sebastien....-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Gesendet: Samstag, 4. Dezember 2021 14:20:03
> An: pandoc-discuss
> Betreff: Pandoc Document Model in Python
>
> Hi everyone,
>
> I have just released the 2.0 version of a pandoc python library which 
> exposes the Pandoc document model to Python :
>
> https://github.com/boisgera/pandoc#-overview
>
> The main goal is to enable all kinds of analysis, generation and 
> transformation of documents with Python (roughly speaking, an alternative 
> to pandoc filters); it is not to convert from one format to another (it can 
> do it, but so can the standard pandoc command-line tool).
>
> Feedback welcome !
>
> Cheers,
>
> Sébastien
>
>
>
> --
> You received this message because you are subscribed to the Google Groups 
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<mailto:
> pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/pandoc-discuss/f224cd2c-7d68-40b4-a855-7d4d0d7aa442n%40googlegroups.com
> <
> https://groups.google.com/d/msgid/pandoc-discuss/f224cd2c-7d68-40b4-a855-7d4d0d7aa442n%40googlegroups.com?utm_medium=email&utm_source=footer
> >.
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/de1fd005-0d0d-49a2-86cc-5a72c764835dn%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 5572 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* AW: Pandoc Document Model in Python
       [not found]             ` <de1fd005-0d0d-49a2-86cc-5a72c764835dn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2021-12-04 14:58               ` denis.maier-NSENcxR/0n0
       [not found]                 ` <fafa9cffd5e4437c865e71875b2f58a2-NSENcxR/0n0@public.gmane.org>
  2021-12-04 15:30               ` Joseph Reagle
  1 sibling, 1 reply; 12+ messages in thread
From: denis.maier-NSENcxR/0n0 @ 2021-12-04 14:58 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Thanks Sébastian.

So, if I understand correctly, you'll use that in in a script that reads your markdowns file and outputs another. Correct? (I just mean, usually your doc will be in your filesystem, not in a python variable.)

Best
Denis

________________________________________
Von: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> im Auftrag von Sébastien Boisgérault <sebastien.boisgerault-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Gesendet: Samstag, 4. Dezember 2021 15:43:42
An: pandoc-discuss
Betreff: Re: Pandoc Document Model in Python

(Ooops probably answered Denis only and not the group. Here we go again!)

AFAICT filters are document AST to AST transformations. In this Python library, docs (Pandoc instances) represent this AST, so a typical AST (in-place) transform would be:

import pandoc
from pandoc.types import *

def uppercase(doc):
    for elt in pandoc.iter(doc):
        if isinstance(elt, Str):
            elt[0] = elt[0].upper() # elt: Str(Text)

If you need to make a markdown to markdown transformation instead, you read the input markdown, transform it then write it back:

>>> markdown = "Hello world!"
>>> doc = pandoc.read(markdown)
>>> uppercase(doc)
>>> markdown = pandoc.write(doc)
>>> print(markdown)
HELLO WORLD!

There are more examples here:

    https://boisgera.github.io/pandoc/examples/

and a documentation of common patterns here:

    https://boisgera.github.io/pandoc/cookbook/

Cheers,

Sébastien

Le samedi 4 décembre 2021 à 15:06:10 UTC+1, denis...-NSENcxR/0n0@public.gmane.org a écrit :
Hi Sebastian
Looks interesting. But I don't undertake how you'd use that as an alternative to a filter. Can you give an example?
Denis

________________________________________
Von: pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> im Auftrag von Sébastien Boisgérault <sebastien....-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Gesendet: Samstag, 4. Dezember 2021 14:20:03
An: pandoc-discuss
Betreff: Pandoc Document Model in Python

Hi everyone,

I have just released the 2.0 version of a pandoc python library which exposes the Pandoc document model to Python :

https://github.com/boisgera/pandoc#-overview

The main goal is to enable all kinds of analysis, generation and transformation of documents with Python (roughly speaking, an alternative to pandoc filters); it is not to convert from one format to another (it can do it, but so can the standard pandoc command-line tool).

Feedback welcome !

Cheers,

Sébastien

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<mailto:pandoc-discus...@googlegroups.com>.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/f224cd2c-7d68-40b4-a855-7d4d0d7aa442n%40googlegroups.com<https://groups.google.com/d/msgid/pandoc-discuss/f224cd2c-7d68-40b4-a855-7d4d0d7aa442n%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/de1fd005-0d0d-49a2-86cc-5a72c764835dn%40googlegroups.com<https://groups.google.com/d/msgid/pandoc-discuss/de1fd005-0d0d-49a2-86cc-5a72c764835dn%40googlegroups.com?utm_medium=email&utm_source=footer>.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/fafa9cffd5e4437c865e71875b2f58a2%40unibe.ch.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Pandoc Document Model in Python
       [not found]             ` <de1fd005-0d0d-49a2-86cc-5a72c764835dn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2021-12-04 14:58               ` AW: " denis.maier-NSENcxR/0n0
@ 2021-12-04 15:30               ` Joseph Reagle
       [not found]                 ` <fe2b314b-863d-f8c0-8dfc-1104422fbf52-T1oY19WcHSwdnm+yROfE0A@public.gmane.org>
  1 sibling, 1 reply; 12+ messages in thread
From: Joseph Reagle @ 2021-12-04 15:30 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Can you speak to what the advantages of this would be over panflute?

http://scorreia.com/software/panflute/

On 21-12-04 09:43, Sébastien Boisgérault wrote:
> AFAICT filters are document AST to AST transformations. In this Python library, docs (Pandoc instances) represent this AST, so a typical AST (in-place) transform would be:

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/fe2b314b-863d-f8c0-8dfc-1104422fbf52%40reagle.org.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Pandoc Document Model in Python
       [not found]                 ` <fafa9cffd5e4437c865e71875b2f58a2-NSENcxR/0n0@public.gmane.org>
@ 2021-12-04 15:35                   ` Sébastien Boisgérault
  0 siblings, 0 replies; 12+ messages in thread
From: Sébastien Boisgérault @ 2021-12-04 15:35 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 5535 bytes --]

You can also use read and write files instead of Python strings. For 
example, if you have a `notebookify` function that takes a doc (pandoc AST) 
and returns a Jupyter notebook (as a Python JSON-like dictionnary), your 
wrapper script could be something like:

# file: notebookify.py

# here: imports, notebookify definition, etc.
# ...

def main():
    filename = sys.argv[1]
    doc = pandoc.read(file=filename)

    notebook = notebookify(doc)

    ipynb = Path(filename).with_suffix(".ipynb")
    with open(ipynb, "w", encoding="utf-8") as output:
        json.dump(notebook, output, ensure_ascii=False, indent=2)

if __name__ == "__main__":
    main()

you would invoke it with

$ python notebookify.py my_document.md

and it would produce a my_document.ipynb Jupyter notebook.

This example is developped with more details here:

https://boisgera.github.io/pandoc/examples/#jupyter-notebooks

Cheers,

SB

Le samedi 4 décembre 2021 à 15:58:31 UTC+1, denis...-NSENcxR/0n0@public.gmane.org a écrit :

> Thanks Sébastian.
>
> So, if I understand correctly, you'll use that in in a script that reads 
> your markdowns file and outputs another. Correct? (I just mean, usually 
> your doc will be in your filesystem, not in a python variable.)
>
> Best
> Denis
>
> ________________________________________
> Von: pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> im Auftrag 
> von Sébastien Boisgérault <sebastien....-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Gesendet: Samstag, 4. Dezember 2021 15:43:42
> An: pandoc-discuss
> Betreff: Re: Pandoc Document Model in Python
>
> (Ooops probably answered Denis only and not the group. Here we go again!)
>
> AFAICT filters are document AST to AST transformations. In this Python 
> library, docs (Pandoc instances) represent this AST, so a typical AST 
> (in-place) transform would be:
>
> import pandoc
> from pandoc.types import *
>
> def uppercase(doc):
> for elt in pandoc.iter(doc):
> if isinstance(elt, Str):
> elt[0] = elt[0].upper() # elt: Str(Text)
>
> If you need to make a markdown to markdown transformation instead, you 
> read the input markdown, transform it then write it back:
>
> >>> markdown = "Hello world!"
> >>> doc = pandoc.read(markdown)
> >>> uppercase(doc)
> >>> markdown = pandoc.write(doc)
> >>> print(markdown)
> HELLO WORLD!
>
> There are more examples here:
>
> https://boisgera.github.io/pandoc/examples/
>
> and a documentation of common patterns here:
>
> https://boisgera.github.io/pandoc/cookbook/
>
> Cheers,
>
> Sébastien
>
> Le samedi 4 décembre 2021 à 15:06:10 UTC+1, denis...-NSENcxR/0n0@public.gmane.org a écrit :
> Hi Sebastian
> Looks interesting. But I don't undertake how you'd use that as an 
> alternative to a filter. Can you give an example?
> Denis
>
>
>
> ________________________________________
> Von: pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> im Auftrag 
> von Sébastien Boisgérault <sebastien....-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Gesendet: Samstag, 4. Dezember 2021 14:20:03
> An: pandoc-discuss
> Betreff: Pandoc Document Model in Python
>
> Hi everyone,
>
> I have just released the 2.0 version of a pandoc python library which 
> exposes the Pandoc document model to Python :
>
> https://github.com/boisgera/pandoc#-overview
>
> The main goal is to enable all kinds of analysis, generation and 
> transformation of documents with Python (roughly speaking, an alternative 
> to pandoc filters); it is not to convert from one format to another (it can 
> do it, but so can the standard pandoc command-line tool).
>
> Feedback welcome !
>
> Cheers,
>
> Sébastien
>
>
>
> --
> You received this message because you are subscribed to the Google Groups 
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<mailto:
> pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/pandoc-discuss/f224cd2c-7d68-40b4-a855-7d4d0d7aa442n%40googlegroups.com
> <
> https://groups.google.com/d/msgid/pandoc-discuss/f224cd2c-7d68-40b4-a855-7d4d0d7aa442n%40googlegroups.com?utm_medium=email&utm_source=footer
> >.
>
> --
> You received this message because you are subscribed to the Google Groups 
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<mailto:
> pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/pandoc-discuss/de1fd005-0d0d-49a2-86cc-5a72c764835dn%40googlegroups.com
> <
> https://groups.google.com/d/msgid/pandoc-discuss/de1fd005-0d0d-49a2-86cc-5a72c764835dn%40googlegroups.com?utm_medium=email&utm_source=footer
> >.
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/0afdf96b-bc19-4117-a78b-bdbee13e076an%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 9727 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Pandoc Document Model in Python
       [not found]                 ` <fe2b314b-863d-f8c0-8dfc-1104422fbf52-T1oY19WcHSwdnm+yROfE0A@public.gmane.org>
@ 2021-12-04 16:17                   ` Sébastien Boisgérault
       [not found]                     ` <1e952a20-a77f-4987-9e7f-bac963ba4385n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Sébastien Boisgérault @ 2021-12-04 16:17 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 3775 bytes --]

To be honest I have not studied panflute in great details (started my own 
project before panflute 1.0 was released when my use case was not covered 
yet).
I guess that there is a lot of common ground here but with different 
elements of style, so if you are a happy user of panflute, i don't avise 
you to change. 
I'd say that panflute probably feels higher-level (and maybe terser for the 
common use cases ?) and pandoc (Python) more minimalistic and slightly more 
Haskell-ish.

Some misc. things I can think of:

  - Pandoc (python) adapts the AST model which is used to the pandoc 
version discovered on your system (or to the one you specify), 
    so you will probably only need a single version of it, even if you deal 
with several pandoc binaries.
  
  - Our hierarchy of classes is automatically generated (from the 
registered Haskell type info). So if you know the original data model 
<https://hackage.haskell.org/package/pandoc-types-1.22.1/docs/Text-Pandoc-Definition.html>, 
then you're already covered. 
If you don't know it, it is documented 
<https://boisgera.github.io/pandoc/api/#pandoctypes> but you can also 
discover it interactively in the Python interpreter (which I like very much 
❤️):

>>> from pandoc.types import *
>>> Pandoc
Pandoc(Meta, [Block])
>>> Block
Block = Plain([Inline])
      | Para([Inline])
      | LineBlock([[Inline]])
     ...
      | Div(Attr, [Block])
      | Null()
>>> Para
Para([Inline])
>>> Inline
Inline = Str(String)
       | Emph([Inline])
       | Strong([Inline])
       | Strikeout([Inline])
       ...
       | Note([Block])
       | Span(Attr, [Inline])

    I don't know how the panflute type hierarchy is generated (manually ?) 
and exactly what the trade-offs are in each case.
    For example we don't use named arguments for the pandoc element 
arguments since there is no such thing in Haskell.
    There are differences such as Para(Str('eggs')) for panflute but Para(
[Str('eggs')]) for pandoc (Python) (to compare with the Haskell type info 
<https://hackage.haskell.org/package/pandoc-types-1.22.1/docs/Text-Pandoc-Definition.html#t:Block>
)
    The panflute types have also more methods while ours are rather "dumb" 
(mere algebraic data types) which is a matter of taste I guess (?).
    But overall the (AST) type hierarchy feels very similar.

  - I think that apart from the AST, the core of panflute is probably 
run_filter, while our core tool is pandoc.iter, a (top-down, 
document-order) iterator.

  - The dynamic type checking of panflute is very nice 👍. I don't have the 
equivalent (yet) and that can honestly sometimes be a pain.

If you are a regular panflute user and can have a quick look our docs 
<https://boisgera.github.io/pandoc/>, especially the examples of code, I'd 
appreciate some feedback about what you like and don't like!

Cheers,

Sébastien

Le samedi 4 décembre 2021 à 16:30:46 UTC+1, Joseph a écrit :

> Can you speak to what the advantages of this would be over panflute?
>
> http://scorreia.com/software/panflute/
>
> On 21-12-04 09:43, Sébastien Boisgérault wrote:
> > AFAICT filters are document AST to AST transformations. In this Python 
> library, docs (Pandoc instances) represent this AST, so a typical AST 
> (in-place) transform would be:
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/1e952a20-a77f-4987-9e7f-bac963ba4385n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 5600 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Pandoc Document Model in Python
       [not found]                     ` <1e952a20-a77f-4987-9e7f-bac963ba4385n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2021-12-04 16:48                       ` Sébastien Boisgérault
  2021-12-04 17:30                       ` John MacFarlane
  1 sibling, 0 replies; 12+ messages in thread
From: Sébastien Boisgérault @ 2021-12-04 16:48 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 4392 bytes --]


I also don't know if panflute supports pattern matching (not seen any 
examples in the doc). For example is there a pattern like this one ?

>>> doc = pandoc.read("Hello world!")
>>> match doc:
...     case Pandoc(Meta(meta), [Para(inlines)]):
...         assert meta == {}
...         print(inlines)
[Str('Hello'), Space(), Str('world!')]

Cheers,

Sébastien
Le samedi 4 décembre 2021 à 17:17:46 UTC+1, Sébastien Boisgérault a écrit :

> To be honest I have not studied panflute in great details (started my own 
> project before panflute 1.0 was released when my use case was not covered 
> yet).
> I guess that there is a lot of common ground here but with different 
> elements of style, so if you are a happy user of panflute, i don't avise 
> you to change. 
> I'd say that panflute probably feels higher-level (and maybe terser for 
> the common use cases ?) and pandoc (Python) more minimalistic and slightly 
> more Haskell-ish.
>
> Some misc. things I can think of:
>
>   - Pandoc (python) adapts the AST model which is used to the pandoc 
> version discovered on your system (or to the one you specify), 
>     so you will probably only need a single version of it, even if you 
> deal with several pandoc binaries.
>   
>   - Our hierarchy of classes is automatically generated (from the 
> registered Haskell type info). So if you know the original data model 
> <https://hackage.haskell.org/package/pandoc-types-1.22.1/docs/Text-Pandoc-Definition.html>, 
> then you're already covered. 
> If you don't know it, it is documented 
> <https://boisgera.github.io/pandoc/api/#pandoctypes> but you can also 
> discover it interactively in the Python interpreter (which I like very much 
> ❤️):
>
> >>> from pandoc.types import *
> >>> Pandoc
> Pandoc(Meta, [Block])
> >>> Block
> Block = Plain([Inline])
>       | Para([Inline])
>       | LineBlock([[Inline]])
>      ...
>       | Div(Attr, [Block])
>       | Null()
> >>> Para
> Para([Inline])
> >>> Inline
> Inline = Str(String)
>        | Emph([Inline])
>        | Strong([Inline])
>        | Strikeout([Inline])
>        ...
>        | Note([Block])
>        | Span(Attr, [Inline])
>
>     I don't know how the panflute type hierarchy is generated (manually ?) 
> and exactly what the trade-offs are in each case.
>     For example we don't use named arguments for the pandoc element 
> arguments since there is no such thing in Haskell.
>     There are differences such as Para(Str('eggs')) for panflute but Para(
> [Str('eggs')]) for pandoc (Python) (to compare with the Haskell type info 
> <https://hackage.haskell.org/package/pandoc-types-1.22.1/docs/Text-Pandoc-Definition.html#t:Block>
> )
>     The panflute types have also more methods while ours are rather "dumb" 
> (mere algebraic data types) which is a matter of taste I guess (?).
>     But overall the (AST) type hierarchy feels very similar.
>
>   - I think that apart from the AST, the core of panflute is probably 
> run_filter, while our core tool is pandoc.iter, a (top-down, 
> document-order) iterator.
>
>   - The dynamic type checking of panflute is very nice 👍. I don't have 
> the equivalent (yet) and that can honestly sometimes be a pain.
>
> If you are a regular panflute user and can have a quick look our docs 
> <https://boisgera.github.io/pandoc/>, especially the examples of code, 
> I'd appreciate some feedback about what you like and don't like!
>
> Cheers,
>
> Sébastien
>
> Le samedi 4 décembre 2021 à 16:30:46 UTC+1, Joseph a écrit :
>
>> Can you speak to what the advantages of this would be over panflute? 
>>
>> http://scorreia.com/software/panflute/ 
>>
>> On 21-12-04 09:43, Sébastien Boisgérault wrote: 
>> > AFAICT filters are document AST to AST transformations. In this Python 
>> library, docs (Pandoc instances) represent this AST, so a typical AST 
>> (in-place) transform would be: 
>>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/42a53bd7-98db-4bca-a6c8-d052a03fbf40n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 7194 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Pandoc Document Model in Python
       [not found]                     ` <1e952a20-a77f-4987-9e7f-bac963ba4385n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2021-12-04 16:48                       ` Sébastien Boisgérault
@ 2021-12-04 17:30                       ` John MacFarlane
  1 sibling, 0 replies; 12+ messages in thread
From: John MacFarlane @ 2021-12-04 17:30 UTC (permalink / raw)
  To: Sébastien Boisgérault, pandoc-discuss

Sébastien Boisgérault <sebastien.boisgerault-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

>   - Our hierarchy of classes is automatically generated (from the 
> registered Haskell type info).

That is nifty!  So is the pattern-matching.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m235n8jn01.fsf%40MacBook-Pro-2.hsd1.ca.comcast.net.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Pandoc Document Model in Python
       [not found]     ` <f224cd2c-7d68-40b4-a855-7d4d0d7aa442n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2021-12-04 14:06       ` AW: " denis.maier-NSENcxR/0n0
@ 2021-12-22 18:05       ` Joseph Reagle
       [not found]         ` <c0c49e25-898d-c72c-3303-69005985ea01-T1oY19WcHSwdnm+yROfE0A@public.gmane.org>
  1 sibling, 1 reply; 12+ messages in thread
From: Joseph Reagle @ 2021-12-22 18:05 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

I finally had a chance to read through the documentation: wow, impressive! And by that I mean not only the library but the documentation itself. It's rare to see such comprehensive and clear documentation of a new project. For folks who don't already read haskell, it's a great way to learn about Pandoc.

BTW: It's probably too late to change, but I wonder if you should've given it a novel name? I wonder if it'll be easy for folks to find? (I'm not sure on github why it's labeled as a predominantly JavaScript project?)

Also, I wonder if there will ever be a higher level way of searching/transforming markdown in Python? Panflute is a bit higher-level and more python-idiomatic, and your examples [1] are fantastic, but I crave the intuitive XML-based selectors (e.g., eTree, BeautifulSoup, and CSS). Your API, like most, requires me to be familiar with the pandoc AST to do anything (e.g., meta is the first -- `doc[0]` -- items in the document structure).

[1]: https://boisgera.github.io/pandoc/cookbook/

In the examples below, I exercise the three options for pandoc and python. I kind of like using pandoc to convert it to HTML, use those selectors, and then convert back if need be... It's be great if panflute (or pandoc) had high-level selectors.

```python
# 1. Using pandoc API to print date
# Requires I remember pandoc data model via list indices
# No find/select; lots of iteration

doc = pandoc.read(COMMONMARK_SPEC)
meta = doc[0]  # doc: Pandoc(Meta, [Block])
meta_dict = meta[0]  # meta: Meta({Text: MetaValue})
date = meta_dict["date"]
date_inlines = date[0]  # date: MetaInlines([Inline])
print("pandoc:" + pandoc.write(date_inlines).strip())

# 2. Using panflute to print date
# Data-model is a bit more intuitive.
# No find/select

doc = pf.convert_text(COMMONMARK_SPEC, standalone=True)
print("panflute" + doc.get_metadata()["date"])

# 3. Using pandoc + BeautifulSoup to print date
# Requires me to remember HTML model, but I'm more familiar.
# Can use BeautifulSoup or CSS selectors

doc = pandoc.read(COMMONMARK_SPEC)
html = pandoc.write(doc, format="html", options=["--standalone"])
soup = BeautifulSoup(html, "html5lib")
date = soup.find("meta", {"name": "dcterms.date"})["content"]
print("BS native selector:" + date)
# CSS selector
date = soup.select("""meta[name="dcterms.date"]""")[0]["content"]  # CSS
print("BS/CSS selector:" + date)

```

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/c0c49e25-898d-c72c-3303-69005985ea01%40reagle.org.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Pandoc Document Model in Python
       [not found]         ` <c0c49e25-898d-c72c-3303-69005985ea01-T1oY19WcHSwdnm+yROfE0A@public.gmane.org>
@ 2021-12-23 10:56           ` Sébastien Boisgérault
       [not found]             ` <e45c083b-fff3-46ac-8af5-b416c60f6a97n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Sébastien Boisgérault @ 2021-12-23 10:56 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 5926 bytes --]


Hi Joseph,

Le mercredi 22 décembre 2021 à 19:05:31 UTC+1, Joseph a écrit :

> I finally had a chance to read through the documentation: wow, impressive! 
> And by that I mean not only the library but the documentation itself. It's 
> rare to see such comprehensive and clear documentation of a new project. 
> For folks who don't already read haskell, it's a great way to learn about 
> Pandoc. 
>
> Thank you for the kind words! 
 

> BTW: It's probably too late to change, but I wonder if you should've given 
> it a novel name? I wonder if it'll be easy for folks to find? (I'm not sure 
> on github why it's labeled as a predominantly JavaScript project?) 
>

I think that many people for which this library could be useful would 
search for "Pandoc" and "Python" in a search engine ; for me that returns 
"Scripting with Pandoc" (which is totally appropriate), this project and 
then pypandoc (whose scope is different: it is a wrapper for the 
command-line, the document model is not exposed). So I guess it's easy to 
find at this stage ; easier than panflute for example (which I can't see in 
the results), despite the fact than panflute is more mature and obviously a 
totally appropriate result.

I guess that the Javascript component found by GitHub refers to the list of 
pandoc types hierarchy (one by version) which are stored as a JSON file : 

    
https://github.com/boisgera/pandoc/blob/master/src/pandoc/pandoc-types.js
 

>
> Also, I wonder if there will ever be a higher level way of 
> searching/transforming markdown in Python? Panflute is a bit higher-level 
> and more python-idiomatic, and your examples [1] are fantastic, but I crave 
> the intuitive XML-based selectors (e.g., eTree, BeautifulSoup, and CSS). 

Your API, like most, requires me to be familiar with the pandoc AST to do 
> anything (e.g., meta is the first -- `doc[0]` -- items in the document 
> structure). 
>

Yes, you're 100% right: 

  - The library is low-level (at this stage) and therefore you're 
"expected" to build your own helpers on top of it if you want a 
higher-level API.
    There is a finder example in the documentatrion 
(https://boisgera.github.io/pandoc/cookbook/#finder), but this is not in 
the official high-level API (yet).
    After using the low-level API myself for a long time, I still wonder 
what kind of high-level API I'd like to have that would at least cover my 
own use cases ...
    I rather dislike selector language (xpath, CSS selectors, regexp, 
etc.), but there are other great sources of inspiration (xml.etree and 
beautifulsoup finders, 
    chained queries of rethinkdb, etc.).

  - In the current state, you need to be familiar with the pandoc AST to do 
anything. I don't know to which extent we can avoid that (in any library) 
for advanced use cases.
    I tried to improve the learning curve a bit (you can discover the type 
hierarchy interactively in the interpreter: 
https://boisgera.github.io/pandoc/document/#types), 
    but I agree that's it's likely to be a show-stopper for many people. 

>
> [1]: https://boisgera.github.io/pandoc/cookbook/ 
>
> In the examples below, I exercise the three options for pandoc and python. 
> I kind of like using pandoc to convert it to HTML, use those selectors, and 
> then convert back if need be... It's be great if panflute (or pandoc) had 
> high-level selectors. 
>
>
> ```python 
> # 1. Using pandoc API to print date 
> # Requires I remember pandoc data model via list indices 
> # No find/select; lots of iteration 
>
> doc = pandoc.read(COMMONMARK_SPEC) 
> meta = doc[0] # doc: Pandoc(Meta, [Block]) 
> meta_dict = meta[0] # meta: Meta({Text: MetaValue}) 
> date = meta_dict["date"] 
> date_inlines = date[0] # date: MetaInlines([Inline]) 
> print("pandoc:" + pandoc.write(date_inlines).strip()) 
>
> # 2. Using panflute to print date 
> # Data-model is a bit more intuitive. 
> # No find/select 
>
> doc = pf.convert_text(COMMONMARK_SPEC, standalone=True) 
> print("panflute" + doc.get_metadata()["date"]) 
>
> # 3. Using pandoc + BeautifulSoup to print date 
> # Requires me to remember HTML model, but I'm more familiar. 
> # Can use BeautifulSoup or CSS selectors 
>
> doc = pandoc.read(COMMONMARK_SPEC) 
> html = pandoc.write(doc, format="html", options=["--standalone"]) 
> soup = BeautifulSoup(html, "html5lib") 
> date = soup.find("meta", {"name": "dcterms.date"})["content"] 
> print("BS native selector:" + date) 
> # CSS selector 
> date = soup.select("""meta[name="dcterms.date"]""")[0]["content"] # CSS 
> print("BS/CSS selector:" + date) 
>
> ``` 
>

Very useful example ! Thank you for taking the time to do this at this 
level of detail.

I'll have a deeper look at BeautifulSoup find/find_all methods to start 
with and will experiment a bit ; I'll report back here.

Cheers,

SB

P.S.: the metadata is especially hard to use right now, since it is 
littered with Haskell-like wrapper types (MetaInlines, MetaBlocks, etc.) 
and 90% of the time you'd just like to have the result as a "regular Python 
dictionnary". I'll also work a bit on this ; I have not done it so far 
because I know that any such conversion will lose some type info (how to 
you distinguish empty list of blocks and list of inlines for example ?) and 
therefore cannot be used reliably for round-tripping ; so the metadata 
handling so far is "correct" but not at all convenient.
 

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e45c083b-fff3-46ac-8af5-b416c60f6a97n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 7717 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Pandoc Document Model in Python
       [not found]             ` <e45c083b-fff3-46ac-8af5-b416c60f6a97n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2021-12-23 14:25               ` Joseph Reagle
  0 siblings, 0 replies; 12+ messages in thread
From: Joseph Reagle @ 2021-12-23 14:25 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


Any of that would be fantastic! XPath/CSS are widely known but overkill since this is within a programming language itself, which has objects/attributes and dictionaries.


On 21-12-23 05:56, Sébastien Boisgérault wrote:
>      I rather dislike selector language (xpath, CSS selectors, regexp, etc.), but there are other great sources of inspiration (xml.etree and beautifulsoup finders, chained queries of rethinkdb, etc.).

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/31c745d7-9c0c-7ebe-9252-034fe09b846c%40reagle.org.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-12-23 14:25 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <AQHX6RGokE17J35tB0eLtRDrIrd1JqwiXcS8///5/wCAABOrKA==>
     [not found] ` <AQHX6RGokE17J35tB0eLtRDrIrd1JqwiXcS8>
2021-12-04 13:20   ` Pandoc Document Model in Python Sébastien Boisgérault
     [not found]     ` <f224cd2c-7d68-40b4-a855-7d4d0d7aa442n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-12-04 14:06       ` AW: " denis.maier-NSENcxR/0n0
     [not found]         ` <3b5d75fe4e2a45e38ab45a820d110faf-NSENcxR/0n0@public.gmane.org>
2021-12-04 14:43           ` Sébastien Boisgérault
     [not found]             ` <de1fd005-0d0d-49a2-86cc-5a72c764835dn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-12-04 14:58               ` AW: " denis.maier-NSENcxR/0n0
     [not found]                 ` <fafa9cffd5e4437c865e71875b2f58a2-NSENcxR/0n0@public.gmane.org>
2021-12-04 15:35                   ` Sébastien Boisgérault
2021-12-04 15:30               ` Joseph Reagle
     [not found]                 ` <fe2b314b-863d-f8c0-8dfc-1104422fbf52-T1oY19WcHSwdnm+yROfE0A@public.gmane.org>
2021-12-04 16:17                   ` Sébastien Boisgérault
     [not found]                     ` <1e952a20-a77f-4987-9e7f-bac963ba4385n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-12-04 16:48                       ` Sébastien Boisgérault
2021-12-04 17:30                       ` John MacFarlane
2021-12-22 18:05       ` Joseph Reagle
     [not found]         ` <c0c49e25-898d-c72c-3303-69005985ea01-T1oY19WcHSwdnm+yROfE0A@public.gmane.org>
2021-12-23 10:56           ` Sébastien Boisgérault
     [not found]             ` <e45c083b-fff3-46ac-8af5-b416c60f6a97n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-12-23 14:25               ` Joseph Reagle

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).