python filter to include files in original format

public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed

* python filter to include files in original format
@ 2015-03-11  5:13 news-WPTjrydoUPgeaOpM6FAJmQkbCANdLtlA
       [not found] ` <a320b195-1cd6-4cf5-a0d6-678fa137b569-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: news-WPTjrydoUPgeaOpM6FAJmQkbCANdLtlA @ 2015-03-11  5:13 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


[-- Attachment #1.1: Type: text/plain, Size: 1700 bytes --]

Hi there

I have started a python pandocfilter which replaces a codeblock like

~~~~ {include=inc.md}
~~~~

with the content of the file inc.md. My version of the filter looks like 
this:

#!/usr/bin/env python
import pandocfilters as pf

def include(key, val, fmt, meta):
    if key == 'CodeBlock':
        [[id, classes, keyvals], code] = val
        for kv in keyvals:
            if kv[0] == 'include':
                inline = []
                with open(kv[1], 'rt') as f:
                    inline = [ pf.Str(l) for l in f.readlines() ]
                return pf.Para(inline)
    return None

if __name__ == "__main__":
    pf.toJSONFilter(include)


However, the included file is inserted as if it were plain text; e.g. if 
the included file contains "*important*" than the output will just contain "
*important*" and not the emphasized version of "important" in the output 
format.

Instead I want the included file to be process as a file like the "parent" 
file (normally markdown).

How can I achieve this? I think the problem is that I contruct a list of 
pandoc Str objects, but what should I construct instead?

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/a320b195-1cd6-4cf5-a0d6-678fa137b569%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 11361 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: python filter to include files in original format
       [not found] ` <a320b195-1cd6-4cf5-a0d6-678fa137b569-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2015-03-11  8:19   ` Aaron O'Leary
       [not found]     ` <CAHzsXVX2kEwuVwO+6bDSow25+eRLhWSbPJjbrVau3hgeRP2Skw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Aaron O'Leary @ 2015-03-11  8:19 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Hi,

Neat filter idea!

The problem is that pandoc works like: Reader -> AST -> filter -> AST -> Writer

The Reader (e.g. the markdown reader) knows that '*'  means 'emphasis'
and represents that in the AST.

When we get to the filter the reader is out of the picture and any
input read from a file will be treated as Str. The writer won't
inspect the contents of Str to render your '*important'.

What you'd need to do is call the markdown reader. The way to do this
in a Python filter is to call pandoc from python (using subprocess or
similar) and then inject the resulting json.

On 11 March 2015 at 05:13,  <news-WPTjrydoUPgeaOpM6FAJmQkbCANdLtlA@public.gmane.org> wrote:
> Hi there
>
> I have started a python pandocfilter which replaces a codeblock like
>
> ~~~~ {include=inc.md}
> ~~~~
>
> with the content of the file inc.md. My version of the filter looks like
> this:
>
> #!/usr/bin/env python
> import pandocfilters as pf
>
> def include(key, val, fmt, meta):
>     if key == 'CodeBlock':
>         [[id, classes, keyvals], code] = val
>         for kv in keyvals:
>             if kv[0] == 'include':
>                 inline = []
>                 with open(kv[1], 'rt') as f:
>                     inline = [ pf.Str(l) for l in f.readlines() ]
>                 return pf.Para(inline)
>     return None
>
> if __name__ == "__main__":
>     pf.toJSONFilter(include)
>
>
> However, the included file is inserted as if it were plain text; e.g. if the
> included file contains "*important*" than the output will just contain
> "*important*" and not the emphasized version of "important" in the output
> format.
>
> Instead I want the included file to be process as a file like the "parent"
> file (normally markdown).
>
> How can I achieve this? I think the problem is that I contruct a list of
> pandoc Str objects, but what should I construct instead?
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/a320b195-1cd6-4cf5-a0d6-678fa137b569%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: python filter to include files in original format
       [not found]     ` <CAHzsXVX2kEwuVwO+6bDSow25+eRLhWSbPJjbrVau3hgeRP2Skw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-03-11  9:36       ` news-WPTjrydoUPgeaOpM6FAJmQkbCANdLtlA
       [not found]         ` <559514e2-10a0-45f5-989a-f54e3b070b35-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: news-WPTjrydoUPgeaOpM6FAJmQkbCANdLtlA @ 2015-03-11  9:36 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


[-- Attachment #1.1: Type: text/plain, Size: 3640 bytes --]

Thanks for the explanation. Makes sense!

Calling pandoc inside the filter won't neceassy have the desired effect 
when the included file uses global structures/references like headers or 
food notes.

I guess the only way to do this properly is a preprocessor which inserts 
the file before pandoc sees it.

On Wednesday, March 11, 2015 at 7:19:22 PM UTC+11, Aaron O'leary wrote:
>
> Hi, 
>
> Neat filter idea! 
>
> The problem is that pandoc works like: Reader -> AST -> filter -> AST -> 
> Writer 
>
> The Reader (e.g. the markdown reader) knows that '*'  means 'emphasis' 
> and represents that in the AST. 
>
> When we get to the filter the reader is out of the picture and any 
> input read from a file will be treated as Str. The writer won't 
> inspect the contents of Str to render your '*important'. 
>
> What you'd need to do is call the markdown reader. The way to do this 
> in a Python filter is to call pandoc from python (using subprocess or 
> similar) and then inject the resulting json. 
>
> On 11 March 2015 at 05:13,  <ne...-WPTjrydoUPgeaOpM6FAJmQkbCANdLtlA@public.gmane.org <javascript:>> 
> wrote: 
> > Hi there 
> > 
> > I have started a python pandocfilter which replaces a codeblock like 
> > 
> > ~~~~ {include=inc.md} 
> > ~~~~ 
> > 
> > with the content of the file inc.md. My version of the filter looks 
> like 
> > this: 
> > 
> > #!/usr/bin/env python 
> > import pandocfilters as pf 
> > 
> > def include(key, val, fmt, meta): 
> >     if key == 'CodeBlock': 
> >         [[id, classes, keyvals], code] = val 
> >         for kv in keyvals: 
> >             if kv[0] == 'include': 
> >                 inline = [] 
> >                 with open(kv[1], 'rt') as f: 
> >                     inline = [ pf.Str(l) for l in f.readlines() ] 
> >                 return pf.Para(inline) 
> >     return None 
> > 
> > if __name__ == "__main__": 
> >     pf.toJSONFilter(include) 
> > 
> > 
> > However, the included file is inserted as if it were plain text; e.g. if 
> the 
> > included file contains "*important*" than the output will just contain 
> > "*important*" and not the emphasized version of "important" in the 
> output 
> > format. 
> > 
> > Instead I want the included file to be process as a file like the 
> "parent" 
> > file (normally markdown). 
> > 
> > How can I achieve this? I think the problem is that I contruct a list of 
> > pandoc Str objects, but what should I construct instead? 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "pandoc-discuss" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an 
> > email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>. 
> > To post to this group, send email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org 
> <javascript:>. 
> > To view this discussion on the web visit 
> > 
> https://groups.google.com/d/msgid/pandoc-discuss/a320b195-1cd6-4cf5-a0d6-678fa137b569%40googlegroups.com. 
>
> > For more options, visit https://groups.google.com/d/optout. 
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/559514e2-10a0-45f5-989a-f54e3b070b35%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 6644 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: python filter to include files in original format
       [not found]         ` <559514e2-10a0-45f5-989a-f54e3b070b35-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2015-03-11 12:59           ` BP Jonsson
       [not found]             ` <55003C4B.4000002-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: BP Jonsson @ 2015-03-11 12:59 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Den 2015-03-11 10:36, news-WPTjrydoUPgeaOpM6FAJmQkbCANdLtlA@public.gmane.org skrev:
> Thanks for the explanation. Makes sense!
>
> Calling pandoc inside the filter won't neceassy have the desired effect
> when the included file uses global structures/references like headers or
> food notes.
>
> I guess the only way to do this properly is a preprocessor which inserts
> the file before pandoc sees it.

I use Template::Toolkit[^1] to achieve this, and sometimes other 
things too.  I wrote my own commandline tool which invokes TT on 
specified files, optionally taking specified data from YAML files, 
and writes output to stdout, whence I pipe it to pandoc.  Probably 
overkill, but it works.  I guess any general purpose templating 
engine which can include files would do.

I got the idea from a blog post written by a list member about 
using gpp for the same purpose[^2].  The advantage with using TT 
is that I can just pick any two strings[^3] which don't otherwise 
occur in the file and then don't need to escape anything in the 
literal text, and I can embed Perl code if I need to, or write my 
own plugins, which I prefer.

[^1]: <https://metacpan.org/pod/distribution/Template-Toolkit>
[^2]: 
<https://randomdeterminism.wordpress.com/2012/06/01/how-i-stopped-worring-and-started-using-markdown-like-tex/>

[^3]: Usually I use ``` ``{{ ... }}`` ``` which allows me to 
render/print the file without processing it as a template, with 
the template stuff visible as 'code'.  If I need an actual code 
span beginning/ending with two curlies I just put a space between 
the curlies and the backticks.  To pandoc it's the same but to TT 
it makes a big difference!

> On Wednesday, March 11, 2015 at 7:19:22 PM UTC+11, Aaron O'leary wrote:
>>
>> Hi,
>>
>> Neat filter idea!
>>
>> The problem is that pandoc works like: Reader -> AST -> filter -> AST ->
>> Writer
>>
>> The Reader (e.g. the markdown reader) knows that '*'  means 'emphasis'
>> and represents that in the AST.
>>
>> When we get to the filter the reader is out of the picture and any
>> input read from a file will be treated as Str. The writer won't
>> inspect the contents of Str to render your '*important'.
>>
>> What you'd need to do is call the markdown reader. The way to do this
>> in a Python filter is to call pandoc from python (using subprocess or
>> similar) and then inject the resulting json.
>>
>> On 11 March 2015 at 05:13,  <ne...-WPTjrydoUPgeaOpM6FAJmQkbCANdLtlA@public.gmane.org <javascript:>>
>> wrote:
>>> Hi there
>>>
>>> I have started a python pandocfilter which replaces a codeblock like
>>>
>>> ~~~~ {include=inc.md}
>>> ~~~~
>>>
>>> with the content of the file inc.md. My version of the filter looks
>> like
>>> this:
>>>
>>> #!/usr/bin/env python
>>> import pandocfilters as pf
>>>
>>> def include(key, val, fmt, meta):
>>>      if key == 'CodeBlock':
>>>          [[id, classes, keyvals], code] = val
>>>          for kv in keyvals:
>>>              if kv[0] == 'include':
>>>                  inline = []
>>>                  with open(kv[1], 'rt') as f:
>>>                      inline = [ pf.Str(l) for l in f.readlines() ]
>>>                  return pf.Para(inline)
>>>      return None
>>>
>>> if __name__ == "__main__":
>>>      pf.toJSONFilter(include)
>>>
>>>
>>> However, the included file is inserted as if it were plain text; e.g. if
>> the
>>> included file contains "*important*" than the output will just contain
>>> "*important*" and not the emphasized version of "important" in the
>> output
>>> format.
>>>
>>> Instead I want the included file to be process as a file like the
>> "parent"
>>> file (normally markdown).
>>>
>>> How can I achieve this? I think the problem is that I contruct a list of
>>> pandoc Str objects, but what should I construct instead?
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>> Groups
>>> "pandoc-discuss" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>> an
>>> email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>.
>>> To post to this group, send email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
>> <javascript:>.
>>> To view this discussion on the web visit
>>>
>> https://groups.google.com/d/msgid/pandoc-discuss/a320b195-1cd6-4cf5-a0d6-678fa137b569%40googlegroups.com.
>>
>>> For more options, visit https://groups.google.com/d/optout.
>>
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: python filter to include files in original format
       [not found]             ` <55003C4B.4000002-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2015-03-12 10:34               ` 'Jason Seeley' via pandoc-discuss
  0 siblings, 0 replies; 5+ messages in thread
From: 'Jason Seeley' via pandoc-discuss @ 2015-03-12 10:34 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw; +Cc: bpj-J3H7GcXPSITLoDKTGw+V6w


[-- Attachment #1.1: Type: text/plain, Size: 5729 bytes --]

FWIW, I use filepp for the same purpose. By default, it just uses c-style 
"#include <file.inc>", but the hash character can be changed if you're 
worried about clashing with headers. (This also lets me have simpler 
comments, and regexes to add my own markup for, say, divs.) 
http://freecode.com/projects/filepp

On Wednesday, March 11, 2015 at 8:23:08 AM UTC-5, BP Jonsson wrote:
>
> Den 2015-03-11 10:36, ne...-WPTjrydoUPgeaOpM6FAJmQkbCANdLtlA@public.gmane.org <javascript:> skrev: 
> > Thanks for the explanation. Makes sense! 
> > 
> > Calling pandoc inside the filter won't neceassy have the desired effect 
> > when the included file uses global structures/references like headers or 
> > food notes. 
> > 
> > I guess the only way to do this properly is a preprocessor which inserts 
> > the file before pandoc sees it. 
>
> I use Template::Toolkit[^1] to achieve this, and sometimes other 
> things too.  I wrote my own commandline tool which invokes TT on 
> specified files, optionally taking specified data from YAML files, 
> and writes output to stdout, whence I pipe it to pandoc.  Probably 
> overkill, but it works.  I guess any general purpose templating 
> engine which can include files would do. 
>
> I got the idea from a blog post written by a list member about 
> using gpp for the same purpose[^2].  The advantage with using TT 
> is that I can just pick any two strings[^3] which don't otherwise 
> occur in the file and then don't need to escape anything in the 
> literal text, and I can embed Perl code if I need to, or write my 
> own plugins, which I prefer. 
>
> [^1]: <https://metacpan.org/pod/distribution/Template-Toolkit> 
> [^2]: 
> <
> https://randomdeterminism.wordpress.com/2012/06/01/how-i-stopped-worring-and-started-using-markdown-like-tex/> 
>
>
> [^3]: Usually I use ``` ``{{ ... }}`` ``` which allows me to 
> render/print the file without processing it as a template, with 
> the template stuff visible as 'code'.  If I need an actual code 
> span beginning/ending with two curlies I just put a space between 
> the curlies and the backticks.  To pandoc it's the same but to TT 
> it makes a big difference! 
>
> > On Wednesday, March 11, 2015 at 7:19:22 PM UTC+11, Aaron O'leary wrote: 
> >> 
> >> Hi, 
> >> 
> >> Neat filter idea! 
> >> 
> >> The problem is that pandoc works like: Reader -> AST -> filter -> AST 
> -> 
> >> Writer 
> >> 
> >> The Reader (e.g. the markdown reader) knows that '*'  means 'emphasis' 
> >> and represents that in the AST. 
> >> 
> >> When we get to the filter the reader is out of the picture and any 
> >> input read from a file will be treated as Str. The writer won't 
> >> inspect the contents of Str to render your '*important'. 
> >> 
> >> What you'd need to do is call the markdown reader. The way to do this 
> >> in a Python filter is to call pandoc from python (using subprocess or 
> >> similar) and then inject the resulting json. 
> >> 
> >> On 11 March 2015 at 05:13,  <ne...-WPTjrydoUPgeaOpM6FAJmQkbCANdLtlA@public.gmane.org 
> <javascript:>> 
> >> wrote: 
> >>> Hi there 
> >>> 
> >>> I have started a python pandocfilter which replaces a codeblock like 
> >>> 
> >>> ~~~~ {include=inc.md} 
> >>> ~~~~ 
> >>> 
> >>> with the content of the file inc.md. My version of the filter looks 
> >> like 
> >>> this: 
> >>> 
> >>> #!/usr/bin/env python 
> >>> import pandocfilters as pf 
> >>> 
> >>> def include(key, val, fmt, meta): 
> >>>      if key == 'CodeBlock': 
> >>>          [[id, classes, keyvals], code] = val 
> >>>          for kv in keyvals: 
> >>>              if kv[0] == 'include': 
> >>>                  inline = [] 
> >>>                  with open(kv[1], 'rt') as f: 
> >>>                      inline = [ pf.Str(l) for l in f.readlines() ] 
> >>>                  return pf.Para(inline) 
> >>>      return None 
> >>> 
> >>> if __name__ == "__main__": 
> >>>      pf.toJSONFilter(include) 
> >>> 
> >>> 
> >>> However, the included file is inserted as if it were plain text; e.g. 
> if 
> >> the 
> >>> included file contains "*important*" than the output will just contain 
> >>> "*important*" and not the emphasized version of "important" in the 
> >> output 
> >>> format. 
> >>> 
> >>> Instead I want the included file to be process as a file like the 
> >> "parent" 
> >>> file (normally markdown). 
> >>> 
> >>> How can I achieve this? I think the problem is that I contruct a list 
> of 
> >>> pandoc Str objects, but what should I construct instead? 
> >>> 
> >>> -- 
> >>> You received this message because you are subscribed to the Google 
> >> Groups 
> >>> "pandoc-discuss" group. 
> >>> To unsubscribe from this group and stop receiving emails from it, send 
> >> an 
> >>> email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <javascript:>. 
> >>> To post to this group, send email to pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org 
> >> <javascript:>. 
> >>> To view this discussion on the web visit 
> >>> 
> >> 
> https://groups.google.com/d/msgid/pandoc-discuss/a320b195-1cd6-4cf5-a0d6-678fa137b569%40googlegroups.com. 
>
> >> 
> >>> For more options, visit https://groups.google.com/d/optout. 
> >> 
> > 
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5e8049ff-2781-4a20-ae09-63bb044fde20%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 10339 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-03-12 10:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-11  5:13 python filter to include files in original format news-WPTjrydoUPgeaOpM6FAJmQkbCANdLtlA
     [not found] ` <a320b195-1cd6-4cf5-a0d6-678fa137b569-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2015-03-11  8:19   ` Aaron O'Leary
     [not found]     ` <CAHzsXVX2kEwuVwO+6bDSow25+eRLhWSbPJjbrVau3hgeRP2Skw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-03-11  9:36       ` news-WPTjrydoUPgeaOpM6FAJmQkbCANdLtlA
     [not found]         ` <559514e2-10a0-45f5-989a-f54e3b070b35-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2015-03-11 12:59           ` BP Jonsson
     [not found]             ` <55003C4B.4000002-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-03-12 10:34               ` 'Jason Seeley' via pandoc-discuss

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).