Help w/ pandoc filter

public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed

* Help w/ pandoc filter
@ 2015-07-06 20:25 Jeff Larkin
       [not found] ` <7ed6f5ef-dd3d-40f5-a096-31a430c16ddc-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff Larkin @ 2015-07-06 20:25 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1.1: Type: text/plain, Size: 2907 bytes --]

Thanks to those who replied to my earlier email, I've been able to add a 
custom latex block to my markdown for highlighting best practices in my 
document. I'd like to take it to the next step and make a filter that will 
output either this new latex element or an HTML div if encountered. 
Currently I start each best practice paragraph with ***Best Practice:***, 
so I'm thinking I can key on that and add what I need around that 
paragraph. I'm struggling a bit to understand the pandoc filters system. 
I'm working from the pandoc-filters python examples and source. I *think* I 
need to key on the Para type, rather than RawBlock, but I'm not certain and 
can't really find much documentation about the types. The filter below 
prints to stderr that it's found the string I'm searching for, but when I 
try to modify the text and return the results I'm not getting the expected 
results. To simplify things, I've boiled it all down to a filter that finds 
"Best Practice" and changes is to "Worst Practice." If I can get that 
working, I think I can get the rest.

== Filter source ==
from pandocfilters import toJSONFilter, Str, Para, RawBlock, stringify
import re
import sys

def bestpractice(key, value, format, meta):
  global inblock
  if key == 'Para':
    s = stringify(value)
    if format == "latex":
      if re.search("Best Practice:", s):
        sys.stderr.write("FOUND")
        s.replace("Best", "Worst")
        inblock = True
      return Para([Str(s)])
if __name__ == "__main__":
  toJSONFilter(bestpractice)

== Markdown ==
***Best Practice:*** This is a best practice. It's limited to a single 
paragraph, although I could add something to end the range if that'll 
simplify things.

I see in the output that the token is being found, but "Worst" doesn't 
appear in the resulting document. I'm really not sure what to return the 
edited string as (Para([Str(s))]) is just what I happened to get to run 
without error). These types don't seem very well documented and I've dug 
through both the python and haskell sources, but haven't figured them out. 
Is there a better option than using stringify on the paragraph and then 
trying to convert it back into a Para object? Is anyone aware of additional 
filter examples that may include something that munges paragraphs like this?

Thanks for the help.

-Jeff

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/7ed6f5ef-dd3d-40f5-a096-31a430c16ddc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 3726 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Help w/ pandoc filter
       [not found] ` <7ed6f5ef-dd3d-40f5-a096-31a430c16ddc-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2015-07-06 20:27   ` Jeff Larkin
       [not found]     ` <6ae2f385-b499-4fa5-9e91-9fc0c5428b50-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff Larkin @ 2015-07-06 20:27 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


[-- Attachment #1.1: Type: text/plain, Size: 3335 bytes --]

Well, shoot, I found the python mistake preventing it from actually editing 
the string. It's doing what I expected now, although I'm still curious if 
my call to stringify() followed by converting back into a Para object is 
the right way to do this. 

Thanks.


On Monday, July 6, 2015 at 4:25:09 PM UTC-4, Jeff Larkin wrote:
>
> Thanks to those who replied to my earlier email, I've been able to add a 
> custom latex block to my markdown for highlighting best practices in my 
> document. I'd like to take it to the next step and make a filter that will 
> output either this new latex element or an HTML div if encountered. 
> Currently I start each best practice paragraph with ***Best Practice:***, 
> so I'm thinking I can key on that and add what I need around that 
> paragraph. I'm struggling a bit to understand the pandoc filters system. 
> I'm working from the pandoc-filters python examples and source. I *think* I 
> need to key on the Para type, rather than RawBlock, but I'm not certain and 
> can't really find much documentation about the types. The filter below 
> prints to stderr that it's found the string I'm searching for, but when I 
> try to modify the text and return the results I'm not getting the expected 
> results. To simplify things, I've boiled it all down to a filter that finds 
> "Best Practice" and changes is to "Worst Practice." If I can get that 
> working, I think I can get the rest.
>
> == Filter source ==
> from pandocfilters import toJSONFilter, Str, Para, RawBlock, stringify
> import re
> import sys
>
> def bestpractice(key, value, format, meta):
>   global inblock
>   if key == 'Para':
>     s = stringify(value)
>     if format == "latex":
>       if re.search("Best Practice:", s):
>         sys.stderr.write("FOUND")
>         s.replace("Best", "Worst")
>         inblock = True
>       return Para([Str(s)])
> if __name__ == "__main__":
>   toJSONFilter(bestpractice)
>
> == Markdown ==
> ***Best Practice:*** This is a best practice. It's limited to a single 
> paragraph, although I could add something to end the range if that'll 
> simplify things.
>
> I see in the output that the token is being found, but "Worst" doesn't 
> appear in the resulting document. I'm really not sure what to return the 
> edited string as (Para([Str(s))]) is just what I happened to get to run 
> without error). These types don't seem very well documented and I've dug 
> through both the python and haskell sources, but haven't figured them out. 
> Is there a better option than using stringify on the paragraph and then 
> trying to convert it back into a Para object? Is anyone aware of additional 
> filter examples that may include something that munges paragraphs like this?
>
> Thanks for the help.
>
> -Jeff
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/6ae2f385-b499-4fa5-9e91-9fc0c5428b50%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 4245 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Help w/ pandoc filter
       [not found]     ` <6ae2f385-b499-4fa5-9e91-9fc0c5428b50-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2015-07-07 11:31       ` BP Jonsson
       [not found]         ` <559BB87F.6040708-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: BP Jonsson @ 2015-07-07 11:31 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

I have done some iterations of filters for getting custom 
LaTeX/HTML from the same Markdown source and in my experience you 
are better off using a (native) div or span in the markdown and 
then prepending and appending raw LaTeX 
(`\begin{foo}`--`\end{foo}` or `\foo{`--`}`) at the beginning and 
end of the Div/Span content list (as RawBlock or RawInline as 
appropriate). You can leave the Div/Span in place with the only 
consequences that if there is an id on the div pandoc will insert 
a `\hyperdef{}{ID}{\label{ID}}` before the contents of the Div and 
that Span contents will be wrapped in a pair of braces. You don't 
even need the `\begin`--`\end` aliasing trick I showed the other 
day; pandoc's latex *writer* will happily insert the fragments of 
raw LaTeX and still output the rest of the contents properly 
formatted. Just don't reformat your Markdown with such a filter 
active! :-)

If you can read Perl I will write some documentation for my latest 
iteration and upload it later today.

Den 2015-07-06 22:27, Jeff Larkin skrev:
> Well, shoot, I found the python mistake preventing it from actually editing
> the string. It's doing what I expected now, although I'm still curious if
> my call to stringify() followed by converting back into a Para object is
> the right way to do this.
>
> Thanks.
>
>
> On Monday, July 6, 2015 at 4:25:09 PM UTC-4, Jeff Larkin wrote:
>>
>> Thanks to those who replied to my earlier email, I've been able to add a
>> custom latex block to my markdown for highlighting best practices in my
>> document. I'd like to take it to the next step and make a filter that will
>> output either this new latex element or an HTML div if encountered.
>> Currently I start each best practice paragraph with ***Best Practice:***,
>> so I'm thinking I can key on that and add what I need around that
>> paragraph. I'm struggling a bit to understand the pandoc filters system.
>> I'm working from the pandoc-filters python examples and source. I *think* I
>> need to key on the Para type, rather than RawBlock, but I'm not certain and
>> can't really find much documentation about the types. The filter below
>> prints to stderr that it's found the string I'm searching for, but when I
>> try to modify the text and return the results I'm not getting the expected
>> results. To simplify things, I've boiled it all down to a filter that finds
>> "Best Practice" and changes is to "Worst Practice." If I can get that
>> working, I think I can get the rest.
>>
>> == Filter source ==
>> from pandocfilters import toJSONFilter, Str, Para, RawBlock, stringify
>> import re
>> import sys
>>
>> def bestpractice(key, value, format, meta):
>>    global inblock
>>    if key == 'Para':
>>      s = stringify(value)
>>      if format == "latex":
>>        if re.search("Best Practice:", s):
>>          sys.stderr.write("FOUND")
>>          s.replace("Best", "Worst")
>>          inblock = True
>>        return Para([Str(s)])
>> if __name__ == "__main__":
>>    toJSONFilter(bestpractice)
>>
>> == Markdown ==
>> ***Best Practice:*** This is a best practice. It's limited to a single
>> paragraph, although I could add something to end the range if that'll
>> simplify things.
>>
>> I see in the output that the token is being found, but "Worst" doesn't
>> appear in the resulting document. I'm really not sure what to return the
>> edited string as (Para([Str(s))]) is just what I happened to get to run
>> without error). These types don't seem very well documented and I've dug
>> through both the python and haskell sources, but haven't figured them out.
>> Is there a better option than using stringify on the paragraph and then
>> trying to convert it back into a Para object? Is anyone aware of additional
>> filter examples that may include something that munges paragraphs like this?
>>
>> Thanks for the help.
>>
>> -Jeff
>>
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Help w/ pandoc filter
       [not found]         ` <559BB87F.6040708-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2015-07-07 13:37           ` Jeff Larkin
       [not found]             ` <68fec00d-37e1-4098-96e0-2a9eca0f8d4b-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff Larkin @ 2015-07-07 13:37 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw; +Cc: bpj-J3H7GcXPSITLoDKTGw+V6w


[-- Attachment #1.1: Type: text/plain, Size: 5031 bytes --]

Thanks for sharing your experience. Are you suggesting just embedding both 
the latex and <div> in the markdown and letting pandoc deal with it?

I haven't read much Perl in a few years, but I can probably make sense of 
it. Thanks for sharing.

On Tuesday, July 7, 2015 at 7:31:15 AM UTC-4, BP Jonsson wrote:
>
> I have done some iterations of filters for getting custom 
> LaTeX/HTML from the same Markdown source and in my experience you 
> are better off using a (native) div or span in the markdown and 
> then prepending and appending raw LaTeX 
> (`\begin{foo}`--`\end{foo}` or `\foo{`--`}`) at the beginning and 
> end of the Div/Span content list (as RawBlock or RawInline as 
> appropriate). You can leave the Div/Span in place with the only 
> consequences that if there is an id on the div pandoc will insert 
> a `\hyperdef{}{ID}{\label{ID}}` before the contents of the Div and 
> that Span contents will be wrapped in a pair of braces. You don't 
> even need the `\begin`--`\end` aliasing trick I showed the other 
> day; pandoc's latex *writer* will happily insert the fragments of 
> raw LaTeX and still output the rest of the contents properly 
> formatted. Just don't reformat your Markdown with such a filter 
> active! :-) 
>
> If you can read Perl I will write some documentation for my latest 
> iteration and upload it later today. 
>
> Den 2015-07-06 22:27, Jeff Larkin skrev: 
> > Well, shoot, I found the python mistake preventing it from actually 
> editing 
> > the string. It's doing what I expected now, although I'm still curious 
> if 
> > my call to stringify() followed by converting back into a Para object is 
> > the right way to do this. 
> > 
> > Thanks. 
> > 
> > 
> > On Monday, July 6, 2015 at 4:25:09 PM UTC-4, Jeff Larkin wrote: 
> >> 
> >> Thanks to those who replied to my earlier email, I've been able to add 
> a 
> >> custom latex block to my markdown for highlighting best practices in my 
> >> document. I'd like to take it to the next step and make a filter that 
> will 
> >> output either this new latex element or an HTML div if encountered. 
> >> Currently I start each best practice paragraph with ***Best 
> Practice:***, 
> >> so I'm thinking I can key on that and add what I need around that 
> >> paragraph. I'm struggling a bit to understand the pandoc filters 
> system. 
> >> I'm working from the pandoc-filters python examples and source. I 
> *think* I 
> >> need to key on the Para type, rather than RawBlock, but I'm not certain 
> and 
> >> can't really find much documentation about the types. The filter below 
> >> prints to stderr that it's found the string I'm searching for, but when 
> I 
> >> try to modify the text and return the results I'm not getting the 
> expected 
> >> results. To simplify things, I've boiled it all down to a filter that 
> finds 
> >> "Best Practice" and changes is to "Worst Practice." If I can get that 
> >> working, I think I can get the rest. 
> >> 
> >> == Filter source == 
> >> from pandocfilters import toJSONFilter, Str, Para, RawBlock, stringify 
> >> import re 
> >> import sys 
> >> 
> >> def bestpractice(key, value, format, meta): 
> >>    global inblock 
> >>    if key == 'Para': 
> >>      s = stringify(value) 
> >>      if format == "latex": 
> >>        if re.search("Best Practice:", s): 
> >>          sys.stderr.write("FOUND") 
> >>          s.replace("Best", "Worst") 
> >>          inblock = True 
> >>        return Para([Str(s)]) 
> >> if __name__ == "__main__": 
> >>    toJSONFilter(bestpractice) 
> >> 
> >> == Markdown == 
> >> ***Best Practice:*** This is a best practice. It's limited to a single 
> >> paragraph, although I could add something to end the range if that'll 
> >> simplify things. 
> >> 
> >> I see in the output that the token is being found, but "Worst" doesn't 
> >> appear in the resulting document. I'm really not sure what to return 
> the 
> >> edited string as (Para([Str(s))]) is just what I happened to get to run 
> >> without error). These types don't seem very well documented and I've 
> dug 
> >> through both the python and haskell sources, but haven't figured them 
> out. 
> >> Is there a better option than using stringify on the paragraph and then 
> >> trying to convert it back into a Para object? Is anyone aware of 
> additional 
> >> filter examples that may include something that munges paragraphs like 
> this? 
> >> 
> >> Thanks for the help. 
> >> 
> >> -Jeff 
> >> 
> > 
>
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/68fec00d-37e1-4098-96e0-2a9eca0f8d4b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 6266 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Help w/ pandoc filter
       [not found]             ` <68fec00d-37e1-4098-96e0-2a9eca0f8d4b-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2015-07-09 11:58               ` BP Jonsson
  0 siblings, 0 replies; 5+ messages in thread
From: BP Jonsson @ 2015-07-09 11:58 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Sorry for the delay in answring.  Real life demanded my attention
elsewhere the last two days.

Den 2015-07-07 15:37, Jeff Larkin skrev:
> Thanks for sharing your experience. Are you suggesting just embedding both
> the latex and <div> in the markdown and letting pandoc deal with it?

No, you should just use a `<div>` with an appropriate class and 
then have the filter look up divs with that class and embed the 
LaTeX at the beginning and end of the Div object's content list. 
In JSON terms (with added line breaks here for clarity) the filter 
should replace

     {"t":"Div","c":[["",["foo"],[]],[
         {"t":"Plain","c":[{"t":"Str","c":"..."}]}
     ]]}

with

     {"t":"Div","c":[["",["foo"],[]],[
         {"t":"RawBlock","c":["tex","\\begin{foo}"]},
         {"t":"Plain","c":[{"t":"Str","c":"..."}]},
         {"t":"RawBlock","c":["tex","\\end{foo}"]}
     ]]}

and you will get correct LaTeX 'automatically' because of the way 
pandoc handles divs when generating LaTeX. You can in my 
experience even construct the 
`{"t":"RawBlock","c":["tex","\\begin{foo}"]},` and 
`..."\\end{foo}"]},` objects just once outside your traversal loop 
and insert it multiple times, and you will get correct JSON, 
though the behavior of different JSON encoders (and perhaps of 
Perl and Python!) may differ in that regard.

>
> I haven't read much Perl in a few years, but I can probably make sense of
> it. Thanks for sharing.

I hope to deliver on that later today.

>
> On Tuesday, July 7, 2015 at 7:31:15 AM UTC-4, BP Jonsson wrote:
>>
>> I have done some iterations of filters for getting custom
>> LaTeX/HTML from the same Markdown source and in my experience you
>> are better off using a (native) div or span in the markdown and
>> then prepending and appending raw LaTeX
>> (`\begin{foo}`--`\end{foo}` or `\foo{`--`}`) at the beginning and
>> end of the Div/Span content list (as RawBlock or RawInline as
>> appropriate). You can leave the Div/Span in place with the only
>> consequences that if there is an id on the div pandoc will insert
>> a `\hyperdef{}{ID}{\label{ID}}` before the contents of the Div and
>> that Span contents will be wrapped in a pair of braces. You don't
>> even need the `\begin`--`\end` aliasing trick I showed the other
>> day; pandoc's latex *writer* will happily insert the fragments of
>> raw LaTeX and still output the rest of the contents properly
>> formatted. Just don't reformat your Markdown with such a filter
>> active! :-)
>>
>> If you can read Perl I will write some documentation for my latest
>> iteration and upload it later today.
>>
>> Den 2015-07-06 22:27, Jeff Larkin skrev:
>>> Well, shoot, I found the python mistake preventing it from actually
>> editing
>>> the string. It's doing what I expected now, although I'm still curious
>> if
>>> my call to stringify() followed by converting back into a Para object is
>>> the right way to do this.
>>>
>>> Thanks.
>>>
>>>
>>> On Monday, July 6, 2015 at 4:25:09 PM UTC-4, Jeff Larkin wrote:
>>>>
>>>> Thanks to those who replied to my earlier email, I've been able to add
>> a
>>>> custom latex block to my markdown for highlighting best practices in my
>>>> document. I'd like to take it to the next step and make a filter that
>> will
>>>> output either this new latex element or an HTML div if encountered.
>>>> Currently I start each best practice paragraph with ***Best
>> Practice:***,
>>>> so I'm thinking I can key on that and add what I need around that
>>>> paragraph. I'm struggling a bit to understand the pandoc filters
>> system.
>>>> I'm working from the pandoc-filters python examples and source. I
>> *think* I
>>>> need to key on the Para type, rather than RawBlock, but I'm not certain
>> and
>>>> can't really find much documentation about the types. The filter below
>>>> prints to stderr that it's found the string I'm searching for, but when
>> I
>>>> try to modify the text and return the results I'm not getting the
>> expected
>>>> results. To simplify things, I've boiled it all down to a filter that
>> finds
>>>> "Best Practice" and changes is to "Worst Practice." If I can get that
>>>> working, I think I can get the rest.
>>>>
>>>> == Filter source ==
>>>> from pandocfilters import toJSONFilter, Str, Para, RawBlock, stringify
>>>> import re
>>>> import sys
>>>>
>>>> def bestpractice(key, value, format, meta):
>>>>     global inblock
>>>>     if key == 'Para':
>>>>       s = stringify(value)
>>>>       if format == "latex":
>>>>         if re.search("Best Practice:", s):
>>>>           sys.stderr.write("FOUND")
>>>>           s.replace("Best", "Worst")
>>>>           inblock = True
>>>>         return Para([Str(s)])
>>>> if __name__ == "__main__":
>>>>     toJSONFilter(bestpractice)
>>>>
>>>> == Markdown ==
>>>> ***Best Practice:*** This is a best practice. It's limited to a single
>>>> paragraph, although I could add something to end the range if that'll
>>>> simplify things.
>>>>
>>>> I see in the output that the token is being found, but "Worst" doesn't
>>>> appear in the resulting document. I'm really not sure what to return
>> the
>>>> edited string as (Para([Str(s))]) is just what I happened to get to run
>>>> without error). These types don't seem very well documented and I've
>> dug
>>>> through both the python and haskell sources, but haven't figured them
>> out.
>>>> Is there a better option than using stringify on the paragraph and then
>>>> trying to convert it back into a Para object? Is anyone aware of
>> additional
>>>> filter examples that may include something that munges paragraphs like
>> this?
>>>>
>>>> Thanks for the help.
>>>>
>>>> -Jeff
>>>>
>>>
>>
>>
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-07-09 11:58 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-06 20:25 Help w/ pandoc filter Jeff Larkin
     [not found] ` <7ed6f5ef-dd3d-40f5-a096-31a430c16ddc-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2015-07-06 20:27   ` Jeff Larkin
     [not found]     ` <6ae2f385-b499-4fa5-9e91-9fc0c5428b50-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2015-07-07 11:31       ` BP Jonsson
     [not found]         ` <559BB87F.6040708-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-07-07 13:37           ` Jeff Larkin
     [not found]             ` <68fec00d-37e1-4098-96e0-2a9eca0f8d4b-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2015-07-09 11:58               ` BP Jonsson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).