* Help w/ pandoc filter
@ 2015-07-06 20:25 Jeff Larkin
[not found] ` <7ed6f5ef-dd3d-40f5-a096-31a430c16ddc-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: Jeff Larkin @ 2015-07-06 20:25 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
[-- Attachment #1.1: Type: text/plain, Size: 2907 bytes --]
Thanks to those who replied to my earlier email, I've been able to add a
custom latex block to my markdown for highlighting best practices in my
document. I'd like to take it to the next step and make a filter that will
output either this new latex element or an HTML div if encountered.
Currently I start each best practice paragraph with ***Best Practice:***,
so I'm thinking I can key on that and add what I need around that
paragraph. I'm struggling a bit to understand the pandoc filters system.
I'm working from the pandoc-filters python examples and source. I *think* I
need to key on the Para type, rather than RawBlock, but I'm not certain and
can't really find much documentation about the types. The filter below
prints to stderr that it's found the string I'm searching for, but when I
try to modify the text and return the results I'm not getting the expected
results. To simplify things, I've boiled it all down to a filter that finds
"Best Practice" and changes is to "Worst Practice." If I can get that
working, I think I can get the rest.
== Filter source ==
from pandocfilters import toJSONFilter, Str, Para, RawBlock, stringify
import re
import sys
def bestpractice(key, value, format, meta):
global inblock
if key == 'Para':
s = stringify(value)
if format == "latex":
if re.search("Best Practice:", s):
sys.stderr.write("FOUND")
s.replace("Best", "Worst")
inblock = True
return Para([Str(s)])
if __name__ == "__main__":
toJSONFilter(bestpractice)
== Markdown ==
***Best Practice:*** This is a best practice. It's limited to a single
paragraph, although I could add something to end the range if that'll
simplify things.
I see in the output that the token is being found, but "Worst" doesn't
appear in the resulting document. I'm really not sure what to return the
edited string as (Para([Str(s))]) is just what I happened to get to run
without error). These types don't seem very well documented and I've dug
through both the python and haskell sources, but haven't figured them out.
Is there a better option than using stringify on the paragraph and then
trying to convert it back into a Para object? Is anyone aware of additional
filter examples that may include something that munges paragraphs like this?
Thanks for the help.
-Jeff
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/7ed6f5ef-dd3d-40f5-a096-31a430c16ddc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
[-- Attachment #1.2: Type: text/html, Size: 3726 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Help w/ pandoc filter
[not found] ` <7ed6f5ef-dd3d-40f5-a096-31a430c16ddc-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2015-07-06 20:27 ` Jeff Larkin
[not found] ` <6ae2f385-b499-4fa5-9e91-9fc0c5428b50-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: Jeff Larkin @ 2015-07-06 20:27 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
[-- Attachment #1.1: Type: text/plain, Size: 3335 bytes --]
Well, shoot, I found the python mistake preventing it from actually editing
the string. It's doing what I expected now, although I'm still curious if
my call to stringify() followed by converting back into a Para object is
the right way to do this.
Thanks.
On Monday, July 6, 2015 at 4:25:09 PM UTC-4, Jeff Larkin wrote:
>
> Thanks to those who replied to my earlier email, I've been able to add a
> custom latex block to my markdown for highlighting best practices in my
> document. I'd like to take it to the next step and make a filter that will
> output either this new latex element or an HTML div if encountered.
> Currently I start each best practice paragraph with ***Best Practice:***,
> so I'm thinking I can key on that and add what I need around that
> paragraph. I'm struggling a bit to understand the pandoc filters system.
> I'm working from the pandoc-filters python examples and source. I *think* I
> need to key on the Para type, rather than RawBlock, but I'm not certain and
> can't really find much documentation about the types. The filter below
> prints to stderr that it's found the string I'm searching for, but when I
> try to modify the text and return the results I'm not getting the expected
> results. To simplify things, I've boiled it all down to a filter that finds
> "Best Practice" and changes is to "Worst Practice." If I can get that
> working, I think I can get the rest.
>
> == Filter source ==
> from pandocfilters import toJSONFilter, Str, Para, RawBlock, stringify
> import re
> import sys
>
> def bestpractice(key, value, format, meta):
> global inblock
> if key == 'Para':
> s = stringify(value)
> if format == "latex":
> if re.search("Best Practice:", s):
> sys.stderr.write("FOUND")
> s.replace("Best", "Worst")
> inblock = True
> return Para([Str(s)])
> if __name__ == "__main__":
> toJSONFilter(bestpractice)
>
> == Markdown ==
> ***Best Practice:*** This is a best practice. It's limited to a single
> paragraph, although I could add something to end the range if that'll
> simplify things.
>
> I see in the output that the token is being found, but "Worst" doesn't
> appear in the resulting document. I'm really not sure what to return the
> edited string as (Para([Str(s))]) is just what I happened to get to run
> without error). These types don't seem very well documented and I've dug
> through both the python and haskell sources, but haven't figured them out.
> Is there a better option than using stringify on the paragraph and then
> trying to convert it back into a Para object? Is anyone aware of additional
> filter examples that may include something that munges paragraphs like this?
>
> Thanks for the help.
>
> -Jeff
>
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/6ae2f385-b499-4fa5-9e91-9fc0c5428b50%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
[-- Attachment #1.2: Type: text/html, Size: 4245 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Help w/ pandoc filter
[not found] ` <6ae2f385-b499-4fa5-9e91-9fc0c5428b50-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2015-07-07 11:31 ` BP Jonsson
[not found] ` <559BB87F.6040708-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: BP Jonsson @ 2015-07-07 11:31 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
I have done some iterations of filters for getting custom
LaTeX/HTML from the same Markdown source and in my experience you
are better off using a (native) div or span in the markdown and
then prepending and appending raw LaTeX
(`\begin{foo}`--`\end{foo}` or `\foo{`--`}`) at the beginning and
end of the Div/Span content list (as RawBlock or RawInline as
appropriate). You can leave the Div/Span in place with the only
consequences that if there is an id on the div pandoc will insert
a `\hyperdef{}{ID}{\label{ID}}` before the contents of the Div and
that Span contents will be wrapped in a pair of braces. You don't
even need the `\begin`--`\end` aliasing trick I showed the other
day; pandoc's latex *writer* will happily insert the fragments of
raw LaTeX and still output the rest of the contents properly
formatted. Just don't reformat your Markdown with such a filter
active! :-)
If you can read Perl I will write some documentation for my latest
iteration and upload it later today.
Den 2015-07-06 22:27, Jeff Larkin skrev:
> Well, shoot, I found the python mistake preventing it from actually editing
> the string. It's doing what I expected now, although I'm still curious if
> my call to stringify() followed by converting back into a Para object is
> the right way to do this.
>
> Thanks.
>
>
> On Monday, July 6, 2015 at 4:25:09 PM UTC-4, Jeff Larkin wrote:
>>
>> Thanks to those who replied to my earlier email, I've been able to add a
>> custom latex block to my markdown for highlighting best practices in my
>> document. I'd like to take it to the next step and make a filter that will
>> output either this new latex element or an HTML div if encountered.
>> Currently I start each best practice paragraph with ***Best Practice:***,
>> so I'm thinking I can key on that and add what I need around that
>> paragraph. I'm struggling a bit to understand the pandoc filters system.
>> I'm working from the pandoc-filters python examples and source. I *think* I
>> need to key on the Para type, rather than RawBlock, but I'm not certain and
>> can't really find much documentation about the types. The filter below
>> prints to stderr that it's found the string I'm searching for, but when I
>> try to modify the text and return the results I'm not getting the expected
>> results. To simplify things, I've boiled it all down to a filter that finds
>> "Best Practice" and changes is to "Worst Practice." If I can get that
>> working, I think I can get the rest.
>>
>> == Filter source ==
>> from pandocfilters import toJSONFilter, Str, Para, RawBlock, stringify
>> import re
>> import sys
>>
>> def bestpractice(key, value, format, meta):
>> global inblock
>> if key == 'Para':
>> s = stringify(value)
>> if format == "latex":
>> if re.search("Best Practice:", s):
>> sys.stderr.write("FOUND")
>> s.replace("Best", "Worst")
>> inblock = True
>> return Para([Str(s)])
>> if __name__ == "__main__":
>> toJSONFilter(bestpractice)
>>
>> == Markdown ==
>> ***Best Practice:*** This is a best practice. It's limited to a single
>> paragraph, although I could add something to end the range if that'll
>> simplify things.
>>
>> I see in the output that the token is being found, but "Worst" doesn't
>> appear in the resulting document. I'm really not sure what to return the
>> edited string as (Para([Str(s))]) is just what I happened to get to run
>> without error). These types don't seem very well documented and I've dug
>> through both the python and haskell sources, but haven't figured them out.
>> Is there a better option than using stringify on the paragraph and then
>> trying to convert it back into a Para object? Is anyone aware of additional
>> filter examples that may include something that munges paragraphs like this?
>>
>> Thanks for the help.
>>
>> -Jeff
>>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Help w/ pandoc filter
[not found] ` <559BB87F.6040708-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2015-07-07 13:37 ` Jeff Larkin
[not found] ` <68fec00d-37e1-4098-96e0-2a9eca0f8d4b-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: Jeff Larkin @ 2015-07-07 13:37 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw; +Cc: bpj-J3H7GcXPSITLoDKTGw+V6w
[-- Attachment #1.1: Type: text/plain, Size: 5031 bytes --]
Thanks for sharing your experience. Are you suggesting just embedding both
the latex and <div> in the markdown and letting pandoc deal with it?
I haven't read much Perl in a few years, but I can probably make sense of
it. Thanks for sharing.
On Tuesday, July 7, 2015 at 7:31:15 AM UTC-4, BP Jonsson wrote:
>
> I have done some iterations of filters for getting custom
> LaTeX/HTML from the same Markdown source and in my experience you
> are better off using a (native) div or span in the markdown and
> then prepending and appending raw LaTeX
> (`\begin{foo}`--`\end{foo}` or `\foo{`--`}`) at the beginning and
> end of the Div/Span content list (as RawBlock or RawInline as
> appropriate). You can leave the Div/Span in place with the only
> consequences that if there is an id on the div pandoc will insert
> a `\hyperdef{}{ID}{\label{ID}}` before the contents of the Div and
> that Span contents will be wrapped in a pair of braces. You don't
> even need the `\begin`--`\end` aliasing trick I showed the other
> day; pandoc's latex *writer* will happily insert the fragments of
> raw LaTeX and still output the rest of the contents properly
> formatted. Just don't reformat your Markdown with such a filter
> active! :-)
>
> If you can read Perl I will write some documentation for my latest
> iteration and upload it later today.
>
> Den 2015-07-06 22:27, Jeff Larkin skrev:
> > Well, shoot, I found the python mistake preventing it from actually
> editing
> > the string. It's doing what I expected now, although I'm still curious
> if
> > my call to stringify() followed by converting back into a Para object is
> > the right way to do this.
> >
> > Thanks.
> >
> >
> > On Monday, July 6, 2015 at 4:25:09 PM UTC-4, Jeff Larkin wrote:
> >>
> >> Thanks to those who replied to my earlier email, I've been able to add
> a
> >> custom latex block to my markdown for highlighting best practices in my
> >> document. I'd like to take it to the next step and make a filter that
> will
> >> output either this new latex element or an HTML div if encountered.
> >> Currently I start each best practice paragraph with ***Best
> Practice:***,
> >> so I'm thinking I can key on that and add what I need around that
> >> paragraph. I'm struggling a bit to understand the pandoc filters
> system.
> >> I'm working from the pandoc-filters python examples and source. I
> *think* I
> >> need to key on the Para type, rather than RawBlock, but I'm not certain
> and
> >> can't really find much documentation about the types. The filter below
> >> prints to stderr that it's found the string I'm searching for, but when
> I
> >> try to modify the text and return the results I'm not getting the
> expected
> >> results. To simplify things, I've boiled it all down to a filter that
> finds
> >> "Best Practice" and changes is to "Worst Practice." If I can get that
> >> working, I think I can get the rest.
> >>
> >> == Filter source ==
> >> from pandocfilters import toJSONFilter, Str, Para, RawBlock, stringify
> >> import re
> >> import sys
> >>
> >> def bestpractice(key, value, format, meta):
> >> global inblock
> >> if key == 'Para':
> >> s = stringify(value)
> >> if format == "latex":
> >> if re.search("Best Practice:", s):
> >> sys.stderr.write("FOUND")
> >> s.replace("Best", "Worst")
> >> inblock = True
> >> return Para([Str(s)])
> >> if __name__ == "__main__":
> >> toJSONFilter(bestpractice)
> >>
> >> == Markdown ==
> >> ***Best Practice:*** This is a best practice. It's limited to a single
> >> paragraph, although I could add something to end the range if that'll
> >> simplify things.
> >>
> >> I see in the output that the token is being found, but "Worst" doesn't
> >> appear in the resulting document. I'm really not sure what to return
> the
> >> edited string as (Para([Str(s))]) is just what I happened to get to run
> >> without error). These types don't seem very well documented and I've
> dug
> >> through both the python and haskell sources, but haven't figured them
> out.
> >> Is there a better option than using stringify on the paragraph and then
> >> trying to convert it back into a Para object? Is anyone aware of
> additional
> >> filter examples that may include something that munges paragraphs like
> this?
> >>
> >> Thanks for the help.
> >>
> >> -Jeff
> >>
> >
>
>
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/68fec00d-37e1-4098-96e0-2a9eca0f8d4b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
[-- Attachment #1.2: Type: text/html, Size: 6266 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Help w/ pandoc filter
[not found] ` <68fec00d-37e1-4098-96e0-2a9eca0f8d4b-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2015-07-09 11:58 ` BP Jonsson
0 siblings, 0 replies; 5+ messages in thread
From: BP Jonsson @ 2015-07-09 11:58 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
Sorry for the delay in answring. Real life demanded my attention
elsewhere the last two days.
Den 2015-07-07 15:37, Jeff Larkin skrev:
> Thanks for sharing your experience. Are you suggesting just embedding both
> the latex and <div> in the markdown and letting pandoc deal with it?
No, you should just use a `<div>` with an appropriate class and
then have the filter look up divs with that class and embed the
LaTeX at the beginning and end of the Div object's content list.
In JSON terms (with added line breaks here for clarity) the filter
should replace
{"t":"Div","c":[["",["foo"],[]],[
{"t":"Plain","c":[{"t":"Str","c":"..."}]}
]]}
with
{"t":"Div","c":[["",["foo"],[]],[
{"t":"RawBlock","c":["tex","\\begin{foo}"]},
{"t":"Plain","c":[{"t":"Str","c":"..."}]},
{"t":"RawBlock","c":["tex","\\end{foo}"]}
]]}
and you will get correct LaTeX 'automatically' because of the way
pandoc handles divs when generating LaTeX. You can in my
experience even construct the
`{"t":"RawBlock","c":["tex","\\begin{foo}"]},` and
`..."\\end{foo}"]},` objects just once outside your traversal loop
and insert it multiple times, and you will get correct JSON,
though the behavior of different JSON encoders (and perhaps of
Perl and Python!) may differ in that regard.
>
> I haven't read much Perl in a few years, but I can probably make sense of
> it. Thanks for sharing.
I hope to deliver on that later today.
>
> On Tuesday, July 7, 2015 at 7:31:15 AM UTC-4, BP Jonsson wrote:
>>
>> I have done some iterations of filters for getting custom
>> LaTeX/HTML from the same Markdown source and in my experience you
>> are better off using a (native) div or span in the markdown and
>> then prepending and appending raw LaTeX
>> (`\begin{foo}`--`\end{foo}` or `\foo{`--`}`) at the beginning and
>> end of the Div/Span content list (as RawBlock or RawInline as
>> appropriate). You can leave the Div/Span in place with the only
>> consequences that if there is an id on the div pandoc will insert
>> a `\hyperdef{}{ID}{\label{ID}}` before the contents of the Div and
>> that Span contents will be wrapped in a pair of braces. You don't
>> even need the `\begin`--`\end` aliasing trick I showed the other
>> day; pandoc's latex *writer* will happily insert the fragments of
>> raw LaTeX and still output the rest of the contents properly
>> formatted. Just don't reformat your Markdown with such a filter
>> active! :-)
>>
>> If you can read Perl I will write some documentation for my latest
>> iteration and upload it later today.
>>
>> Den 2015-07-06 22:27, Jeff Larkin skrev:
>>> Well, shoot, I found the python mistake preventing it from actually
>> editing
>>> the string. It's doing what I expected now, although I'm still curious
>> if
>>> my call to stringify() followed by converting back into a Para object is
>>> the right way to do this.
>>>
>>> Thanks.
>>>
>>>
>>> On Monday, July 6, 2015 at 4:25:09 PM UTC-4, Jeff Larkin wrote:
>>>>
>>>> Thanks to those who replied to my earlier email, I've been able to add
>> a
>>>> custom latex block to my markdown for highlighting best practices in my
>>>> document. I'd like to take it to the next step and make a filter that
>> will
>>>> output either this new latex element or an HTML div if encountered.
>>>> Currently I start each best practice paragraph with ***Best
>> Practice:***,
>>>> so I'm thinking I can key on that and add what I need around that
>>>> paragraph. I'm struggling a bit to understand the pandoc filters
>> system.
>>>> I'm working from the pandoc-filters python examples and source. I
>> *think* I
>>>> need to key on the Para type, rather than RawBlock, but I'm not certain
>> and
>>>> can't really find much documentation about the types. The filter below
>>>> prints to stderr that it's found the string I'm searching for, but when
>> I
>>>> try to modify the text and return the results I'm not getting the
>> expected
>>>> results. To simplify things, I've boiled it all down to a filter that
>> finds
>>>> "Best Practice" and changes is to "Worst Practice." If I can get that
>>>> working, I think I can get the rest.
>>>>
>>>> == Filter source ==
>>>> from pandocfilters import toJSONFilter, Str, Para, RawBlock, stringify
>>>> import re
>>>> import sys
>>>>
>>>> def bestpractice(key, value, format, meta):
>>>> global inblock
>>>> if key == 'Para':
>>>> s = stringify(value)
>>>> if format == "latex":
>>>> if re.search("Best Practice:", s):
>>>> sys.stderr.write("FOUND")
>>>> s.replace("Best", "Worst")
>>>> inblock = True
>>>> return Para([Str(s)])
>>>> if __name__ == "__main__":
>>>> toJSONFilter(bestpractice)
>>>>
>>>> == Markdown ==
>>>> ***Best Practice:*** This is a best practice. It's limited to a single
>>>> paragraph, although I could add something to end the range if that'll
>>>> simplify things.
>>>>
>>>> I see in the output that the token is being found, but "Worst" doesn't
>>>> appear in the resulting document. I'm really not sure what to return
>> the
>>>> edited string as (Para([Str(s))]) is just what I happened to get to run
>>>> without error). These types don't seem very well documented and I've
>> dug
>>>> through both the python and haskell sources, but haven't figured them
>> out.
>>>> Is there a better option than using stringify on the paragraph and then
>>>> trying to convert it back into a Para object? Is anyone aware of
>> additional
>>>> filter examples that may include something that munges paragraphs like
>> this?
>>>>
>>>> Thanks for the help.
>>>>
>>>> -Jeff
>>>>
>>>
>>
>>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-07-09 11:58 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-06 20:25 Help w/ pandoc filter Jeff Larkin
[not found] ` <7ed6f5ef-dd3d-40f5-a096-31a430c16ddc-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2015-07-06 20:27 ` Jeff Larkin
[not found] ` <6ae2f385-b499-4fa5-9e91-9fc0c5428b50-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2015-07-07 11:31 ` BP Jonsson
[not found] ` <559BB87F.6040708-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-07-07 13:37 ` Jeff Larkin
[not found] ` <68fec00d-37e1-4098-96e0-2a9eca0f8d4b-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2015-07-09 11:58 ` BP Jonsson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).