* Lua filters with ms output [Was: Re: Typesetting Markdiown - Part 8]
[not found] ` <8e93804b-8b3e-48ea-b0a4-620dc0ab77d1-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-04-28 8:05 ` Albert Krewinkel
2020-04-29 0:34 ` T. Kurt Bond
@ 2020-04-30 2:56 ` T. Kurt Bond
[not found] ` <20200429.225635.1056265120665984150.tkurtbond-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2020-05-03 1:56 ` Lua filters with ms output T. Kurt Bond
3 siblings, 1 reply; 7+ messages in thread
From: T. Kurt Bond @ 2020-04-30 2:56 UTC (permalink / raw)
To: Dave Jarvis, pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
Dave Jarvis <dave.jarvis-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Bored? Feeling socially isolated? Need some fun times with Pandoc?
>
> How about typesetting a 100-year-old poem? Or converting epubs to
> Markdown,
> then Markdown into PDF documents?
>
> https://dave.autonoma.ca/blog/2020/04/28/typesetting-markdown-part-8/
>
> I hope you find the post useful.
I wanted to try something similar to the Lua filter classify.lua in
that blog, but for ms output instead of context. Here's what I came
up with:
===== classify-ms.lua ======================================
-- from:
https://dave.autonoma.ca/blog/2020/04/28/typesetting-markdown-part-8/
function Div( element )
local annotation = element.classes:find_if( matches )
if annotation then
annotation = annotation:gsub( "[^%w]*", "" )
return {
ms( ".start", annotation ),
element,
ms( ".stop", annotation )
}
end
end
function Span(element)
local annotation = element.classes:find_if(matches)
if annotation then
annotation = annotation:gsub("[^%w]*", "")
return {
ms_inline("\\*[start", annotation, "]"),
element,
ms_inline("\\*[stop", annotation, "]")
}
end
end
function matches( s )
return s:match( "^%a+" )
end
function ms( macro, annotation )
return pandoc.RawBlock( "ms", macro .. annotation )
end
function ms_inline (macro, annotation, stop)
return pandoc.RawInline ("ms", macro .. annotation .. stop)
end
============================================================
I changed the Div function to use groff syntax. That worked fine.
(And I was glad to see that in the groff output these didn't have
extra blank lines like they did in the context output. Why were there
blank lines in the context output, BTW?)
And I added a Span function that did something similar to Div, but
instead of using RawBlock it used RawInline. This way it could handle
ReStructuredText interpreted text roles (like :program:`pandoc`) or
pandoc markdown spans with classes (like [pandoc]{.program}. That
worked fine.
Here's the markdown input file:
===== poem-plus.md =========================================
<!-- From: https://dave.autonoma.ca/blog/2020/04/28/typesetting-markdown-part-8/ -->
``` {=ms}
.ds startprogram \\f[CW]\\m[red]
.ds stopprogram \\m[]\\fP
.de startpoem
.DS
..
.de stoppoem
.DE
..
```
This is a sentence. This sentence talks about [pandoc]{.program}. This is
another sentence.
::: poem
Some say the world will end in fire,
Some say in ice.
From what I've tasted of desire
I hold with those who favor fire.
But if it had to perish twice,
I think I know enough of hate
To say that for destruction ice
Is also great,
And would suffice.
:::
This is a final sentence.
<!--
Local Variables:
compile-command: "pandoc -f markdown -t ms --lua-filter classify-ms.lua --wrap=preserve poem-plus.md"
End:
-->
============================================================
And here is the ms output from poem-plus.md using
--lua-filter=classify-ms.lua:
===== poem-plus.ms =========================================
.ds startprogram \\f[CW]\\m[red]
.ds stopprogram \\m[]\\fP
.de startpoem
.DS
..
.de stoppoem
.DE
..
.LP
This is a sentence.
This sentence talks about \*[startprogram]pandoc\*[stopprogram].
This is
another sentence.
.startpoem
.LP
Some say the world will end in fire,
Some say in ice.
From what I\[cq]ve tasted of desire
I hold with those who favor fire.
But if it had to perish twice,
I think I know enough of hate
To say that for destruction ice
Is also great,
And would suffice.
.stoppoem
.LP
This is a final sentence.
============================================================
And that looks like I expected it to, and the inline stuff works just
like I expected it to, so the word emacs showed up surrouned in
\*[startprogram] and \*[stopprogram], and that makes it constant width
and red in the PDF output.
And the .startpoem and .stoppoem commands showed up as I expected.
Note that they are defined to start a -ms display and end a -ms
display. Displays don't fill lines, so the intent is that the lines
of the poem will each be a separate line in the output.
Unfortunately, there is a problem. See that .LP right after the
.startpoem in the output? It turns out .LP is not allowed in a
display, so the the .LP cancels the display and the lines show up
filled in the output.
I tried defining the .startpoem and .stoppoem macros so that they just
use the raw groff commands ".nf" and ".fi", but there is a problem
with that, too. It turns out that .LP explicitly resets the fill mode
to on, so the lines are filled in the output. (It also resets a bunch
of other things as well, including the font and the font family.
There goes my hope of being able to set poems in EBGaramond instead of
the default family.)
I'm not sure that there is any good fix to this. Do Str elements in
the internal representation have to be in a paragraph? If not, is
there a way that I could write the function Div in the Lua filter to
walk across the contents of a Div with the annotation "poem" to get
rid of the Para elements and replace them with just a list of Str
elements?
Any ideas would be appreciated.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Lua filters with ms output
[not found] ` <8e93804b-8b3e-48ea-b0a4-620dc0ab77d1-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
` (2 preceding siblings ...)
2020-04-30 2:56 ` Lua filters with ms output [Was: Re: Typesetting Markdiown - Part 8] T. Kurt Bond
@ 2020-05-03 1:56 ` T. Kurt Bond
[not found] ` <20200502.215606.548079673999789845.tkurtbond-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
3 siblings, 1 reply; 7+ messages in thread
From: T. Kurt Bond @ 2020-05-03 1:56 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
Dave Jarvis <dave.jarvis-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> started it with:
> How about typesetting a 100-year-old poem? Or converting epubs to
> Markdown,
> then Markdown into PDF documents?
>
> https://dave.autonoma.ca/blog/2020/04/28/typesetting-markdown-part-8/
T. Kurt Bond <tkurtbond-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> made some changes to the filter
from that page and then noted:
> Unfortunately, there is a problem. See that .LP right after the
> .startpoem in the output? It turns out .LP is not allowed in a
> display, so the the .LP cancels the display and the lines show up
> filled in the output.
John MacFarlane <jgm-TVLZxgkOlNX2fBVCVOL8/A@public.gmane.org> wrote:
> Just have your lua filter change the Para element inside the
> container into a Plain element.
That worked very well.
I work with ReStructuredText documents a lot, and wanted to try
something like this with one of them.
This filter wraps spans with a class, such as from interpreted text
roles defined in the source ReST (like :program:`pandoc`) in calls
to user defined groff strings \*[start<class>] and \*[stop<class>]
(the definitions are included in the source ReST as a raw block for
ms output) that include groff escapes to change the font and the
glyph color and then change back to the previous font and glyph
color.
It also wraps divs with classes with calls to user defined groff
macros .start<class> and .stop<class> (also included in the source
ReST as a raw block for ms output).
For divs with the poem class, it converts any contained LineBlock
elements into a list of Plain elements containing its contents,
avoiding the ms output for the LineBlock starting with .LP, which
would cancel the .DS (start display) macro we want to use in the
.startpoem macro definition. The .LP would also reset the font family
in use to the default, another reason to avoid it.
It also converts the empty element that occurs in the line block
as a result of a blank line in the line block input into a RawBlock
that creates a blank line in the ms output, to show the division into
stanzas of the poem.
Interestingly, the first Str elements in the each line in the content
of the line block preserved the leading spaces from the input as
Unicode NO-BREAK SPACE characters, preserving indentation of lines in
the line block. Unfortunately, the width of those spaces alone is not
enough create a visually distinct indentation, so this filter changes
those Str elements into a RawInline that outputs a groff horizfontal
movement whose width is based on the number of leading NO-BREAK SPACE
characters, and follow this with a new Str element that has the
leading NO-BREAK SPACE characters removed.
Here is the lua filter:
===== classify-rst-ms.lua ==================================
onig = require ("rex_onig") -- Need a regex package that understands UTF8.
-- text in LineBreak preserves leading spaces as Unicode NO-BREAK SPACE
leading_nobreakspace_rx = onig.new ("^(\u{a0}+)(.*)$", nil, "UTF8")
function Div( element )
local annotation = element.classes:find_if( matches )
local numPara = 0
if annotation then
annotation = annotation:gsub( "[^%w]*", "" )
if annotation == "poem" then
element = pandoc.walk_block (
element, {
-- Replace LineBlock element with a list of Plain elements
-- containing the LineBlock's subelements.
LineBlock = function (el)
local l = {}
for _, subel in ipairs (el.content) do
if #subel == 0 then
-- If subel is an empty table, output a raw empty line
table.insert (l, pandoc.RawBlock ("ms", "\n\n"))
else
-- Check for leading NO-BREAK SPACE charaters
local m1, m2 = onig.match (subel[1].text,
leading_nobreakspace_rx)
if m1 then
-- Replace the NO-BREAK SPACE characters with a raw
-- groff horizontal movement, because the
-- NO-BREAK SPACE characters are too narrow.
table.insert (subel, 1, pandoc.RawInline ("ms", string.format ("\\h'%dn'", utf8.len (m1))))
-- Modify what was used to be the first item to just
-- include the trailing characters of the match.
subel[2] = pandoc.Str (m2)
table.insert (l, pandoc.Plain (subel))
else
-- Just put the subel in Plain element.
table.insert (l, (pandoc.Plain (subel)))
end
end
end
return l
end })
end
return {
ms( ".start", annotation ),
element,
ms( ".stop", annotation )
}
end
end
function Span(element)
local annotation = element.classes:find_if(matches)
if annotation then
annotation = annotation:gsub("[^%w]*", "")
return {
ms_inline("\\*[start", annotation, "]"),
element,
ms_inline("\\*[stop", annotation, "]")
}
end
end
function matches( s )
return s:match( "^%a+" )
end
function ms( macro, annotation )
return pandoc.RawBlock( "ms", macro .. annotation )
end
function ms_inline (macro, annotation, stop)
return pandoc.RawInline ("ms", macro .. annotation .. stop)
end
============================================================
Here is the ReST source of the document:
===== poem-plus.rst ========================================
Lua Filters For Massaging ``ms`` Output
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
.. raw:: ms
.ds startprogram \\f[CW]\\m[red]
.ds stopprogram \\m[]\\fP
.de startpoem
.ds OLDFAM \\*[FAM]
.ds FAM BM
.DS I 3
..
.de stoppoem
.DE
.ds FAM \\*[OLDFAM]
..
.. role:: program
This is a sentence. This sentence talks about :program:`pandoc`.
This is
another sentence.
.. class:: poem
| Some say the world will end in fire,
| Some say in ice.
| From what I've tasted of desire
| I hold with those who favor fire.
| But if it had to perish twice,
| I think I know enough of hate
| To say that for destruction ice
| Is also great,
| And would suffice.
|
| And another line,
| And an indented line.
This is a final sentence.
============================================================
And here is the ms output:
===== poem-plus-rst.ms =====================================
.SH 1
Lua Filters For Massaging \f[CB]ms\f[B] Output
.pdfhref O 1 "Lua Filters For Massaging ms Output"
.pdfhref M "lua-filters-for-massaging-ms-output"
.ds startprogram \\f[CW]\\m[red]
.ds stopprogram \\m[]\\fP
.de startpoem
.ds OLDFAM \\*[FAM]
.ds FAM BM
.DS I 3
..
.de stoppoem
.DE
.ds FAM \\*[OLDFAM]
..
.LP
This is a sentence.
This sentence talks about \*[startprogram]pandoc\*[stopprogram].
This is
another sentence.
.startpoem
Some say the world will end in fire,
\h'3n'Some say in ice.
From what I\[aq]ve tasted of desire
\h'3n'I hold with those who favor fire.
But if it had to perish twice,
\h'3n'I think I know enough of hate
\h'3n'To say that for destruction ice
\h'3n'Is also great,
And would suffice.
And another line,
\h'3n'And an indented line.
.stoppoem
.LP
This is a final sentence.
============================================================
Being able to rewrite the tree and insert RawBlocks and RawInlines is
really powerful when it comes to customizing output for particular
output formats.
I hope this example is useful for others like me just learning to use
Lua filters.
^ permalink raw reply [flat|nested] 7+ messages in thread