public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Thoughts trying to write a Lua filter
@ 2018-08-17 17:42 Samuele Pilleri
       [not found] ` <6e09bd71-ee77-4de7-a002-953a62325234-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Samuele Pilleri @ 2018-08-17 17:42 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 3505 bytes --]

Hello everyone.

I've decided to write a Lua filter for Pandoc on my own, something similar 
to scotthartley/pandoc-wrapfig 
<https://github.com/scotthartley/pandoc-wrapfig> with a slightly different 
syntax: it uses classes and attributes instead of modifying caption format.

![caption with **bold**](image.png "Image title"){#img .wrap width=3}

So far so good, but diving deeper in the details I've had some problems.

*Width attribute*

Quoting the manual (extension: link attributes 
<https://pandoc.org/MANUAL.html#extension-link_attributes>):

The width and height attributes on images are treated specially. When used 
> without a unit, the unit is assumed to be pixels.
>

and

Dimensions are converted to inches for output in page-based formats like 
> LaTeX. [...] Use the --dpi option to specify the number of pixels per 
> inch. The default is 96dpi.
>

However, PANDOC_READER_OPTIONS doesn't provide --dpi such option. I think 
it would be cool if Pandoc could handle it itself *before* calling the 
writers, or at least provide a conversion function within the API.

*Image with attributes and formatted caption*

Such syntax

![[caption]{.underline}](image.png)

produces the following output:

$ pandoc demo.md -t native
[Para [Image ("",[],[]) [Span ("",["underline"],[]) [Str "caption"]] 
("image.png","fig:")]]

$ pandoc demo.md -t latex
\begin{figure}
\centering
\includegraphics{image.png}
\caption{{caption}}
\end{figure}

$ pandoc demo.md -t html5
<figure>
<img src="image.png" alt="caption" /><figcaption><span class="underline">
caption</span></figcaption>
</figure>

I suppose it could be a bug, or maybe LaTeX doesn't support underline 
within the context of a caption. However I think some clarification is 
needed and the AST passed to Lua should be revised as well since it's 
pretty a mess:

![[caption]{.underline}](image.png)
1:
  c:
    1:
      1:
      2:
        1: underline
      3:
    2:
      1:
        c: caption

![caption](image.png)
1:
  c: caption

![*caption*](image.png)
1:
  c:
    1:
      c: caption

(*): these tables refer to el.c[2] only

Also, I couldn't find any good documentation on parameter's fields (at 
least for images) and it took me a couple of hours to understand it. I 
would add something like this to the docs:

-- Creates an image identical to the one given
function Image (el)
    local id, classes, attrs = unpack(el.c[1])
    local caption = el.c[2]
    local src, title = unpack(el.c[3])

    return pandoc.Image(caption, src, title, {id, classes, attrs})
end

Finally, this it the first filter I write (and my first experience with 
Lua) so I would really appreciate if the community could give me a feedback 
to check if I got everything right: current version is attached, thanks in 
advance!

I would really like to take this opportunity to thank John and all the devs 
for creating Pandoc: it's a unique piece of software, couldn't wish any 
better!

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/6e09bd71-ee77-4de7-a002-953a62325234%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 18152 bytes --]

[-- Attachment #2: pandoc-wrapfig.lua --]
[-- Type: application/octet-stream, Size: 1538 bytes --]

-- See https://tex.stackexchange.com/q/56176

-- Checks if a table contains a key
local function contains (table, val)
	for i=1,#table do
		if table[i] == val then
			return true
		end
	end
	return false
end

-- Searches an array of tuples for a specific key
-- Returns the value if found, nil otherwise
local function search_tuple_value (table, key)
	for i=1,#table do
		if table[i][1] == key then
			return table[i][2]
		end
	end
end

function Image (el)
	local id, classes, attrs = unpack(el.c[1])
	local caption = el.c[2]
	local src, title = unpack(el.c[3])

	tprint(caption)

	if FORMAT == "latex" and contains(classes, "wrap") then
		local side
		if contains(classes, "wrap-float") then
			if contains(classes, "wrap-left") then
				side = "L"
			else
				side = "R"
			end
		else
			if contains(classes, "wrap-left") then
				side = "l"
			else
				side = "r"
			end
		end

		local size = search_tuple_value(attrs, "width") or 0

		local latex_head = [[\begin{wrapfigure}{]] .. side .. '}{' .. size .. 'in}'

		if #caption > 0 then
			local latex_body = [[\centering\includegraphics{]] .. src .. [[}\caption]]
			local latex_tail = [[\end{wrapfigure}]]
			return { pandoc.RawInline(FORMAT, latex_head .. latex_body),
					 pandoc.Span(caption), pandoc.RawInline(FORMAT, latex_tail) }
			--  Should this ^ really be a Span?
		else
			local latex_body = [[\centering\includegraphics{]] .. src .. '}'
			local latex_tail = [[\end{wrapfigure}]]
			return pandoc.RawInline(FORMAT, latex_head .. latex_body .. latex_tail)
		end
	end
end

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Thoughts trying to write a Lua filter
       [not found] ` <6e09bd71-ee77-4de7-a002-953a62325234-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2018-08-19 18:40   ` John MacFarlane
       [not found]     ` <m2wosm5heu.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: John MacFarlane @ 2018-08-19 18:40 UTC (permalink / raw)
  To: Samuele Pilleri, pandoc-discuss

Samuele Pilleri <pillerisamuele-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> However, PANDOC_READER_OPTIONS doesn't provide --dpi such option. I think 
> it would be cool if Pandoc could handle it itself *before* calling the 
> writers, or at least provide a conversion function within the API.

You don't say why this would be useful in your case --
could you elaborate?

> I suppose it could be a bug, or maybe LaTeX doesn't support underline 
> within the context of a caption.

LaTeX writer doesn't currently support underline spans
at all.  In fact, only the docx writer has special
support for them.  (In HTML, you get the span, which
you could style yourself.)

We could think about adding underline support more
widely, but I'm reluctant to do that unless we add Underline
as a proper INline constructor.  Some discussion here:
https://groups.google.com/d/topic/pandoc-discuss/o98bsCVZ-2w/discussion

> However I think some clarification is 
> needed and the AST passed to Lua should be revised as well since it's 
> pretty a mess:
>
> ![[caption]{.underline}](image.png)
> 1:
>   c:
>     1:
>       1:
>       2:
>         1: underline
>       3:
>     2:
>       1:
>         c: caption
>
> ![caption](image.png)
> 1:
>   c: caption
>
> ![*caption*](image.png)
> 1:
>   c:
>     1:
>       c: caption
>
> (*): these tables refer to el.c[2] only

If the question is why the lua table serialization is
not more "human readable," Albert Krewinkel might be able
to say more. But it's basically isomorphic to the JSON
representation.

Note: You shouldn't need to deal with this manually; use the
functions provided by pandoc to retrieve attributes
and such.

> Also, I couldn't find any good documentation on parameter's fields (at 
> least for images) and it took me a couple of hours to understand it. I 
> would add something like this to the docs:
>
> -- Creates an image identical to the one given
> function Image (el)
>     local id, classes, attrs = unpack(el.c[1])
>     local caption = el.c[2]
>     local src, title = unpack(el.c[3])

Here you can just do:

   local id = el.identifier
   local classes = el.classes
   local attrs = el.attributes
   local caption = el.caption

But I don't think these shortcuts have been properly
documented. I only know about them from looking at
examples. (@tarleb, can you add something to the docs
on this?)

I can imagine it was difficult to figure this out
without knowing about these shortcuts!


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Thoughts trying to write a Lua filter
       [not found]     ` <m2wosm5heu.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
@ 2018-08-20 19:47       ` Samuele Pilleri
       [not found]         ` <9da35dc5-3eec-446b-8464-889ec1eb26f4-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Samuele Pilleri @ 2018-08-20 19:47 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2529 bytes --]

Oh I see. Indeed, I couldn't find any documentation on that and only looked 
at a couple of examples, my bad. Can you please tell me where to find 
those? I could only find this one 
<https://github.com/pandoc/lua-filters/blob/master/short-captions/short-captions.lua> 
which however doesn't cover all cases (ie. is there one shortcut for image 
path as well?).

You don't say why this would be useful in your case -- could you elaborate?
>

I'm writing a filter to handle wrapfigure in LaTeX (and maybe other formats 
in the future, it would be cool if it could target HTML and DOCX/ODT as 
well). This is the syntax which I have in mind:

![Image caption](/path/to/image.png){.wrap width=3}

As I've previously pointed out the manual defines a specific behaviour for 
width and height attributes and states that when no unit is passed it's 
assumed to be pixels. However, in order to convert pixels to inches 
(without introducing a misleading behaviour not matched by the manual) for 
LaTeX output, the filter needs to know the DPI value, which can be 
overridden from the command line: knowing such value would allow me to 
write a function to handle different units passed (or not passed) as part 
of the value to the width attribute, complying with this section 
<https://pandoc.org/MANUAL.html#extension-link_attributes> of the manual. 
Still, given the complexity of the task I think it would be a better idea 
to let Pandoc handle this once for all, converting accordingly to the 
output format:

Dimensions are converted to inches for output in page-based formats like 
> LaTeX. Dimensions are converted to pixels for output in HTML-like formats.
>

Hope I made myself clear.

I'm reluctant to do that unless we add Underline as a proper INline 
> constructor
>

I agree with you.
More generally speaking, can we rely on classes and attributes as part of 
the sematics? If not, I think I have to redesign the syntax for this 
particular filter.
 

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/9da35dc5-3eec-446b-8464-889ec1eb26f4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 4829 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Thoughts trying to write a Lua filter
       [not found]         ` <9da35dc5-3eec-446b-8464-889ec1eb26f4-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2019-09-17 16:08           ` EBkysko
  0 siblings, 0 replies; 4+ messages in thread
From: EBkysko @ 2019-09-17 16:08 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1290 bytes --]


I know this is borderline necroposting, but I was just faced with this 
problem, dpi doesn't seem accessible in lua filters (not in 
`PANDOC_READER_OPTIONS` anyway, which are reader options). So in case 
anyone wants a hint, here goes...

Since I usually invoke pandoc from a script (a windows batch in my case), I 
just create (set) the dpi to a variable (say 'dpi'), and then pass it to 
both `--dpi` and `-M` on the pandoc command line.

So in a win batch that would be, for example:

```
...
set dpi=300
...
pandoc %1 %otheroptions% -M dpi=%dpi% --dpi=%dpi% -o %output%
```

Then in the lua file, the first filter in the return list would be a 
`Meta(meta)` filter that would retrieve the dpi through meta["dpi"] and put 
in a lua variable to be used.

Yes, it's redundant, but I don't see how else (apart from setting an 
environment variable, but it's the same principle).

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/222ed82f-906e-441c-942c-7fa4a39be736%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 1824 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-09-17 16:08 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-17 17:42 Thoughts trying to write a Lua filter Samuele Pilleri
     [not found] ` <6e09bd71-ee77-4de7-a002-953a62325234-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2018-08-19 18:40   ` John MacFarlane
     [not found]     ` <m2wosm5heu.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2018-08-20 19:47       ` Samuele Pilleri
     [not found]         ` <9da35dc5-3eec-446b-8464-889ec1eb26f4-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-09-17 16:08           ` EBkysko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).