* Finding the identifier of the “nearest” heading in a Lua filter
@ 2020-11-24 22:15 Daniel Grady
[not found] ` <43ef8b5b-70ab-4a80-861c-459cf4cc3bfdn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: Daniel Grady @ 2020-11-24 22:15 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 2617 bytes --]
Hi, I’d like to write a Pandoc filter that creates a list of all of the URL
targets in a document, along with the identifier of their parent section
header. I think I understand how to do this with a Python filter, but I
haven’t figured out how to do it in a Lua filter, and I think it might not
be straightforward.
Here’s an example Markdown document:
```markdown
# Breakfast
# Lunch
[Spam](/spam) is an important source of vitamins.
```
I’d like my filter to be able to build a list or map that records the fact
that the link “/spam” appeared in the section “lunch”. With a Python
filter, we can use a global variable to keep track of the most recently
seen header, like this:
```python
import json
import sys
from pandocfilters import toJSONFilters
links = []
most_recent_id = None
def collect(key, value, format, meta):
global most_recent_id
if key == "Link":
links.append((value[2][0], most_recent_id))
meta.get("links", []).append((value[2][0], most_recent_id))
elif key == "Header":
most_recent_id = value[1][0]
if __name__ == "__main__":
toJSONFilters([collect])
print(json.dumps(links), file=sys.stderr)
```
It seems like a comparable approach in Lua is not possible, since different
types of elements are processed at different times. I tried translating the
Python approach more or less directly (included below), but it seems like
the global variable is never updated until after all Links have already
been processed.
How would you go about solving this with a Lua filter? Thanks in advance
for any suggestions!
— Daniel
My attempt at a Lua filter that does not work:
```lua
links = {}
most_recent_id = "BEGIN"
return {
{
Link = function (link)
links[link.target] = most_recent_id
end,
Header = function (h)
most_recent_id = h.identifier
end,
Meta = function (m)
m.links = links
return m
end,
Pandoc = function (doc)
doc.blocks = pandoc.List({})
return doc
end
}
}
```
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/43ef8b5b-70ab-4a80-861c-459cf4cc3bfdn%40googlegroups.com.
[-- Attachment #1.2: Type: text/html, Size: 4048 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Finding the identifier of the “nearest” heading in a Lua filter
[not found] ` <43ef8b5b-70ab-4a80-861c-459cf4cc3bfdn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-11-25 12:32 ` EBkysko
[not found] ` <b2a16a2d-5183-4764-9fa2-f1e421e2e449n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-11-30 6:27 ` Albert Krewinkel
1 sibling, 1 reply; 8+ messages in thread
From: EBkysko @ 2020-11-25 12:32 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 1622 bytes --]
Inlines (like `Link`) are normally done before blocks (like `Header`) if
you don't force it otherwise.
If you are absolutely certain that *all* your Headers are in the root of
the document, and not in other Blocks (e.g. Divs, for whatever reason),
i.e. that they are all in the immediate list of blocks of the document,
then you could do something like this :
```
local links = {}
local most_recent_id = ''
local function Pandoc(doc)
for _, block in ipairs(doc.blocks) do
if block.t == 'Header' then
-- assuming auto_identifier used if no id given
most_recent_id = block.identifier
else
pandoc.walk_block(block, {
Link = function(el)
links[el.target] = most_recent_id
end
})
end
end
end
```
You then do what you want with table `links`. If you want to put in `Meta`,
remember `Meta` is processed before `Pandoc`, unless you force it otherwise
(or you could create `doc.meta.links` table and insert values directly in
it above rather than in `links`, but then you have to return a
`pandoc.Pandoc` constructor with the changed metas).
For a more complex document, the functions `make_sections` and `Blocks`
might be required.
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/b2a16a2d-5183-4764-9fa2-f1e421e2e449n%40googlegroups.com.
[-- Attachment #1.2: Type: text/html, Size: 2969 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Finding the identifier of the “nearest” heading in a Lua filter
[not found] ` <b2a16a2d-5183-4764-9fa2-f1e421e2e449n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-11-25 12:37 ` EBkysko
[not found] ` <37c89d04-0b7c-490c-af86-11d5076d34b0n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: EBkysko @ 2020-11-25 12:37 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 596 bytes --]
(Note also that if the `el.target` comes up later, then the header id is
overwritten... so maybe create the `links[el.target]` themselves as tables,
and insert successive header ids)
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/37c89d04-0b7c-490c-af86-11d5076d34b0n%40googlegroups.com.
[-- Attachment #1.2: Type: text/html, Size: 918 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Finding the identifier of the “nearest” heading in a Lua filter
[not found] ` <37c89d04-0b7c-490c-af86-11d5076d34b0n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-11-25 12:39 ` EBkysko
[not found] ` <9716f82a-bd5e-46f5-afe3-a5d697aa6f45n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: EBkysko @ 2020-11-25 12:39 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 39 bytes --]
(Correction: "comes up *again* later")
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Finding the identifier of the “nearest” heading in a Lua filter
[not found] ` <9716f82a-bd5e-46f5-afe3-a5d697aa6f45n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-11-30 5:16 ` Daniel Grady
0 siblings, 0 replies; 8+ messages in thread
From: Daniel Grady @ 2020-11-30 5:16 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 551 bytes --]
Thanks for the suggestions!
On Wednesday, November 25, 2020 at 4:39:50 AM UTC-8 EBkysko wrote:
> (Correction: "comes up *again* later")
>
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/a4aa7e7e-e4da-42fe-8285-6ec0ebdef012n%40googlegroups.com.
[-- Attachment #1.2: Type: text/html, Size: 1063 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Finding the identifier of the “nearest” heading in a Lua filter
[not found] ` <43ef8b5b-70ab-4a80-861c-459cf4cc3bfdn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-11-25 12:32 ` EBkysko
@ 2020-11-30 6:27 ` Albert Krewinkel
[not found] ` <87r1obgxzx.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
1 sibling, 1 reply; 8+ messages in thread
From: Albert Krewinkel @ 2020-11-30 6:27 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
Daniel Grady writes:
> Hi, I’d like to write a Pandoc filter that creates a list of all of the URL
> targets in a document, along with the identifier of their parent section
> header. I think I understand how to do this with a Python filter, but I
> haven’t figured out how to do it in a Lua filter, and I think it might not
> be straightforward.
Indeed, this is one area where panflute is much better than Lua filters.
Relative information between elements is difficult to process with Lua
filters.
Here is a short description of how I'd approach this with Lua filters:
- First run a `Pandoc` filter on the whole doc and save the doc for
later use. Then run `make_sections` to force a hierarchical
representation.
- Filter on Headers, checking for Links by running `walk_block` on the
header element. The filter used with walk_block should collect the
links. Each processed Header would have to be deleted as to prevent
double counting of links in nested sections.
- Filter on `Pandoc` again, restoring the original document.
HTH,
--
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/87r1obgxzx.fsf%40zeitkraut.de.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Finding the identifier of the “nearest” heading in a Lua filter
[not found] ` <87r1obgxzx.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
@ 2020-11-30 15:21 ` EBkysko
[not found] ` <67a41a10-49e0-496b-963e-b4d939f4a327n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: EBkysko @ 2020-11-30 15:21 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 716 bytes --]
Don't you mean filtering on the section `Div`s?
(I also should have said section `Div`s for more complex docs above, and
not `Blocks`)
On Monday, November 30, 2020 at 1:27:58 AM UTC-5 Albert Krewinkel wrote:
> - Filter on Headers, checking for Links by running `walk_block` on the
> header element.
>
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/67a41a10-49e0-496b-963e-b4d939f4a327n%40googlegroups.com.
[-- Attachment #1.2: Type: text/html, Size: 1243 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Finding the identifier of the “nearest” heading in a Lua filter
[not found] ` <67a41a10-49e0-496b-963e-b4d939f4a327n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-11-30 15:39 ` Albert Krewinkel
0 siblings, 0 replies; 8+ messages in thread
From: Albert Krewinkel @ 2020-11-30 15:39 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
EBkysko writes:
> Don't you mean filtering on the section `Div`s?
> (I also should have said section `Div`s for more complex docs above, and
> not `Blocks`)
Ah yes! Thanks for pointing that out.
--
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2020-11-30 15:39 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-24 22:15 Finding the identifier of the “nearest” heading in a Lua filter Daniel Grady
[not found] ` <43ef8b5b-70ab-4a80-861c-459cf4cc3bfdn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-11-25 12:32 ` EBkysko
[not found] ` <b2a16a2d-5183-4764-9fa2-f1e421e2e449n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-11-25 12:37 ` EBkysko
[not found] ` <37c89d04-0b7c-490c-af86-11d5076d34b0n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-11-25 12:39 ` EBkysko
[not found] ` <9716f82a-bd5e-46f5-afe3-a5d697aa6f45n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-11-30 5:16 ` Daniel Grady
2020-11-30 6:27 ` Albert Krewinkel
[not found] ` <87r1obgxzx.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2020-11-30 15:21 ` EBkysko
[not found] ` <67a41a10-49e0-496b-963e-b4d939f4a327n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-11-30 15:39 ` Albert Krewinkel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).