public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Finding the identifier of the “nearest” heading in a Lua filter
@ 2020-11-24 22:15 Daniel Grady
       [not found] ` <43ef8b5b-70ab-4a80-861c-459cf4cc3bfdn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Grady @ 2020-11-24 22:15 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2617 bytes --]

Hi, I’d like to write a Pandoc filter that creates a list of all of the URL 
targets in a document, along with the identifier of their parent section 
header. I think I understand how to do this with a Python filter, but I 
haven’t figured out how to do it in a Lua filter, and I think it might not 
be straightforward.

Here’s an example Markdown document:

```markdown
# Breakfast

# Lunch

[Spam](/spam) is an important source of vitamins.
```

I’d like my filter to be able to build a list or map that records the fact 
that the link “/spam” appeared in the section “lunch”. With a Python 
filter, we can use a global variable to keep track of the most recently 
seen header, like this:

```python
import json
import sys

from pandocfilters import toJSONFilters

links = []
most_recent_id = None

def collect(key, value, format, meta):
    global most_recent_id
    if key == "Link":
        links.append((value[2][0], most_recent_id))
        meta.get("links", []).append((value[2][0], most_recent_id))
    elif key == "Header":
        most_recent_id = value[1][0]

if __name__ == "__main__":
    toJSONFilters([collect])
    print(json.dumps(links), file=sys.stderr)
```

It seems like a comparable approach in Lua is not possible, since different 
types of elements are processed at different times. I tried translating the 
Python approach more or less directly (included below), but it seems like 
the global variable is never updated until after all Links have already 
been processed.

How would you go about solving this with a Lua filter? Thanks in advance 
for any suggestions!

— Daniel




My attempt at a Lua filter that does not work:

```lua
links = {}
most_recent_id = "BEGIN"

return {
    {
        Link = function (link)
            links[link.target] = most_recent_id
        end,
        Header = function (h)
            most_recent_id = h.identifier
        end,
        Meta = function (m)
            m.links = links
            return m
        end,
        Pandoc = function (doc)
            doc.blocks = pandoc.List({})
            return doc
        end
    }
}
```

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/43ef8b5b-70ab-4a80-861c-459cf4cc3bfdn%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 4048 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Finding the identifier of the “nearest” heading in a Lua filter
       [not found] ` <43ef8b5b-70ab-4a80-861c-459cf4cc3bfdn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-11-25 12:32   ` EBkysko
       [not found]     ` <b2a16a2d-5183-4764-9fa2-f1e421e2e449n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2020-11-30  6:27   ` Albert Krewinkel
  1 sibling, 1 reply; 8+ messages in thread
From: EBkysko @ 2020-11-25 12:32 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1622 bytes --]

Inlines (like `Link`) are normally done before blocks (like `Header`) if 
you don't force it otherwise.

If you are absolutely certain that *all* your Headers are in the root of 
the document, and not in other Blocks (e.g. Divs, for whatever reason), 
i.e. that they are all in the immediate list of blocks of the document, 
then you could do something like this :

```
local links = {}
local most_recent_id = ''

local function Pandoc(doc)
  for _, block in ipairs(doc.blocks) do
    if block.t == 'Header' then
      -- assuming auto_identifier used if no id given 
      most_recent_id = block.identifier
    else
      pandoc.walk_block(block, {
          Link = function(el)
            links[el.target] = most_recent_id
          end
      })
    end
  end
end
```

You then do what you want with table `links`. If you want to put in `Meta`, 
remember `Meta` is processed before `Pandoc`, unless you force it otherwise 
(or you could create `doc.meta.links` table and insert values directly in 
it above rather than in `links`, but then you have to return a 
`pandoc.Pandoc` constructor with the changed metas).

For a more complex document, the functions `make_sections` and `Blocks` 
might be required.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/b2a16a2d-5183-4764-9fa2-f1e421e2e449n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 2969 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Finding the identifier of the “nearest” heading in a Lua filter
       [not found]     ` <b2a16a2d-5183-4764-9fa2-f1e421e2e449n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-11-25 12:37       ` EBkysko
       [not found]         ` <37c89d04-0b7c-490c-af86-11d5076d34b0n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: EBkysko @ 2020-11-25 12:37 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 596 bytes --]

(Note also that if the `el.target` comes up later, then the header id is 
overwritten... so maybe create the `links[el.target]` themselves as tables, 
and insert successive header ids)

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/37c89d04-0b7c-490c-af86-11d5076d34b0n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 918 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Finding the identifier of the “nearest” heading in a Lua filter
       [not found]         ` <37c89d04-0b7c-490c-af86-11d5076d34b0n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-11-25 12:39           ` EBkysko
       [not found]             ` <9716f82a-bd5e-46f5-afe3-a5d697aa6f45n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: EBkysko @ 2020-11-25 12:39 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 39 bytes --]

(Correction: "comes up *again* later")

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Finding the identifier of the “nearest” heading in a Lua filter
       [not found]             ` <9716f82a-bd5e-46f5-afe3-a5d697aa6f45n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-11-30  5:16               ` Daniel Grady
  0 siblings, 0 replies; 8+ messages in thread
From: Daniel Grady @ 2020-11-30  5:16 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 551 bytes --]

Thanks for the suggestions!

On Wednesday, November 25, 2020 at 4:39:50 AM UTC-8 EBkysko wrote:

> (Correction: "comes up *again* later")
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/a4aa7e7e-e4da-42fe-8285-6ec0ebdef012n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 1063 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Finding the identifier of the “nearest” heading in a Lua filter
       [not found] ` <43ef8b5b-70ab-4a80-861c-459cf4cc3bfdn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2020-11-25 12:32   ` EBkysko
@ 2020-11-30  6:27   ` Albert Krewinkel
       [not found]     ` <87r1obgxzx.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
  1 sibling, 1 reply; 8+ messages in thread
From: Albert Krewinkel @ 2020-11-30  6:27 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Daniel Grady writes:

> Hi, I’d like to write a Pandoc filter that creates a list of all of the URL
> targets in a document, along with the identifier of their parent section
> header. I think I understand how to do this with a Python filter, but I
> haven’t figured out how to do it in a Lua filter, and I think it might not
> be straightforward.

Indeed, this is one area where panflute is much better than Lua filters.
Relative information between elements is difficult to process with Lua
filters.

Here is a short description of how I'd approach this with Lua filters:

- First run a `Pandoc` filter on the whole doc and save the doc for
  later use. Then run `make_sections` to force a hierarchical
  representation.
- Filter on Headers, checking for Links by running `walk_block` on the
  header element. The filter used with walk_block should collect the
  links. Each processed Header would have to be deleted as to prevent
  double counting of links in nested sections.
- Filter on `Pandoc` again, restoring the original document.

HTH,

--
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/87r1obgxzx.fsf%40zeitkraut.de.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Finding the identifier of the “nearest” heading in a Lua filter
       [not found]     ` <87r1obgxzx.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
@ 2020-11-30 15:21       ` EBkysko
       [not found]         ` <67a41a10-49e0-496b-963e-b4d939f4a327n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 8+ messages in thread
From: EBkysko @ 2020-11-30 15:21 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 716 bytes --]

Don't you mean filtering on the section `Div`s?
(I also should have said section `Div`s for more complex docs above, and 
not `Blocks`)

On Monday, November 30, 2020 at 1:27:58 AM UTC-5 Albert Krewinkel wrote:

> - Filter on Headers, checking for Links by running `walk_block` on the 
> header element.
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/67a41a10-49e0-496b-963e-b4d939f4a327n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 1243 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Finding the identifier of the “nearest” heading in a Lua filter
       [not found]         ` <67a41a10-49e0-496b-963e-b4d939f4a327n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-11-30 15:39           ` Albert Krewinkel
  0 siblings, 0 replies; 8+ messages in thread
From: Albert Krewinkel @ 2020-11-30 15:39 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

EBkysko writes:

> Don't you mean filtering on the section `Div`s?
> (I also should have said section `Div`s for more complex docs above, and 
> not `Blocks`)

Ah yes! Thanks for pointing that out.


-- 
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-11-30 15:39 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-24 22:15 Finding the identifier of the “nearest” heading in a Lua filter Daniel Grady
     [not found] ` <43ef8b5b-70ab-4a80-861c-459cf4cc3bfdn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-11-25 12:32   ` EBkysko
     [not found]     ` <b2a16a2d-5183-4764-9fa2-f1e421e2e449n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-11-25 12:37       ` EBkysko
     [not found]         ` <37c89d04-0b7c-490c-af86-11d5076d34b0n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-11-25 12:39           ` EBkysko
     [not found]             ` <9716f82a-bd5e-46f5-afe3-a5d697aa6f45n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-11-30  5:16               ` Daniel Grady
2020-11-30  6:27   ` Albert Krewinkel
     [not found]     ` <87r1obgxzx.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2020-11-30 15:21       ` EBkysko
     [not found]         ` <67a41a10-49e0-496b-963e-b4d939f4a327n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-11-30 15:39           ` Albert Krewinkel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).