How to access Span elements with lua filter based on their content

public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed

* How to access Span elements with lua filter based on their content
@ 2020-11-05 17:01 krulis....-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
       [not found] ` <fa68cec8-4ff1-4bbe-95fa-65d36c28bda7n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: krulis....-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org @ 2020-11-05 17:01 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1.1: Type: text/plain, Size: 2070 bytes --]

I am using pandoc to convert `org-agenda` list of todos to `docx` and `pdf` 
for my coworkers. File exported from `emacs org-agenda` can look like that 
(simplified):

`tasks.org`
```
* TODO Feed the cat
```

Pandoc native output of this file parsing is:

```
[Header 1 ("feed-the-cat",[],[]) [Span ("",["todo","TODO"],[]) [Str 
"TODO"],Space,Str "Feed",Space,Str "the",Space,Str "cat"]]
```

Now if I convert this to any output format, I get spurious "TODO" pandoc 
strings (that are present from `org-mode`). How can I get rid of this 
"TODO" string (preferably also with surrounding spaces)?

My first attempt was to use lua filter. I can simply do:

`deleteSpans.lua`
```
function Span(el)
     return pandoc.Str('')
end
```

but this removes just all `Spans`, which could be bad (but doesnt mind in 
my current specific case).

I have tryed to do better like this:

`removeTODO.lua`
```
function Span(el)
  if el.text == 'TODO' then
      return pandoc.Str('')
    else
      return nil
  end
end
```

But this doesnt have any effect. When I look at `lua-filter` docs and on 
`Span` constructor, and try to reverse-engineer that, I am very confused. 
There are `Attr`, `attributes`, and such, and none of them worked.

So, how can I access, or match, `pandoc Span` elements based on their 
content? Where can I read more about this?

If it is possible to achive from `emacs` or `org-agenda` side, I will be 
very interested in that option too.

So far, I was little hesitant to go through hackage and `pandoc.types`; but 
if that is the place to go (in the future), than I give it my best shot.

Thank you very much with any help in this.
Regards, Tomas

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/fa68cec8-4ff1-4bbe-95fa-65d36c28bda7n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 2979 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: How to access Span elements with lua filter based on their content
       [not found] ` <fa68cec8-4ff1-4bbe-95fa-65d36c28bda7n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-11-05 21:30   ` Albert Krewinkel
       [not found]     ` <871rh7trtg.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Albert Krewinkel @ 2020-11-05 21:30 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

krulis....-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org writes:

> I am using pandoc to convert `org-agenda` list of todos to `docx` and `pdf`
> for my coworkers. File exported from `emacs org-agenda` can look like that
> (simplified):
>
> `tasks.org`
> ```
> * TODO Feed the cat
> ```
>
> Pandoc native output of this file parsing is:
>
> ```
> [Header 1 ("feed-the-cat",[],[]) [Span ("",["todo","TODO"],[]) [Str
> "TODO"],Space,Str "Feed",Space,Str "the",Space,Str "cat"]]
> ```
>
> Now if I convert this to any output format, I get spurious "TODO" pandoc
> strings (that are present from `org-mode`). How can I get rid of this
> "TODO" string (preferably also with surrounding spaces)?

Two options:

1. The org reader recognizes most org export options. So adding the
   following line to your input file should be enough:

       #+OPTIONS: todo:nil

   See: https://orgmode.org/manual/Export-Settings.html

2. With a Lua filter you'll want

      function Span (span)
        if span.classes:includes 'todo' then
          return {}   -- delete this element
        end
      end

> So, how can I access, or match, `pandoc Span` elements based on their
> content? Where can I read more about this?

_Just_ on their content is difficult for various reasons, but you can
compare AST elements using the normal `==` Lua operator. The comparison
of elements happens in Haskell, where elements don't have identity.

So `pandoc.Span {pandoc.Str 'hi'} == pandoc.Span {pandoc.Str 'hi'}`
would be true, but `{pandoc.Str 'hi'} == {pandoc.Str 'hi'}` would be
false, as lists are note treated as AST elements. We might change that
at some point.

HTH,

--
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: How to access Span elements with lua filter based on their content
       [not found]     ` <871rh7trtg.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
@ 2020-11-06 15:50       ` krulis....-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
       [not found]         ` <84ffd932-2be5-4900-b115-58220e691dcbn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: krulis....-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org @ 2020-11-06 15:50 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 3127 bytes --]

Hello Mr. Krewinkel,
thank you for your help. The filter works great! The export option has made 
no difference for me, but I might be using it wrong (as it is always with 
me, I am learning working with Emacs, so I am probably doing something the 
way I shouldnt :D).
The second part is difficult for me. Could you elaborate a little more 
about how did you identified that those `'todo'` or `'TODO'` in the 
TODO-Span as classes, and not attributes? This might be silly question, I 
guess this is somehow inspired by HTML, but it would be really helpfull for 
me to know how this element is represented in pandoc AST.
And the element [String "TODO"] is a one-element list in pandoc-AST, 
therefore it cannot be chacked as-is? Did I got the last part correctly?
Regards, Tomas

Dne čtvrtek 5. listopadu 2020 v 22:31:07 UTC+1 uživatel Albert Krewinkel 
napsal:

> krulis....-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org writes:
>
> > I am using pandoc to convert `org-agenda` list of todos to `docx` and 
> `pdf`
> > for my coworkers. File exported from `emacs org-agenda` can look like 
> that
> > (simplified):
> >
> > `tasks.org`
> > ```
> > * TODO Feed the cat
> > ```
> >
> > Pandoc native output of this file parsing is:
> >
> > ```
> > [Header 1 ("feed-the-cat",[],[]) [Span ("",["todo","TODO"],[]) [Str
> > "TODO"],Space,Str "Feed",Space,Str "the",Space,Str "cat"]]
> > ```
> >
> > Now if I convert this to any output format, I get spurious "TODO" pandoc
> > strings (that are present from `org-mode`). How can I get rid of this
> > "TODO" string (preferably also with surrounding spaces)?
>
> Two options:
>
> 1. The org reader recognizes most org export options. So adding the
> following line to your input file should be enough:
>
> #+OPTIONS: todo:nil
>
> See: https://orgmode.org/manual/Export-Settings.html
>
> 2. With a Lua filter you'll want
>
> function Span (span)
> if span.classes:includes 'todo' then
> return {} -- delete this element
> end
> end
>
> > So, how can I access, or match, `pandoc Span` elements based on their
> > content? Where can I read more about this?
>
> _Just_ on their content is difficult for various reasons, but you can
> compare AST elements using the normal `==` Lua operator. The comparison
> of elements happens in Haskell, where elements don't have identity.
>
> So `pandoc.Span {pandoc.Str 'hi'} == pandoc.Span {pandoc.Str 'hi'}`
> would be true, but `{pandoc.Str 'hi'} == {pandoc.Str 'hi'}` would be
> false, as lists are note treated as AST elements. We might change that
> at some point.
>
> HTH,
>
> --
> Albert Krewinkel
> GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/84ffd932-2be5-4900-b115-58220e691dcbn%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 4571 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: How to access Span elements with lua filter based on their content
       [not found]         ` <84ffd932-2be5-4900-b115-58220e691dcbn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2020-11-06 19:31           ` Albert Krewinkel
       [not found]             ` <87tuu2s2mw.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Albert Krewinkel @ 2020-11-06 19:31 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Hey Tomas,

krulis....-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org writes:

> thank you for your help. The filter works great! The export option has made
> no difference for me, but I might be using it wrong (as it is always with
> me, I am learning working with Emacs, so I am probably doing something the
> way I shouldnt :D).

Feel free to send me an example file and I can take a quick look.

> The second part is difficult for me. Could you elaborate a little more
> about how did you identified that those `'todo'` or `'TODO'` in the
> TODO-Span as classes, and not attributes?

That's a good question. A Span consists of two parts, the Attr and the
inline contents. Attr values are triples consisting of the element's id,
classes, and key-value attributes, in that order.
https://pandoc.org/lua-filters.html#type-attr

If we look at

>> > [Header 1 ("feed-the-cat",[],[]) [Span ("",["todo","TODO"],[]) [Str
>> > "TODO"],Space,Str "Feed",Space,Str "the",Space,Str "cat"]]

we see that `["todo", "TODO"]` is the second element in the tuples, so
these are classes. We can now check the Lua filter docs for the Span
type to see how we can access the info:
https://pandoc.org/lua-filters.html#type-span
We find that Span elements have a `classes` field; the rest should be
discoverable by clicking and scrolling through the docs.

The most difficult part is to know that the triples in the native output
are Attr values. I'm actually not sure if this is documented anywhere
but the Haskell source. Any confusion about this is very understandable.

> And the element [String "TODO"] is a one-element list in pandoc-AST,
> therefore it cannot be chacked as-is? Did I got the last part correctly?

Yes. Haskell lists are translated into plain Lua tables; the latter
follow the usual Lua comparison rules. Pandoc AST elements are special,
in that they have `__eq` metamethods which use Haskell's comparision
mechanism under the hood. Hence the difference in behavior.

--
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: How to access Span elements with lua filter based on their content
       [not found]             ` <87tuu2s2mw.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
@ 2020-11-08 18:20               ` krulis....-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
  0 siblings, 0 replies; 5+ messages in thread
From: krulis....-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org @ 2020-11-08 18:20 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 3272 bytes --]

Thank you for offering help Mr. Krewinkel, but I should really first learn 
with Org-mode features and functions. I am yet lookig at the quick-start 
guide and learning about what I actually **can** do with org-mode. I should 
think about hacking and modifying Emacs afterwards.
About spans, I believe that their contents are not documented in such 
concise way, as you have written now (at least not in user guide or in any 
filter documentation). Your explanation is perfectly understandable and 
readable to me. Would it make sense to add this to the lua filter 
documentation?

Dne pátek 6. listopadu 2020 v 20:32:05 UTC+1 uživatel Albert Krewinkel 
napsal:

> Hey Tomas,
>
> krulis....-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org writes:
>
> > thank you for your help. The filter works great! The export option has 
> made
> > no difference for me, but I might be using it wrong (as it is always with
> > me, I am learning working with Emacs, so I am probably doing something 
> the
> > way I shouldnt :D).
>
> Feel free to send me an example file and I can take a quick look.
>
> > The second part is difficult for me. Could you elaborate a little more
> > about how did you identified that those `'todo'` or `'TODO'` in the
> > TODO-Span as classes, and not attributes?
>
> That's a good question. A Span consists of two parts, the Attr and the
> inline contents. Attr values are triples consisting of the element's id,
> classes, and key-value attributes, in that order.
> https://pandoc.org/lua-filters.html#type-attr
>
> If we look at
>
> >> > [Header 1 ("feed-the-cat",[],[]) [Span ("",["todo","TODO"],[]) [Str
> >> > "TODO"],Space,Str "Feed",Space,Str "the",Space,Str "cat"]]
>
> we see that `["todo", "TODO"]` is the second element in the tuples, so
> these are classes. We can now check the Lua filter docs for the Span
> type to see how we can access the info:
> https://pandoc.org/lua-filters.html#type-span
> We find that Span elements have a `classes` field; the rest should be
> discoverable by clicking and scrolling through the docs.
>
> The most difficult part is to know that the triples in the native output
> are Attr values. I'm actually not sure if this is documented anywhere
> but the Haskell source. Any confusion about this is very understandable.
>
> > And the element [String "TODO"] is a one-element list in pandoc-AST,
> > therefore it cannot be chacked as-is? Did I got the last part correctly?
>
> Yes. Haskell lists are translated into plain Lua tables; the latter
> follow the usual Lua comparison rules. Pandoc AST elements are special,
> in that they have `__eq` metamethods which use Haskell's comparision
> mechanism under the hood. Hence the difference in behavior.
>
> --
> Albert Krewinkel
> GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/3feef49d-8473-4adf-abd7-de523ecb6661n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 4677 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-11-08 18:20 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-05 17:01 How to access Span elements with lua filter based on their content krulis....-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
     [not found] ` <fa68cec8-4ff1-4bbe-95fa-65d36c28bda7n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-11-05 21:30   ` Albert Krewinkel
     [not found]     ` <871rh7trtg.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2020-11-06 15:50       ` krulis....-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
     [not found]         ` <84ffd932-2be5-4900-b115-58220e691dcbn-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2020-11-06 19:31           ` Albert Krewinkel
     [not found]             ` <87tuu2s2mw.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2020-11-08 18:20               ` krulis....-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).