public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Pointers on modifying Plain objects(?)
@ 2022-12-22  8:53 balaj...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
       [not found] ` <8af6876b-72cc-448e-9f5e-7d12ccdf2ad8n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: balaj...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org @ 2022-12-22  8:53 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1169 bytes --]

Hello,

After many fruitless attempts via Google Search, GitHub Search and indeed 
this mailing list search, I was wondering if someone could point me to some 
examples of how to modify "Plain" elements? Unfortunately I cannot even 
seem to figure out how one loops through this so it feels like I'm just 
stumbling in the dark. 

The specific scenario I'm looking at is a Markdown file such as this:

### Todo
- [ ] Foo
- [X] Quux Qux


The native representation of the Bullet List item with a "[X]" appears to 
be:

[ Plain
          [ Str "[X]" , Space , Str "Quux" , Space , Str "Qux" ]
      ]

I would like to apply a strikethrough to lines where the "[X}" character is 
present which leads me to the problem of trying to manipulate a "Plain" 
object.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/8af6876b-72cc-448e-9f5e-7d12ccdf2ad8n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 2252 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Pointers on modifying Plain objects(?)
       [not found] ` <8af6876b-72cc-448e-9f5e-7d12ccdf2ad8n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2022-12-22 12:05   ` Albert Krewinkel
       [not found]     ` <878riz8wf4.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Albert Krewinkel @ 2022-12-22 12:05 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

"balaj...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org" <balaji.dutt-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> The specific scenario I'm looking at is a Markdown file such as this:
>
> ### Todo
> - [ ] Foo
> - [X] Quux Qux

This is an interesting case because it is more complex than it seems.
The reason is pandoc's `task_list` extension that causes pandoc to
handle these checkboxes specially, converting them to [Str "☐", Space]
and [Str "☒", Space]. So we'll have to match on that in our filter.

A good approach would be to write a filter for Plain, like so:

``` lua
function Plain (plain)
  -- modify the object here
  return plain
end
```

Pandoc will then do all necessary document traversals automatically,
the function gets applied to all `Plain` elements in the document.

To check for the prefix, we'd do something like

``` lua
local done_marker = pandoc.List{pandoc.Str '☒', pandoc.Space()}
local prefix = pandoc.List{plain.content[1], plain.content[2]}
if prefix == done_marker then
  -- modify content
end
```

I hope that's enough to get you started. Happy hacking!


-- 
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/878riz8wf4.fsf%40zeitkraut.de.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Pointers on modifying Plain objects(?)
       [not found]     ` <878riz8wf4.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
@ 2022-12-24  8:37       ` balaj...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
       [not found]         ` <8f0e8d81-7f0b-49a7-b9b5-d78b19a0b1ban-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: balaj...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org @ 2022-12-24  8:37 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2578 bytes --]

Thanks for the pointers Albert! It did help me get started. Unfortunately 
when I started looping through the Plain object, I realized that the 
individual strings were represented as separate elements so there did not 
seem to be an easy way to apply a strikethrough formatting for the entire 
sentence. The best I would be able to do was apply the strikethrough 
word-by-word but with that approach, the final HTML did not look very 
pleasing.

In the end, I wound up writing a small Python script that would modify a 
file with the pandoc native format directly (outside of pandoc) and then 
feed the modified native format file back into pandoc. After a couple of 
false starts with the regex and then the native output becoming invalid, 
I've got it working fairly well for my purposes.

On Thursday, 22 December 2022 at 20:21:19 UTC+8 Albert Krewinkel wrote:

> "balaj...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org" <balaj...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
> > The specific scenario I'm looking at is a Markdown file such as this:
> >
> > ### Todo
> > - [ ] Foo
> > - [X] Quux Qux
>
> This is an interesting case because it is more complex than it seems.
> The reason is pandoc's `task_list` extension that causes pandoc to
> handle these checkboxes specially, converting them to [Str "☐", Space]
> and [Str "☒", Space]. So we'll have to match on that in our filter.
>
> A good approach would be to write a filter for Plain, like so:
>
> ``` lua
> function Plain (plain)
> -- modify the object here
> return plain
> end
> ```
>
> Pandoc will then do all necessary document traversals automatically,
> the function gets applied to all `Plain` elements in the document.
>
> To check for the prefix, we'd do something like
>
> ``` lua
> local done_marker = pandoc.List{pandoc.Str '☒', pandoc.Space()}
> local prefix = pandoc.List{plain.content[1], plain.content[2]}
> if prefix == done_marker then
> -- modify content
> end
> ```
>
> I hope that's enough to get you started. Happy hacking!
>
>
> -- 
> Albert Krewinkel
> GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/8f0e8d81-7f0b-49a7-b9b5-d78b19a0b1ban%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 3339 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Pointers on modifying Plain objects(?)
       [not found]         ` <8f0e8d81-7f0b-49a7-b9b5-d78b19a0b1ban-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2022-12-24 10:26           ` Albert Krewinkel
       [not found]             ` <C927BB76-A05B-48E2-8277-0DED656D13CA-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Albert Krewinkel @ 2022-12-24 10:26 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 3451 bytes --]

We could do this by passing the full content to the strikeout constructor. We'd remove, then re-add the checkbox later:

plain.content:remove(2) -- remove space
plain.content:remove(1) -- remove checkbox
plain.content = done_marker .. 
  pandoc.Strikeout(plain.content)


Am 24. Dezember 2022 09:37:58 MEZ schrieb "balaj...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org" <balaji.dutt@gmail.com>:
>Thanks for the pointers Albert! It did help me get started. Unfortunately 
>when I started looping through the Plain object, I realized that the 
>individual strings were represented as separate elements so there did not 
>seem to be an easy way to apply a strikethrough formatting for the entire 
>sentence. The best I would be able to do was apply the strikethrough 
>word-by-word but with that approach, the final HTML did not look very 
>pleasing.
>
>In the end, I wound up writing a small Python script that would modify a 
>file with the pandoc native format directly (outside of pandoc) and then 
>feed the modified native format file back into pandoc. After a couple of 
>false starts with the regex and then the native output becoming invalid, 
>I've got it working fairly well for my purposes.
>
>On Thursday, 22 December 2022 at 20:21:19 UTC+8 Albert Krewinkel wrote:
>
>> "balaj...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org" <balaj...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>>
>> > The specific scenario I'm looking at is a Markdown file such as this:
>> >
>> > ### Todo
>> > - [ ] Foo
>> > - [X] Quux Qux
>>
>> This is an interesting case because it is more complex than it seems.
>> The reason is pandoc's `task_list` extension that causes pandoc to
>> handle these checkboxes specially, converting them to [Str "☐", Space]
>> and [Str "☒", Space]. So we'll have to match on that in our filter.
>>
>> A good approach would be to write a filter for Plain, like so:
>>
>> ``` lua
>> function Plain (plain)
>> -- modify the object here
>> return plain
>> end
>> ```
>>
>> Pandoc will then do all necessary document traversals automatically,
>> the function gets applied to all `Plain` elements in the document.
>>
>> To check for the prefix, we'd do something like
>>
>> ``` lua
>> local done_marker = pandoc.List{pandoc.Str '☒', pandoc.Space()}
>> local prefix = pandoc.List{plain.content[1], plain.content[2]}
>> if prefix == done_marker then
>> -- modify content
>> end
>> ```
>>
>> I hope that's enough to get you started. Happy hacking!
>>
>>
>> -- 
>> Albert Krewinkel
>> GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124
>>
>
>-- 
>You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
>To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/8f0e8d81-7f0b-49a7-b9b5-d78b19a0b1ban%40googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/C927BB76-A05B-48E2-8277-0DED656D13CA%40zeitkraut.de.

[-- Attachment #2: Type: text/html, Size: 3945 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Pointers on modifying Plain objects(?)
       [not found]             ` <C927BB76-A05B-48E2-8277-0DED656D13CA-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
@ 2022-12-25  9:15               ` balaj...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
       [not found]                 ` <5257c49e-968d-40bf-a398-ae104a53c5c8n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: balaj...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org @ 2022-12-25  9:15 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 4554 bytes --]

Hi Albert,

Merry Christmas to you!

Your suggestion seemed really promising, but when I try to create a filter 
using the couple of bits of code you'd suggested, I get a compilation 
error. Here's the filter that I came up with:

function Plain (plain)

local done_marker = pandoc.List{pandoc.Str '[X]', pandoc.Space()}
local prefix = pandoc.List{plain.content[1], plain.content[2]}

      if prefix == done_marker then
        plain.content:remove(2) -- remove space
        plain.content:remove(1) -- remove checkbox
        pandoc.Strikeout(plain.content)
        plain.content = done_marker .. pandoc.Strikeout(plain.content)
      end

return plain
end

Unfortunately when I try to run pandoc with this filter I get the following 
error:

Error running filter 
C:\Users\Balaji\AppData\Roaming\pandoc\filters\strikethrough.lua:
...\Balaji\AppData\Roaming\pandoc\filters\strikethrough.lua:10: bad 
argument #2 to 'concat' (table expected, got Inline)

As you might be able to guess from the fact that I switched to Python when 
trying this earlier, I have really no expertise in Lua. I tried to use 
table.insert in place of concat, and now the filter does not throw an error 
but it does not seem to do anything either.

Any suggestions?

On Saturday, 24 December 2022 at 18:27:04 UTC+8 Albert Krewinkel wrote:

> We could do this by passing the full content to the strikeout constructor. 
> We'd remove, then re-add the checkbox later:
>
> plain.content:remove(2) -- remove space
> plain.content:remove(1) -- remove checkbox
> plain.content = done_marker .. 
>   pandoc.Strikeout(plain.content)
>
>
>
> Am 24. Dezember 2022 09:37:58 MEZ schrieb "balaj...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org" <
> balaj...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
>>
>> Thanks for the pointers Albert! It did help me get started. Unfortunately 
>> when I started looping through the Plain object, I realized that the 
>> individual strings were represented as separate elements so there did not 
>> seem to be an easy way to apply a strikethrough formatting for the entire 
>> sentence. The best I would be able to do was apply the strikethrough 
>> word-by-word but with that approach, the final HTML did not look very 
>> pleasing.
>>
>> In the end, I wound up writing a small Python script that would modify a 
>> file with the pandoc native format directly (outside of pandoc) and then 
>> feed the modified native format file back into pandoc. After a couple of 
>> false starts with the regex and then the native output becoming invalid, 
>> I've got it working fairly well for my purposes.
>>
>> On Thursday, 22 December 2022 at 20:21:19 UTC+8 Albert Krewinkel wrote:
>>
>>> "balaj...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org" <balaj...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: 
>>>
>>> > The specific scenario I'm looking at is a Markdown file such as this: 
>>> > 
>>> > ### Todo 
>>> > - [ ] Foo 
>>> > - [X] Quux Qux 
>>>
>>> This is an interesting case because it is more complex than it seems. 
>>> The reason is pandoc's `task_list` extension that causes pandoc to 
>>> handle these checkboxes specially, converting them to [Str "☐", Space] 
>>> and [Str "☒", Space]. So we'll have to match on that in our filter. 
>>>
>>> A good approach would be to write a filter for Plain, like so: 
>>>
>>> ``` lua 
>>> function Plain (plain) 
>>> -- modify the object here 
>>> return plain 
>>> end 
>>> ``` 
>>>
>>> Pandoc will then do all necessary document traversals automatically, 
>>> the function gets applied to all `Plain` elements in the document. 
>>>
>>> To check for the prefix, we'd do something like 
>>>
>>> ``` lua 
>>> local done_marker = pandoc.List{pandoc.Str '☒', pandoc.Space()} 
>>> local prefix = pandoc.List{plain.content[1], plain.content[2]} 
>>> if prefix == done_marker then 
>>> -- modify content 
>>> end 
>>> ``` 
>>>
>>> I hope that's enough to get you started. Happy hacking! 
>>>
>>>
>>> -- 
>>> Albert Krewinkel 
>>> GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124 
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5257c49e-968d-40bf-a398-ae104a53c5c8n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 6495 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Pointers on modifying Plain objects(?)
       [not found]                 ` <5257c49e-968d-40bf-a398-ae104a53c5c8n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2022-12-25  9:55                   ` Albert Krewinkel
       [not found]                     ` <877cyf7qb2.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Albert Krewinkel @ 2022-12-25  9:55 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw


"balaj...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org" <balaji.dutt-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Merry Christmas to you!

Merry Christmas!

> Your suggestion seemed really promising, but when I try to create a
> filter using the couple of bits of code you'd suggested, I get a
> compilation error. Here's the filter that I came up with:

My bad, I forgot to put the Strikeout element in a table, i.e., to
wrap it in curly braces.

``` lua
function Plain (plain)

  local done_marker = pandoc.List{pandoc.Str '\u{2612}', pandoc.Space()}
  local prefix = pandoc.List{plain.content[1], plain.content[2]}

  if prefix == done_marker then
    plain.content:remove(2) -- remove space
    plain.content:remove(1) -- remove checkbox
    pandoc.Strikeout(plain.content)
    plain.content = done_marker .. {pandoc.Strikeout(plain.content)}
  end

  return plain
end
```


-- 
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Pointers on modifying Plain objects(?)
       [not found]                     ` <877cyf7qb2.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
@ 2022-12-27  4:42                       ` balaj...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
  0 siblings, 0 replies; 7+ messages in thread
From: balaj...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org @ 2022-12-27  4:42 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1242 bytes --]

On Sunday, 25 December 2022 at 18:05:18 UTC+8 Albert Krewinkel wrote:

>
> My bad, I forgot to put the Strikeout element in a table, i.e., to 
> wrap it in curly braces. 
>
>
Thank you for the fix! You also helped me understand table syntax for Lua 
which helped with another filter I was using :-)

After all this effort, I discovered that the Outlook desktop client (which 
is the target for the HTML generated by pandoc) has deemed that the <del> 
tag will be treated as text to be ignored - probably because Outlook uses 
Word for HTML rendering and not displaying deleted text is how Word's Track 
Changes view works. I had to add an extra filter to use the <s> tag instead 
- amusingly enough, the answer for how to do that was another post by you 
:-), but this time on StackOverflow: https://stackoverflow.com/a/74700448
 

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/f401f7b3-5e4d-4175-b467-4160f7414d60n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 1822 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-12-27  4:42 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-22  8:53 Pointers on modifying Plain objects(?) balaj...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
     [not found] ` <8af6876b-72cc-448e-9f5e-7d12ccdf2ad8n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-12-22 12:05   ` Albert Krewinkel
     [not found]     ` <878riz8wf4.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2022-12-24  8:37       ` balaj...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
     [not found]         ` <8f0e8d81-7f0b-49a7-b9b5-d78b19a0b1ban-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-12-24 10:26           ` Albert Krewinkel
     [not found]             ` <C927BB76-A05B-48E2-8277-0DED656D13CA-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2022-12-25  9:15               ` balaj...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
     [not found]                 ` <5257c49e-968d-40bf-a398-ae104a53c5c8n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-12-25  9:55                   ` Albert Krewinkel
     [not found]                     ` <877cyf7qb2.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2022-12-27  4:42                       ` balaj...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).