public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Filter for automatic md > HTML block level element ID creation?
@ 2022-07-30 11:35 Martin Post
       [not found] ` <fe63819e-1816-4948-a675-a8fe85510e18n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Martin Post @ 2022-07-30 11:35 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1370 bytes --]

Dear Filter Maestros in this group,

I couldn’t code myself out of a paper bag, so I have to ask for support 
here.

I have a md > HTML use case where I need several types of block-level 
elements (paragraphs, list items, figcaptions etc.) to have 
(auto-generated) identifiers, for example derived from MD5 or CRC-32 
checksum. Actually, I wouldn’t mind every block element having an id.

(The idea is to make these IDs available as bookmarkable link targets, as 
seen in the headings of the Pandoc manual and may wikis.)

1.  Can this be done using a (Lua) filter (that would have to skip 
user-defined ids)?

2.  Does one exist that I missed?

3.  If not, would someone be interested in writing it? Because I’d be happy 
to support that (financially).

I understand that this would break HTML validation for a document with 
multiple identical block elements / IDs, but that’s something I can live 
with.

Thank you.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/fe63819e-1816-4948-a675-a8fe85510e18n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 1702 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Filter for automatic md > HTML block level element ID creation?
       [not found] ` <fe63819e-1816-4948-a675-a8fe85510e18n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2022-07-30 11:40   ` Jiří Wolker
  2022-07-30 11:41   ` Jiří Wolker
  2022-07-30 14:34   ` Jiří Wolker
  2 siblings, 0 replies; 7+ messages in thread
From: Jiří Wolker @ 2022-07-30 11:40 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

> I couldn’t code myself out of a paper bag, so I have to ask for support
> here.
> 
> I have a md > HTML use case where I need several types of block-level
> elements (paragraphs, list items, figcaptions etc.) to have
> (auto-generated) identifiers, for example derived from MD5 or CRC-32
> checksum. Actually, I wouldn’t mind every block element having an id.
 >
 > […]
 >
> I understand that this would break HTML validation for a document with
> multiple identical block elements / IDs, but that’s something I can live
> with.

What if you change the block text? That would make the stored bookmark 
invalid. If the document is versioned¹, it would not be problem.

I would prefer something like IDs in format “Heading_Foo-1” (for the 
first paragraph after heading “Heading Foo”).

I can code it for you. Can I write it in Haskell, or it must be Lua?


¹ I mean that the document (webpage) URL points to a version that will 
never change and if it changes, new URL is assigned.

Jiří.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/51f677af-a493-9ba8-ecaa-b5a6a67070a5%40gmail.com.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Filter for automatic md > HTML block level element ID creation?
       [not found] ` <fe63819e-1816-4948-a675-a8fe85510e18n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2022-07-30 11:40   ` Jiří Wolker
@ 2022-07-30 11:41   ` Jiří Wolker
       [not found]     ` <982ef1cb-79ef-8d82-f215-90b725e51613-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2022-07-30 14:34   ` Jiří Wolker
  2 siblings, 1 reply; 7+ messages in thread
From: Jiří Wolker @ 2022-07-30 11:41 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

I forgot to say this:

When the paragraphs are numbered (by consecutive numbers and not 
hashes), modifications that do not change paragraph number would not 
break links to the paragraphs.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Filter for automatic md > HTML block level element ID creation?
       [not found]     ` <982ef1cb-79ef-8d82-f215-90b725e51613-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2022-07-30 12:08       ` Martin Post
       [not found]         ` <e9b07f81-df21-423d-bd01-0121b9428ddan-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Martin Post @ 2022-07-30 12:08 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1258 bytes --]

Thanks for the swift reply, Jiří.

Regarding content changes > changing IDs > broken links: Not a problem 
(here), as changes would indicate that the target is new/different, so 
that’s a feature, not a bug. ;)

> Can I write it in Haskell, or it must be Lua?

Would that require installing Haskell on macOS? As far as I’m concerned, 
the less dependencies, the better…

In general, it would be great to have every block element carrying content 
(paragraph, list item, table row…) to have an ID.

Thanks.
On Saturday, July 30, 2022 at 1:42:00 PM UTC+2 wol...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org wrote:

> I forgot to say this:
>
> When the paragraphs are numbered (by consecutive numbers and not 
> hashes), modifications that do not change paragraph number would not 
> break links to the paragraphs.
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e9b07f81-df21-423d-bd01-0121b9428ddan%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 1867 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Filter for automatic md > HTML block level element ID creation?
       [not found]         ` <e9b07f81-df21-423d-bd01-0121b9428ddan-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2022-07-30 12:10           ` Jiří Wolker
  0 siblings, 0 replies; 7+ messages in thread
From: Jiří Wolker @ 2022-07-30 12:10 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

> Regarding content changes > changing IDs > broken links: Not a problem
> (here), as changes would indicate that the target is new/different, so
> that’s a feature, not a bug. ;)
> 
>> Can I write it in Haskell, or it must be Lua?
> 
> Would that require installing Haskell on macOS? As far as I’m concerned,
> the less dependencies, the better…

Yes, it would. So, I'll try to learn writing filters in Lua. Haskell's 
pattern matching is an useful feature when writing filters.

> In general, it would be great to have every block element carrying content
> (paragraph, list item, table row…) to have an ID.

Okay.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/f924d475-0b4a-d118-c336-c9be37ba144e%40gmail.com.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Filter for automatic md > HTML block level element ID creation?
       [not found] ` <fe63819e-1816-4948-a675-a8fe85510e18n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2022-07-30 11:40   ` Jiří Wolker
  2022-07-30 11:41   ` Jiří Wolker
@ 2022-07-30 14:34   ` Jiří Wolker
       [not found]     ` <15c136b0-7748-e24d-65a5-1072d34b7a04-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2 siblings, 1 reply; 7+ messages in thread
From: Jiří Wolker @ 2022-07-30 14:34 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

(I wrote this message accidentally as a reply not to the mailing list, 
so I send it once again to the list.)

Below, you can get the source code of a Lua filter that adds anchors to 
all paragraphs, list items (list items are also paragraphs in Pandoc) 
and headings. The anchors are numbered (#para-0, #para-1 etc.).

If you really need

To the end of every paragraph (or other block), a link to the paragraph 
is added.

Both links and anchors are given a class name. You can style them. I 
recommend this CSS:

.para-link {
   font-size: .7em;
   text-decoration: none;
   color: inherit;
   opacity: .7;
}
.para-link:hover, .para-link:focus {
   text-decoration: underline;
   opacity: 1;
}


Usage of the filter:

pandoc inputfile.md -o outputfile.html --lua-filter=blockids.lua

(Where the file blockids.lua is the filter file. Save it to the data dir 
subdirectory ‘filters/’ or the cwd.)

The code follows.


-- Adds IDs to every block of the document.

-- MIT License
-- 
-- Copyright (c) 2022 Jiří Wolker
-- 
-- Permission is hereby granted, free of charge, to any person obtaining 
a copy
-- of this software and associated documentation files (the "Software"), 
to deal
-- in the Software without restriction, including without limitation the 
rights
-- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-- copies of the Software, and to permit persons to whom the Software is
-- furnished to do so, subject to the following conditions:
-- 
-- The above copyright notice and this permission notice shall be 
included in
-- all copies or substantial portions of the Software.
-- 
-- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
EXPRESS OR
-- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT 
SHALL THE
-- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 
ARISING FROM,
-- OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
DEALINGS IN THE
-- SOFTWARE.

paragraph_number = 0

function Block(elem)
     local blocks_to_modify = {
         ["Para"]   = true,
         ["Div"]    = true,
         ["Header"] = true,
     }
     if blocks_to_modify[elem.t] then
         id = "para-"..paragraph_number
         paragraph_number = paragraph_number + 1

         anchor = pandoc.Link({ }, "#"..id)
         anchor.classes:insert("para-anchor")
         anchor.identifier = id

         link = pandoc.Link("link", "#"..id)
         link.classes:insert("para-link")

         -- At the start of the block:
         elem.content:insert(1, anchor)

         -- At the end of the block:
         elem.content:insert(pandoc.Space())
         elem.content:insert(link)

         -- Return the modified block.
         return elem
     end
end

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/15c136b0-7748-e24d-65a5-1072d34b7a04%40gmail.com.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Filter for automatic md > HTML block level element ID creation?
       [not found]     ` <15c136b0-7748-e24d-65a5-1072d34b7a04-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2022-08-01 20:30       ` Martin Post
  0 siblings, 0 replies; 7+ messages in thread
From: Martin Post @ 2022-08-01 20:30 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 3697 bytes --]

Hi Jiří,

thank you (again :) for this filter. I can confirm it works, with the only 
bug I found so far being stand-alone images converted to regular ID’d 
paragraphs instead of HTML figures.

On Saturday, July 30, 2022 at 4:34:56 PM UTC+2 wol...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org wrote:

> (I wrote this message accidentally as a reply not to the mailing list, 
> so I send it once again to the list.)
>
> Below, you can get the source code of a Lua filter that adds anchors to 
> all paragraphs, list items (list items are also paragraphs in Pandoc) 
> and headings. The anchors are numbered (#para-0, #para-1 etc.).
>
> If you really need
>
> To the end of every paragraph (or other block), a link to the paragraph 
> is added.
>
> Both links and anchors are given a class name. You can style them. I 
> recommend this CSS:
>
> .para-link {
> font-size: .7em;
> text-decoration: none;
> color: inherit;
> opacity: .7;
> }
> .para-link:hover, .para-link:focus {
> text-decoration: underline;
> opacity: 1;
> }
>
>
> Usage of the filter:
>
> pandoc inputfile.md -o outputfile.html --lua-filter=blockids.lua
>
> (Where the file blockids.lua is the filter file. Save it to the data dir 
> subdirectory ‘filters/’ or the cwd.)
>
> The code follows.
>
>
> -- Adds IDs to every block of the document.
>
> -- MIT License
> -- 
> -- Copyright (c) 2022 Jiří Wolker
> -- 
> -- Permission is hereby granted, free of charge, to any person obtaining 
> a copy
> -- of this software and associated documentation files (the "Software"), 
> to deal
> -- in the Software without restriction, including without limitation the 
> rights
> -- to use, copy, modify, merge, publish, distribute, sublicense, and/or 
> sell
> -- copies of the Software, and to permit persons to whom the Software is
> -- furnished to do so, subject to the following conditions:
> -- 
> -- The above copyright notice and this permission notice shall be 
> included in
> -- all copies or substantial portions of the Software.
> -- 
> -- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
> EXPRESS OR
> -- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> -- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT 
> SHALL THE
> -- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> -- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 
> ARISING FROM,
> -- OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
> DEALINGS IN THE
> -- SOFTWARE.
>
> paragraph_number = 0
>
> function Block(elem)
> local blocks_to_modify = {
> ["Para"] = true,
> ["Div"] = true,
> ["Header"] = true,
> }
> if blocks_to_modify[elem.t] then
> id = "para-"..paragraph_number
> paragraph_number = paragraph_number + 1
>
> anchor = pandoc.Link({ }, "#"..id)
> anchor.classes:insert("para-anchor")
> anchor.identifier = id
>
> link = pandoc.Link("link", "#"..id)
> link.classes:insert("para-link")
>
> -- At the start of the block:
> elem.content:insert(1, anchor)
>
> -- At the end of the block:
> elem.content:insert(pandoc.Space())
> elem.content:insert(link)
>
> -- Return the modified block.
> return elem
> end
> end
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/22f277e9-5f3f-4673-a898-cf8be9e6492fn%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 4731 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-08-01 20:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-30 11:35 Filter for automatic md > HTML block level element ID creation? Martin Post
     [not found] ` <fe63819e-1816-4948-a675-a8fe85510e18n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-07-30 11:40   ` Jiří Wolker
2022-07-30 11:41   ` Jiří Wolker
     [not found]     ` <982ef1cb-79ef-8d82-f215-90b725e51613-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-07-30 12:08       ` Martin Post
     [not found]         ` <e9b07f81-df21-423d-bd01-0121b9428ddan-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-07-30 12:10           ` Jiří Wolker
2022-07-30 14:34   ` Jiří Wolker
     [not found]     ` <15c136b0-7748-e24d-65a5-1072d34b7a04-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-08-01 20:30       ` Martin Post

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).