HTML – Building a linked index page

public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed

* HTML – Building a linked index page
@ 2022-08-07  9:20 Martin Post
       [not found] ` <45b1415d-1ab3-44c2-8199-d45095873e62n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Martin Post @ 2022-08-07  9:20 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1.1: Type: text/plain, Size: 1580 bytes --]

Hello,

in a Markdown > HTML workflow, I am looking for a way to create “index” 
pages for a page set that will contain linked lists of block elements (e.g. 
all headings or figures) in that set.

So I’ll have 1.md, 2.md etc. converted to HTML (1.htm), and I’d like to 
generate e.g. a “h1_index.htm” with links to ID’d H1 headings:

<li>1<ul>
<li><a href="1.htm#first-heading">First Heading</a></li>
<li><a href="1.htm#second-heading">Second Heading</a></li>
</ul></li>
<li>2<ul>
<li><a href="2.htm#first-heading">First Heading</a></li>
<li><a href="2.htm#second-heading">Second Heading</a></li>
</ul></li>

Same for other block elements such as <figcaption> to generate a linked 
list of illustrations.

Can someone suggest how to approach this using only Pandoc (filters) 
instead of using a separate tool?

I believe the Python library Beautiful Soup 
(https://www.crummy.com/software/BeautifulSoup/) is often used for this 
kind of processing, but it would be nice to do it all with Pandoc.

Maybe a second (HTML > HTML) pass für concatenation and a filter that only 
leaves the respective elements?

Thank you.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/45b1415d-1ab3-44c2-8199-d45095873e62n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 2078 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

[parent not found: <45b1415d-1ab3-44c2-8199-d45095873e62n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]

* Re: HTML – Building a linked index page
       [not found] ` <45b1415d-1ab3-44c2-8199-d45095873e62n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2022-08-07 13:56   ` Jiří Wolker
       [not found]     ` <9ee132fd-6023-0bf6-02f5-5f72e56392fb-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2022-08-07 14:02   ` Albert Krewinkel
  1 sibling, 1 reply; 5+ messages in thread
From: Jiří Wolker @ 2022-08-07 13:56 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Hello,

> in a Markdown > HTML workflow, I am looking for a way to create “index”
> pages for a page set that will contain linked lists of block elements (e.g.
> all headings or figures) in that set.
> 
> So I’ll have 1.md, 2.md etc. converted to HTML (1.htm), and I’d like to
> generate e.g. a “h1_index.htm” with links to ID’d H1 headings:
> 
> <li>1<ul>
> <li><a href="1.htm#first-heading">First Heading</a></li>
> <li><a href="1.htm#second-heading">Second Heading</a></li>
> </ul></li>
> <li>2<ul>
> <li><a href="2.htm#first-heading">First Heading</a></li>
> <li><a href="2.htm#second-heading">Second Heading</a></li>
> </ul></li>
> 
> Same for other block elements such as <figcaption> to generate a linked
> list of illustrations.
> 
> Can someone suggest how to approach this using only Pandoc (filters)
> instead of using a separate tool?

I would avoid constraining myself to using only Pandoc for everything. 
Even Pandoc uses external software package LaTeX for generating PDFs.

I would create a script (or rather a set of scripts) that processes the 
files, extracts heading and figure information and creates input file(s) 
for Pandoc with the indices.

You can also use custom Lua writer to produce files with indices. 
Another option is creating a filter, but this seems to me like slightly 
weird way for doing that.

Write me (or to the list) which solution would you like.

Jiří.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/9ee132fd-6023-0bf6-02f5-5f72e56392fb%40gmail.com.


^ permalink raw reply	[flat|nested] 5+ messages in thread

[parent not found: <9ee132fd-6023-0bf6-02f5-5f72e56392fb-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]

* Re: HTML – Building a linked index page
       [not found]     ` <9ee132fd-6023-0bf6-02f5-5f72e56392fb-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2022-08-07 12:12       ` William Lupton
  0 siblings, 0 replies; 5+ messages in thread
From: William Lupton @ 2022-08-07 12:12 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 3170 bytes --]

Is it possible, when running a command like 'pandoc 1.md 2.md', for a lua
filter to know from which input file a given AST element came (note: an
element might come from multiple input docs)? I rather thought that this
wasn't possible (but have half a feeling that I've seen something about
this recently).

A trick like this might work for you (with a lua filter that looks for divs
with class 'new-file')?

% for file in files/*.md; do echo "::: {.new-file file=$file} :::"; cat
$file; echo ":::"; echo; done | pandoc
<div class="new-file" data-file="files/1.md">
<p>This is 1.md.</p>
</div>
<div class="new-file" data-file="files/2.md">
<p>And this is 2.md</p>
</div>

On Sun, 7 Aug 2022 at 12:58, Jiří Wolker <woljiri-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> Hello,
>
> > in a Markdown > HTML workflow, I am looking for a way to create “index”
> > pages for a page set that will contain linked lists of block elements
> (e.g.
> > all headings or figures) in that set.
> >
> > So I’ll have 1.md, 2.md etc. converted to HTML (1.htm), and I’d like to
> > generate e.g. a “h1_index.htm” with links to ID’d H1 headings:
> >
> > <li>1<ul>
> > <li><a href="1.htm#first-heading">First Heading</a></li>
> > <li><a href="1.htm#second-heading">Second Heading</a></li>
> > </ul></li>
> > <li>2<ul>
> > <li><a href="2.htm#first-heading">First Heading</a></li>
> > <li><a href="2.htm#second-heading">Second Heading</a></li>
> > </ul></li>
> >
> > Same for other block elements such as <figcaption> to generate a linked
> > list of illustrations.
> >
> > Can someone suggest how to approach this using only Pandoc (filters)
> > instead of using a separate tool?
>
> I would avoid constraining myself to using only Pandoc for everything.
> Even Pandoc uses external software package LaTeX for generating PDFs.
>
> I would create a script (or rather a set of scripts) that processes the
> files, extracts heading and figure information and creates input file(s)
> for Pandoc with the indices.
>
> You can also use custom Lua writer to produce files with indices.
> Another option is creating a filter, but this seems to me like slightly
> weird way for doing that.
>
> Write me (or to the list) which solution would you like.
>
> Jiří.
>
> --
> You received this message because you are subscribed to the Google Groups
> "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pandoc-discuss/9ee132fd-6023-0bf6-02f5-5f72e56392fb%40gmail.com
> .
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAEe_xxjZeP8dhJnVnQuXCvYvnZ84RKbjpKjvQcEVdXbPhTyCCA%40mail.gmail.com.

[-- Attachment #2: Type: text/html, Size: 4611 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: HTML – Building a linked index page
       [not found] ` <45b1415d-1ab3-44c2-8199-d45095873e62n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2022-08-07 13:56   ` Jiří Wolker
@ 2022-08-07 14:02   ` Albert Krewinkel
       [not found]     ` <87edxs401h.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
  1 sibling, 1 reply; 5+ messages in thread
From: Albert Krewinkel @ 2022-08-07 14:02 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

> Can someone suggest how to approach this using only Pandoc (filters)
> instead of using a separate tool?

The information of which header originated in which file is lost during
the conversion, so we can't use a filter. However, we can use a custom
reader, as those have access to both the filenames and the file's
contents. We just need to do the parsing ourselves with `pandoc.read`:

``` lua
function Reader (sources, opts)
  local items = pandoc.List{}
  for i, source in ipairs(sources) do
    local headers = pandoc.read(source, 'markdown', opts).blocks:walk{
      Block = function (blk)
        return blk.t == 'Header'
          and blk  -- keep Header elements
          or {}    -- discard everything else
      end
    }
    local current_filename = source.name
    -- TODO: convert headers to list items, add links, append to `items`
    -- ...
  end
  return pandoc.Pandoc{pandoc.BulletList(items)}
end
```

This needs some modification, of course, but I hope the general idea
becomes clear from this.

HTH

-- 
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124


^ permalink raw reply	[flat|nested] 5+ messages in thread

[parent not found: <87edxs401h.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>]

* Re: HTML – Building a linked index page
       [not found]     ` <87edxs401h.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
@ 2022-08-07 15:37       ` Martin Post
  0 siblings, 0 replies; 5+ messages in thread
From: Martin Post @ 2022-08-07 15:37 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1994 bytes --]


Thank you, Albert. That approach looks promising. As I'm not a developer, I 
cannot fill the blanks here, but I hope someone else can. It would be great 
if such a reader could be used to create index files for whatever one might 
look for in a large document set: “meta” docs (aka sitemaps), table / image 
/ definition lists…


On Sunday, August 7, 2022 at 4:12:07 PM UTC+2 Albert Krewinkel wrote:

> > Can someone suggest how to approach this using only Pandoc (filters) 
> > instead of using a separate tool? 
>
> The information of which header originated in which file is lost during 
> the conversion, so we can't use a filter. However, we can use a custom 
> reader, as those have access to both the filenames and the file's 
> contents. We just need to do the parsing ourselves with `pandoc.read`: 
>
> ``` lua 
> function Reader (sources, opts) 
> local items = pandoc.List{} 
> for i, source in ipairs(sources) do 
> local headers = pandoc.read(source, 'markdown', opts).blocks:walk{ 
> Block = function (blk) 
> return blk.t == 'Header' 
> and blk -- keep Header elements 
> or {} -- discard everything else 
> end 
> } 
> local current_filename = source.name 
> -- TODO: convert headers to list items, add links, append to `items` 
> -- ... 
> end 
> return pandoc.Pandoc{pandoc.BulletList(items)} 
> end 
> ``` 
>
> This needs some modification, of course, but I hope the general idea 
> becomes clear from this. 
>
> HTH 
>
> -- 
> Albert Krewinkel 
> GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124 
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/a68fad6e-6a63-49be-a19a-575919b91a38n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 2847 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-08-07 15:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-07  9:20 HTML – Building a linked index page Martin Post
     [not found] ` <45b1415d-1ab3-44c2-8199-d45095873e62n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2022-08-07 13:56   ` Jiří Wolker
     [not found]     ` <9ee132fd-6023-0bf6-02f5-5f72e56392fb-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2022-08-07 12:12       ` William Lupton
2022-08-07 14:02   ` Albert Krewinkel
     [not found]     ` <87edxs401h.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2022-08-07 15:37       ` Martin Post

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).