* HTML – Building a linked index page @ 2022-08-07 9:20 Martin Post [not found] ` <45b1415d-1ab3-44c2-8199-d45095873e62n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 0 siblings, 1 reply; 5+ messages in thread From: Martin Post @ 2022-08-07 9:20 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 1580 bytes --] Hello, in a Markdown > HTML workflow, I am looking for a way to create “index” pages for a page set that will contain linked lists of block elements (e.g. all headings or figures) in that set. So I’ll have 1.md, 2.md etc. converted to HTML (1.htm), and I’d like to generate e.g. a “h1_index.htm” with links to ID’d H1 headings: <li>1<ul> <li><a href="1.htm#first-heading">First Heading</a></li> <li><a href="1.htm#second-heading">Second Heading</a></li> </ul></li> <li>2<ul> <li><a href="2.htm#first-heading">First Heading</a></li> <li><a href="2.htm#second-heading">Second Heading</a></li> </ul></li> Same for other block elements such as <figcaption> to generate a linked list of illustrations. Can someone suggest how to approach this using only Pandoc (filters) instead of using a separate tool? I believe the Python library Beautiful Soup (https://www.crummy.com/software/BeautifulSoup/) is often used for this kind of processing, but it would be nice to do it all with Pandoc. Maybe a second (HTML > HTML) pass für concatenation and a filter that only leaves the respective elements? Thank you. -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/45b1415d-1ab3-44c2-8199-d45095873e62n%40googlegroups.com. [-- Attachment #1.2: Type: text/html, Size: 2078 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <45b1415d-1ab3-44c2-8199-d45095873e62n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* Re: HTML – Building a linked index page [not found] ` <45b1415d-1ab3-44c2-8199-d45095873e62n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2022-08-07 13:56 ` Jiří Wolker [not found] ` <9ee132fd-6023-0bf6-02f5-5f72e56392fb-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2022-08-07 14:02 ` Albert Krewinkel 1 sibling, 1 reply; 5+ messages in thread From: Jiří Wolker @ 2022-08-07 13:56 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw Hello, > in a Markdown > HTML workflow, I am looking for a way to create “index” > pages for a page set that will contain linked lists of block elements (e.g. > all headings or figures) in that set. > > So I’ll have 1.md, 2.md etc. converted to HTML (1.htm), and I’d like to > generate e.g. a “h1_index.htm” with links to ID’d H1 headings: > > <li>1<ul> > <li><a href="1.htm#first-heading">First Heading</a></li> > <li><a href="1.htm#second-heading">Second Heading</a></li> > </ul></li> > <li>2<ul> > <li><a href="2.htm#first-heading">First Heading</a></li> > <li><a href="2.htm#second-heading">Second Heading</a></li> > </ul></li> > > Same for other block elements such as <figcaption> to generate a linked > list of illustrations. > > Can someone suggest how to approach this using only Pandoc (filters) > instead of using a separate tool? I would avoid constraining myself to using only Pandoc for everything. Even Pandoc uses external software package LaTeX for generating PDFs. I would create a script (or rather a set of scripts) that processes the files, extracts heading and figure information and creates input file(s) for Pandoc with the indices. You can also use custom Lua writer to produce files with indices. Another option is creating a filter, but this seems to me like slightly weird way for doing that. Write me (or to the list) which solution would you like. Jiří. -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/9ee132fd-6023-0bf6-02f5-5f72e56392fb%40gmail.com. ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <9ee132fd-6023-0bf6-02f5-5f72e56392fb-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: HTML – Building a linked index page [not found] ` <9ee132fd-6023-0bf6-02f5-5f72e56392fb-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2022-08-07 12:12 ` William Lupton 0 siblings, 0 replies; 5+ messages in thread From: William Lupton @ 2022-08-07 12:12 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw [-- Attachment #1: Type: text/plain, Size: 3170 bytes --] Is it possible, when running a command like 'pandoc 1.md 2.md', for a lua filter to know from which input file a given AST element came (note: an element might come from multiple input docs)? I rather thought that this wasn't possible (but have half a feeling that I've seen something about this recently). A trick like this might work for you (with a lua filter that looks for divs with class 'new-file')? % for file in files/*.md; do echo "::: {.new-file file=$file} :::"; cat $file; echo ":::"; echo; done | pandoc <div class="new-file" data-file="files/1.md"> <p>This is 1.md.</p> </div> <div class="new-file" data-file="files/2.md"> <p>And this is 2.md</p> </div> On Sun, 7 Aug 2022 at 12:58, Jiří Wolker <woljiri-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > Hello, > > > in a Markdown > HTML workflow, I am looking for a way to create “index” > > pages for a page set that will contain linked lists of block elements > (e.g. > > all headings or figures) in that set. > > > > So I’ll have 1.md, 2.md etc. converted to HTML (1.htm), and I’d like to > > generate e.g. a “h1_index.htm” with links to ID’d H1 headings: > > > > <li>1<ul> > > <li><a href="1.htm#first-heading">First Heading</a></li> > > <li><a href="1.htm#second-heading">Second Heading</a></li> > > </ul></li> > > <li>2<ul> > > <li><a href="2.htm#first-heading">First Heading</a></li> > > <li><a href="2.htm#second-heading">Second Heading</a></li> > > </ul></li> > > > > Same for other block elements such as <figcaption> to generate a linked > > list of illustrations. > > > > Can someone suggest how to approach this using only Pandoc (filters) > > instead of using a separate tool? > > I would avoid constraining myself to using only Pandoc for everything. > Even Pandoc uses external software package LaTeX for generating PDFs. > > I would create a script (or rather a set of scripts) that processes the > files, extracts heading and figure information and creates input file(s) > for Pandoc with the indices. > > You can also use custom Lua writer to produce files with indices. > Another option is creating a filter, but this seems to me like slightly > weird way for doing that. > > Write me (or to the list) which solution would you like. > > Jiří. > > -- > You received this message because you are subscribed to the Google Groups > "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit > https://groups.google.com/d/msgid/pandoc-discuss/9ee132fd-6023-0bf6-02f5-5f72e56392fb%40gmail.com > . > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/CAEe_xxjZeP8dhJnVnQuXCvYvnZ84RKbjpKjvQcEVdXbPhTyCCA%40mail.gmail.com. [-- Attachment #2: Type: text/html, Size: 4611 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: HTML – Building a linked index page [not found] ` <45b1415d-1ab3-44c2-8199-d45095873e62n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2022-08-07 13:56 ` Jiří Wolker @ 2022-08-07 14:02 ` Albert Krewinkel [not found] ` <87edxs401h.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> 1 sibling, 1 reply; 5+ messages in thread From: Albert Krewinkel @ 2022-08-07 14:02 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw > Can someone suggest how to approach this using only Pandoc (filters) > instead of using a separate tool? The information of which header originated in which file is lost during the conversion, so we can't use a filter. However, we can use a custom reader, as those have access to both the filenames and the file's contents. We just need to do the parsing ourselves with `pandoc.read`: ``` lua function Reader (sources, opts) local items = pandoc.List{} for i, source in ipairs(sources) do local headers = pandoc.read(source, 'markdown', opts).blocks:walk{ Block = function (blk) return blk.t == 'Header' and blk -- keep Header elements or {} -- discard everything else end } local current_filename = source.name -- TODO: convert headers to list items, add links, append to `items` -- ... end return pandoc.Pandoc{pandoc.BulletList(items)} end ``` This needs some modification, of course, but I hope the general idea becomes clear from this. HTH -- Albert Krewinkel GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124 ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <87edxs401h.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>]
* Re: HTML – Building a linked index page [not found] ` <87edxs401h.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> @ 2022-08-07 15:37 ` Martin Post 0 siblings, 0 replies; 5+ messages in thread From: Martin Post @ 2022-08-07 15:37 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 1994 bytes --] Thank you, Albert. That approach looks promising. As I'm not a developer, I cannot fill the blanks here, but I hope someone else can. It would be great if such a reader could be used to create index files for whatever one might look for in a large document set: “meta” docs (aka sitemaps), table / image / definition lists… On Sunday, August 7, 2022 at 4:12:07 PM UTC+2 Albert Krewinkel wrote: > > Can someone suggest how to approach this using only Pandoc (filters) > > instead of using a separate tool? > > The information of which header originated in which file is lost during > the conversion, so we can't use a filter. However, we can use a custom > reader, as those have access to both the filenames and the file's > contents. We just need to do the parsing ourselves with `pandoc.read`: > > ``` lua > function Reader (sources, opts) > local items = pandoc.List{} > for i, source in ipairs(sources) do > local headers = pandoc.read(source, 'markdown', opts).blocks:walk{ > Block = function (blk) > return blk.t == 'Header' > and blk -- keep Header elements > or {} -- discard everything else > end > } > local current_filename = source.name > -- TODO: convert headers to list items, add links, append to `items` > -- ... > end > return pandoc.Pandoc{pandoc.BulletList(items)} > end > ``` > > This needs some modification, of course, but I hope the general idea > becomes clear from this. > > HTH > > -- > Albert Krewinkel > GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124 > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/a68fad6e-6a63-49be-a19a-575919b91a38n%40googlegroups.com. [-- Attachment #1.2: Type: text/html, Size: 2847 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-08-07 15:37 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-08-07 9:20 HTML – Building a linked index page Martin Post [not found] ` <45b1415d-1ab3-44c2-8199-d45095873e62n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2022-08-07 13:56 ` Jiří Wolker [not found] ` <9ee132fd-6023-0bf6-02f5-5f72e56392fb-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2022-08-07 12:12 ` William Lupton 2022-08-07 14:02 ` Albert Krewinkel [not found] ` <87edxs401h.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> 2022-08-07 15:37 ` Martin Post
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).