Multiple HTML files (split at H1 or H2 level) from one source?

public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed

* Multiple HTML files (split at H1 or H2 level) from one source?
@ 2019-04-13 22:24 Martin Post
       [not found] ` <caeb876c-6393-45d6-92b5-e7f3bee17f49-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Martin Post @ 2019-04-13 22:24 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1.1: Type: text/plain, Size: 1194 bytes --]

Hello.

I’d like to write a long-form, structured document using (Pandoc’s) 
Markdown and then create chapter-sized HTML files from it, with the file 
names derived from h1 or h2 heading IDs.

I understand that Pandoc will always emit _one__ target document, so I 
guess I’d either need to split my Markdown master file first or the target 
HTML file using some third-party tool.

And there’s the bonus problem of breaking in-document links: 
[link](#anchor_2) in chapter 1 would have to become 
[link](chapter_2.htm#anchor_2).

I’d be grateful for suggestions on how to do this. Thanks.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/caeb876c-6393-45d6-92b5-e7f3bee17f49%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 1717 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Multiple HTML files (split at H1 or H2 level) from one source?
       [not found] ` <caeb876c-6393-45d6-92b5-e7f3bee17f49-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2019-04-14  9:59   ` Albert Krewinkel
       [not found]     ` <87o958c3fm.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
  2019-04-15 15:44   ` John MacFarlane
  1 sibling, 1 reply; 13+ messages in thread
From: Albert Krewinkel @ 2019-04-14  9:59 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Martin Post writes:

> I’d like to write a long-form, structured document using (Pandoc’s)
> Markdown and then create chapter-sized HTML files from it, with the file
> names derived from h1 or h2 heading IDs.
>
> I understand that Pandoc will always emit _one__ target document, so I
> guess I’d either need to split my Markdown master file first or the target
> HTML file using some third-party tool.

I cannot think of a clean solution, so here is a slightly hacky
one, based on Lua filters. It uses `pandoc.utils.hierarchicalize`
to create new subdocuments, which are then each sent to separate
json files.

    local utils = require 'pandoc.utils'

    function flatten (elements)
      local result = List:new {}
      for i, el in ipairs(elements) do
        if el.t == 'Sec' then
          local header = pandoc.Header(el.level, el.label, el.attr)
          table.insert(result, header)
          result:extend(flatten(el.contents))
        else
          table.insert(result, el)
        end
      end
      return result
    end

    function Pandoc (doc)
      local elements = utils.hierarchicalize(doc.blocks)
      for i, sec in ipairs(elements) do
        -- create new metadata. Copy from `doc.meta` as necessary.
        local new_meta = {
          title = sec.label
        }
        local new_doc = pandoc.Pandoc(flatten(sec.contents), new_meta)
        local filename = sec.attr.identifier .. '.json'
        utils.run_json_filter(new_doc, 'tee', {filename})
      end
      return pandoc.Pandoc {}
    end

This will give you all sections as `.json` files, which you can
then process further with pandoc in the ways you see fit.

> And there’s the bonus problem of breaking in-document links:
> [link](#anchor_2) in chapter 1 would have to become
> [link](chapter_2.htm#anchor_2).

That's more difficult, but should still be possible using a filter.

HTH

--
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/87o958c3fm.fsf%40zeitkraut.de.
For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Multiple HTML files (split at H1 or H2 level) from one source?
       [not found]     ` <87o958c3fm.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
@ 2019-04-14 19:54       ` Martin Post
  0 siblings, 0 replies; 13+ messages in thread
From: Martin Post @ 2019-04-14 19:54 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2982 bytes --]


Thanks a lot for looking into this, Albert. I have never written (or used, 
for that matter) a Lua filter, but I will give this a try.


On Sunday, April 14, 2019 at 12:00:04 PM UTC+2, Albert Krewinkel wrote:
>
> Martin Post writes: 
>
> > I’d like to write a long-form, structured document using (Pandoc’s) 
> > Markdown and then create chapter-sized HTML files from it, with the file 
> > names derived from h1 or h2 heading IDs. 
> > 
> > I understand that Pandoc will always emit _one__ target document, so I 
> > guess I’d either need to split my Markdown master file first or the 
> target 
> > HTML file using some third-party tool. 
>
> I cannot think of a clean solution, so here is a slightly hacky 
> one, based on Lua filters. It uses `pandoc.utils.hierarchicalize` 
> to create new subdocuments, which are then each sent to separate 
> json files. 
>
>     local utils = require 'pandoc.utils' 
>
>     function flatten (elements) 
>       local result = List:new {} 
>       for i, el in ipairs(elements) do 
>         if el.t == 'Sec' then 
>           local header = pandoc.Header(el.level, el.label, el.attr) 
>           table.insert(result, header) 
>           result:extend(flatten(el.contents)) 
>         else 
>           table.insert(result, el) 
>         end 
>       end 
>       return result 
>     end 
>
>     function Pandoc (doc) 
>       local elements = utils.hierarchicalize(doc.blocks) 
>       for i, sec in ipairs(elements) do 
>         -- create new metadata. Copy from `doc.meta` as necessary. 
>         local new_meta = { 
>           title = sec.label 
>         } 
>         local new_doc = pandoc.Pandoc(flatten(sec.contents), new_meta) 
>         local filename = sec.attr.identifier .. '.json' 
>         utils.run_json_filter(new_doc, 'tee', {filename}) 
>       end 
>       return pandoc.Pandoc {} 
>     end 
>
> This will give you all sections as `.json` files, which you can 
> then process further with pandoc in the ways you see fit. 
>
> > And there’s the bonus problem of breaking in-document links: 
> > [link](#anchor_2) in chapter 1 would have to become 
> > [link](chapter_2.htm#anchor_2). 
>
> That's more difficult, but should still be possible using a filter. 
>
> HTH 
>
> -- 
> Albert Krewinkel 
> GPG: 8eed e3e2 e8c5 6f18 81fe  e836 388d c0b2 1f63 1124 
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/f431ea2d-f1d0-4d9a-a089-44df2a883846%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 3783 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Multiple HTML files (split at H1 or H2 level) from one source?
       [not found] ` <caeb876c-6393-45d6-92b5-e7f3bee17f49-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2019-04-14  9:59   ` Albert Krewinkel
@ 2019-04-15 15:44   ` John MacFarlane
       [not found]     ` <m21s232ryh.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
  1 sibling, 1 reply; 13+ messages in thread
From: John MacFarlane @ 2019-04-15 15:44 UTC (permalink / raw)
  To: Martin Post, pandoc-discuss


This is not an infrequent request.  Maybe we should
consider adding an "html chapters" output mode, which
produces a zip file.

After all, we already have code for the EPUB writer
that splits content into chapters and fixes all the
internal references. This could be factored out.

Martin Post <martinpostberlin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Hello.
>
> I’d like to write a long-form, structured document using (Pandoc’s) 
> Markdown and then create chapter-sized HTML files from it, with the file 
> names derived from h1 or h2 heading IDs.
>
> I understand that Pandoc will always emit _one__ target document, so I 
> guess I’d either need to split my Markdown master file first or the target 
> HTML file using some third-party tool.
>
> And there’s the bonus problem of breaking in-document links: 
> [link](#anchor_2) in chapter 1 would have to become 
> [link](chapter_2.htm#anchor_2).
>
> I’d be grateful for suggestions on how to do this. Thanks.
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/caeb876c-6393-45d6-92b5-e7f3bee17f49%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m21s232ryh.fsf%40johnmacfarlane.net.
For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Multiple HTML files (split at H1 or H2 level) from one source?
       [not found]     ` <m21s232ryh.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
@ 2019-04-15 15:55       ` Alexander Krotov
       [not found]         ` <319987c6-f761-841f-3c98-47f56b0993cb-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Alexander Krotov @ 2019-04-15 15:55 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

On 15/04/2019 18:44, John MacFarlane wrote:
> 
> This is not an infrequent request.  Maybe we should
> consider adding an "html chapters" output mode, which
> produces a zip file.

I don't think it needs to be implemented in HTML writer.

Someone may want to split the document into multiple markdown, RST,
ODT... documents.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Multiple HTML files (split at H1 or H2 level) from one source?
       [not found]         ` <319987c6-f761-841f-3c98-47f56b0993cb-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2019-04-15 15:58           ` John MacFarlane
       [not found]             ` <m2pnpn1crg.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: John MacFarlane @ 2019-04-15 15:58 UTC (permalink / raw)
  To: Alexander Krotov, pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

Alexander Krotov <ilabdsf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> On 15/04/2019 18:44, John MacFarlane wrote:
>> 
>> This is not an infrequent request.  Maybe we should
>> consider adding an "html chapters" output mode, which
>> produces a zip file.
>
> I don't think it needs to be implemented in HTML writer.
>
> Someone may want to split the document into multiple markdown, RST,
> ODT... documents.

Good point.  Perhaps we could have some generic
feature like "chapters" that applies this
transformation and generates a zip, including
any media, for any output format...


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Multiple HTML files (split at H1 or H2 level) from one source?
       [not found]             ` <m2pnpn1crg.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
@ 2019-04-16 20:44               ` Martin Post
       [not found]                 ` <5749e9e9-205c-4b67-9dd1-f5e37d4d891f-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Martin Post @ 2019-04-16 20:44 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 1821 bytes --]

I think a generic chapters features would be great. I am writing and 
translating product manuals. Being able to convert the same source file to 
a printable PDF and web-friendly short chapters (or knowledge base 
articles) would make Pandoc the perfect single source publishing solution 
for the rest of us.

(And while I’m dreaming: a separate TOC file and “Previous” / “Next” links 
at the end of each chapter would complete the picture. :)

Thank you for considering this.


On Monday, April 15, 2019 at 5:58:42 PM UTC+2, John MacFarlane wrote:
>
> Alexander Krotov <ila...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <javascript:>> writes: 
>
> > On 15/04/2019 18:44, John MacFarlane wrote: 
> >> 
> >> This is not an infrequent request.  Maybe we should 
> >> consider adding an "html chapters" output mode, which 
> >> produces a zip file. 
> > 
> > I don't think it needs to be implemented in HTML writer. 
> > 
> > Someone may want to split the document into multiple markdown, RST, 
> > ODT... documents. 
>
> Good point.  Perhaps we could have some generic 
> feature like "chapters" that applies this 
> transformation and generates a zip, including 
> any media, for any output format... 
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5749e9e9-205c-4b67-9dd1-f5e37d4d891f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 2670 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Multiple HTML files (split at H1 or H2 level) from one source?
       [not found]                 ` <5749e9e9-205c-4b67-9dd1-f5e37d4d891f-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2019-04-17  4:39                   ` John MacFarlane
       [not found]                     ` <m28sw91bzj.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
  2019-04-18 15:54                   ` BP Jonsson
  2019-04-19 20:05                   ` John Gabriele
  2 siblings, 1 reply; 13+ messages in thread
From: John MacFarlane @ 2019-04-17  4:39 UTC (permalink / raw)
  To: Martin Post, pandoc-discuss

Martin Post <martinpostberlin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> I think a generic chapters features would be great. I am writing and 
> translating product manuals. Being able to convert the same source file to 
> a printable PDF and web-friendly short chapters (or knowledge base 
> articles) would make Pandoc the perfect single source publishing solution 
> for the rest of us.
>
> (And while I’m dreaming: a separate TOC file and “Previous” / “Next” links 
> at the end of each chapter would complete the picture. :)

Have you considered using pandoc to produce docbook or
texinfo, then using docbook or texinfo tools to
produce chaptered HTML?

Here's an example from pandoc's online demos, going
pandoc -> texinfo -> chunked HTML:

http://pandoc.org/demo/example19/

Commands:

pandoc MANUAL.txt -s -o example19.texi
makeinfo --no-validate --force example19.texi --html -o example19

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m28sw91bzj.fsf%40johnmacfarlane.net.
For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Multiple HTML files (split at H1 or H2 level) from one source?
       [not found]                     ` <m28sw91bzj.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
@ 2019-04-17  8:28                       ` Martin Post
  0 siblings, 0 replies; 13+ messages in thread
From: Martin Post @ 2019-04-17  8:28 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2319 bytes --]

Hello John,

while I have looked into DocBook and admire its feature set (conditional 
content, semantic markup etc.), I’m looking for something more simple that 
is easy to replicate – preferably a single-tool, cross-platform solution. I 
am working with other technical writers and translators who are unhappy 
about everything that doesn’t look & feel like MS Word or Madcap Flare. I 
might just get them to install Pandoc, but surely not XSL processors etc. 
Markdown > HTML > PDF (via Prince) covers a lot of bases with an acceptable 
learning curve and investment. But I realise this is a fairly specific use 
case.


On Wednesday, April 17, 2019 at 6:40:00 AM UTC+2, John MacFarlane wrote:
>
> Martin Post <martinpo...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <javascript:>> writes: 
>
> > I think a generic chapters features would be great. I am writing and 
> > translating product manuals. Being able to convert the same source file 
> to 
> > a printable PDF and web-friendly short chapters (or knowledge base 
> > articles) would make Pandoc the perfect single source publishing 
> solution 
> > for the rest of us. 
> > 
> > (And while I’m dreaming: a separate TOC file and “Previous” / “Next” 
> links 
> > at the end of each chapter would complete the picture. :) 
>
> Have you considered using pandoc to produce docbook or 
> texinfo, then using docbook or texinfo tools to 
> produce chaptered HTML? 
>
> Here's an example from pandoc's online demos, going 
> pandoc -> texinfo -> chunked HTML: 
>
> http://pandoc.org/demo/example19/ 
>
> Commands: 
>
> pandoc MANUAL.txt -s -o example19.texi 
> makeinfo --no-validate --force example19.texi --html -o example19 
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/680ed6d7-2292-48da-9042-ac2f0ab80503%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 3641 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Multiple HTML files (split at H1 or H2 level) from one source?
       [not found]                 ` <5749e9e9-205c-4b67-9dd1-f5e37d4d891f-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2019-04-17  4:39                   ` John MacFarlane
@ 2019-04-18 15:54                   ` BP Jonsson
       [not found]                     ` <a5eac512-0951-2b0c-c2fc-a9239bc3c553-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2019-04-19 20:05                   ` John Gabriele
  2 siblings, 1 reply; 13+ messages in thread
From: BP Jonsson @ 2019-04-18 15:54 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw, Martin Post

[-- Attachment #1: Type: text/plain, Size: 8929 bytes --]

Den 2019-04-16 kl. 22:44, skrev Martin Post:
> I think a generic chapters features would be great. I am writing and
> translating product manuals. Being able to convert the same source file to
> a printable PDF and web-friendly short chapters (or knowledge base
> articles) would make Pandoc the perfect single source publishing solution
> for the rest of us.

Have you considered inverting the algorithm and have each chapter 
in a separate source file? Since Pandoc will concatenate multiple 
input files given on the command line you can just do something 
like this when generating a single-file version:

````sh
pandoc -o book.pdf chap-*.md
````

The age-old trick for getting the globbed input files in the right 
order is to include a zero-padded "serial number" in the name of 
each input file:

```
chap-001-introduction.md
chap-002-background.md
...
chap-010-something.md
chap-011-whatever.md
```

The point of the zero-padding is that each serial number has the 
same number of characters and the "alphabetical" (actually 
ASCII-betical) sort order will be correct.  Another trick is to 
leave some space "between" the chapters in the numbering so that 
you can insert new chapters in between without renaming all the 
existing ones:

```
chap-0010-introduction.md
chap-0020-background.md
...
chap-0100-something.md
chap-0101-inserted-chapter.md
chap-0110-whatever.md
```

Another important thing to address whether you concatenate several 
files to get a single-file version of a text or split a single 
file to get a multi-file version is that hyperlinks between the 
chapters, if there are any, must be converted so that they are 
internal links in a single-file version and external links to the 
appropriate section in the appropriate file in a multi-file 
version.  My solution for these "variable URLs" is to define the 
links with the appropriate external URL, with a prefix to identify 
it as a variable URL, and let a filter remove everything before 
the fragment when generating a single-file version, but just 
remove the prefix when generating a multi-file version. 
Additionally the filter supports using a dummy file extension 
'.XXX' in variable URLS, which gets replaced with the appropriate 
extension for the file type you are generating for a multi-file 
version.

A Lua version of that filter is attached.

You tell the filter which kind of URLs you want by setting the 
metadata field `internal_urls` to a true value (i.e. anything 
except `false` or null/`nil`) when you want internal links.
The default value is `false`, i.e. output external URLs.

You tell the filter what to replace the dummy file extension 
`.XXX` with by stting the value of the metadata field 
`external_url_ext` to the wanted extension --- a string including 
the leading dot.  The default value is `.html`.

The prefix for variable URLs is `+`, so that a link with a 
variable URL should look something like this:

````markdown
[link text](+path/to/chap-001.XXX#chap-001-foo)
````

Note that the fragment (the "id" part from `#` onwards) is 
required so that links work as they should in a single-file 
version.  You will probably also want to include a part unique for 
each source file/main section in every target id to ensure that 
ids remain unique in a single-file version.  This means that you 
probably want to specify an id explicitly for each section like this:

````markdown
### Foo {#chap-001-foo}
````

Thus to produce a single-file PDF version you would say:

````sh
pandoc --lua-filter=var-urls.lua -M internal_urls -o book.pdf 
chap-*.md
````

A Makefile might look something like this:

````makefile
MD   = $(wildcard markdown/*.md)
HTML = $(patsubst markdown/%.md,html/%.html,$(MD))
PDF  = $(patsubst markdown/%.md,pdf/%.pdf,$(MD))

all: single.html single.pdf $(HTML) $(PDF)
.PHONY: all

single.html: $(MD)
	pandoc --lua-filter=var-urls.lua -M internal_urls -w html -so $@ $^

single.pdf: $(MD)
	pandoc --lua-filter=var-urls.lua -M internal_urls -w latex -so $@ $^

$(HTML): html/%.html: markdown/%.md
	mkdir -p html
	pandoc --lua-filter=var-urls.lua -w html -so $@ $<

$(PDF): pdf/%.pdf: markdown/%.md
	mkdir -p pdf
	pandoc --lua-filter=var-urls.lua -M external_url_ext='.pdf' -w 
latex -so $@ $<
````

where each file in the `markdown` directory looks something like this:

````markdown
## Chapter 1 {#chap-001}

Illum sapiente non rerum.

### Overview {#chap-001-overview}

See [chapter 3](+chap-003.XXX#chap-003).
````

 >
 > (And while I’m dreaming: a separate TOC file and “Previous” / 
“Next” links
 > at the end of each chapter would complete the picture. :)

Both may be handled with custom templates (and a filter in the 
case of the TOC).

If you create a file `toc-template.md` which contains just the 
single line

````md
$table-of-contents$
````

You will get a Markdown file which contains just the TOC (in 
Markdown format) with

````sh
pandoc --toc --template=./toc-template.md -w markdown -so toc.md 
source.md
````

This will give you a file which looks something like this:

````markdown
-   [Chapter 1](#chap-001)
     -   [Abstract](#chap-001-abstract)
-   [Chapter 2](#chap-002)
     -   [Abstract](#chap-002-abstract)
-   [Chapter 3](#chap-003)
     -   [Abstract](#chap-003-abstract)
````

Building on my multi-source model from above you would create the 
TOC markdown with something like

````sh
pandoc --toc --template=./toc-template.md -w markdown -so toc.md 
markdown/*.md
````

which you then can edit as you wish (or put the relevant content 
in the template --- just make sure to double any dollar signs 
which are part of the text!), and then convert to HTML.

However since the TOC Markdown is created in single-file mode so 
to speak you will need to use a filter to correct them so that 
they point to different files.
Assuming all heading ids are of the form `#chap-NUM-OPTIONAL-PART` 
where `chap-NUM` corresponds to the base of the name of the file 
which will contain the chapter, i.e. the chapter files have names 
like `chap-001.html` you will get a working separate `toc.html` by 
running this command:

````sh
pandoc --lua-filter=multi-toc-links.lua -w html -so html/toc.html 
toc.md
````

where `multi-toc-links.lua` contains just

````lua
function Link (link)
     link.target = link.target:gsub('^%#(chap%-%d+)','%1.html%0')
     return link
end
````

what this filter does is that if the URL of a link starts with 
`#chap-NUM` it will be changed to start with 
`chap-NUM.html#chap-NUM`.  This filter does not allow you to 
specify the file extension on the commandline, but could easily be 
modified to do so. Since the replacement (the second argument to 
string.gsub()) essentially is a template --- `%1` is replaced with 
the `chap-NUM` part of the matched text, and `%0` is replaced with 
the entire matched text, i.e. `#chap-NUM` ---, you could even 
specify that template via metadata.

Getting the Previous and Next links is easy if you have each 
chapter in a separate Markdown file and use a custom HTML template.

To create the custom template you first need to get a copy of the 
standard template:

````sh
pandoc -D html >chapter-template.html
````

Now edit `chapter-template.html` and above the `</body>` line 
insert something along the lines of

````html
<table id="prev-next-nav">
<tbody>
<tr class="odd">
<td>
$if(prev-chapter)$
<a href="chap-$prev-chapter$.html">Previous</a></td>
$endif$
<td><a href="toc.html">ToC</a></td>
$if(next-chapter)$
<a href="chap-$next-chapter$.html">Next</a></td>
$endif$
</tr>
</tbody>
</table>
````

Now to get this to work you just need to add one or two lines to 
the metadata of each file:

````yaml
prev-chapter: 001
next-chapter: 003
````

The values of those two fields can of course be full filenames as 
well; just change the `href` attribute value in the custom 
template Previous and Next links. In the first chapter you just 
omit the `prev-chapter` field from the metadata and in the last 
chapter you omit the `next-chapter` field and the corresponding 
link won't exist.

To style the whole thing just add some CSS for `table#prev-next-nav`.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/a5eac512-0951-2b0c-c2fc-a9239bc3c553%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: var-urls.lua --]
[-- Type: text/x-lua, Size: 7350 bytes --]

--[=============================================================================[
var-urls.lua - handle URLs when (not) concatenating multiple sources with Pandoc

An important thing to address when you concatenate several files to get a
single-file version of a text is that hyperlinks between the chapters, if
there are any, must be converted so that they are internal links in a single-
file version and external links to the appropriate section in the
appropriate file in a multi-file version. My solution for these "variable
URLs" is to define the links with the appropriate external URL, with a
prefix to identify it as a variable URL, and let a filter remove everything
before the fragment when generating a single-file version, but just remove
the prefix when generating a multi-file version. Additionally the filter
supports using a dummy file extension '.XXX' in variable URLS, which gets
replaced with the appropriate extension for the file type you are
generating for a multi-file version.

A Lua version of that filter is attached.

You tell the filter which kind of URLs you want by setting the metadata
field `internal_urls` to a true value (i.e. anything except `false` or
null/`nil`) when you want internal links. The default value is `false`,
i.e. output external URLs.

You tell the filter what to replace the dummy file extension `.XXX` with
by stting the value of the metadata field `external_url_ext` to the
wanted extension --- a string including the leading dot. The default
value is `.html`.

The prefix for variable URLs is `+`, so that a link with a variable URL
should look something like this:

````markdown
[link text](+path/to/chap-001.XXX#chap-001-foo)
````

Note that the fragment (the "id" part from `#` onwards) is required so
that links work as they should in a single-file version. You will probably
also want to include a part unique for each source file/main section in
every target id to ensure that ids remain unique in a single-file version.
This means that you probably want to specify an id explicitly for each
section like this:

````markdown
### Foo {#chap-001-foo}
````

Thus to produce a single-file PDF version you would say:

````sh
pandoc --lua-filter=var-urls.lua -M internal_urls -o book.pdf chap-*.md
````

A Makefile might look something like this:

````makefile
MD   = $(wildcard markdown/*.md)
HTML = $(patsubst markdown/%.md,html/%.html,$(MD))
PDF  = $(patsubst markdown/%.md,pdf/%.pdf,$(MD))

all: single.html single.pdf $(HTML) $(PDF)
.PHONY: all

single.html: $(MD)
	pandoc --lua-filter=var-urls.lua -M internal_urls -w html -so $@ $^

single.pdf: $(MD)
	pandoc --lua-filter=var-urls.lua -M internal_urls -w latex -so $@ $^

$(HTML): html/%.html: markdown/%.md
	mkdir -p html
	pandoc --lua-filter=var-urls.lua -w html -so $@ $<

$(PDF): pdf/%.pdf: markdown/%.md
	mkdir -p pdf
	pandoc --lua-filter=var-urls.lua -M external_url_ext='.pdf' -w latex -so $@ $<
````

where each file in the `markdown` directory looks something like this:

````markdown
## Chapter 1 {#chap-001}

Illum sapiente non rerum.

### Overview {#chap-001-overview}

See [chapter 3](+chap-003.XXX#chap-003).
````

The age-old trick for getting the globbed input files in the right order is
to include a zero-padded "serial number" in the name of each input file:

```
chap-001-introduction.md
chap-002-background.md
...
chap-010-something.md
chap-011-whatever.md
```

The point of the zero-padding is that each serial number has the same
number of characters and the "alphabetical" (actually ASCII-betical) sort
order will be correct. Another trick is to leave some space "between" the
chapters in the numbering so that you can insert new chapters in between
without renaming all the existing ones:

```
chap-0010-introduction.md
chap-0020-background.md
...
chap-0100-something.md
chap-0101-inserted-chapter.md
chap-0110-whatever.md
```

| This software is Copyright (c) 2019 by Benct Philip Jonsson.
| 
| This is free software, licensed under:
| 
|   The MIT (X11) License
| See <http://www.opensource.org/licenses/mit-license.php>.

--]=============================================================================]

local stringify = pandoc.utils and pandoc.utils.stringify or require"pandoc.utils".stringify
assert(stringify, "Couldn't get the pandoc.utils.stringify function")

local config = {
    internal_urls = false,
    external_url_ext = '.html',
}

-- patterns and replacements for predefined substitutions
local url_subst_dispatch = {
    var_url      = {
        -- A leading '+' identifies a "variable URL" as such.
        -- To use another prefix just modify this pattern!
        pat = '^%+',
        -- we simply remove the leading character
        repl = "",
    },
    internal_url = {
        -- When we want internal URLs we just remove
        -- everything before the fragment!
        pat = '^[^#]+',
        repl = "",
    },
    external_url = {
        -- When we want external URLs we replace the
        -- dummy extension '.XXX' if any with the
        -- appropriate extension for the current output format
        pat = '(%.XXX)',
        -- This must be a function so that it returns the current
        -- value of config.external_url_ext at the time when
        -- the substitution is made!
        repl = function (match) return config.external_url_ext or match end,
    },
}

-- make a predefined substitution
local function url_subst (str, key)
    str = tostring(str) -- just in case!
    key = tostring(key) -- just in case!
    local disp = assert(url_subst_dispatch[key], "No such dispatch: " .. key)
    return str:gsub(disp.pat, disp.repl)
end

-- turn a meta value/tree into a "plain" value or "plain" data tree
local function meta2data (meta)
    if 'table' == type(meta) then
        -- find out if the table is an element object,
        -- a meta list/map object or some other table
        if nil == meta.t or 'MetaList' == meta.t or 'MetaMap' == meta.t then
            -- probably container rather than element object, so clone it
            -- (we have to assume that an object without a .t is the metadata
            -- root, because there is no safe way to check for it!)
            local data = {}
            for k,v in pairs(meta) do
                data[k] = meta2data(v) -- yes process recursively!
            end
            return data
        elseif meta.t then
            -- probably an element so stringify it
            return stringify(meta)
        else
            -- something else so just return it
            return meta
        end
    else
        -- not a table so just return it
        return meta
    end
end

local function get_meta_config (meta)
    for k in pairs(config) do
        -- get meta value and turn it into a "plain" value
        local v = meta2data(meta[k])
        if nil ~= v then -- if defined use it
            -- TODO: type check and/or otherwise validate v
            config[k] = v
        end
    end
    return nil
end

local function fix_var_url (link)
    local url = url_subst(link.target, 'var_url')
    if url == link.target then
        return nil
    elseif config.internal_urls then
        link.target = url_subst(url, 'internal_url')
    else
        link.target = url_subst(url, 'external_url')
    end
    return link
end

return {
    { Meta = get_meta_config },
    { Link = fix_var_url },
}

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Multiple HTML files (split at H1 or H2 level) from one source?
       [not found]                 ` <5749e9e9-205c-4b67-9dd1-f5e37d4d891f-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2019-04-17  4:39                   ` John MacFarlane
  2019-04-18 15:54                   ` BP Jonsson
@ 2019-04-19 20:05                   ` John Gabriele
       [not found]                     ` <3bcc67bf-59a4-49ad-9a0a-ec5e6a39f7b3-jFIJ+Wc5/Vo7lZ9V/NTDHw@public.gmane.org>
  2 siblings, 1 reply; 13+ messages in thread
From: John Gabriele @ 2019-04-19 20:05 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 2979 bytes --]

Hi Martin,

Do you prefer working with one long file instead of multiple smaller ones?

If you're willing to break the one big file into chapter-sized files, I wrote a little program that uses Pandoc to process them into ordered linked-up html output: http://www.unexpected-vortices.com/sw/rippledoc/index.html , including prev/next links.

-- John


On Tue, Apr 16, 2019, at 4:44 PM, Martin Post wrote:
> I think a generic chapters features would be great. I am writing and translating product manuals. Being able to convert the same source file to a printable PDF and web-friendly short chapters (or knowledge base articles) would make Pandoc the perfect single source publishing solution for the rest of us.
> 
> (And while I’m dreaming: a separate TOC file and “Previous” / “Next” links at the end of each chapter would complete the picture. :)
> 
> Thank you for considering this.
> 
> 
> On Monday, April 15, 2019 at 5:58:42 PM UTC+2, John MacFarlane wrote:
>> Alexander Krotov <ila...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: 
>> 
>> > On 15/04/2019 18:44, John MacFarlane wrote: 
>> >> 
>> >> This is not an infrequent request. Maybe we should 
>> >> consider adding an "html chapters" output mode, which 
>> >> produces a zip file. 
>> > 
>> > I don't think it needs to be implemented in HTML writer. 
>> > 
>> > Someone may want to split the document into multiple markdown, RST, 
>> > ODT... documents. 
>> 
>> Good point. Perhaps we could have some generic 
>> feature like "chapters" that applies this 
>> transformation and generates a zip, including 
>> any media, for any output format... 
> 

> --
>  You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
>  To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>  To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
>  To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5749e9e9-205c-4b67-9dd1-f5e37d4d891f%40googlegroups.com <https://groups.google.com/d/msgid/pandoc-discuss/5749e9e9-205c-4b67-9dd1-f5e37d4d891f%40googlegroups.com?utm_medium=email&utm_source=footer>.
>  For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/3bcc67bf-59a4-49ad-9a0a-ec5e6a39f7b3%40www.fastmail.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #2: Type: text/html, Size: 5582 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Multiple HTML files (split at H1 or H2 level) from one source?
       [not found]                     ` <a5eac512-0951-2b0c-c2fc-a9239bc3c553-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2019-04-20  9:18                       ` Martin Post
  0 siblings, 0 replies; 13+ messages in thread
From: Martin Post @ 2019-04-20  9:18 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1.1: Type: text/plain, Size: 1434 bytes --]

Hello BP Jonsson,

thank you so much for taking the time to outline your approach and 
providing both your Lua filter and a ready-to-use Makefile. And yes, I have 
not only considered the chapters to long document approach, I am actually 
doing this – both by concatenating chapters in Pandoc and using file 
transclusion in a custom LML I am using. Both approaches (slice long docs 
vs. building them from small chapters) have obvious advantages. This is one 
of several problem I am trying to solve right now with regards to writing 
technical documentation, but I have saved your comments and the filter to 
study them over the next days.

Cheers,

Martin

On Thursday, April 18, 2019 at 5:54:45 PM UTC+2, BP Jonsson wrote:

Have you considered inverting the algorithm and have each chapter 
> in a separate source file?

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/0cbc2a1f-4ec5-4634-8709-f2f88e9d1e3b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 2082 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Multiple HTML files (split at H1 or H2 level) from one source?
       [not found]                     ` <3bcc67bf-59a4-49ad-9a0a-ec5e6a39f7b3-jFIJ+Wc5/Vo7lZ9V/NTDHw@public.gmane.org>
@ 2019-04-20  9:46                       ` Martin Post
  0 siblings, 0 replies; 13+ messages in thread
From: Martin Post @ 2019-04-20  9:46 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1.1: Type: text/plain, Size: 1732 bytes --]

On Friday, April 19, 2019 at 10:06:27 PM UTC+2, jgabriele wrote:
>
> If you're willing to break the one big file into chapter-sized files, I 
> wrote a little program that uses Pandoc to process them into ordered 
> linked-up html output: 
> http://www.unexpected-vortices.com/sw/rippledoc/index.html , including 
> prev/next links.
>
>
Hello John,

This thread just keeps on giving… :)

I just realise I came across Rippledoc a while ago, but obviously had 
forgotten about it when researching this topic (or the “no OS other than 
GNU/Linux” line had scared off my inner macOS fanboy).

I downloaded it, and it worked out of the box. Beautiful results (and 
obviously easy to customize once it has created CSS and TOC files).

So while it doesn’t help with slicing long documents, it does provide 
navigation and will allow me to use folders to structure more complex 
projects effectively.

I also like that it is (as you say on your site) “talkative” (most CLI 
tools are a bit too tight-lipped for my taste), and your own documentation 
is excellent, too. 

So - another big thank you.

Cheers,

Martin

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/3666c747-2b9b-4891-a992-1d6b6755d161%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[-- Attachment #1.2: Type: text/html, Size: 2689 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-04-20  9:46 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-13 22:24 Multiple HTML files (split at H1 or H2 level) from one source? Martin Post
     [not found] ` <caeb876c-6393-45d6-92b5-e7f3bee17f49-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-04-14  9:59   ` Albert Krewinkel
     [not found]     ` <87o958c3fm.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2019-04-14 19:54       ` Martin Post
2019-04-15 15:44   ` John MacFarlane
     [not found]     ` <m21s232ryh.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2019-04-15 15:55       ` Alexander Krotov
     [not found]         ` <319987c6-f761-841f-3c98-47f56b0993cb-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2019-04-15 15:58           ` John MacFarlane
     [not found]             ` <m2pnpn1crg.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2019-04-16 20:44               ` Martin Post
     [not found]                 ` <5749e9e9-205c-4b67-9dd1-f5e37d4d891f-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-04-17  4:39                   ` John MacFarlane
     [not found]                     ` <m28sw91bzj.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2019-04-17  8:28                       ` Martin Post
2019-04-18 15:54                   ` BP Jonsson
     [not found]                     ` <a5eac512-0951-2b0c-c2fc-a9239bc3c553-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2019-04-20  9:18                       ` Martin Post
2019-04-19 20:05                   ` John Gabriele
     [not found]                     ` <3bcc67bf-59a4-49ad-9a0a-ec5e6a39f7b3-jFIJ+Wc5/Vo7lZ9V/NTDHw@public.gmane.org>
2019-04-20  9:46                       ` Martin Post

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).