* Multiple HTML files (split at H1 or H2 level) from one source?
@ 2019-04-13 22:24 Martin Post
[not found] ` <caeb876c-6393-45d6-92b5-e7f3bee17f49-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
0 siblings, 1 reply; 13+ messages in thread
From: Martin Post @ 2019-04-13 22:24 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 1194 bytes --]
Hello.
I’d like to write a long-form, structured document using (Pandoc’s)
Markdown and then create chapter-sized HTML files from it, with the file
names derived from h1 or h2 heading IDs.
I understand that Pandoc will always emit _one__ target document, so I
guess I’d either need to split my Markdown master file first or the target
HTML file using some third-party tool.
And there’s the bonus problem of breaking in-document links:
[link](#anchor_2) in chapter 1 would have to become
[link](chapter_2.htm#anchor_2).
I’d be grateful for suggestions on how to do this. Thanks.
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/caeb876c-6393-45d6-92b5-e7f3bee17f49%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
[-- Attachment #1.2: Type: text/html, Size: 1717 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Multiple HTML files (split at H1 or H2 level) from one source?
[not found] ` <caeb876c-6393-45d6-92b5-e7f3bee17f49-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2019-04-14 9:59 ` Albert Krewinkel
[not found] ` <87o958c3fm.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2019-04-15 15:44 ` John MacFarlane
1 sibling, 1 reply; 13+ messages in thread
From: Albert Krewinkel @ 2019-04-14 9:59 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
Martin Post writes:
> I’d like to write a long-form, structured document using (Pandoc’s)
> Markdown and then create chapter-sized HTML files from it, with the file
> names derived from h1 or h2 heading IDs.
>
> I understand that Pandoc will always emit _one__ target document, so I
> guess I’d either need to split my Markdown master file first or the target
> HTML file using some third-party tool.
I cannot think of a clean solution, so here is a slightly hacky
one, based on Lua filters. It uses `pandoc.utils.hierarchicalize`
to create new subdocuments, which are then each sent to separate
json files.
local utils = require 'pandoc.utils'
function flatten (elements)
local result = List:new {}
for i, el in ipairs(elements) do
if el.t == 'Sec' then
local header = pandoc.Header(el.level, el.label, el.attr)
table.insert(result, header)
result:extend(flatten(el.contents))
else
table.insert(result, el)
end
end
return result
end
function Pandoc (doc)
local elements = utils.hierarchicalize(doc.blocks)
for i, sec in ipairs(elements) do
-- create new metadata. Copy from `doc.meta` as necessary.
local new_meta = {
title = sec.label
}
local new_doc = pandoc.Pandoc(flatten(sec.contents), new_meta)
local filename = sec.attr.identifier .. '.json'
utils.run_json_filter(new_doc, 'tee', {filename})
end
return pandoc.Pandoc {}
end
This will give you all sections as `.json` files, which you can
then process further with pandoc in the ways you see fit.
> And there’s the bonus problem of breaking in-document links:
> [link](#anchor_2) in chapter 1 would have to become
> [link](chapter_2.htm#anchor_2).
That's more difficult, but should still be possible using a filter.
HTH
--
Albert Krewinkel
GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/87o958c3fm.fsf%40zeitkraut.de.
For more options, visit https://groups.google.com/d/optout.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Multiple HTML files (split at H1 or H2 level) from one source?
[not found] ` <87o958c3fm.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
@ 2019-04-14 19:54 ` Martin Post
0 siblings, 0 replies; 13+ messages in thread
From: Martin Post @ 2019-04-14 19:54 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 2982 bytes --]
Thanks a lot for looking into this, Albert. I have never written (or used,
for that matter) a Lua filter, but I will give this a try.
On Sunday, April 14, 2019 at 12:00:04 PM UTC+2, Albert Krewinkel wrote:
>
> Martin Post writes:
>
> > I’d like to write a long-form, structured document using (Pandoc’s)
> > Markdown and then create chapter-sized HTML files from it, with the file
> > names derived from h1 or h2 heading IDs.
> >
> > I understand that Pandoc will always emit _one__ target document, so I
> > guess I’d either need to split my Markdown master file first or the
> target
> > HTML file using some third-party tool.
>
> I cannot think of a clean solution, so here is a slightly hacky
> one, based on Lua filters. It uses `pandoc.utils.hierarchicalize`
> to create new subdocuments, which are then each sent to separate
> json files.
>
> local utils = require 'pandoc.utils'
>
> function flatten (elements)
> local result = List:new {}
> for i, el in ipairs(elements) do
> if el.t == 'Sec' then
> local header = pandoc.Header(el.level, el.label, el.attr)
> table.insert(result, header)
> result:extend(flatten(el.contents))
> else
> table.insert(result, el)
> end
> end
> return result
> end
>
> function Pandoc (doc)
> local elements = utils.hierarchicalize(doc.blocks)
> for i, sec in ipairs(elements) do
> -- create new metadata. Copy from `doc.meta` as necessary.
> local new_meta = {
> title = sec.label
> }
> local new_doc = pandoc.Pandoc(flatten(sec.contents), new_meta)
> local filename = sec.attr.identifier .. '.json'
> utils.run_json_filter(new_doc, 'tee', {filename})
> end
> return pandoc.Pandoc {}
> end
>
> This will give you all sections as `.json` files, which you can
> then process further with pandoc in the ways you see fit.
>
> > And there’s the bonus problem of breaking in-document links:
> > [link](#anchor_2) in chapter 1 would have to become
> > [link](chapter_2.htm#anchor_2).
>
> That's more difficult, but should still be possible using a filter.
>
> HTH
>
> --
> Albert Krewinkel
> GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124
>
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/f431ea2d-f1d0-4d9a-a089-44df2a883846%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
[-- Attachment #1.2: Type: text/html, Size: 3783 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Multiple HTML files (split at H1 or H2 level) from one source?
[not found] ` <caeb876c-6393-45d6-92b5-e7f3bee17f49-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-04-14 9:59 ` Albert Krewinkel
@ 2019-04-15 15:44 ` John MacFarlane
[not found] ` <m21s232ryh.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
1 sibling, 1 reply; 13+ messages in thread
From: John MacFarlane @ 2019-04-15 15:44 UTC (permalink / raw)
To: Martin Post, pandoc-discuss
This is not an infrequent request. Maybe we should
consider adding an "html chapters" output mode, which
produces a zip file.
After all, we already have code for the EPUB writer
that splits content into chapters and fixes all the
internal references. This could be factored out.
Martin Post <martinpostberlin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> Hello.
>
> I’d like to write a long-form, structured document using (Pandoc’s)
> Markdown and then create chapter-sized HTML files from it, with the file
> names derived from h1 or h2 heading IDs.
>
> I understand that Pandoc will always emit _one__ target document, so I
> guess I’d either need to split my Markdown master file first or the target
> HTML file using some third-party tool.
>
> And there’s the bonus problem of breaking in-document links:
> [link](#anchor_2) in chapter 1 would have to become
> [link](chapter_2.htm#anchor_2).
>
> I’d be grateful for suggestions on how to do this. Thanks.
>
> --
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/caeb876c-6393-45d6-92b5-e7f3bee17f49%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m21s232ryh.fsf%40johnmacfarlane.net.
For more options, visit https://groups.google.com/d/optout.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Multiple HTML files (split at H1 or H2 level) from one source?
[not found] ` <m21s232ryh.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
@ 2019-04-15 15:55 ` Alexander Krotov
[not found] ` <319987c6-f761-841f-3c98-47f56b0993cb-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
0 siblings, 1 reply; 13+ messages in thread
From: Alexander Krotov @ 2019-04-15 15:55 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
On 15/04/2019 18:44, John MacFarlane wrote:
>
> This is not an infrequent request. Maybe we should
> consider adding an "html chapters" output mode, which
> produces a zip file.
I don't think it needs to be implemented in HTML writer.
Someone may want to split the document into multiple markdown, RST,
ODT... documents.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Multiple HTML files (split at H1 or H2 level) from one source?
[not found] ` <319987c6-f761-841f-3c98-47f56b0993cb-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2019-04-15 15:58 ` John MacFarlane
[not found] ` <m2pnpn1crg.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
0 siblings, 1 reply; 13+ messages in thread
From: John MacFarlane @ 2019-04-15 15:58 UTC (permalink / raw)
To: Alexander Krotov, pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
Alexander Krotov <ilabdsf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> On 15/04/2019 18:44, John MacFarlane wrote:
>>
>> This is not an infrequent request. Maybe we should
>> consider adding an "html chapters" output mode, which
>> produces a zip file.
>
> I don't think it needs to be implemented in HTML writer.
>
> Someone may want to split the document into multiple markdown, RST,
> ODT... documents.
Good point. Perhaps we could have some generic
feature like "chapters" that applies this
transformation and generates a zip, including
any media, for any output format...
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Multiple HTML files (split at H1 or H2 level) from one source?
[not found] ` <m2pnpn1crg.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
@ 2019-04-16 20:44 ` Martin Post
[not found] ` <5749e9e9-205c-4b67-9dd1-f5e37d4d891f-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
0 siblings, 1 reply; 13+ messages in thread
From: Martin Post @ 2019-04-16 20:44 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 1821 bytes --]
I think a generic chapters features would be great. I am writing and
translating product manuals. Being able to convert the same source file to
a printable PDF and web-friendly short chapters (or knowledge base
articles) would make Pandoc the perfect single source publishing solution
for the rest of us.
(And while I’m dreaming: a separate TOC file and “Previous” / “Next” links
at the end of each chapter would complete the picture. :)
Thank you for considering this.
On Monday, April 15, 2019 at 5:58:42 PM UTC+2, John MacFarlane wrote:
>
> Alexander Krotov <ila...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <javascript:>> writes:
>
> > On 15/04/2019 18:44, John MacFarlane wrote:
> >>
> >> This is not an infrequent request. Maybe we should
> >> consider adding an "html chapters" output mode, which
> >> produces a zip file.
> >
> > I don't think it needs to be implemented in HTML writer.
> >
> > Someone may want to split the document into multiple markdown, RST,
> > ODT... documents.
>
> Good point. Perhaps we could have some generic
> feature like "chapters" that applies this
> transformation and generates a zip, including
> any media, for any output format...
>
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5749e9e9-205c-4b67-9dd1-f5e37d4d891f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
[-- Attachment #1.2: Type: text/html, Size: 2670 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Multiple HTML files (split at H1 or H2 level) from one source?
[not found] ` <5749e9e9-205c-4b67-9dd1-f5e37d4d891f-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2019-04-17 4:39 ` John MacFarlane
[not found] ` <m28sw91bzj.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2019-04-18 15:54 ` BP Jonsson
2019-04-19 20:05 ` John Gabriele
2 siblings, 1 reply; 13+ messages in thread
From: John MacFarlane @ 2019-04-17 4:39 UTC (permalink / raw)
To: Martin Post, pandoc-discuss
Martin Post <martinpostberlin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> I think a generic chapters features would be great. I am writing and
> translating product manuals. Being able to convert the same source file to
> a printable PDF and web-friendly short chapters (or knowledge base
> articles) would make Pandoc the perfect single source publishing solution
> for the rest of us.
>
> (And while I’m dreaming: a separate TOC file and “Previous” / “Next” links
> at the end of each chapter would complete the picture. :)
Have you considered using pandoc to produce docbook or
texinfo, then using docbook or texinfo tools to
produce chaptered HTML?
Here's an example from pandoc's online demos, going
pandoc -> texinfo -> chunked HTML:
http://pandoc.org/demo/example19/
Commands:
pandoc MANUAL.txt -s -o example19.texi
makeinfo --no-validate --force example19.texi --html -o example19
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m28sw91bzj.fsf%40johnmacfarlane.net.
For more options, visit https://groups.google.com/d/optout.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Multiple HTML files (split at H1 or H2 level) from one source?
[not found] ` <m28sw91bzj.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
@ 2019-04-17 8:28 ` Martin Post
0 siblings, 0 replies; 13+ messages in thread
From: Martin Post @ 2019-04-17 8:28 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 2319 bytes --]
Hello John,
while I have looked into DocBook and admire its feature set (conditional
content, semantic markup etc.), I’m looking for something more simple that
is easy to replicate – preferably a single-tool, cross-platform solution. I
am working with other technical writers and translators who are unhappy
about everything that doesn’t look & feel like MS Word or Madcap Flare. I
might just get them to install Pandoc, but surely not XSL processors etc.
Markdown > HTML > PDF (via Prince) covers a lot of bases with an acceptable
learning curve and investment. But I realise this is a fairly specific use
case.
On Wednesday, April 17, 2019 at 6:40:00 AM UTC+2, John MacFarlane wrote:
>
> Martin Post <martinpo...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <javascript:>> writes:
>
> > I think a generic chapters features would be great. I am writing and
> > translating product manuals. Being able to convert the same source file
> to
> > a printable PDF and web-friendly short chapters (or knowledge base
> > articles) would make Pandoc the perfect single source publishing
> solution
> > for the rest of us.
> >
> > (And while I’m dreaming: a separate TOC file and “Previous” / “Next”
> links
> > at the end of each chapter would complete the picture. :)
>
> Have you considered using pandoc to produce docbook or
> texinfo, then using docbook or texinfo tools to
> produce chaptered HTML?
>
> Here's an example from pandoc's online demos, going
> pandoc -> texinfo -> chunked HTML:
>
> http://pandoc.org/demo/example19/
>
> Commands:
>
> pandoc MANUAL.txt -s -o example19.texi
> makeinfo --no-validate --force example19.texi --html -o example19
>
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/680ed6d7-2292-48da-9042-ac2f0ab80503%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
[-- Attachment #1.2: Type: text/html, Size: 3641 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Multiple HTML files (split at H1 or H2 level) from one source?
[not found] ` <5749e9e9-205c-4b67-9dd1-f5e37d4d891f-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-04-17 4:39 ` John MacFarlane
@ 2019-04-18 15:54 ` BP Jonsson
[not found] ` <a5eac512-0951-2b0c-c2fc-a9239bc3c553-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2019-04-19 20:05 ` John Gabriele
2 siblings, 1 reply; 13+ messages in thread
From: BP Jonsson @ 2019-04-18 15:54 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw, Martin Post
[-- Attachment #1: Type: text/plain, Size: 8929 bytes --]
Den 2019-04-16 kl. 22:44, skrev Martin Post:
> I think a generic chapters features would be great. I am writing and
> translating product manuals. Being able to convert the same source file to
> a printable PDF and web-friendly short chapters (or knowledge base
> articles) would make Pandoc the perfect single source publishing solution
> for the rest of us.
Have you considered inverting the algorithm and have each chapter
in a separate source file? Since Pandoc will concatenate multiple
input files given on the command line you can just do something
like this when generating a single-file version:
````sh
pandoc -o book.pdf chap-*.md
````
The age-old trick for getting the globbed input files in the right
order is to include a zero-padded "serial number" in the name of
each input file:
```
chap-001-introduction.md
chap-002-background.md
...
chap-010-something.md
chap-011-whatever.md
```
The point of the zero-padding is that each serial number has the
same number of characters and the "alphabetical" (actually
ASCII-betical) sort order will be correct. Another trick is to
leave some space "between" the chapters in the numbering so that
you can insert new chapters in between without renaming all the
existing ones:
```
chap-0010-introduction.md
chap-0020-background.md
...
chap-0100-something.md
chap-0101-inserted-chapter.md
chap-0110-whatever.md
```
Another important thing to address whether you concatenate several
files to get a single-file version of a text or split a single
file to get a multi-file version is that hyperlinks between the
chapters, if there are any, must be converted so that they are
internal links in a single-file version and external links to the
appropriate section in the appropriate file in a multi-file
version. My solution for these "variable URLs" is to define the
links with the appropriate external URL, with a prefix to identify
it as a variable URL, and let a filter remove everything before
the fragment when generating a single-file version, but just
remove the prefix when generating a multi-file version.
Additionally the filter supports using a dummy file extension
'.XXX' in variable URLS, which gets replaced with the appropriate
extension for the file type you are generating for a multi-file
version.
A Lua version of that filter is attached.
You tell the filter which kind of URLs you want by setting the
metadata field `internal_urls` to a true value (i.e. anything
except `false` or null/`nil`) when you want internal links.
The default value is `false`, i.e. output external URLs.
You tell the filter what to replace the dummy file extension
`.XXX` with by stting the value of the metadata field
`external_url_ext` to the wanted extension --- a string including
the leading dot. The default value is `.html`.
The prefix for variable URLs is `+`, so that a link with a
variable URL should look something like this:
````markdown
[link text](+path/to/chap-001.XXX#chap-001-foo)
````
Note that the fragment (the "id" part from `#` onwards) is
required so that links work as they should in a single-file
version. You will probably also want to include a part unique for
each source file/main section in every target id to ensure that
ids remain unique in a single-file version. This means that you
probably want to specify an id explicitly for each section like this:
````markdown
### Foo {#chap-001-foo}
````
Thus to produce a single-file PDF version you would say:
````sh
pandoc --lua-filter=var-urls.lua -M internal_urls -o book.pdf
chap-*.md
````
A Makefile might look something like this:
````makefile
MD = $(wildcard markdown/*.md)
HTML = $(patsubst markdown/%.md,html/%.html,$(MD))
PDF = $(patsubst markdown/%.md,pdf/%.pdf,$(MD))
all: single.html single.pdf $(HTML) $(PDF)
.PHONY: all
single.html: $(MD)
pandoc --lua-filter=var-urls.lua -M internal_urls -w html -so $@ $^
single.pdf: $(MD)
pandoc --lua-filter=var-urls.lua -M internal_urls -w latex -so $@ $^
$(HTML): html/%.html: markdown/%.md
mkdir -p html
pandoc --lua-filter=var-urls.lua -w html -so $@ $<
$(PDF): pdf/%.pdf: markdown/%.md
mkdir -p pdf
pandoc --lua-filter=var-urls.lua -M external_url_ext='.pdf' -w
latex -so $@ $<
````
where each file in the `markdown` directory looks something like this:
````markdown
## Chapter 1 {#chap-001}
Illum sapiente non rerum.
### Overview {#chap-001-overview}
See [chapter 3](+chap-003.XXX#chap-003).
````
>
> (And while I’m dreaming: a separate TOC file and “Previous” /
“Next” links
> at the end of each chapter would complete the picture. :)
Both may be handled with custom templates (and a filter in the
case of the TOC).
If you create a file `toc-template.md` which contains just the
single line
````md
$table-of-contents$
````
You will get a Markdown file which contains just the TOC (in
Markdown format) with
````sh
pandoc --toc --template=./toc-template.md -w markdown -so toc.md
source.md
````
This will give you a file which looks something like this:
````markdown
- [Chapter 1](#chap-001)
- [Abstract](#chap-001-abstract)
- [Chapter 2](#chap-002)
- [Abstract](#chap-002-abstract)
- [Chapter 3](#chap-003)
- [Abstract](#chap-003-abstract)
````
Building on my multi-source model from above you would create the
TOC markdown with something like
````sh
pandoc --toc --template=./toc-template.md -w markdown -so toc.md
markdown/*.md
````
which you then can edit as you wish (or put the relevant content
in the template --- just make sure to double any dollar signs
which are part of the text!), and then convert to HTML.
However since the TOC Markdown is created in single-file mode so
to speak you will need to use a filter to correct them so that
they point to different files.
Assuming all heading ids are of the form `#chap-NUM-OPTIONAL-PART`
where `chap-NUM` corresponds to the base of the name of the file
which will contain the chapter, i.e. the chapter files have names
like `chap-001.html` you will get a working separate `toc.html` by
running this command:
````sh
pandoc --lua-filter=multi-toc-links.lua -w html -so html/toc.html
toc.md
````
where `multi-toc-links.lua` contains just
````lua
function Link (link)
link.target = link.target:gsub('^%#(chap%-%d+)','%1.html%0')
return link
end
````
what this filter does is that if the URL of a link starts with
`#chap-NUM` it will be changed to start with
`chap-NUM.html#chap-NUM`. This filter does not allow you to
specify the file extension on the commandline, but could easily be
modified to do so. Since the replacement (the second argument to
string.gsub()) essentially is a template --- `%1` is replaced with
the `chap-NUM` part of the matched text, and `%0` is replaced with
the entire matched text, i.e. `#chap-NUM` ---, you could even
specify that template via metadata.
Getting the Previous and Next links is easy if you have each
chapter in a separate Markdown file and use a custom HTML template.
To create the custom template you first need to get a copy of the
standard template:
````sh
pandoc -D html >chapter-template.html
````
Now edit `chapter-template.html` and above the `</body>` line
insert something along the lines of
````html
<table id="prev-next-nav">
<tbody>
<tr class="odd">
<td>
$if(prev-chapter)$
<a href="chap-$prev-chapter$.html">Previous</a></td>
$endif$
<td><a href="toc.html">ToC</a></td>
$if(next-chapter)$
<a href="chap-$next-chapter$.html">Next</a></td>
$endif$
</tr>
</tbody>
</table>
````
Now to get this to work you just need to add one or two lines to
the metadata of each file:
````yaml
prev-chapter: 001
next-chapter: 003
````
The values of those two fields can of course be full filenames as
well; just change the `href` attribute value in the custom
template Previous and Next links. In the first chapter you just
omit the `prev-chapter` field from the metadata and in the last
chapter you omit the `next-chapter` field and the corresponding
link won't exist.
To style the whole thing just add some CSS for `table#prev-next-nav`.
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/a5eac512-0951-2b0c-c2fc-a9239bc3c553%40gmail.com.
For more options, visit https://groups.google.com/d/optout.
[-- Attachment #2: var-urls.lua --]
[-- Type: text/x-lua, Size: 7350 bytes --]
--[=============================================================================[
var-urls.lua - handle URLs when (not) concatenating multiple sources with Pandoc
An important thing to address when you concatenate several files to get a
single-file version of a text is that hyperlinks between the chapters, if
there are any, must be converted so that they are internal links in a single-
file version and external links to the appropriate section in the
appropriate file in a multi-file version. My solution for these "variable
URLs" is to define the links with the appropriate external URL, with a
prefix to identify it as a variable URL, and let a filter remove everything
before the fragment when generating a single-file version, but just remove
the prefix when generating a multi-file version. Additionally the filter
supports using a dummy file extension '.XXX' in variable URLS, which gets
replaced with the appropriate extension for the file type you are
generating for a multi-file version.
A Lua version of that filter is attached.
You tell the filter which kind of URLs you want by setting the metadata
field `internal_urls` to a true value (i.e. anything except `false` or
null/`nil`) when you want internal links. The default value is `false`,
i.e. output external URLs.
You tell the filter what to replace the dummy file extension `.XXX` with
by stting the value of the metadata field `external_url_ext` to the
wanted extension --- a string including the leading dot. The default
value is `.html`.
The prefix for variable URLs is `+`, so that a link with a variable URL
should look something like this:
````markdown
[link text](+path/to/chap-001.XXX#chap-001-foo)
````
Note that the fragment (the "id" part from `#` onwards) is required so
that links work as they should in a single-file version. You will probably
also want to include a part unique for each source file/main section in
every target id to ensure that ids remain unique in a single-file version.
This means that you probably want to specify an id explicitly for each
section like this:
````markdown
### Foo {#chap-001-foo}
````
Thus to produce a single-file PDF version you would say:
````sh
pandoc --lua-filter=var-urls.lua -M internal_urls -o book.pdf chap-*.md
````
A Makefile might look something like this:
````makefile
MD = $(wildcard markdown/*.md)
HTML = $(patsubst markdown/%.md,html/%.html,$(MD))
PDF = $(patsubst markdown/%.md,pdf/%.pdf,$(MD))
all: single.html single.pdf $(HTML) $(PDF)
.PHONY: all
single.html: $(MD)
pandoc --lua-filter=var-urls.lua -M internal_urls -w html -so $@ $^
single.pdf: $(MD)
pandoc --lua-filter=var-urls.lua -M internal_urls -w latex -so $@ $^
$(HTML): html/%.html: markdown/%.md
mkdir -p html
pandoc --lua-filter=var-urls.lua -w html -so $@ $<
$(PDF): pdf/%.pdf: markdown/%.md
mkdir -p pdf
pandoc --lua-filter=var-urls.lua -M external_url_ext='.pdf' -w latex -so $@ $<
````
where each file in the `markdown` directory looks something like this:
````markdown
## Chapter 1 {#chap-001}
Illum sapiente non rerum.
### Overview {#chap-001-overview}
See [chapter 3](+chap-003.XXX#chap-003).
````
The age-old trick for getting the globbed input files in the right order is
to include a zero-padded "serial number" in the name of each input file:
```
chap-001-introduction.md
chap-002-background.md
...
chap-010-something.md
chap-011-whatever.md
```
The point of the zero-padding is that each serial number has the same
number of characters and the "alphabetical" (actually ASCII-betical) sort
order will be correct. Another trick is to leave some space "between" the
chapters in the numbering so that you can insert new chapters in between
without renaming all the existing ones:
```
chap-0010-introduction.md
chap-0020-background.md
...
chap-0100-something.md
chap-0101-inserted-chapter.md
chap-0110-whatever.md
```
| This software is Copyright (c) 2019 by Benct Philip Jonsson.
|
| This is free software, licensed under:
|
| The MIT (X11) License
| See <http://www.opensource.org/licenses/mit-license.php>.
--]=============================================================================]
local stringify = pandoc.utils and pandoc.utils.stringify or require"pandoc.utils".stringify
assert(stringify, "Couldn't get the pandoc.utils.stringify function")
local config = {
internal_urls = false,
external_url_ext = '.html',
}
-- patterns and replacements for predefined substitutions
local url_subst_dispatch = {
var_url = {
-- A leading '+' identifies a "variable URL" as such.
-- To use another prefix just modify this pattern!
pat = '^%+',
-- we simply remove the leading character
repl = "",
},
internal_url = {
-- When we want internal URLs we just remove
-- everything before the fragment!
pat = '^[^#]+',
repl = "",
},
external_url = {
-- When we want external URLs we replace the
-- dummy extension '.XXX' if any with the
-- appropriate extension for the current output format
pat = '(%.XXX)',
-- This must be a function so that it returns the current
-- value of config.external_url_ext at the time when
-- the substitution is made!
repl = function (match) return config.external_url_ext or match end,
},
}
-- make a predefined substitution
local function url_subst (str, key)
str = tostring(str) -- just in case!
key = tostring(key) -- just in case!
local disp = assert(url_subst_dispatch[key], "No such dispatch: " .. key)
return str:gsub(disp.pat, disp.repl)
end
-- turn a meta value/tree into a "plain" value or "plain" data tree
local function meta2data (meta)
if 'table' == type(meta) then
-- find out if the table is an element object,
-- a meta list/map object or some other table
if nil == meta.t or 'MetaList' == meta.t or 'MetaMap' == meta.t then
-- probably container rather than element object, so clone it
-- (we have to assume that an object without a .t is the metadata
-- root, because there is no safe way to check for it!)
local data = {}
for k,v in pairs(meta) do
data[k] = meta2data(v) -- yes process recursively!
end
return data
elseif meta.t then
-- probably an element so stringify it
return stringify(meta)
else
-- something else so just return it
return meta
end
else
-- not a table so just return it
return meta
end
end
local function get_meta_config (meta)
for k in pairs(config) do
-- get meta value and turn it into a "plain" value
local v = meta2data(meta[k])
if nil ~= v then -- if defined use it
-- TODO: type check and/or otherwise validate v
config[k] = v
end
end
return nil
end
local function fix_var_url (link)
local url = url_subst(link.target, 'var_url')
if url == link.target then
return nil
elseif config.internal_urls then
link.target = url_subst(url, 'internal_url')
else
link.target = url_subst(url, 'external_url')
end
return link
end
return {
{ Meta = get_meta_config },
{ Link = fix_var_url },
}
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Multiple HTML files (split at H1 or H2 level) from one source?
[not found] ` <5749e9e9-205c-4b67-9dd1-f5e37d4d891f-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-04-17 4:39 ` John MacFarlane
2019-04-18 15:54 ` BP Jonsson
@ 2019-04-19 20:05 ` John Gabriele
[not found] ` <3bcc67bf-59a4-49ad-9a0a-ec5e6a39f7b3-jFIJ+Wc5/Vo7lZ9V/NTDHw@public.gmane.org>
2 siblings, 1 reply; 13+ messages in thread
From: John Gabriele @ 2019-04-19 20:05 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
[-- Attachment #1: Type: text/plain, Size: 2979 bytes --]
Hi Martin,
Do you prefer working with one long file instead of multiple smaller ones?
If you're willing to break the one big file into chapter-sized files, I wrote a little program that uses Pandoc to process them into ordered linked-up html output: http://www.unexpected-vortices.com/sw/rippledoc/index.html , including prev/next links.
-- John
On Tue, Apr 16, 2019, at 4:44 PM, Martin Post wrote:
> I think a generic chapters features would be great. I am writing and translating product manuals. Being able to convert the same source file to a printable PDF and web-friendly short chapters (or knowledge base articles) would make Pandoc the perfect single source publishing solution for the rest of us.
>
> (And while I’m dreaming: a separate TOC file and “Previous” / “Next” links at the end of each chapter would complete the picture. :)
>
> Thank you for considering this.
>
>
> On Monday, April 15, 2019 at 5:58:42 PM UTC+2, John MacFarlane wrote:
>> Alexander Krotov <ila...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>>
>> > On 15/04/2019 18:44, John MacFarlane wrote:
>> >>
>> >> This is not an infrequent request. Maybe we should
>> >> consider adding an "html chapters" output mode, which
>> >> produces a zip file.
>> >
>> > I don't think it needs to be implemented in HTML writer.
>> >
>> > Someone may want to split the document into multiple markdown, RST,
>> > ODT... documents.
>>
>> Good point. Perhaps we could have some generic
>> feature like "chapters" that applies this
>> transformation and generates a zip, including
>> any media, for any output format...
>
> --
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5749e9e9-205c-4b67-9dd1-f5e37d4d891f%40googlegroups.com <https://groups.google.com/d/msgid/pandoc-discuss/5749e9e9-205c-4b67-9dd1-f5e37d4d891f%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/3bcc67bf-59a4-49ad-9a0a-ec5e6a39f7b3%40www.fastmail.com.
For more options, visit https://groups.google.com/d/optout.
[-- Attachment #2: Type: text/html, Size: 5582 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Multiple HTML files (split at H1 or H2 level) from one source?
[not found] ` <a5eac512-0951-2b0c-c2fc-a9239bc3c553-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2019-04-20 9:18 ` Martin Post
0 siblings, 0 replies; 13+ messages in thread
From: Martin Post @ 2019-04-20 9:18 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 1434 bytes --]
Hello BP Jonsson,
thank you so much for taking the time to outline your approach and
providing both your Lua filter and a ready-to-use Makefile. And yes, I have
not only considered the chapters to long document approach, I am actually
doing this – both by concatenating chapters in Pandoc and using file
transclusion in a custom LML I am using. Both approaches (slice long docs
vs. building them from small chapters) have obvious advantages. This is one
of several problem I am trying to solve right now with regards to writing
technical documentation, but I have saved your comments and the filter to
study them over the next days.
Cheers,
Martin
On Thursday, April 18, 2019 at 5:54:45 PM UTC+2, BP Jonsson wrote:
Have you considered inverting the algorithm and have each chapter
> in a separate source file?
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/0cbc2a1f-4ec5-4634-8709-f2f88e9d1e3b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
[-- Attachment #1.2: Type: text/html, Size: 2082 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Multiple HTML files (split at H1 or H2 level) from one source?
[not found] ` <3bcc67bf-59a4-49ad-9a0a-ec5e6a39f7b3-jFIJ+Wc5/Vo7lZ9V/NTDHw@public.gmane.org>
@ 2019-04-20 9:46 ` Martin Post
0 siblings, 0 replies; 13+ messages in thread
From: Martin Post @ 2019-04-20 9:46 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 1732 bytes --]
On Friday, April 19, 2019 at 10:06:27 PM UTC+2, jgabriele wrote:
>
> If you're willing to break the one big file into chapter-sized files, I
> wrote a little program that uses Pandoc to process them into ordered
> linked-up html output:
> http://www.unexpected-vortices.com/sw/rippledoc/index.html , including
> prev/next links.
>
>
Hello John,
This thread just keeps on giving… :)
I just realise I came across Rippledoc a while ago, but obviously had
forgotten about it when researching this topic (or the “no OS other than
GNU/Linux” line had scared off my inner macOS fanboy).
I downloaded it, and it worked out of the box. Beautiful results (and
obviously easy to customize once it has created CSS and TOC files).
So while it doesn’t help with slicing long documents, it does provide
navigation and will allow me to use folders to structure more complex
projects effectively.
I also like that it is (as you say on your site) “talkative” (most CLI
tools are a bit too tight-lipped for my taste), and your own documentation
is excellent, too.
So - another big thank you.
Cheers,
Martin
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/3666c747-2b9b-4891-a992-1d6b6755d161%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
[-- Attachment #1.2: Type: text/html, Size: 2689 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2019-04-20 9:46 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-13 22:24 Multiple HTML files (split at H1 or H2 level) from one source? Martin Post
[not found] ` <caeb876c-6393-45d6-92b5-e7f3bee17f49-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-04-14 9:59 ` Albert Krewinkel
[not found] ` <87o958c3fm.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>
2019-04-14 19:54 ` Martin Post
2019-04-15 15:44 ` John MacFarlane
[not found] ` <m21s232ryh.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2019-04-15 15:55 ` Alexander Krotov
[not found] ` <319987c6-f761-841f-3c98-47f56b0993cb-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2019-04-15 15:58 ` John MacFarlane
[not found] ` <m2pnpn1crg.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2019-04-16 20:44 ` Martin Post
[not found] ` <5749e9e9-205c-4b67-9dd1-f5e37d4d891f-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2019-04-17 4:39 ` John MacFarlane
[not found] ` <m28sw91bzj.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>
2019-04-17 8:28 ` Martin Post
2019-04-18 15:54 ` BP Jonsson
[not found] ` <a5eac512-0951-2b0c-c2fc-a9239bc3c553-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2019-04-20 9:18 ` Martin Post
2019-04-19 20:05 ` John Gabriele
[not found] ` <3bcc67bf-59a4-49ad-9a0a-ec5e6a39f7b3-jFIJ+Wc5/Vo7lZ9V/NTDHw@public.gmane.org>
2019-04-20 9:46 ` Martin Post
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).