* Multiple HTML files (split at H1 or H2 level) from one source? @ 2019-04-13 22:24 Martin Post [not found] ` <caeb876c-6393-45d6-92b5-e7f3bee17f49-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 0 siblings, 1 reply; 13+ messages in thread From: Martin Post @ 2019-04-13 22:24 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 1194 bytes --] Hello. I’d like to write a long-form, structured document using (Pandoc’s) Markdown and then create chapter-sized HTML files from it, with the file names derived from h1 or h2 heading IDs. I understand that Pandoc will always emit _one__ target document, so I guess I’d either need to split my Markdown master file first or the target HTML file using some third-party tool. And there’s the bonus problem of breaking in-document links: [link](#anchor_2) in chapter 1 would have to become [link](chapter_2.htm#anchor_2). I’d be grateful for suggestions on how to do this. Thanks. -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/caeb876c-6393-45d6-92b5-e7f3bee17f49%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #1.2: Type: text/html, Size: 1717 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <caeb876c-6393-45d6-92b5-e7f3bee17f49-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* Re: Multiple HTML files (split at H1 or H2 level) from one source? [not found] ` <caeb876c-6393-45d6-92b5-e7f3bee17f49-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2019-04-14 9:59 ` Albert Krewinkel [not found] ` <87o958c3fm.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> 2019-04-15 15:44 ` John MacFarlane 1 sibling, 1 reply; 13+ messages in thread From: Albert Krewinkel @ 2019-04-14 9:59 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw Martin Post writes: > I’d like to write a long-form, structured document using (Pandoc’s) > Markdown and then create chapter-sized HTML files from it, with the file > names derived from h1 or h2 heading IDs. > > I understand that Pandoc will always emit _one__ target document, so I > guess I’d either need to split my Markdown master file first or the target > HTML file using some third-party tool. I cannot think of a clean solution, so here is a slightly hacky one, based on Lua filters. It uses `pandoc.utils.hierarchicalize` to create new subdocuments, which are then each sent to separate json files. local utils = require 'pandoc.utils' function flatten (elements) local result = List:new {} for i, el in ipairs(elements) do if el.t == 'Sec' then local header = pandoc.Header(el.level, el.label, el.attr) table.insert(result, header) result:extend(flatten(el.contents)) else table.insert(result, el) end end return result end function Pandoc (doc) local elements = utils.hierarchicalize(doc.blocks) for i, sec in ipairs(elements) do -- create new metadata. Copy from `doc.meta` as necessary. local new_meta = { title = sec.label } local new_doc = pandoc.Pandoc(flatten(sec.contents), new_meta) local filename = sec.attr.identifier .. '.json' utils.run_json_filter(new_doc, 'tee', {filename}) end return pandoc.Pandoc {} end This will give you all sections as `.json` files, which you can then process further with pandoc in the ways you see fit. > And there’s the bonus problem of breaking in-document links: > [link](#anchor_2) in chapter 1 would have to become > [link](chapter_2.htm#anchor_2). That's more difficult, but should still be possible using a filter. HTH -- Albert Krewinkel GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124 -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/87o958c3fm.fsf%40zeitkraut.de. For more options, visit https://groups.google.com/d/optout. ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <87o958c3fm.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org>]
* Re: Multiple HTML files (split at H1 or H2 level) from one source? [not found] ` <87o958c3fm.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> @ 2019-04-14 19:54 ` Martin Post 0 siblings, 0 replies; 13+ messages in thread From: Martin Post @ 2019-04-14 19:54 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 2982 bytes --] Thanks a lot for looking into this, Albert. I have never written (or used, for that matter) a Lua filter, but I will give this a try. On Sunday, April 14, 2019 at 12:00:04 PM UTC+2, Albert Krewinkel wrote: > > Martin Post writes: > > > I’d like to write a long-form, structured document using (Pandoc’s) > > Markdown and then create chapter-sized HTML files from it, with the file > > names derived from h1 or h2 heading IDs. > > > > I understand that Pandoc will always emit _one__ target document, so I > > guess I’d either need to split my Markdown master file first or the > target > > HTML file using some third-party tool. > > I cannot think of a clean solution, so here is a slightly hacky > one, based on Lua filters. It uses `pandoc.utils.hierarchicalize` > to create new subdocuments, which are then each sent to separate > json files. > > local utils = require 'pandoc.utils' > > function flatten (elements) > local result = List:new {} > for i, el in ipairs(elements) do > if el.t == 'Sec' then > local header = pandoc.Header(el.level, el.label, el.attr) > table.insert(result, header) > result:extend(flatten(el.contents)) > else > table.insert(result, el) > end > end > return result > end > > function Pandoc (doc) > local elements = utils.hierarchicalize(doc.blocks) > for i, sec in ipairs(elements) do > -- create new metadata. Copy from `doc.meta` as necessary. > local new_meta = { > title = sec.label > } > local new_doc = pandoc.Pandoc(flatten(sec.contents), new_meta) > local filename = sec.attr.identifier .. '.json' > utils.run_json_filter(new_doc, 'tee', {filename}) > end > return pandoc.Pandoc {} > end > > This will give you all sections as `.json` files, which you can > then process further with pandoc in the ways you see fit. > > > And there’s the bonus problem of breaking in-document links: > > [link](#anchor_2) in chapter 1 would have to become > > [link](chapter_2.htm#anchor_2). > > That's more difficult, but should still be possible using a filter. > > HTH > > -- > Albert Krewinkel > GPG: 8eed e3e2 e8c5 6f18 81fe e836 388d c0b2 1f63 1124 > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/f431ea2d-f1d0-4d9a-a089-44df2a883846%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #1.2: Type: text/html, Size: 3783 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Multiple HTML files (split at H1 or H2 level) from one source? [not found] ` <caeb876c-6393-45d6-92b5-e7f3bee17f49-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2019-04-14 9:59 ` Albert Krewinkel @ 2019-04-15 15:44 ` John MacFarlane [not found] ` <m21s232ryh.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org> 1 sibling, 1 reply; 13+ messages in thread From: John MacFarlane @ 2019-04-15 15:44 UTC (permalink / raw) To: Martin Post, pandoc-discuss This is not an infrequent request. Maybe we should consider adding an "html chapters" output mode, which produces a zip file. After all, we already have code for the EPUB writer that splits content into chapters and fixes all the internal references. This could be factored out. Martin Post <martinpostberlin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: > Hello. > > I’d like to write a long-form, structured document using (Pandoc’s) > Markdown and then create chapter-sized HTML files from it, with the file > names derived from h1 or h2 heading IDs. > > I understand that Pandoc will always emit _one__ target document, so I > guess I’d either need to split my Markdown master file first or the target > HTML file using some third-party tool. > > And there’s the bonus problem of breaking in-document links: > [link](#anchor_2) in chapter 1 would have to become > [link](chapter_2.htm#anchor_2). > > I’d be grateful for suggestions on how to do this. Thanks. > > -- > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/caeb876c-6393-45d6-92b5-e7f3bee17f49%40googlegroups.com. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m21s232ryh.fsf%40johnmacfarlane.net. For more options, visit https://groups.google.com/d/optout. ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <m21s232ryh.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>]
* Re: Multiple HTML files (split at H1 or H2 level) from one source? [not found] ` <m21s232ryh.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org> @ 2019-04-15 15:55 ` Alexander Krotov [not found] ` <319987c6-f761-841f-3c98-47f56b0993cb-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 13+ messages in thread From: Alexander Krotov @ 2019-04-15 15:55 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw On 15/04/2019 18:44, John MacFarlane wrote: > > This is not an infrequent request. Maybe we should > consider adding an "html chapters" output mode, which > produces a zip file. I don't think it needs to be implemented in HTML writer. Someone may want to split the document into multiple markdown, RST, ODT... documents. ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <319987c6-f761-841f-3c98-47f56b0993cb-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: Multiple HTML files (split at H1 or H2 level) from one source? [not found] ` <319987c6-f761-841f-3c98-47f56b0993cb-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2019-04-15 15:58 ` John MacFarlane [not found] ` <m2pnpn1crg.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org> 0 siblings, 1 reply; 13+ messages in thread From: John MacFarlane @ 2019-04-15 15:58 UTC (permalink / raw) To: Alexander Krotov, pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw Alexander Krotov <ilabdsf-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: > On 15/04/2019 18:44, John MacFarlane wrote: >> >> This is not an infrequent request. Maybe we should >> consider adding an "html chapters" output mode, which >> produces a zip file. > > I don't think it needs to be implemented in HTML writer. > > Someone may want to split the document into multiple markdown, RST, > ODT... documents. Good point. Perhaps we could have some generic feature like "chapters" that applies this transformation and generates a zip, including any media, for any output format... ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <m2pnpn1crg.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>]
* Re: Multiple HTML files (split at H1 or H2 level) from one source? [not found] ` <m2pnpn1crg.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org> @ 2019-04-16 20:44 ` Martin Post [not found] ` <5749e9e9-205c-4b67-9dd1-f5e37d4d891f-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 0 siblings, 1 reply; 13+ messages in thread From: Martin Post @ 2019-04-16 20:44 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 1821 bytes --] I think a generic chapters features would be great. I am writing and translating product manuals. Being able to convert the same source file to a printable PDF and web-friendly short chapters (or knowledge base articles) would make Pandoc the perfect single source publishing solution for the rest of us. (And while I’m dreaming: a separate TOC file and “Previous” / “Next” links at the end of each chapter would complete the picture. :) Thank you for considering this. On Monday, April 15, 2019 at 5:58:42 PM UTC+2, John MacFarlane wrote: > > Alexander Krotov <ila...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <javascript:>> writes: > > > On 15/04/2019 18:44, John MacFarlane wrote: > >> > >> This is not an infrequent request. Maybe we should > >> consider adding an "html chapters" output mode, which > >> produces a zip file. > > > > I don't think it needs to be implemented in HTML writer. > > > > Someone may want to split the document into multiple markdown, RST, > > ODT... documents. > > Good point. Perhaps we could have some generic > feature like "chapters" that applies this > transformation and generates a zip, including > any media, for any output format... > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5749e9e9-205c-4b67-9dd1-f5e37d4d891f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #1.2: Type: text/html, Size: 2670 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <5749e9e9-205c-4b67-9dd1-f5e37d4d891f-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>]
* Re: Multiple HTML files (split at H1 or H2 level) from one source? [not found] ` <5749e9e9-205c-4b67-9dd1-f5e37d4d891f-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> @ 2019-04-17 4:39 ` John MacFarlane [not found] ` <m28sw91bzj.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org> 2019-04-18 15:54 ` BP Jonsson 2019-04-19 20:05 ` John Gabriele 2 siblings, 1 reply; 13+ messages in thread From: John MacFarlane @ 2019-04-17 4:39 UTC (permalink / raw) To: Martin Post, pandoc-discuss Martin Post <martinpostberlin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: > I think a generic chapters features would be great. I am writing and > translating product manuals. Being able to convert the same source file to > a printable PDF and web-friendly short chapters (or knowledge base > articles) would make Pandoc the perfect single source publishing solution > for the rest of us. > > (And while I’m dreaming: a separate TOC file and “Previous” / “Next” links > at the end of each chapter would complete the picture. :) Have you considered using pandoc to produce docbook or texinfo, then using docbook or texinfo tools to produce chaptered HTML? Here's an example from pandoc's online demos, going pandoc -> texinfo -> chunked HTML: http://pandoc.org/demo/example19/ Commands: pandoc MANUAL.txt -s -o example19.texi makeinfo --no-validate --force example19.texi --html -o example19 -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m28sw91bzj.fsf%40johnmacfarlane.net. For more options, visit https://groups.google.com/d/optout. ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <m28sw91bzj.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org>]
* Re: Multiple HTML files (split at H1 or H2 level) from one source? [not found] ` <m28sw91bzj.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org> @ 2019-04-17 8:28 ` Martin Post 0 siblings, 0 replies; 13+ messages in thread From: Martin Post @ 2019-04-17 8:28 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 2319 bytes --] Hello John, while I have looked into DocBook and admire its feature set (conditional content, semantic markup etc.), I’m looking for something more simple that is easy to replicate – preferably a single-tool, cross-platform solution. I am working with other technical writers and translators who are unhappy about everything that doesn’t look & feel like MS Word or Madcap Flare. I might just get them to install Pandoc, but surely not XSL processors etc. Markdown > HTML > PDF (via Prince) covers a lot of bases with an acceptable learning curve and investment. But I realise this is a fairly specific use case. On Wednesday, April 17, 2019 at 6:40:00 AM UTC+2, John MacFarlane wrote: > > Martin Post <martinpo...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org <javascript:>> writes: > > > I think a generic chapters features would be great. I am writing and > > translating product manuals. Being able to convert the same source file > to > > a printable PDF and web-friendly short chapters (or knowledge base > > articles) would make Pandoc the perfect single source publishing > solution > > for the rest of us. > > > > (And while I’m dreaming: a separate TOC file and “Previous” / “Next” > links > > at the end of each chapter would complete the picture. :) > > Have you considered using pandoc to produce docbook or > texinfo, then using docbook or texinfo tools to > produce chaptered HTML? > > Here's an example from pandoc's online demos, going > pandoc -> texinfo -> chunked HTML: > > http://pandoc.org/demo/example19/ > > Commands: > > pandoc MANUAL.txt -s -o example19.texi > makeinfo --no-validate --force example19.texi --html -o example19 > -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/680ed6d7-2292-48da-9042-ac2f0ab80503%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #1.2: Type: text/html, Size: 3641 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Multiple HTML files (split at H1 or H2 level) from one source? [not found] ` <5749e9e9-205c-4b67-9dd1-f5e37d4d891f-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2019-04-17 4:39 ` John MacFarlane @ 2019-04-18 15:54 ` BP Jonsson [not found] ` <a5eac512-0951-2b0c-c2fc-a9239bc3c553-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2019-04-19 20:05 ` John Gabriele 2 siblings, 1 reply; 13+ messages in thread From: BP Jonsson @ 2019-04-18 15:54 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw, Martin Post [-- Attachment #1: Type: text/plain, Size: 8929 bytes --] Den 2019-04-16 kl. 22:44, skrev Martin Post: > I think a generic chapters features would be great. I am writing and > translating product manuals. Being able to convert the same source file to > a printable PDF and web-friendly short chapters (or knowledge base > articles) would make Pandoc the perfect single source publishing solution > for the rest of us. Have you considered inverting the algorithm and have each chapter in a separate source file? Since Pandoc will concatenate multiple input files given on the command line you can just do something like this when generating a single-file version: ````sh pandoc -o book.pdf chap-*.md ```` The age-old trick for getting the globbed input files in the right order is to include a zero-padded "serial number" in the name of each input file: ``` chap-001-introduction.md chap-002-background.md ... chap-010-something.md chap-011-whatever.md ``` The point of the zero-padding is that each serial number has the same number of characters and the "alphabetical" (actually ASCII-betical) sort order will be correct. Another trick is to leave some space "between" the chapters in the numbering so that you can insert new chapters in between without renaming all the existing ones: ``` chap-0010-introduction.md chap-0020-background.md ... chap-0100-something.md chap-0101-inserted-chapter.md chap-0110-whatever.md ``` Another important thing to address whether you concatenate several files to get a single-file version of a text or split a single file to get a multi-file version is that hyperlinks between the chapters, if there are any, must be converted so that they are internal links in a single-file version and external links to the appropriate section in the appropriate file in a multi-file version. My solution for these "variable URLs" is to define the links with the appropriate external URL, with a prefix to identify it as a variable URL, and let a filter remove everything before the fragment when generating a single-file version, but just remove the prefix when generating a multi-file version. Additionally the filter supports using a dummy file extension '.XXX' in variable URLS, which gets replaced with the appropriate extension for the file type you are generating for a multi-file version. A Lua version of that filter is attached. You tell the filter which kind of URLs you want by setting the metadata field `internal_urls` to a true value (i.e. anything except `false` or null/`nil`) when you want internal links. The default value is `false`, i.e. output external URLs. You tell the filter what to replace the dummy file extension `.XXX` with by stting the value of the metadata field `external_url_ext` to the wanted extension --- a string including the leading dot. The default value is `.html`. The prefix for variable URLs is `+`, so that a link with a variable URL should look something like this: ````markdown [link text](+path/to/chap-001.XXX#chap-001-foo) ```` Note that the fragment (the "id" part from `#` onwards) is required so that links work as they should in a single-file version. You will probably also want to include a part unique for each source file/main section in every target id to ensure that ids remain unique in a single-file version. This means that you probably want to specify an id explicitly for each section like this: ````markdown ### Foo {#chap-001-foo} ```` Thus to produce a single-file PDF version you would say: ````sh pandoc --lua-filter=var-urls.lua -M internal_urls -o book.pdf chap-*.md ```` A Makefile might look something like this: ````makefile MD = $(wildcard markdown/*.md) HTML = $(patsubst markdown/%.md,html/%.html,$(MD)) PDF = $(patsubst markdown/%.md,pdf/%.pdf,$(MD)) all: single.html single.pdf $(HTML) $(PDF) .PHONY: all single.html: $(MD) pandoc --lua-filter=var-urls.lua -M internal_urls -w html -so $@ $^ single.pdf: $(MD) pandoc --lua-filter=var-urls.lua -M internal_urls -w latex -so $@ $^ $(HTML): html/%.html: markdown/%.md mkdir -p html pandoc --lua-filter=var-urls.lua -w html -so $@ $< $(PDF): pdf/%.pdf: markdown/%.md mkdir -p pdf pandoc --lua-filter=var-urls.lua -M external_url_ext='.pdf' -w latex -so $@ $< ```` where each file in the `markdown` directory looks something like this: ````markdown ## Chapter 1 {#chap-001} Illum sapiente non rerum. ### Overview {#chap-001-overview} See [chapter 3](+chap-003.XXX#chap-003). ```` > > (And while I’m dreaming: a separate TOC file and “Previous” / “Next” links > at the end of each chapter would complete the picture. :) Both may be handled with custom templates (and a filter in the case of the TOC). If you create a file `toc-template.md` which contains just the single line ````md $table-of-contents$ ```` You will get a Markdown file which contains just the TOC (in Markdown format) with ````sh pandoc --toc --template=./toc-template.md -w markdown -so toc.md source.md ```` This will give you a file which looks something like this: ````markdown - [Chapter 1](#chap-001) - [Abstract](#chap-001-abstract) - [Chapter 2](#chap-002) - [Abstract](#chap-002-abstract) - [Chapter 3](#chap-003) - [Abstract](#chap-003-abstract) ```` Building on my multi-source model from above you would create the TOC markdown with something like ````sh pandoc --toc --template=./toc-template.md -w markdown -so toc.md markdown/*.md ```` which you then can edit as you wish (or put the relevant content in the template --- just make sure to double any dollar signs which are part of the text!), and then convert to HTML. However since the TOC Markdown is created in single-file mode so to speak you will need to use a filter to correct them so that they point to different files. Assuming all heading ids are of the form `#chap-NUM-OPTIONAL-PART` where `chap-NUM` corresponds to the base of the name of the file which will contain the chapter, i.e. the chapter files have names like `chap-001.html` you will get a working separate `toc.html` by running this command: ````sh pandoc --lua-filter=multi-toc-links.lua -w html -so html/toc.html toc.md ```` where `multi-toc-links.lua` contains just ````lua function Link (link) link.target = link.target:gsub('^%#(chap%-%d+)','%1.html%0') return link end ```` what this filter does is that if the URL of a link starts with `#chap-NUM` it will be changed to start with `chap-NUM.html#chap-NUM`. This filter does not allow you to specify the file extension on the commandline, but could easily be modified to do so. Since the replacement (the second argument to string.gsub()) essentially is a template --- `%1` is replaced with the `chap-NUM` part of the matched text, and `%0` is replaced with the entire matched text, i.e. `#chap-NUM` ---, you could even specify that template via metadata. Getting the Previous and Next links is easy if you have each chapter in a separate Markdown file and use a custom HTML template. To create the custom template you first need to get a copy of the standard template: ````sh pandoc -D html >chapter-template.html ```` Now edit `chapter-template.html` and above the `</body>` line insert something along the lines of ````html <table id="prev-next-nav"> <tbody> <tr class="odd"> <td> $if(prev-chapter)$ <a href="chap-$prev-chapter$.html">Previous</a></td> $endif$ <td><a href="toc.html">ToC</a></td> $if(next-chapter)$ <a href="chap-$next-chapter$.html">Next</a></td> $endif$ </tr> </tbody> </table> ```` Now to get this to work you just need to add one or two lines to the metadata of each file: ````yaml prev-chapter: 001 next-chapter: 003 ```` The values of those two fields can of course be full filenames as well; just change the `href` attribute value in the custom template Previous and Next links. In the first chapter you just omit the `prev-chapter` field from the metadata and in the last chapter you omit the `next-chapter` field and the corresponding link won't exist. To style the whole thing just add some CSS for `table#prev-next-nav`. -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/a5eac512-0951-2b0c-c2fc-a9239bc3c553%40gmail.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #2: var-urls.lua --] [-- Type: text/x-lua, Size: 7350 bytes --] --[=============================================================================[ var-urls.lua - handle URLs when (not) concatenating multiple sources with Pandoc An important thing to address when you concatenate several files to get a single-file version of a text is that hyperlinks between the chapters, if there are any, must be converted so that they are internal links in a single- file version and external links to the appropriate section in the appropriate file in a multi-file version. My solution for these "variable URLs" is to define the links with the appropriate external URL, with a prefix to identify it as a variable URL, and let a filter remove everything before the fragment when generating a single-file version, but just remove the prefix when generating a multi-file version. Additionally the filter supports using a dummy file extension '.XXX' in variable URLS, which gets replaced with the appropriate extension for the file type you are generating for a multi-file version. A Lua version of that filter is attached. You tell the filter which kind of URLs you want by setting the metadata field `internal_urls` to a true value (i.e. anything except `false` or null/`nil`) when you want internal links. The default value is `false`, i.e. output external URLs. You tell the filter what to replace the dummy file extension `.XXX` with by stting the value of the metadata field `external_url_ext` to the wanted extension --- a string including the leading dot. The default value is `.html`. The prefix for variable URLs is `+`, so that a link with a variable URL should look something like this: ````markdown [link text](+path/to/chap-001.XXX#chap-001-foo) ```` Note that the fragment (the "id" part from `#` onwards) is required so that links work as they should in a single-file version. You will probably also want to include a part unique for each source file/main section in every target id to ensure that ids remain unique in a single-file version. This means that you probably want to specify an id explicitly for each section like this: ````markdown ### Foo {#chap-001-foo} ```` Thus to produce a single-file PDF version you would say: ````sh pandoc --lua-filter=var-urls.lua -M internal_urls -o book.pdf chap-*.md ```` A Makefile might look something like this: ````makefile MD = $(wildcard markdown/*.md) HTML = $(patsubst markdown/%.md,html/%.html,$(MD)) PDF = $(patsubst markdown/%.md,pdf/%.pdf,$(MD)) all: single.html single.pdf $(HTML) $(PDF) .PHONY: all single.html: $(MD) pandoc --lua-filter=var-urls.lua -M internal_urls -w html -so $@ $^ single.pdf: $(MD) pandoc --lua-filter=var-urls.lua -M internal_urls -w latex -so $@ $^ $(HTML): html/%.html: markdown/%.md mkdir -p html pandoc --lua-filter=var-urls.lua -w html -so $@ $< $(PDF): pdf/%.pdf: markdown/%.md mkdir -p pdf pandoc --lua-filter=var-urls.lua -M external_url_ext='.pdf' -w latex -so $@ $< ```` where each file in the `markdown` directory looks something like this: ````markdown ## Chapter 1 {#chap-001} Illum sapiente non rerum. ### Overview {#chap-001-overview} See [chapter 3](+chap-003.XXX#chap-003). ```` The age-old trick for getting the globbed input files in the right order is to include a zero-padded "serial number" in the name of each input file: ``` chap-001-introduction.md chap-002-background.md ... chap-010-something.md chap-011-whatever.md ``` The point of the zero-padding is that each serial number has the same number of characters and the "alphabetical" (actually ASCII-betical) sort order will be correct. Another trick is to leave some space "between" the chapters in the numbering so that you can insert new chapters in between without renaming all the existing ones: ``` chap-0010-introduction.md chap-0020-background.md ... chap-0100-something.md chap-0101-inserted-chapter.md chap-0110-whatever.md ``` | This software is Copyright (c) 2019 by Benct Philip Jonsson. | | This is free software, licensed under: | | The MIT (X11) License | See <http://www.opensource.org/licenses/mit-license.php>. --]=============================================================================] local stringify = pandoc.utils and pandoc.utils.stringify or require"pandoc.utils".stringify assert(stringify, "Couldn't get the pandoc.utils.stringify function") local config = { internal_urls = false, external_url_ext = '.html', } -- patterns and replacements for predefined substitutions local url_subst_dispatch = { var_url = { -- A leading '+' identifies a "variable URL" as such. -- To use another prefix just modify this pattern! pat = '^%+', -- we simply remove the leading character repl = "", }, internal_url = { -- When we want internal URLs we just remove -- everything before the fragment! pat = '^[^#]+', repl = "", }, external_url = { -- When we want external URLs we replace the -- dummy extension '.XXX' if any with the -- appropriate extension for the current output format pat = '(%.XXX)', -- This must be a function so that it returns the current -- value of config.external_url_ext at the time when -- the substitution is made! repl = function (match) return config.external_url_ext or match end, }, } -- make a predefined substitution local function url_subst (str, key) str = tostring(str) -- just in case! key = tostring(key) -- just in case! local disp = assert(url_subst_dispatch[key], "No such dispatch: " .. key) return str:gsub(disp.pat, disp.repl) end -- turn a meta value/tree into a "plain" value or "plain" data tree local function meta2data (meta) if 'table' == type(meta) then -- find out if the table is an element object, -- a meta list/map object or some other table if nil == meta.t or 'MetaList' == meta.t or 'MetaMap' == meta.t then -- probably container rather than element object, so clone it -- (we have to assume that an object without a .t is the metadata -- root, because there is no safe way to check for it!) local data = {} for k,v in pairs(meta) do data[k] = meta2data(v) -- yes process recursively! end return data elseif meta.t then -- probably an element so stringify it return stringify(meta) else -- something else so just return it return meta end else -- not a table so just return it return meta end end local function get_meta_config (meta) for k in pairs(config) do -- get meta value and turn it into a "plain" value local v = meta2data(meta[k]) if nil ~= v then -- if defined use it -- TODO: type check and/or otherwise validate v config[k] = v end end return nil end local function fix_var_url (link) local url = url_subst(link.target, 'var_url') if url == link.target then return nil elseif config.internal_urls then link.target = url_subst(url, 'internal_url') else link.target = url_subst(url, 'external_url') end return link end return { { Meta = get_meta_config }, { Link = fix_var_url }, } ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <a5eac512-0951-2b0c-c2fc-a9239bc3c553-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: Multiple HTML files (split at H1 or H2 level) from one source? [not found] ` <a5eac512-0951-2b0c-c2fc-a9239bc3c553-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2019-04-20 9:18 ` Martin Post 0 siblings, 0 replies; 13+ messages in thread From: Martin Post @ 2019-04-20 9:18 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 1434 bytes --] Hello BP Jonsson, thank you so much for taking the time to outline your approach and providing both your Lua filter and a ready-to-use Makefile. And yes, I have not only considered the chapters to long document approach, I am actually doing this – both by concatenating chapters in Pandoc and using file transclusion in a custom LML I am using. Both approaches (slice long docs vs. building them from small chapters) have obvious advantages. This is one of several problem I am trying to solve right now with regards to writing technical documentation, but I have saved your comments and the filter to study them over the next days. Cheers, Martin On Thursday, April 18, 2019 at 5:54:45 PM UTC+2, BP Jonsson wrote: Have you considered inverting the algorithm and have each chapter > in a separate source file? -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/0cbc2a1f-4ec5-4634-8709-f2f88e9d1e3b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #1.2: Type: text/html, Size: 2082 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Multiple HTML files (split at H1 or H2 level) from one source? [not found] ` <5749e9e9-205c-4b67-9dd1-f5e37d4d891f-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2019-04-17 4:39 ` John MacFarlane 2019-04-18 15:54 ` BP Jonsson @ 2019-04-19 20:05 ` John Gabriele [not found] ` <3bcc67bf-59a4-49ad-9a0a-ec5e6a39f7b3-jFIJ+Wc5/Vo7lZ9V/NTDHw@public.gmane.org> 2 siblings, 1 reply; 13+ messages in thread From: John Gabriele @ 2019-04-19 20:05 UTC (permalink / raw) To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw [-- Attachment #1: Type: text/plain, Size: 2979 bytes --] Hi Martin, Do you prefer working with one long file instead of multiple smaller ones? If you're willing to break the one big file into chapter-sized files, I wrote a little program that uses Pandoc to process them into ordered linked-up html output: http://www.unexpected-vortices.com/sw/rippledoc/index.html , including prev/next links. -- John On Tue, Apr 16, 2019, at 4:44 PM, Martin Post wrote: > I think a generic chapters features would be great. I am writing and translating product manuals. Being able to convert the same source file to a printable PDF and web-friendly short chapters (or knowledge base articles) would make Pandoc the perfect single source publishing solution for the rest of us. > > (And while I’m dreaming: a separate TOC file and “Previous” / “Next” links at the end of each chapter would complete the picture. :) > > Thank you for considering this. > > > On Monday, April 15, 2019 at 5:58:42 PM UTC+2, John MacFarlane wrote: >> Alexander Krotov <ila...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes: >> >> > On 15/04/2019 18:44, John MacFarlane wrote: >> >> >> >> This is not an infrequent request. Maybe we should >> >> consider adding an "html chapters" output mode, which >> >> produces a zip file. >> > >> > I don't think it needs to be implemented in HTML writer. >> > >> > Someone may want to split the document into multiple markdown, RST, >> > ODT... documents. >> >> Good point. Perhaps we could have some generic >> feature like "chapters" that applies this >> transformation and generates a zip, including >> any media, for any output format... > > -- > You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5749e9e9-205c-4b67-9dd1-f5e37d4d891f%40googlegroups.com <https://groups.google.com/d/msgid/pandoc-discuss/5749e9e9-205c-4b67-9dd1-f5e37d4d891f%40googlegroups.com?utm_medium=email&utm_source=footer>. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/3bcc67bf-59a4-49ad-9a0a-ec5e6a39f7b3%40www.fastmail.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #2: Type: text/html, Size: 5582 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <3bcc67bf-59a4-49ad-9a0a-ec5e6a39f7b3-jFIJ+Wc5/Vo7lZ9V/NTDHw@public.gmane.org>]
* Re: Multiple HTML files (split at H1 or H2 level) from one source? [not found] ` <3bcc67bf-59a4-49ad-9a0a-ec5e6a39f7b3-jFIJ+Wc5/Vo7lZ9V/NTDHw@public.gmane.org> @ 2019-04-20 9:46 ` Martin Post 0 siblings, 0 replies; 13+ messages in thread From: Martin Post @ 2019-04-20 9:46 UTC (permalink / raw) To: pandoc-discuss [-- Attachment #1.1: Type: text/plain, Size: 1732 bytes --] On Friday, April 19, 2019 at 10:06:27 PM UTC+2, jgabriele wrote: > > If you're willing to break the one big file into chapter-sized files, I > wrote a little program that uses Pandoc to process them into ordered > linked-up html output: > http://www.unexpected-vortices.com/sw/rippledoc/index.html , including > prev/next links. > > Hello John, This thread just keeps on giving… :) I just realise I came across Rippledoc a while ago, but obviously had forgotten about it when researching this topic (or the “no OS other than GNU/Linux” line had scared off my inner macOS fanboy). I downloaded it, and it worked out of the box. Beautiful results (and obviously easy to customize once it has created CSS and TOC files). So while it doesn’t help with slicing long documents, it does provide navigation and will allow me to use folders to structure more complex projects effectively. I also like that it is (as you say on your site) “talkative” (most CLI tools are a bit too tight-lipped for my taste), and your own documentation is excellent, too. So - another big thank you. Cheers, Martin -- You received this message because you are subscribed to the Google Groups "pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/3666c747-2b9b-4891-a992-1d6b6755d161%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. [-- Attachment #1.2: Type: text/html, Size: 2689 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2019-04-20 9:46 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-04-13 22:24 Multiple HTML files (split at H1 or H2 level) from one source? Martin Post [not found] ` <caeb876c-6393-45d6-92b5-e7f3bee17f49-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2019-04-14 9:59 ` Albert Krewinkel [not found] ` <87o958c3fm.fsf-9EawChwDxG8hFhg+JK9F0w@public.gmane.org> 2019-04-14 19:54 ` Martin Post 2019-04-15 15:44 ` John MacFarlane [not found] ` <m21s232ryh.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org> 2019-04-15 15:55 ` Alexander Krotov [not found] ` <319987c6-f761-841f-3c98-47f56b0993cb-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2019-04-15 15:58 ` John MacFarlane [not found] ` <m2pnpn1crg.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org> 2019-04-16 20:44 ` Martin Post [not found] ` <5749e9e9-205c-4b67-9dd1-f5e37d4d891f-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> 2019-04-17 4:39 ` John MacFarlane [not found] ` <m28sw91bzj.fsf-pgq/RBwaQ+zq8tPRBa0AtqxOck334EZe@public.gmane.org> 2019-04-17 8:28 ` Martin Post 2019-04-18 15:54 ` BP Jonsson [not found] ` <a5eac512-0951-2b0c-c2fc-a9239bc3c553-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2019-04-20 9:18 ` Martin Post 2019-04-19 20:05 ` John Gabriele [not found] ` <3bcc67bf-59a4-49ad-9a0a-ec5e6a39f7b3-jFIJ+Wc5/Vo7lZ9V/NTDHw@public.gmane.org> 2019-04-20 9:46 ` Martin Post
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).