* Getting LaTeX process custom output files?
@ 2021-03-25 13:49 ` Julien Dutant
[not found] ` <04f715c6-bf37-4f03-a780-1c10d4d09740n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: Julien Dutant @ 2021-03-25 13:49 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 2879 bytes --]
Hi all,
Suppose I have a LaTeX template that generates custom output files as it is
processed - to record the page number at which each chapter starts. Is
there a way I can get it back once pandoc is done? From what I see the only
way is to run pandoc to generate .tex output, and then run pdflatex
separately to get my custom LaTeX-process output file. Is that right?
Context: I'm writing a lua filter that generates PDFs for a collection of
papers. But I would also like it to generate separate PDFs for each
chapter. Currently I run my filter with:
pandoc -L collection-builder --template collection.latex collection.md -o
collection.pdf
Where collection.md is a 'driver' file containing only a YAML block with
fields for every chapter:
---
collection:
- title: My collection
- editor: Jane Doe
chapters:
- filename: chapter1.md
- filename: chapter2.md
----
The Lua filter runs Pandoc on each chapter to generate a .tex file for
each. It then populates its own metadata with the names of each .tex file:
----
...
chapters:
- filename: chapter1.md
texoutput: chapter1.tmp.tex
- filename: chapter2.md
texoutput: chapter2.tmp.tex
---
The template collection.latex then imports all these chapter files with:
$foreach(chapters)$
\import{$chapters.texoutput$)
$endfor$
So my Pandoc command generates a collection.pdf with all the chapters.
However, to generate PDFs for each chapter separately, I need to know at
which page it starts in the collection PDF. Ideally, I would like to
extract this information and add it to chapter1.md's metadata block in a
page-start field, so that I can later generate single chapters directly
from those; but it'd be ok if I had to generate the entire collection each
time I want to regenerate a single chapter PDF.
I can add code to the LaTeX template to tell the LaTeX engine to generate a
custom output file, e.g. pagenumbers.yaml:
---
chapter1: 1
chapter2: 17
---
But as far as I can tell, if the LaTeX engine is run by Pandoc itself
there's no way of getting that file back - Pandoc trashes all LaTeX output
besides the PDF. So it looks to me like my filter should only use Pandoc to
generate (temporary) .tex files, and then run the pdf engine on them to get
the pagenumbers.yaml file itself. Is that the best solution? Is there a
trick to get some custom output of the LaTeX engine back after Pandoc
generates a PDF I haven't thought of (e.g. in stdout?)?
Any suggestions welcome,
Julien
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/04f715c6-bf37-4f03-a780-1c10d4d09740n%40googlegroups.com.
[-- Attachment #1.2: Type: text/html, Size: 3718 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Getting LaTeX process custom output files?
[not found] ` <04f715c6-bf37-4f03-a780-1c10d4d09740n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2021-03-25 19:48 ` John MacFarlane
[not found] ` <m2mturm2tw.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: John MacFarlane @ 2021-03-25 19:48 UTC (permalink / raw)
To: Julien Dutant, pandoc-discuss
Best bet is to use do this, as noted in the manual:
--pdf-engine=latexmk --pdf-engine-opt=-outdir=foo
Now all the latex output and aux files will go into the foo directory,
which won't be deleted. I think your custom output would be there
too.
Julien Dutant <julien.dutant-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> Hi all,
>
> Suppose I have a LaTeX template that generates custom output files as it is
> processed - to record the page number at which each chapter starts. Is
> there a way I can get it back once pandoc is done? From what I see the only
> way is to run pandoc to generate .tex output, and then run pdflatex
> separately to get my custom LaTeX-process output file. Is that right?
>
> Context: I'm writing a lua filter that generates PDFs for a collection of
> papers. But I would also like it to generate separate PDFs for each
> chapter. Currently I run my filter with:
>
> pandoc -L collection-builder --template collection.latex collection.md -o
> collection.pdf
>
> Where collection.md is a 'driver' file containing only a YAML block with
> fields for every chapter:
>
> ---
> collection:
> - title: My collection
> - editor: Jane Doe
> chapters:
> - filename: chapter1.md
> - filename: chapter2.md
> ----
>
> The Lua filter runs Pandoc on each chapter to generate a .tex file for
> each. It then populates its own metadata with the names of each .tex file:
>
> ----
> ...
> chapters:
> - filename: chapter1.md
> texoutput: chapter1.tmp.tex
> - filename: chapter2.md
> texoutput: chapter2.tmp.tex
> ---
>
> The template collection.latex then imports all these chapter files with:
>
> $foreach(chapters)$
> \import{$chapters.texoutput$)
> $endfor$
>
> So my Pandoc command generates a collection.pdf with all the chapters.
> However, to generate PDFs for each chapter separately, I need to know at
> which page it starts in the collection PDF. Ideally, I would like to
> extract this information and add it to chapter1.md's metadata block in a
> page-start field, so that I can later generate single chapters directly
> from those; but it'd be ok if I had to generate the entire collection each
> time I want to regenerate a single chapter PDF.
>
> I can add code to the LaTeX template to tell the LaTeX engine to generate a
> custom output file, e.g. pagenumbers.yaml:
>
> ---
> chapter1: 1
> chapter2: 17
> ---
>
> But as far as I can tell, if the LaTeX engine is run by Pandoc itself
> there's no way of getting that file back - Pandoc trashes all LaTeX output
> besides the PDF. So it looks to me like my filter should only use Pandoc to
> generate (temporary) .tex files, and then run the pdf engine on them to get
> the pagenumbers.yaml file itself. Is that the best solution? Is there a
> trick to get some custom output of the LaTeX engine back after Pandoc
> generates a PDF I haven't thought of (e.g. in stdout?)?
>
> Any suggestions welcome,
>
> Julien
>
> --
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/04f715c6-bf37-4f03-a780-1c10d4d09740n%40googlegroups.com.
^ permalink raw reply [flat|nested] 5+ messages in thread
* AW: Getting LaTeX process custom output files?
[not found] ` <m2mturm2tw.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
@ 2021-03-25 23:07 ` denis.maier-FfwAq0itz3ofv37vnLkPlQ
[not found] ` <e04c4f87026b4c2b964e551b63ad25aa-FfwAq0itz3ofv37vnLkPlQ@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: denis.maier-FfwAq0itz3ofv37vnLkPlQ @ 2021-03-25 23:07 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw,
julien.dutant-Re5JQEeQqe8AvxtiuMwx3w
> -----Ursprüngliche Nachricht-----
> Von: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <pandoc-
> discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> Im Auftrag von John MacFarlane
> Gesendet: Donnerstag, 25. März 2021 20:49
> An: Julien Dutant <julien.dutant-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>; pandoc-discuss <pandoc-
> discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
> Betreff: Re: Getting LaTeX process custom output files?
>
>
[...]
> >
> > Context: I'm writing a lua filter that generates PDFs for a collection
> > of papers. But I would also like it to generate separate PDFs for each
> > chapter. Currently I run my filter with:
> >
[...]
> >
> > So my Pandoc command generates a collection.pdf with all the chapters.
> > However, to generate PDFs for each chapter separately, I need to know
> > at which page it starts in the collection PDF. Ideally, I would like
> > to extract this information and add it to chapter1.md's metadata block
> > in a page-start field, so that I can later generate single chapters
> > directly from those; but it'd be ok if I had to generate the entire
> > collection each time I want to regenerate a single chapter PDF.
In my workflow I currently let papers always start at page 1 as it's for an online-only journal, and I don't think continuous pagination is of much help here.
Anyway, when I've tested a workflow similar to what you have in mind, I've had the best (and easiest) results with always generating an entire issue, and then splitting the pdf with pdftk. That's a trivial task for a script if you have the splitpoints available somewhere.
Denis
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e04c4f87026b4c2b964e551b63ad25aa%40ub.unibe.ch.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Getting LaTeX process custom output files?
[not found] ` <e04c4f87026b4c2b964e551b63ad25aa-FfwAq0itz3ofv37vnLkPlQ@public.gmane.org>
@ 2021-03-26 13:14 ` Julien Dutant
[not found] ` <5157881e-36ea-48ef-ba40-30370eed79e3n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: Julien Dutant @ 2021-03-26 13:14 UTC (permalink / raw)
To: pandoc-discuss
[-- Attachment #1.1: Type: text/plain, Size: 2973 bytes --]
Thanks both, great suggestions. Denis: I'm planning to add a single-chapter
cover page to each separately printed chapter, I suspect it's neater to
generate a new PDF for each. Some comment in case someone else needs this:
* Using latexmk works well with pdf-engine-opt works well. My .tex is for
Lualatex or Xelatex, so I need to call it with two options, but that seems
to work fine:
pandoc --pdf-engine=latexmk --pdf-engine-opt="-lualatex"
--pdf-engine-opt="-outdir=foo"
* Splitting the PDF by bookmarks could be done with a script like
this: https://stackoverflow.com/a/10086073 . Haven't tested it out yet.
Looks reasonably easy to replicate within a lua filter.
All best,
Julien
On Thursday, March 25, 2021 at 11:07:13 PM UTC denis...-FfwAq0itz3ofv37vnLkPlQ@public.gmane.org wrote:
> > -----Ursprüngliche Nachricht-----
> > Von: pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <pandoc-
> > dis...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> Im Auftrag von John MacFarlane
> > Gesendet: Donnerstag, 25. März 2021 20:49
> > An: Julien Dutant <julien...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>; pandoc-discuss <pandoc-
> > dis...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
> > Betreff: Re: Getting LaTeX process custom output files?
> >
> >
>
> [...]
>
> > >
> > > Context: I'm writing a lua filter that generates PDFs for a collection
> > > of papers. But I would also like it to generate separate PDFs for each
> > > chapter. Currently I run my filter with:
> > >
> [...]
>
> > >
> > > So my Pandoc command generates a collection.pdf with all the chapters.
> > > However, to generate PDFs for each chapter separately, I need to know
> > > at which page it starts in the collection PDF. Ideally, I would like
> > > to extract this information and add it to chapter1.md's metadata block
> > > in a page-start field, so that I can later generate single chapters
> > > directly from those; but it'd be ok if I had to generate the entire
> > > collection each time I want to regenerate a single chapter PDF.
>
> In my workflow I currently let papers always start at page 1 as it's for
> an online-only journal, and I don't think continuous pagination is of much
> help here.
>
> Anyway, when I've tested a workflow similar to what you have in mind, I've
> had the best (and easiest) results with always generating an entire issue,
> and then splitting the pdf with pdftk. That's a trivial task for a script
> if you have the splitpoints available somewhere.
>
> Denis
>
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5157881e-36ea-48ef-ba40-30370eed79e3n%40googlegroups.com.
[-- Attachment #1.2: Type: text/html, Size: 3958 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* AW: Getting LaTeX process custom output files?
[not found] ` <5157881e-36ea-48ef-ba40-30370eed79e3n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2021-03-26 14:31 ` denis.maier-FfwAq0itz3ofv37vnLkPlQ
0 siblings, 0 replies; 5+ messages in thread
From: denis.maier-FfwAq0itz3ofv37vnLkPlQ @ 2021-03-26 14:31 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
[-- Attachment #1: Type: text/plain, Size: 6152 bytes --]
Von: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> Im Auftrag von Julien Dutant
Gesendet: Freitag, 26. März 2021 14:14
An: pandoc-discuss <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Betreff: Re: Getting LaTeX process custom output files?
Thanks both, great suggestions. Denis: I'm planning to add a single-chapter cover page to each separately printed chapter, I suspect it's neater to generate a new PDF for each.
As you know, I use ConTeXt, which really shines in these use-cases.
Some comment in case someone else needs this:
* Using latexmk works well with pdf-engine-opt works well. My .tex is for Lualatex or Xelatex, so I need to call it with two options, but that seems to work fine:
pandoc --pdf-engine=latexmk --pdf-engine-opt="-lualatex" --pdf-engine-opt="-outdir=foo"
* Splitting the PDF by bookmarks could be done with a script like this: https://stackoverflow.com/a/10086073 . Haven't tested it out yet. Looks reasonably easy to replicate within a lua filter.
E.g.:
```
-- mit welchen Dateien arbeiten wir?
local base = assert(arg[1], "Keine Datei angegeben")
local pdf = base .. ".pdf"
local tuc = base .. ".tuc"
-- import .tuc-file /extension lua
local utilitydata = dofile(tuc)
local breakpoints = {}
local last_page = utilitydata.structures.counters.collected["realpage"][1][1]
print ("Letzte Seite: " .. last_page)
-- iterate over .tuc => get breakpoints
for index, content in pairs(utilitydata.structures.lists.collected) do
if (content["titledata"]["label"] == "chapter")
then
table.insert(breakpoints,content["references"]["realpage"])
end
end
-- welches sind die Breakpoints?
print("Wir haben folgende Breakpoints:")
for index, content in pairs(breakpoints) do
print (content)
end
-- wie viele Breakpoint haben wir?
function tablelength(T)
local count = 0
for _ in pairs(T) do count = count + 1 end
return count
end
local breakpoints_length = tablelength(breakpoints)
print ("Wir haben " .. breakpoints_length .. " Breakpoints.")
-- Extraktionsbereiche festlegen
local extractions = {}
for index, breakpoint in pairs(breakpoints) do
region = {}
local startregion = breakpoint
local nextstartregion = breakpoints[index + 1]
local stopregion;
if (nextstartregion == nil)
then
stopregion = last_page
else
stopregion = nextstartregion - 1
end
region["start"] = startregion
region["stop"] = stopregion
table.insert(extractions,region)
end
print ("Wir extrahieren ...")
for index, region in pairs(extractions) do
print("von " .. region["start"] .. " bis " .. region["stop"])
local outputfile = "article" .. index .. ".pdf"
local extract_command = "pdftk " .. pdf .. " cat " .. region["start"] .. "-" .. region["stop"] .. " output " .. outputfile
os.execute(extract_command)
end
```
Not a filter and based on the .tuc-file produced by ConTeXt, but you get the idea. Run it with `lua split.lua basename`, and you’ll end up with one file per chapter.
Denis
All best,
Julien
On Thursday, March 25, 2021 at 11:07:13 PM UTC denis...-FfwAq0itz3ofv37vnLkPlQ@public.gmane.org<mailto:denis...-FfwAq0itz3ofv37vnLkPlQ@public.gmane.org> wrote:
> -----Ursprüngliche Nachricht-----
> Von: pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <pandoc-
> dis...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> Im Auftrag von John MacFarlane
> Gesendet: Donnerstag, 25. März 2021 20:49
> An: Julien Dutant <julien...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>; pandoc-discuss <pandoc-
> dis...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
> Betreff: Re: Getting LaTeX process custom output files?
>
>
[...]
> >
> > Context: I'm writing a lua filter that generates PDFs for a collection
> > of papers. But I would also like it to generate separate PDFs for each
> > chapter. Currently I run my filter with:
> >
[...]
> >
> > So my Pandoc command generates a collection.pdf with all the chapters.
> > However, to generate PDFs for each chapter separately, I need to know
> > at which page it starts in the collection PDF. Ideally, I would like
> > to extract this information and add it to chapter1.md's metadata block
> > in a page-start field, so that I can later generate single chapters
> > directly from those; but it'd be ok if I had to generate the entire
> > collection each time I want to regenerate a single chapter PDF.
In my workflow I currently let papers always start at page 1 as it's for an online-only journal, and I don't think continuous pagination is of much help here.
Anyway, when I've tested a workflow similar to what you have in mind, I've had the best (and easiest) results with always generating an entire issue, and then splitting the pdf with pdftk. That's a trivial task for a script if you have the splitpoints available somewhere.
Denis
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5157881e-36ea-48ef-ba40-30370eed79e3n%40googlegroups.com<https://groups.google.com/d/msgid/pandoc-discuss/5157881e-36ea-48ef-ba40-30370eed79e3n%40googlegroups.com?utm_medium=email&utm_source=footer>.
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/72cb6d97f7ff40b29e1ab2478c478da4%40ub.unibe.ch.
[-- Attachment #2: Type: text/html, Size: 14855 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-03-26 14:31 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <AQHXIX2gqLcc1zMbF0m499336o3twKqVDAyAgABF8DA=>
2021-03-25 13:49 ` Getting LaTeX process custom output files? Julien Dutant
[not found] ` <04f715c6-bf37-4f03-a780-1c10d4d09740n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-03-25 19:48 ` John MacFarlane
[not found] ` <m2mturm2tw.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-03-25 23:07 ` AW: " denis.maier-FfwAq0itz3ofv37vnLkPlQ
[not found] ` <e04c4f87026b4c2b964e551b63ad25aa-FfwAq0itz3ofv37vnLkPlQ@public.gmane.org>
2021-03-26 13:14 ` Julien Dutant
[not found] ` <5157881e-36ea-48ef-ba40-30370eed79e3n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-03-26 14:31 ` AW: " denis.maier-FfwAq0itz3ofv37vnLkPlQ
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).