Getting LaTeX process custom output files?

public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed

* Getting LaTeX process custom output files?
@ 2021-03-25 13:49 ` Julien Dutant
       [not found]   ` <04f715c6-bf37-4f03-a780-1c10d4d09740n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Julien Dutant @ 2021-03-25 13:49 UTC (permalink / raw)
  To: pandoc-discuss

[-- Attachment #1.1: Type: text/plain, Size: 2879 bytes --]

Hi all,

Suppose I have a LaTeX template that generates custom output files as it is 
processed - to record the page number at which each chapter starts. Is 
there a way I can get it back once pandoc is done? From what I see the only 
way is to run pandoc to generate .tex output, and then run pdflatex 
separately to get my custom LaTeX-process output file. Is that right? 

Context: I'm writing a lua filter that generates PDFs for a collection of 
papers. But I would also like it to generate separate PDFs for each 
chapter. Currently I run my filter with:

pandoc -L collection-builder --template collection.latex collection.md -o 
collection.pdf

Where collection.md is a 'driver' file containing only a YAML block with 
fields for every chapter:

---
collection:
- title: My collection
- editor: Jane Doe
chapters:
- filename: chapter1.md
- filename: chapter2.md
----

The Lua filter runs Pandoc on each chapter to generate a .tex file for 
each. It then populates its own metadata with the names of each .tex file:

----
...
chapters:
- filename: chapter1.md
  texoutput: chapter1.tmp.tex
- filename: chapter2.md
  texoutput: chapter2.tmp.tex
---

The template collection.latex then imports all these chapter files with:

$foreach(chapters)$
\import{$chapters.texoutput$)
$endfor$

So my Pandoc command generates a collection.pdf with all the chapters. 
However, to generate PDFs for each chapter separately, I need to know at 
which page it starts in the collection PDF. Ideally, I would like to 
extract this information and add it to chapter1.md's metadata block in a 
page-start field, so that I can later generate single chapters directly 
from those; but it'd be ok if I had to generate the entire collection each 
time I want to regenerate a single chapter PDF.

I can add code to the LaTeX template to tell the LaTeX engine to generate a 
custom output file, e.g. pagenumbers.yaml:

---
chapter1: 1
chapter2: 17
---

But as far as I can tell, if the LaTeX engine is run by Pandoc itself 
there's no way of getting that file back - Pandoc trashes all LaTeX output 
besides the PDF. So it looks to me like my filter should only use Pandoc to 
generate (temporary) .tex files, and then run the pdf engine on them to get 
the pagenumbers.yaml file itself. Is that the best solution? Is there a 
trick to get some custom output of the LaTeX engine back after Pandoc 
generates a PDF I haven't thought of (e.g. in stdout?)?

Any suggestions welcome,

Julien

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/04f715c6-bf37-4f03-a780-1c10d4d09740n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 3718 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Getting LaTeX process custom output files?
       [not found]   ` <04f715c6-bf37-4f03-a780-1c10d4d09740n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2021-03-25 19:48     ` John MacFarlane
       [not found]       ` <m2mturm2tw.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: John MacFarlane @ 2021-03-25 19:48 UTC (permalink / raw)
  To: Julien Dutant, pandoc-discuss


Best bet is to use do this, as noted in the manual:

    --pdf-engine=latexmk --pdf-engine-opt=-outdir=foo

Now all the latex output and aux files will go into the foo directory,
which won't be deleted. I think your custom output would be there
too.

Julien Dutant <julien.dutant-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Hi all,
>
> Suppose I have a LaTeX template that generates custom output files as it is 
> processed - to record the page number at which each chapter starts. Is 
> there a way I can get it back once pandoc is done? From what I see the only 
> way is to run pandoc to generate .tex output, and then run pdflatex 
> separately to get my custom LaTeX-process output file. Is that right? 
>
> Context: I'm writing a lua filter that generates PDFs for a collection of 
> papers. But I would also like it to generate separate PDFs for each 
> chapter. Currently I run my filter with:
>
> pandoc -L collection-builder --template collection.latex collection.md -o 
> collection.pdf
>
> Where collection.md is a 'driver' file containing only a YAML block with 
> fields for every chapter:
>
> ---
> collection:
> - title: My collection
> - editor: Jane Doe
> chapters:
> - filename: chapter1.md
> - filename: chapter2.md
> ----
>
> The Lua filter runs Pandoc on each chapter to generate a .tex file for 
> each. It then populates its own metadata with the names of each .tex file:
>
> ----
> ...
> chapters:
> - filename: chapter1.md
>   texoutput: chapter1.tmp.tex
> - filename: chapter2.md
>   texoutput: chapter2.tmp.tex
> ---
>
> The template collection.latex then imports all these chapter files with:
>
> $foreach(chapters)$
> \import{$chapters.texoutput$)
> $endfor$
>
> So my Pandoc command generates a collection.pdf with all the chapters. 
> However, to generate PDFs for each chapter separately, I need to know at 
> which page it starts in the collection PDF. Ideally, I would like to 
> extract this information and add it to chapter1.md's metadata block in a 
> page-start field, so that I can later generate single chapters directly 
> from those; but it'd be ok if I had to generate the entire collection each 
> time I want to regenerate a single chapter PDF.
>
> I can add code to the LaTeX template to tell the LaTeX engine to generate a 
> custom output file, e.g. pagenumbers.yaml:
>
> ---
> chapter1: 1
> chapter2: 17
> ---
>
> But as far as I can tell, if the LaTeX engine is run by Pandoc itself 
> there's no way of getting that file back - Pandoc trashes all LaTeX output 
> besides the PDF. So it looks to me like my filter should only use Pandoc to 
> generate (temporary) .tex files, and then run the pdf engine on them to get 
> the pagenumbers.yaml file itself. Is that the best solution? Is there a 
> trick to get some custom output of the LaTeX engine back after Pandoc 
> generates a PDF I haven't thought of (e.g. in stdout?)?
>
> Any suggestions welcome,
>
> Julien
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/04f715c6-bf37-4f03-a780-1c10d4d09740n%40googlegroups.com.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* AW: Getting LaTeX process custom output files?
       [not found]       ` <m2mturm2tw.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
@ 2021-03-25 23:07         ` denis.maier-FfwAq0itz3ofv37vnLkPlQ
       [not found]           ` <e04c4f87026b4c2b964e551b63ad25aa-FfwAq0itz3ofv37vnLkPlQ@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: denis.maier-FfwAq0itz3ofv37vnLkPlQ @ 2021-03-25 23:07 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw,
	julien.dutant-Re5JQEeQqe8AvxtiuMwx3w

> -----Ursprüngliche Nachricht-----
> Von: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <pandoc-
> discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> Im Auftrag von John MacFarlane
> Gesendet: Donnerstag, 25. März 2021 20:49
> An: Julien Dutant <julien.dutant-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>; pandoc-discuss <pandoc-
> discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
> Betreff: Re: Getting LaTeX process custom output files?
> 
> 

[...]

> >
> > Context: I'm writing a lua filter that generates PDFs for a collection
> > of papers. But I would also like it to generate separate PDFs for each
> > chapter. Currently I run my filter with:
> >
[...] 

> >
> > So my Pandoc command generates a collection.pdf with all the chapters.
> > However, to generate PDFs for each chapter separately, I need to know
> > at which page it starts in the collection PDF. Ideally, I would like
> > to extract this information and add it to chapter1.md's metadata block
> > in a page-start field, so that I can later generate single chapters
> > directly from those; but it'd be ok if I had to generate the entire
> > collection each time I want to regenerate a single chapter PDF.

In my workflow I currently let papers always start at page 1 as it's for an online-only journal, and I don't think continuous pagination is of much help here.

Anyway, when I've tested a workflow similar to what you have in mind, I've had the best (and easiest) results with always generating an entire issue, and then splitting the pdf with pdftk. That's a trivial task for a script if you have the splitpoints available somewhere.

Denis 

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/e04c4f87026b4c2b964e551b63ad25aa%40ub.unibe.ch.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Getting LaTeX process custom output files?
       [not found]           ` <e04c4f87026b4c2b964e551b63ad25aa-FfwAq0itz3ofv37vnLkPlQ@public.gmane.org>
@ 2021-03-26 13:14             ` Julien Dutant
       [not found]               ` <5157881e-36ea-48ef-ba40-30370eed79e3n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Julien Dutant @ 2021-03-26 13:14 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 2973 bytes --]

Thanks both, great suggestions. Denis: I'm planning to add a single-chapter 
cover page to each separately printed chapter, I suspect it's neater to 
generate a new PDF for each. Some comment in case someone else needs this:

* Using latexmk works well with pdf-engine-opt works well. My .tex is for 
Lualatex or Xelatex, so I need to call it with two options, but that seems 
to work fine:

pandoc --pdf-engine=latexmk --pdf-engine-opt="-lualatex" 
--pdf-engine-opt="-outdir=foo"

* Splitting the PDF by bookmarks could be done with a script like 
this: https://stackoverflow.com/a/10086073 . Haven't tested it out yet. 
Looks reasonably easy to replicate within a lua filter. 

All best,

Julien


On Thursday, March 25, 2021 at 11:07:13 PM UTC denis...-FfwAq0itz3ofv37vnLkPlQ@public.gmane.org wrote:

> > -----Ursprüngliche Nachricht-----
> > Von: pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <pandoc-
> > dis...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> Im Auftrag von John MacFarlane
> > Gesendet: Donnerstag, 25. März 2021 20:49
> > An: Julien Dutant <julien...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>; pandoc-discuss <pandoc-
> > dis...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
> > Betreff: Re: Getting LaTeX process custom output files?
> > 
> > 
>
> [...]
>
> > >
> > > Context: I'm writing a lua filter that generates PDFs for a collection
> > > of papers. But I would also like it to generate separate PDFs for each
> > > chapter. Currently I run my filter with:
> > >
> [...] 
>
> > >
> > > So my Pandoc command generates a collection.pdf with all the chapters.
> > > However, to generate PDFs for each chapter separately, I need to know
> > > at which page it starts in the collection PDF. Ideally, I would like
> > > to extract this information and add it to chapter1.md's metadata block
> > > in a page-start field, so that I can later generate single chapters
> > > directly from those; but it'd be ok if I had to generate the entire
> > > collection each time I want to regenerate a single chapter PDF.
>
> In my workflow I currently let papers always start at page 1 as it's for 
> an online-only journal, and I don't think continuous pagination is of much 
> help here.
>
> Anyway, when I've tested a workflow similar to what you have in mind, I've 
> had the best (and easiest) results with always generating an entire issue, 
> and then splitting the pdf with pdftk. That's a trivial task for a script 
> if you have the splitpoints available somewhere.
>
> Denis 
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5157881e-36ea-48ef-ba40-30370eed79e3n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 3958 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* AW: Getting LaTeX process custom output files?
       [not found]               ` <5157881e-36ea-48ef-ba40-30370eed79e3n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2021-03-26 14:31                 ` denis.maier-FfwAq0itz3ofv37vnLkPlQ
  0 siblings, 0 replies; 5+ messages in thread
From: denis.maier-FfwAq0itz3ofv37vnLkPlQ @ 2021-03-26 14:31 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

[-- Attachment #1: Type: text/plain, Size: 6152 bytes --]



Von: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> Im Auftrag von Julien Dutant
Gesendet: Freitag, 26. März 2021 14:14
An: pandoc-discuss <pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
Betreff: Re: Getting LaTeX process custom output files?

Thanks both, great suggestions. Denis: I'm planning to add a single-chapter cover page to each separately printed chapter, I suspect it's neater to generate a new PDF for each.

As you know, I use ConTeXt, which really shines in these use-cases.


Some comment in case someone else needs this:

* Using latexmk works well with pdf-engine-opt works well. My .tex is for Lualatex or Xelatex, so I need to call it with two options, but that seems to work fine:

pandoc --pdf-engine=latexmk --pdf-engine-opt="-lualatex" --pdf-engine-opt="-outdir=foo"

* Splitting the PDF by bookmarks could be done with a script like this: https://stackoverflow.com/a/10086073 . Haven't tested it out yet. Looks reasonably easy to replicate within a lua filter.


E.g.:

```
-- mit welchen Dateien arbeiten wir?

local base = assert(arg[1], "Keine Datei angegeben")
local pdf = base .. ".pdf"
local tuc = base .. ".tuc"


-- import .tuc-file /extension lua
local utilitydata = dofile(tuc)
local breakpoints = {}

local last_page = utilitydata.structures.counters.collected["realpage"][1][1]
print ("Letzte Seite: " .. last_page)

-- iterate over .tuc => get breakpoints
for index, content in pairs(utilitydata.structures.lists.collected) do
    if (content["titledata"]["label"] == "chapter")
    then
        table.insert(breakpoints,content["references"]["realpage"])
    end
end

-- welches sind die Breakpoints?
print("Wir haben folgende Breakpoints:")
for index, content in pairs(breakpoints) do
    print (content)
end

-- wie viele Breakpoint haben wir?
function tablelength(T)
  local count = 0
  for _ in pairs(T) do count = count + 1 end
  return count
end

local breakpoints_length = tablelength(breakpoints)
print ("Wir haben " .. breakpoints_length .. " Breakpoints.")

-- Extraktionsbereiche festlegen
local extractions = {}

for index, breakpoint in pairs(breakpoints) do
    region = {}
    local startregion = breakpoint
                local nextstartregion = breakpoints[index + 1]
                local stopregion;
                if (nextstartregion == nil)
                then
                  stopregion = last_page
                else
                  stopregion = nextstartregion - 1
                end
                region["start"] = startregion
                region["stop"] = stopregion
    table.insert(extractions,region)
end


print ("Wir extrahieren ...")
for index, region in pairs(extractions) do
  print("von " .. region["start"] .. " bis " .. region["stop"])
  local outputfile = "article" .. index .. ".pdf"
  local extract_command = "pdftk " .. pdf .. " cat " .. region["start"] .. "-" .. region["stop"] .. " output " .. outputfile
  os.execute(extract_command)
end
```

Not a filter and based on the .tuc-file produced by ConTeXt, but you get the idea. Run it with `lua split.lua basename`, and you’ll end up with one file per chapter.

Denis




All best,

Julien


On Thursday, March 25, 2021 at 11:07:13 PM UTC denis...-FfwAq0itz3ofv37vnLkPlQ@public.gmane.org<mailto:denis...-FfwAq0itz3ofv37vnLkPlQ@public.gmane.org> wrote:
> -----Ursprüngliche Nachricht-----
> Von: pandoc-...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org <pandoc-
> dis...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org> Im Auftrag von John MacFarlane
> Gesendet: Donnerstag, 25. März 2021 20:49
> An: Julien Dutant <julien...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>; pandoc-discuss <pandoc-
> dis...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
> Betreff: Re: Getting LaTeX process custom output files?
>
>

[...]

> >
> > Context: I'm writing a lua filter that generates PDFs for a collection
> > of papers. But I would also like it to generate separate PDFs for each
> > chapter. Currently I run my filter with:
> >
[...]

> >
> > So my Pandoc command generates a collection.pdf with all the chapters.
> > However, to generate PDFs for each chapter separately, I need to know
> > at which page it starts in the collection PDF. Ideally, I would like
> > to extract this information and add it to chapter1.md's metadata block
> > in a page-start field, so that I can later generate single chapters
> > directly from those; but it'd be ok if I had to generate the entire
> > collection each time I want to regenerate a single chapter PDF.

In my workflow I currently let papers always start at page 1 as it's for an online-only journal, and I don't think continuous pagination is of much help here.

Anyway, when I've tested a workflow similar to what you have in mind, I've had the best (and easiest) results with always generating an entire issue, and then splitting the pdf with pdftk. That's a trivial task for a script if you have the splitpoints available somewhere.

Denis
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org<mailto:pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>.
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/5157881e-36ea-48ef-ba40-30370eed79e3n%40googlegroups.com<https://groups.google.com/d/msgid/pandoc-discuss/5157881e-36ea-48ef-ba40-30370eed79e3n%40googlegroups.com?utm_medium=email&utm_source=footer>.

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/72cb6d97f7ff40b29e1ab2478c478da4%40ub.unibe.ch.

[-- Attachment #2: Type: text/html, Size: 14855 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-03-26 14:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <AQHXIX2gqLcc1zMbF0m499336o3twKqVDAyAgABF8DA=>
2021-03-25 13:49 ` Getting LaTeX process custom output files? Julien Dutant
     [not found]   ` <04f715c6-bf37-4f03-a780-1c10d4d09740n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-03-25 19:48     ` John MacFarlane
     [not found]       ` <m2mturm2tw.fsf-jF64zX8BO08an7k8zZ43ob9bIa4KchGshsV+eolpW18@public.gmane.org>
2021-03-25 23:07         ` AW: " denis.maier-FfwAq0itz3ofv37vnLkPlQ
     [not found]           ` <e04c4f87026b4c2b964e551b63ad25aa-FfwAq0itz3ofv37vnLkPlQ@public.gmane.org>
2021-03-26 13:14             ` Julien Dutant
     [not found]               ` <5157881e-36ea-48ef-ba40-30370eed79e3n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-03-26 14:31                 ` AW: " denis.maier-FfwAq0itz3ofv37vnLkPlQ

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).