* [Markdown=>PDF] Performance and other differences by switching between pdflatex/xelatex/lualatex
@ 2014-01-11 15:12 kurt.pfeifle-gM/Ye1E23mwN+BqQ9rBEUg
[not found] ` <66137e52-b12d-476a-b79c-9afb3ca612bb-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: kurt.pfeifle-gM/Ye1E23mwN+BqQ9rBEUg @ 2014-01-11 15:12 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
[-- Attachment #1: Type: text/plain, Size: 21401 bytes --]
A few weeks ago I've been playing with different settings to created PDF
from my own Markdown files, using --latex-engine=pdflatex|xelatex|lualatex
At the time I noticed there were significant performance differences:
- *pdflatex* was the fastest (but sometimes had problems with special
characters, like German umlauts)
- *xelatex* was significantly slower (but handled my umlauts out of the
box)
- *lualatex* was extremely slow, and in many cases didn't finish the job
at all but threw an error
But I didn't have much time to investigate more deeply -- I decided to
write most of my content in Markdown first, before I turning to fine-tuning
the style details of the different output formats.
Today I found some time to start taking a deeper look at the performance.
In order to have a common (and stable) base for these measurements, I'm
using the main README file from pandoc's Git repository as my Markdown
source.
I'm using the freshly released version 1.12.3, installed via cabal on a
Macbook (running Mavericks), the different LaTeX-engines were installed via
MacPorts:
kp@mb:git.pandoc.trunk >* pandoc --version*
pandoc 1.12.3
Compiled with texmath 0.6.6, highlighting-kate 0.5.6.
[...]
kp@mbp:git.pandoc.trunk > *pdflatex --version*
pdfTeX 3.1415926-2.5-1.40.14 (TeX Live 2013/MacPorts 2013_5)
kpathsea version 6.1.1
Copyright 2013 Peter Breitenlohner (eTeX)/Han The Thanh (pdfTeX).
There is NO warranty. Redistribution of this software is
covered by the terms of both the pdfTeX copyright and
the Lesser GNU General Public License.
For more information about these matters, see the file
named COPYING and the pdfTeX source.
Primary author of pdfTeX: Peter Breitenlohner (eTeX)/Han The Thanh
(pdfTeX).
Compiled with libpng 1.6.7; using libpng 1.6.8
Compiled with zlib 1.2.8; using zlib 1.2.8
Compiled with poppler version 0.24.4
kp@mbp:git.pandoc.trunk > *xelatex --version*
XeTeX 3.1415926-2.5-0.9999.3-2013122212 (TeX Live 2013/MacPorts 2013_5)
kpathsea version 6.1.1
Copyright 2013 SIL International and Jonathan Kew.
There is NO warranty. Redistribution of this software is
covered by the terms of both the XeTeX copyright and
the Lesser GNU General Public License.
For more information about these matters, see the file
named COPYING and the XeTeX source.
Primary author of XeTeX: Jonathan Kew.
Compiled with ICU version 51.2; using 51.2
Compiled with zlib version 1.2.8; using 1.2.8
Compiled with FreeType2 version 2.5.2; using 2.5.2
Compiled with Graphite2 version 1.2.4; using 1.2.4
Compiled with HarfBuzz version 0.9.25; using 0.9.25
Using Mac OS X Core Text, Cocoa & ImageIO frameworks
kp@mbp:git.pandoc.trunk > *lualatex --version*
This is LuaTeX, Version beta-0.76.0-2013122212 (TeX Live 2013/MacPorts
2013_5) (rev 4627)
Execute 'luatex --credits' for credits and version details
There is NO warranty. Redistribution of this software is covered by
the terms of the GNU General Public License, version 2 or (at your option)
any later version. For more information about these matters, see the file
named COPYING and the LuaTeX source.
Copyright 2013 Taco Hoekwater, the LuaTeX Team.
*Speed Differences pdflatex vs. xelatex*
Here are first results from my performance testing:
kp@mb:git.pandoc.trunk > *time for i in {1..10}; do pandoc -f markdown
--latex-engine=pdflatex -o myreadme_pdflatex_${i}.pdf README; done*
real 0m19.262s
user 0m23.205s
sys 0m1.032s
kp@mb:git.pandoc.trunk > *time for i in {1..10}; do pandoc -f markdown
--latex-engine=xelatex -o myreadme_xelatex_${i}.pdf README; done*
real 0m44.976s
user 0m50.706s
sys 0m2.519s
So It seems fair to state that the *xelatex*-path to PDF takes about double
the time compared to the *pdflatex*-path.
*lualatex is b0rken for me*
However, lualatex didn't work at all:
kp@mb:git.pandoc.trunk > *time pandoc -f markdown --latex-engine=lualatex
-o myreadme_lualatex.pdf README*
pandoc: Error producing PDF from TeX source.
This is LuaTeX, Version beta-0.76.0-2013122212 (rev 4627)
restricted \write18 enabled.
(/var/folders/80/3wtx3wys21l921zvl6mp_lp80000gn/T/tex2pdf.60796/input.tex
LaTeX2e <2011/06/27>
Babel <3.9f> and hyphenation patterns for 43 languages loaded.
(/opt/local/share/texmf-texlive/tex/latex/base/article.cls
Document Class: article 2007/10/19 v1.4h Standard LaTeX document class
(/opt/local/share/texmf-texlive/tex/latex/base/size10.clo))
(/opt/local/share/texmf-texlive/tex/latex/base/fontenc.sty
(/opt/local/share/texmf-texlive/tex/latex/base/t1enc.def))
(/opt/local/share/texmf-texlive/tex/latex/lm/lmodern.sty)
(/opt/local/share/texmf-texlive/tex/latex/amsfonts/amssymb.sty
(/opt/local/share/texmf-texlive/tex/latex/amsfonts/amsfonts.sty))
(/opt/local/share/texmf-texlive/tex/latex/amsmath/amsmath.sty
For additional information on amsmath, use the `?' option.
(/opt/local/share/texmf-texlive/tex/latex/amsmath/amstext.sty
(/opt/local/share/texmf-texlive/tex/latex/amsmath/amsgen.sty))
(/opt/local/share/texmf-texlive/tex/latex/amsmath/amsbsy.sty)
(/opt/local/share/texmf-texlive/tex/latex/amsmath/amsopn.sty))
(/opt/local/share/texmf-texlive/tex/generic/ifxetex/ifxetex.sty)
(/opt/local/share/texmf-texlive/tex/generic/oberdiek/ifluatex.sty)
(/opt/local/share/texmf-texlive/tex/latex/base/fixltx2e.sty)
(/opt/local/share/texmf-texlive/tex/latex/upquote/upquote.sty
(/opt/local/share/texmf-texlive/tex/latex/base/textcomp.sty
(/opt/local/share/texmf-texlive/tex/latex/base/ts1enc.def)))
(/opt/local/share/texmf-texlive/tex/latex/fontspec/fontspec.sty
(/opt/local/share/texmf-texlive/tex/latex/l3kernel/expl3.sty
(/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3names.sty
(/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3bootstrap.sty
(/opt/local/share/texmf-texlive/tex/generic/oberdiek/luatex.sty
(/opt/local/share/texmf-texlive/tex/generic/oberdiek/infwarerr.sty)
(/opt/local/share/texmf-texlive/tex/latex/etex-pkg/etex.sty)
(/opt/local/share/texmf-texlive/tex/generic/oberdiek/luatex-loader.sty
(/opt/local/share/texmf-texlive/scripts/oberdiek/oberdiek.luatex.lua)))
(/opt/local/share/texmf-texlive/tex/generic/oberdiek/pdftexcmds.sty
(/opt/local/share/texmf-texlive/tex/generic/oberdiek/ltxcmds.sty)
(/opt/local/share/texmf-texlive/tex/generic/oberdiek/ifpdf.sty))))
(/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3basics.sty)
(/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3expan.sty)
(/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3tl.sty)
(/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3seq.sty)
(/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3int.sty)
(/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3quark.sty)
(/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3prg.sty)
(/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3clist.sty)
(/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3token.sty)
(/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3prop.sty)
(/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3msg.sty)
(/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3file.sty)
(/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3skip.sty)
(/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3keys.sty)
(/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3fp.sty)
(/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3box.sty)
(/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3coffins.sty)
(/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3color.sty)
(/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3luatex.sty)
(/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3candidates.sty))
(/opt/local/share/texmf-texlive/tex/latex/l3packages/xparse/xparse.sty)
(/opt/local/share/texmf-texlive/tex/luatex/luaotfload/luaotfload.sty
(/opt/local/share/texmf-texlive/tex/luatex/luatexbase/luatexbase.sty
(/opt/local/share/texmf-texlive/tex/luatex/luatexbase/luatexbase-compat.sty)
(/opt/local/share/texmf-texlive/tex/luatex/luatexbase/luatexbase-modutils.sty
(/opt/local/share/texmf-texlive/tex/luatex/luatexbase/luatexbase-loader.sty
(/opt/local/share/texmf-texlive/tex/luatex/luatexbase/luatexbase.loader.lua))
(/opt/local/share/texmf-texlive/tex/luatex/luatexbase/modutils.lua))
(/opt/local/share/texmf-texlive/tex/luatex/luatexbase/luatexbase-regs.sty)
(/opt/local/share/texmf-texlive/tex/luatex/luatexbase/luatexbase-attr.sty
(/opt/local/share/texmf-texlive/tex/luatex/luatexbase/attr.lua))
(/opt/local/share/texmf-texlive/tex/luatex/luatexbase/luatexbase-cctb.sty
(/opt/local/share/texmf-texlive/tex/luatex/luatexbase/cctb.lua))
(/opt/local/share/texmf-texlive/tex/luatex/luatexbase/luatexbase-mcb.sty
(/opt/local/share/texmf-texlive/tex/luatex/luatexbase/mcb.lua)))
(/opt/local/share/texmf-texlive/tex/luatex/luaotfload/luaotfload.lua)
(/opt/local/share/texmf-texlive/tex/luatex/luaotfload/luaotfload-merged.lua)(usi
ng write cache:
/Users/kurtpfeifle/.texlive2013/texmf-var/luatex-cache/generic)(
using read cache: /opt/local/var/db/texmf/luatex-cache/generic
/Users/kurtpfeifl
e/.texlive2013/texmf-var/luatex-cache/generic)
(/opt/local/share/texmf-texlive/tex/luatex/luaotfload/luaotfload-lib-dir.lua)
(/opt/local/share/texmf-texlive/tex/luatex/luaotfload/luaotfload-override.lua)
(/opt/local/share/texmf-texlive/tex/luatex/luaotfload/luaotfload-loaders.lua)
(/opt/local/share/texmf-texlive/tex/luatex/luaotfload/luaotfload-database.lua)
(/opt/local/share/texmf-texlive/tex/luatex/luaotfload/luaotfload-colors.lua)
(/opt/local/share/texmf-texlive/tex/luatex/luaotfload/luaotfload-features.lua)
(/opt/local/share/texmf-texlive/tex/luatex/luaotfload/luaotfload-extralibs.lua)
(/opt/local/share/texmf-texlive/tex/luatex/luaotfload/luaotfload-typo-krn.lua)
(/opt/local/share/texmf-texlive/tex/luatex/luaotfload/luaotfload-letterspace.lua
)
(/opt/local/share/texmf-texlive/tex/luatex/luaotfload/luaotfload-auxiliary.lua))
(/opt/local/share/texmf-texlive/tex/latex/fontspec/fontspec.lua)
(/opt/local/share/texmf-texlive/tex/latex/fontspec/fontspec-patches.sty
*************************************************
* LaTeX warning: "xparse/redefine-command"
*
* Redefining document command \oldstylenums with arg. spec. 'm' on line 128.
*************************************************
) (/opt/local/share/texmf-texlive/tex/latex/fontspec/fontspec-luatex.sty
(/opt/local/share/texmf-texlive/tex/latex/base/fontenc.sty
(/opt/local/share/texmf-texlive/tex/latex/euenc/eu2enc.def)
(/opt/local/share/texmf-texlive/tex/latex/euenc/eu2lmr.fd
real 3m31.995s
user 3m20.349s
sys 0m7.601s
kp@mb:git.pandoc.trunk >* echo $?*
43
The lualatex engine didn't produce any PDF:
kp@mb:git.pandoc.trunk >* ls -tlar *.pdf*
-rw-r--r-- 1 kp staff 470292 Jan 11 12:01 myreadme_pdflatex.pdf
-rw-r--r-- 1 kp staff 205823 Jan 11 12:02 myreadme_xelatex.pdf
*Fixing the Page Size Differences*
Another significant difference in the output of the two successful PDF
conversions:
- *xelatex* used A4 media format for the PDF pages
- *pdflatex* used Letter format
(but I guess these defaults are builtin to the respective engines and do
not have anything to do with pandoc. Or?!) This pagesize difference does
not allow for an easy visual side-by side inspection of the two PDFs for
any more subtile differences in their pages' appearance.
So in order to make the output of the two working engines better
comparable, I extended my commandline options:
time pandoc \
-V "geometry:paperwidth=8.26387in" \
-V "geometry:paperheight=29.7cm" \
-V "geometry:vmargin=40pt" \
-V "geometry:hmargin=40pt" \
-f markdown \
--latex-engine=pdflatex \
-o myreadme_pdflatex.pdf \
README
time pandoc \
-V "geometry:paperwidth=8.26387in" \
-V "geometry:paperheight=29.7cm" \
-V "geometry:vmargin=40pt" \
-V "geometry:hmargin=40pt" \
-f markdown \
--latex-engine=xelatex \
-o myreadme_xelatex.pdf \
README
The timings didn't change significantly, but now I have two different PDFs
for inspection, to see if there are any qualitative differences.
On a first supervisual view, both PDFs look nearly identical. However, some
word spacings are slightly different, leading to lines which wrap
differently, which leads to more differences of line wraps on further
pages, which leads to some pages which wrap differently.
Not a big issue, though.
*PDF Metadata*
kp@mb:git.pandoc.trunk > *pdfinfo myreadme_pdflatex.pdf *
Title: Pandoc User's Guide
Subject:
Keywords:
Author: John MacFarlane
Creator: LaTeX with hyperref package
Producer: pdfTeX-1.40.14
CreationDate: Sat Jan 11 14:28:53 2014
ModDate: Sat Jan 11 14:28:53 2014
Tagged: no
Form: none
Pages: 36
Encrypted: no
Page size: 594.999 x 841.89 pts (A4)
Page rot: 0
File size: 455149 bytes
Optimized: no
PDF version: 1.5
kp@mb:git.pandoc.trunk > *pdfinfo myreadme_xelatex.pdf *
Title: Pandoc User's Guide
Author: John MacFarlane
Creator: LaTeX with hyperref package
Producer: xdvipdfmx (0.7.9)
CreationDate: Sat Jan 11 14:28:04 2014
Tagged: no
Form: none
Pages: 37
Encrypted: no
Page size: 595 x 841.89 pts (A4)
Page rot: 0
File size: 189281 bytes
Optimized: no
PDF version: 1.5
As you can see, there are a few significant differences:
- *File size:* pdflatex outputs ~444 kB, xelatex outputs -185 kB
(difference of ~259 kB).
- *Page numbers:* pdflatex generates 36 pages, xelatex generates 37
pages.
- *Producer:* pdflatex states "pdfTeX-1.40.14", xelatex states
"xdvipdfmx (0.7.9)". This means xelatex goes a detour via DVI to produce
its PDF.
- *Subject *and* Keywords:* pdflatex doesn't put these metadata fields
into the PDF (into object with /Type /Catalog), xelatex does so, but
leaves them empty.
- *Page size:* despite identical commandline parameters, there are
slight differences in the page size. I assume this is because of the DVI
detour of xelatex which may introduce some rounding errors when calculating
stuff.
*PDF Fonts*
kp@mb:git.pandoc.trunk > *pdffonts myreadme_pdflatex.pdf *
name type encoding emb
sub uni object ID
------------------------------------ ----------------- ---------------- ---
--- --- ---------
YRKMSP+LMRoman17-Regular Type 1 Custom yes
yes no 347 0
FTOMDN+LMRoman12-Regular Type 1 Custom yes
yes no 348 0
GUKOVW+LMRoman12-Bold Type 1 Custom yes
yes no 349 0
CFVARR+LMRoman10-Regular Type 1 Custom yes
yes no 350 0
SWKNVD+LMRoman10-Italic Type 1 Custom yes
yes no 351 0
GCGIOZ+LMMono10-Regular Type 1 Custom yes
yes no 353 0
BPKJXQ+LMMonoLt10-Bold Type 1 Custom yes
yes no 354 0
WMNHHZ+LMRoman10-BoldItalic Type 1 Custom yes
yes no 364 0
JKORXP+LMRoman10-Bold Type 1 Custom yes
yes no 365 0
CFVARR+LMRoman10-Regular Type 1 Custom yes
yes no 429 0
GCGIOZ+LMMono10-Regular Type 1 Custom yes
yes no 446 0
UMYEZP+LMRoman7-Regular Type 1 Custom yes
yes no 462 0
URPVMO+LMRoman6-Regular Type 1 Custom yes
yes no 463 0
UAEFEH+LMRoman8-Regular Type 1 Custom yes
yes no 465 0
PGWEIL+LMMono8-Regular Type 1 Custom yes
yes no 466 0
kp@mb:git.pandoc.trunk > *pdffonts myreadme_xelatex.pdf*
name type encoding emb
sub uni object ID
-------------------------------------- --------------- ---------------- ---
--- --- ---------
ERGCXD+LMRoman17-Regular-Identity-H CID Type 0C Identity-H yes
yes yes 5 0
PXEJIZ+LMRoman12-Regular-Identity-H CID Type 0C Identity-H yes
yes yes 7 0
SNYTKW+LMRoman12-Bold-Identity-H CID Type 0C Identity-H yes
yes yes 9 0
DKWVLY+LMRoman10-Regular-Identity-H CID Type 0C Identity-H yes
yes yes 11 0
FKSWYW+LMRoman10-Italic-Identity-H CID Type 0C Identity-H yes
yes yes 13 0
TFUYQQ+LMMono10-Regular-Identity-H CID Type 0C Identity-H yes
yes yes 57 0
EEPFTP+LMMonoLt10-Bold-Identity-H CID Type 0C Identity-H yes
yes yes 59 0
UDLNER+LMRoman10-BoldItalic-Identity-H CID Type 0C Identity-H yes
yes yes 64 0
JLOSBI+LMRoman10-Bold-Identity-H CID Type 0C Identity-H yes
yes yes 66 0
QWNYRO+LMRoman7-Regular-Identity-H CID Type 0C Identity-H yes
yes yes 139 0
IKIBKZ+LMRoman6-Regular-Identity-H CID Type 0C Identity-H yes
yes yes 142 0
EIDUGE+LMRoman8-Regular-Identity-H CID Type 0C Identity-H yes
yes yes 144 0
YNRKKN+LMMono8-Regular-Identity-H CID Type 0C Identity-H yes
yes yes 146 0
So here is another significant difference:
- *pdflatex* uses Type 1 (PostScript) fonts with a custom encoding
- *xelatex* here converts all fonts to CID Type 0C
(CFF/CompactFontFormat) fonts with Identity-H encoding
So this font handling IMHO most likely explains to a large part the speed
and size differences of the two PDFs: converting Type1 fonts to CID takes
time (but saves space), and leads to slight differences in character + word
spacing which finally end up with an additional page being created.
On the other hand, sometimes PDFs containing CID fonts with Identity_H as
well as those containing any font with a custom encoding do not play nice
when it comes to copy'n'paste text from their pages, or to extract their
text altogether.
But better let's check both these statements...
*Font Size Differences*
I used the following commands to extract the fonts from the two PDFs:
kp@mb:git.pandoc.trunk > *mutool extract myreadme_xelatex.pdf*
kp@mb:git.pandoc.trunk > *mutool extract myreadme_pdflatex.pdf*
(mutool is a companion commandline tool to MuPDF). This gave me 13 *.pfa
and 13 *.cid files in the current directory. A (rough) comparison of the
combined file sizes for each of the two groups yields this result:
kp@mb:git.pandoc.trunk > *tar cvzf pfas.tar.gz *.pfa 2>/dev/null && ls
-lh pfas.tar.gz*
-rw-r--r-- 1 kurtpfeifle staff 312K Jan 11 15:53 pfas.tar.gz
kp@mb:git.pandoc.trunk > *tar cvzf cids.tar.gz *.cid 2>/dev/null && ls
-lh cids.tar.gz*
-rw-r--r-- 1 kurtpfeifle staff 52K Jan 11 15:53 cids.tar.gz
Extracted *.pfa fonts are not compressed any more, hence I re-compressed
them again inside a tarball. (Inside a PDF, all fonts usually are
compressed too -- so this should be a more reasonable comparison than the
direct filesize sum of the fonts as they are present when extracted...).
The size difference between the two font groups is ~260 kB, which thusly
accounts pretty well for the size differences of the respective PDFs.
*Summary*
1. The file size difference is worth switching from the default
(pdflatex) engine to xelatex, if output file size is a major concern.
However, you pay for this gain with a conversion speed penalty.
2. xelatex can give you additional benefits, should you need them: you
can more easily switch to different fonts, use advanced OpenType font
features
3. However, if you use Markdown/Pandoc to write a paper to submit to
some conference organizers who insist on embedding Type 1 fonts into your
PDF, you're probably better of to stick with the (default) pdflatex engine.
4. It needs to be determined why my lualatex engine currently does not
work.
5. I would be grateful other people on this list could test this too,
especially with --pdf-engine=lualatex, and also post their respective
speed and other results [in case it's not b0rken for them as well]...
(It's nice when pandoc works flawlessly -- however, it is quite difficult
to narrow down the cause of a problem when something goes wrong, like in
this case with LuaLaTeX. I think I'll run a Markdown=>LaTeX conversion
next, and then run a LaTeX=>PDF conversion manually on the commandline, to
see if I can enable some debugging switches there. Currently I do not have
any experiences about directly running lualatex, xelatex or pdflatex on the
command line...)
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/66137e52-b12d-476a-b79c-9afb3ca612bb%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
[-- Attachment #2: Type: text/html, Size: 47087 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Markdown=>PDF] Performance and other differences by switching between pdflatex/xelatex/lualatex
[not found] ` <66137e52-b12d-476a-b79c-9afb3ca612bb-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2014-01-11 15:31 ` kurt.pfeifle-gM/Ye1E23mwN+BqQ9rBEUg
2014-01-11 18:31 ` Axel Kielhorn
2014-01-12 9:52 ` Dirk Laurie
2 siblings, 0 replies; 6+ messages in thread
From: kurt.pfeifle-gM/Ye1E23mwN+BqQ9rBEUg @ 2014-01-11 15:31 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
[-- Attachment #1: Type: text/plain, Size: 23487 bytes --]
Sorry, forgot to report the results from my text extraction tests. In
short: everything worked fine:
I did run these two commands:
kp@mb:git.pandoc.trunk > *pdftotext -layout myreadme_pdflatex.pdf
myreadme_pdflatex--pdftotext.text*
kp@mb:git.pandoc.trunk > *pdftotext -layout myreadme_xelatex.pdf
myreadme_xelatex--pdftotext.text*
kp@mb:git.pandoc.trunk >* wc -l *.text*
2212 myreadme_pdflatex--pdftotext.text
2230 myreadme_xelatex--pdftotext.text
This shows number of text lines extracted (2212 and 2230, respectively).
Visual inspection of the *.text files showed no problem for either of the
source PDF files.
Of course, your mileage may vary as soon as you start using custom font
settings with xelatex (or lualatex, should it work for you). However, I'm
very happy that pandoc + pdflatex/xelatex do work so well with their
default settings when it comes to fonts and text extraction (LaTeX-based
PDF files used to be infamous for causing huuuuge problems in the past when
it came to text extraction or merging them with other PDF files).
Am Samstag, 11. Januar 2014 16:12:11 UTC+1 schrieb kurt.p...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org:
>
> A few weeks ago I've been playing with different settings to created PDF
> from my own Markdown files, using --latex-engine=pdflatex|xelatex|lualatex
>
> At the time I noticed there were significant performance differences:
>
> - *pdflatex* was the fastest (but sometimes had problems with special
> characters, like German umlauts)
> - *xelatex* was significantly slower (but handled my umlauts out of
> the box)
> - *lualatex* was extremely slow, and in many cases didn't finish the
> job at all but threw an error
>
> But I didn't have much time to investigate more deeply -- I decided to
> write most of my content in Markdown first, before I turning to fine-tuning
> the style details of the different output formats.
>
> Today I found some time to start taking a deeper look at the performance.
> In order to have a common (and stable) base for these measurements, I'm
> using the main README file from pandoc's Git repository as my Markdown
> source.
>
> I'm using the freshly released version 1.12.3, installed via cabal on a
> Macbook (running Mavericks), the different LaTeX-engines were installed via
> MacPorts:
>
> kp@mb:git.pandoc.trunk >* pandoc --version*
> pandoc 1.12.3
> Compiled with texmath 0.6.6, highlighting-kate 0.5.6.
> [...]
>
>
> kp@mbp:git.pandoc.trunk > *pdflatex --version*
> pdfTeX 3.1415926-2.5-1.40.14 (TeX Live 2013/MacPorts 2013_5)
> kpathsea version 6.1.1
> Copyright 2013 Peter Breitenlohner (eTeX)/Han The Thanh (pdfTeX).
> There is NO warranty. Redistribution of this software is
> covered by the terms of both the pdfTeX copyright and
> the Lesser GNU General Public License.
> For more information about these matters, see the file
> named COPYING and the pdfTeX source.
> Primary author of pdfTeX: Peter Breitenlohner (eTeX)/Han The Thanh
> (pdfTeX).
> Compiled with libpng 1.6.7; using libpng 1.6.8
> Compiled with zlib 1.2.8; using zlib 1.2.8
> Compiled with poppler version 0.24.4
>
> kp@mbp:git.pandoc.trunk > *xelatex --version*
> XeTeX 3.1415926-2.5-0.9999.3-2013122212 (TeX Live 2013/MacPorts 2013_5)
> kpathsea version 6.1.1
> Copyright 2013 SIL International and Jonathan Kew.
> There is NO warranty. Redistribution of this software is
> covered by the terms of both the XeTeX copyright and
> the Lesser GNU General Public License.
> For more information about these matters, see the file
> named COPYING and the XeTeX source.
> Primary author of XeTeX: Jonathan Kew.
> Compiled with ICU version 51.2; using 51.2
> Compiled with zlib version 1.2.8; using 1.2.8
> Compiled with FreeType2 version 2.5.2; using 2.5.2
> Compiled with Graphite2 version 1.2.4; using 1.2.4
> Compiled with HarfBuzz version 0.9.25; using 0.9.25
> Using Mac OS X Core Text, Cocoa & ImageIO frameworks
>
> kp@mbp:git.pandoc.trunk > *lualatex --version*
> This is LuaTeX, Version beta-0.76.0-2013122212 (TeX Live 2013/MacPorts
> 2013_5) (rev 4627)
> Execute 'luatex --credits' for credits and version details
> There is NO warranty. Redistribution of this software is covered by
> the terms of the GNU General Public License, version 2 or (at your
> option)
> any later version. For more information about these matters, see the file
> named COPYING and the LuaTeX source.
> Copyright 2013 Taco Hoekwater, the LuaTeX Team.
>
>
>
> *Speed Differences pdflatex vs. xelatex*
>
> Here are first results from my performance testing:
>
>
> kp@mb:git.pandoc.trunk > *time for i in {1..10}; do pandoc -f markdown
> --latex-engine=pdflatex -o myreadme_pdflatex_${i}.pdf README; done*
> real 0m19.262s
> user 0m23.205s
> sys 0m1.032s
>
> kp@mb:git.pandoc.trunk > *time for i in {1..10}; do pandoc -f markdown
> --latex-engine=xelatex -o myreadme_xelatex_${i}.pdf README; done*
> real 0m44.976s
> user 0m50.706s
> sys 0m2.519s
>
>
> So It seems fair to state that the *xelatex*-path to PDF takes about
> double the time compared to the *pdflatex*-path.
>
> *lualatex is b0rken for me*
>
> However, lualatex didn't work at all:
>
> kp@mb:git.pandoc.trunk > *time pandoc -f markdown --latex-engine=lualatex
> -o myreadme_lualatex.pdf README*
> pandoc: Error producing PDF from TeX source.
> This is LuaTeX, Version beta-0.76.0-2013122212 (rev 4627)
> restricted \write18 enabled.
> (/var/folders/80/3wtx3wys21l921zvl6mp_lp80000gn/T/tex2pdf.60796/input.tex
> LaTeX2e <2011/06/27>
> Babel <3.9f> and hyphenation patterns for 43 languages loaded.
> (/opt/local/share/texmf-texlive/tex/latex/base/article.cls
> Document Class: article 2007/10/19 v1.4h Standard LaTeX document class
> (/opt/local/share/texmf-texlive/tex/latex/base/size10.clo))
> (/opt/local/share/texmf-texlive/tex/latex/base/fontenc.sty
> (/opt/local/share/texmf-texlive/tex/latex/base/t1enc.def))
> (/opt/local/share/texmf-texlive/tex/latex/lm/lmodern.sty)
> (/opt/local/share/texmf-texlive/tex/latex/amsfonts/amssymb.sty
> (/opt/local/share/texmf-texlive/tex/latex/amsfonts/amsfonts.sty))
> (/opt/local/share/texmf-texlive/tex/latex/amsmath/amsmath.sty
> For additional information on amsmath, use the `?' option.
> (/opt/local/share/texmf-texlive/tex/latex/amsmath/amstext.sty
> (/opt/local/share/texmf-texlive/tex/latex/amsmath/amsgen.sty))
> (/opt/local/share/texmf-texlive/tex/latex/amsmath/amsbsy.sty)
> (/opt/local/share/texmf-texlive/tex/latex/amsmath/amsopn.sty))
> (/opt/local/share/texmf-texlive/tex/generic/ifxetex/ifxetex.sty)
> (/opt/local/share/texmf-texlive/tex/generic/oberdiek/ifluatex.sty)
> (/opt/local/share/texmf-texlive/tex/latex/base/fixltx2e.sty)
> (/opt/local/share/texmf-texlive/tex/latex/upquote/upquote.sty
> (/opt/local/share/texmf-texlive/tex/latex/base/textcomp.sty
> (/opt/local/share/texmf-texlive/tex/latex/base/ts1enc.def)))
> (/opt/local/share/texmf-texlive/tex/latex/fontspec/fontspec.sty
> (/opt/local/share/texmf-texlive/tex/latex/l3kernel/expl3.sty
> (/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3names.sty
> (/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3bootstrap.sty
> (/opt/local/share/texmf-texlive/tex/generic/oberdiek/luatex.sty
> (/opt/local/share/texmf-texlive/tex/generic/oberdiek/infwarerr.sty)
> (/opt/local/share/texmf-texlive/tex/latex/etex-pkg/etex.sty)
> (/opt/local/share/texmf-texlive/tex/generic/oberdiek/luatex-loader.sty
> (/opt/local/share/texmf-texlive/scripts/oberdiek/oberdiek.luatex.lua)))
> (/opt/local/share/texmf-texlive/tex/generic/oberdiek/pdftexcmds.sty
> (/opt/local/share/texmf-texlive/tex/generic/oberdiek/ltxcmds.sty)
> (/opt/local/share/texmf-texlive/tex/generic/oberdiek/ifpdf.sty))))
> (/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3basics.sty)
> (/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3expan.sty)
> (/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3tl.sty)
> (/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3seq.sty)
> (/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3int.sty)
> (/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3quark.sty)
> (/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3prg.sty)
> (/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3clist.sty)
> (/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3token.sty)
> (/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3prop.sty)
> (/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3msg.sty)
> (/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3file.sty)
> (/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3skip.sty)
> (/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3keys.sty)
> (/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3fp.sty)
> (/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3box.sty)
> (/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3coffins.sty)
> (/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3color.sty)
> (/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3luatex.sty)
> (/opt/local/share/texmf-texlive/tex/latex/l3kernel/l3candidates.sty))
> (/opt/local/share/texmf-texlive/tex/latex/l3packages/xparse/xparse.sty)
> (/opt/local/share/texmf-texlive/tex/luatex/luaotfload/luaotfload.sty
> (/opt/local/share/texmf-texlive/tex/luatex/luatexbase/luatexbase.sty
>
> (/opt/local/share/texmf-texlive/tex/luatex/luatexbase/luatexbase-compat.sty)
>
> (/opt/local/share/texmf-texlive/tex/luatex/luatexbase/luatexbase-modutils.sty
> (/opt/local/share/texmf-texlive/tex/luatex/luatexbase/luatexbase-loader.sty
>
> (/opt/local/share/texmf-texlive/tex/luatex/luatexbase/luatexbase.loader.lua))
> (/opt/local/share/texmf-texlive/tex/luatex/luatexbase/modutils.lua))
> (/opt/local/share/texmf-texlive/tex/luatex/luatexbase/luatexbase-regs.sty)
> (/opt/local/share/texmf-texlive/tex/luatex/luatexbase/luatexbase-attr.sty
> (/opt/local/share/texmf-texlive/tex/luatex/luatexbase/attr.lua))
> (/opt/local/share/texmf-texlive/tex/luatex/luatexbase/luatexbase-cctb.sty
> (/opt/local/share/texmf-texlive/tex/luatex/luatexbase/cctb.lua))
> (/opt/local/share/texmf-texlive/tex/luatex/luatexbase/luatexbase-mcb.sty
> (/opt/local/share/texmf-texlive/tex/luatex/luatexbase/mcb.lua)))
> (/opt/local/share/texmf-texlive/tex/luatex/luaotfload/luaotfload.lua)
>
> (/opt/local/share/texmf-texlive/tex/luatex/luaotfload/luaotfload-merged.lua)(usi
> ng write cache:
> /Users/kurtpfeifle/.texlive2013/texmf-var/luatex-cache/generic)(
> using read cache: /opt/local/var/db/texmf/luatex-cache/generic
> /Users/kurtpfeifl
> e/.texlive2013/texmf-var/luatex-cache/generic)
>
> (/opt/local/share/texmf-texlive/tex/luatex/luaotfload/luaotfload-lib-dir.lua)
>
> (/opt/local/share/texmf-texlive/tex/luatex/luaotfload/luaotfload-override.lua)
>
> (/opt/local/share/texmf-texlive/tex/luatex/luaotfload/luaotfload-loaders.lua)
>
> (/opt/local/share/texmf-texlive/tex/luatex/luaotfload/luaotfload-database.lua)
>
> (/opt/local/share/texmf-texlive/tex/luatex/luaotfload/luaotfload-colors.lua)
>
> (/opt/local/share/texmf-texlive/tex/luatex/luaotfload/luaotfload-features.lua)
>
> (/opt/local/share/texmf-texlive/tex/luatex/luaotfload/luaotfload-extralibs.lua)
>
> (/opt/local/share/texmf-texlive/tex/luatex/luaotfload/luaotfload-typo-krn.lua)
>
> (/opt/local/share/texmf-texlive/tex/luatex/luaotfload/luaotfload-letterspace.lua
> )
>
> (/opt/local/share/texmf-texlive/tex/luatex/luaotfload/luaotfload-auxiliary.lua))
> (/opt/local/share/texmf-texlive/tex/latex/fontspec/fontspec.lua)
> (/opt/local/share/texmf-texlive/tex/latex/fontspec/fontspec-patches.sty
> *************************************************
> * LaTeX warning: "xparse/redefine-command"
> *
> * Redefining document command \oldstylenums with arg. spec. 'm' on line
> 128.
> *************************************************
> ) (/opt/local/share/texmf-texlive/tex/latex/fontspec/fontspec-luatex.sty
> (/opt/local/share/texmf-texlive/tex/latex/base/fontenc.sty
> (/opt/local/share/texmf-texlive/tex/latex/euenc/eu2enc.def)
> (/opt/local/share/texmf-texlive/tex/latex/euenc/eu2lmr.fd
>
> real 3m31.995s
> user 3m20.349s
> sys 0m7.601s
>
>
> kp@mb:git.pandoc.trunk >* echo $?*
> 43
>
>
> The lualatex engine didn't produce any PDF:
>
> kp@mb:git.pandoc.trunk >* ls -tlar *.pdf*
> -rw-r--r-- 1 kp staff 470292 Jan 11 12:01 myreadme_pdflatex.pdf
> -rw-r--r-- 1 kp staff 205823 Jan 11 12:02 myreadme_xelatex.pdf
>
>
>
> *Fixing the Page Size Differences*
>
> Another significant difference in the output of the two successful PDF
> conversions:
>
> - *xelatex* used A4 media format for the PDF pages
> - *pdflatex* used Letter format
>
> (but I guess these defaults are builtin to the respective engines and do
> not have anything to do with pandoc. Or?!) This pagesize difference does
> not allow for an easy visual side-by side inspection of the two PDFs for
> any more subtile differences in their pages' appearance.
>
> So in order to make the output of the two working engines better
> comparable, I extended my commandline options:
>
> time pandoc \
> -V "geometry:paperwidth=8.26387in" \
> -V "geometry:paperheight=29.7cm" \
> -V "geometry:vmargin=40pt" \
> -V "geometry:hmargin=40pt" \
> -f markdown \
> --latex-engine=pdflatex \
> -o myreadme_pdflatex.pdf \
> README
>
>
> time pandoc \
> -V "geometry:paperwidth=8.26387in" \
> -V "geometry:paperheight=29.7cm" \
> -V "geometry:vmargin=40pt" \
> -V "geometry:hmargin=40pt" \
> -f markdown \
> --latex-engine=xelatex \
> -o myreadme_xelatex.pdf \
> README
>
>
> The timings didn't change significantly, but now I have two different PDFs
> for inspection, to see if there are any qualitative differences.
>
> On a first supervisual view, both PDFs look nearly identical. However,
> some word spacings are slightly different, leading to lines which wrap
> differently, which leads to more differences of line wraps on further
> pages, which leads to some pages which wrap differently.
>
> Not a big issue, though.
>
> *PDF Metadata*
>
> kp@mb:git.pandoc.trunk > *pdfinfo myreadme_pdflatex.pdf *
> Title: Pandoc User's Guide
> Subject:
> Keywords:
> Author: John MacFarlane
> Creator: LaTeX with hyperref package
> Producer: pdfTeX-1.40.14
> CreationDate: Sat Jan 11 14:28:53 2014
> ModDate: Sat Jan 11 14:28:53 2014
> Tagged: no
> Form: none
> Pages: 36
> Encrypted: no
> Page size: 594.999 x 841.89 pts (A4)
> Page rot: 0
> File size: 455149 bytes
> Optimized: no
> PDF version: 1.5
>
> kp@mb:git.pandoc.trunk > *pdfinfo myreadme_xelatex.pdf *
> Title: Pandoc User's Guide
> Author: John MacFarlane
> Creator: LaTeX with hyperref package
> Producer: xdvipdfmx (0.7.9)
> CreationDate: Sat Jan 11 14:28:04 2014
> Tagged: no
> Form: none
> Pages: 37
> Encrypted: no
> Page size: 595 x 841.89 pts (A4)
> Page rot: 0
> File size: 189281 bytes
> Optimized: no
> PDF version: 1.5
>
>
> As you can see, there are a few significant differences:
>
> - *File size:* pdflatex outputs ~444 kB, xelatex outputs -185 kB
> (difference of ~259 kB).
> - *Page numbers:* pdflatex generates 36 pages, xelatex generates 37
> pages.
> - *Producer:* pdflatex states "pdfTeX-1.40.14", xelatex states
> "xdvipdfmx (0.7.9)". This means xelatex goes a detour via DVI to produce
> its PDF.
> - *Subject *and* Keywords:* pdflatex doesn't put these metadata fields
> into the PDF (into object with /Type /Catalog), xelatex does so, but
> leaves them empty.
> - *Page size:* despite identical commandline parameters, there are
> slight differences in the page size. I assume this is because of the DVI
> detour of xelatex which may introduce some rounding errors when calculating
> stuff.
>
>
> *PDF Fonts*
>
> kp@mb:git.pandoc.trunk > *pdffonts myreadme_pdflatex.pdf *
> name type encoding
> emb sub uni object ID
> ------------------------------------ ----------------- ----------------
> --- --- --- ---------
> YRKMSP+LMRoman17-Regular Type 1 Custom
> yes yes no 347 0
> FTOMDN+LMRoman12-Regular Type 1 Custom
> yes yes no 348 0
> GUKOVW+LMRoman12-Bold Type 1 Custom
> yes yes no 349 0
> CFVARR+LMRoman10-Regular Type 1 Custom
> yes yes no 350 0
> SWKNVD+LMRoman10-Italic Type 1 Custom
> yes yes no 351 0
> GCGIOZ+LMMono10-Regular Type 1 Custom
> yes yes no 353 0
> BPKJXQ+LMMonoLt10-Bold Type 1 Custom
> yes yes no 354 0
> WMNHHZ+LMRoman10-BoldItalic Type 1 Custom
> yes yes no 364 0
> JKORXP+LMRoman10-Bold Type 1 Custom
> yes yes no 365 0
> CFVARR+LMRoman10-Regular Type 1 Custom
> yes yes no 429 0
> GCGIOZ+LMMono10-Regular Type 1 Custom
> yes yes no 446 0
> UMYEZP+LMRoman7-Regular Type 1 Custom
> yes yes no 462 0
> URPVMO+LMRoman6-Regular Type 1 Custom
> yes yes no 463 0
> UAEFEH+LMRoman8-Regular Type 1 Custom
> yes yes no 465 0
> PGWEIL+LMMono8-Regular Type 1 Custom
> yes yes no 466 0
>
> kp@mb:git.pandoc.trunk > *pdffonts myreadme_xelatex.pdf*
> name type encoding
> emb sub uni object ID
> -------------------------------------- --------------- ----------------
> --- --- --- ---------
> ERGCXD+LMRoman17-Regular-Identity-H CID Type 0C Identity-H
> yes yes yes 5 0
> PXEJIZ+LMRoman12-Regular-Identity-H CID Type 0C Identity-H
> yes yes yes 7 0
> SNYTKW+LMRoman12-Bold-Identity-H CID Type 0C Identity-H
> yes yes yes 9 0
> DKWVLY+LMRoman10-Regular-Identity-H CID Type 0C Identity-H
> yes yes yes 11 0
> FKSWYW+LMRoman10-Italic-Identity-H CID Type 0C Identity-H
> yes yes yes 13 0
> TFUYQQ+LMMono10-Regular-Identity-H CID Type 0C Identity-H
> yes yes yes 57 0
> EEPFTP+LMMonoLt10-Bold-Identity-H CID Type 0C Identity-H
> yes yes yes 59 0
> UDLNER+LMRoman10-BoldItalic-Identity-H CID Type 0C Identity-H
> yes yes yes 64 0
> JLOSBI+LMRoman10-Bold-Identity-H CID Type 0C Identity-H
> yes yes yes 66 0
> QWNYRO+LMRoman7-Regular-Identity-H CID Type 0C Identity-H
> yes yes yes 139 0
> IKIBKZ+LMRoman6-Regular-Identity-H CID Type 0C Identity-H
> yes yes yes 142 0
> EIDUGE+LMRoman8-Regular-Identity-H CID Type 0C Identity-H
> yes yes yes 144 0
> YNRKKN+LMMono8-Regular-Identity-H CID Type 0C Identity-H
> yes yes yes 146 0
>
> So here is another significant difference:
>
> - *pdflatex* uses Type 1 (PostScript) fonts with a custom encoding
> - *xelatex* here converts all fonts to CID Type 0C
> (CFF/CompactFontFormat) fonts with Identity-H encoding
>
> So this font handling IMHO most likely explains to a large part the speed
> and size differences of the two PDFs: converting Type1 fonts to CID takes
> time (but saves space), and leads to slight differences in character + word
> spacing which finally end up with an additional page being created.
>
> On the other hand, sometimes PDFs containing CID fonts with Identity_H as
> well as those containing any font with a custom encoding do not play nice
> when it comes to copy'n'paste text from their pages, or to extract their
> text altogether.
>
> But better let's check both these statements...
>
> *Font Size Differences*
>
> I used the following commands to extract the fonts from the two PDFs:
>
> kp@mb:git.pandoc.trunk > *mutool extract myreadme_xelatex.pdf*
> kp@mb:git.pandoc.trunk > *mutool extract myreadme_pdflatex.pdf*
>
>
> (mutool is a companion commandline tool to MuPDF). This gave me 13 *.pfa
> and 13 *.cid files in the current directory. A (rough) comparison of the
> combined file sizes for each of the two groups yields this result:
>
> kp@mb:git.pandoc.trunk > *tar cvzf pfas.tar.gz *.pfa 2>/dev/null && ls
> -lh pfas.tar.gz*
>
> -rw-r--r-- 1 kurtpfeifle staff 312K Jan 11 15:53 pfas.tar.gz
> kp@mb:git.pandoc.trunk > *tar cvzf cids.tar.gz *.cid 2>/dev/null && ls
> -lh cids.tar.gz*
>
> -rw-r--r-- 1 kurtpfeifle staff 52K Jan 11 15:53 cids.tar.gz
>
> Extracted *.pfa fonts are not compressed any more, hence I re-compressed
> them again inside a tarball. (Inside a PDF, all fonts usually are
> compressed too -- so this should be a more reasonable comparison than the
> direct filesize sum of the fonts as they are present when extracted...).
>
> The size difference between the two font groups is ~260 kB, which thusly
> accounts pretty well for the size differences of the respective PDFs.
>
>
> *Summary*
>
> 1. The file size difference is worth switching from the default
> (pdflatex) engine to xelatex, if output file size is a major concern.
> However, you pay for this gain with a conversion speed penalty.
> 2. xelatex can give you additional benefits, should you need them: you
> can more easily switch to different fonts, use advanced OpenType font
> features
> 3. However, if you use Markdown/Pandoc to write a paper to submit to
> some conference organizers who insist on embedding Type 1 fonts into your
> PDF, you're probably better of to stick with the (default) pdflatex engine.
> 4. It needs to be determined why my lualatex engine currently does not
> work.
> 5. I would be grateful other people on this list could test this too,
> especially with --pdf-engine=lualatex, and also post their respective
> speed and other results [in case it's not b0rken for them as well]...
>
> (It's nice when pandoc works flawlessly -- however, it is quite difficult
> to narrow down the cause of a problem when something goes wrong, like in
> this case with LuaLaTeX. I think I'll run a Markdown=>LaTeX conversion
> next, and then run a LaTeX=>PDF conversion manually on the commandline, to
> see if I can enable some debugging switches there. Currently I do not have
> any experiences about directly running lualatex, xelatex or pdflatex on the
> command line...)
>
>
>
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/14ae6813-02f0-4c95-a4d8-33e1ec37874c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
[-- Attachment #2: Type: text/html, Size: 49722 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Markdown=>PDF] Performance and other differences by switching between pdflatex/xelatex/lualatex
[not found] ` <66137e52-b12d-476a-b79c-9afb3ca612bb-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2014-01-11 15:31 ` kurt.pfeifle-gM/Ye1E23mwN+BqQ9rBEUg
@ 2014-01-11 18:31 ` Axel Kielhorn
[not found] ` <B219D441-229E-43F4-AC71-DA65D6902CFB-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-01-12 9:52 ` Dirk Laurie
2 siblings, 1 reply; 6+ messages in thread
From: Axel Kielhorn @ 2014-01-11 18:31 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
[-- Attachment #1: Type: text/plain, Size: 4655 bytes --]
Am 11.01.2014 um 16:12 schrieb kurt.pfeifle-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org:
> A few weeks ago I've been playing with different settings to created PDF from my own Markdown files, using --latex-engine=pdflatex|xelatex|lualatex
>
> At the time I noticed there were significant performance differences:
> • pdflatex was the fastest (but sometimes had problems with special characters, like German umlauts)
> • xelatex was significantly slower (but handled my umlauts out of the box)
> • lualatex was extremely slow, and in many cases didn't finish the job at all but threw an error
This is well known.
If pdflatex does the job, use it, it is the fastest.
If you need XeLaTeX or luatex features it will take longer.
One problem I often run into is shift-space (non-breaking space) which isn't supported by inputenc.
Another is non ASCII characters in sections.
> But I didn't have much time to investigate more deeply -- I decided to write most of my content in Markdown first, before I turning to fine-tuning the style details of the different output formats.
>
> Today I found some time to start taking a deeper look at the performance. In order to have a common (and stable) base for these measurements, I'm using the main README file from pandoc's Git repository as my Markdown source.
>
> I'm using the freshly released version 1.12.3, installed via cabal on a Macbook (running Mavericks), the different LaTeX-engines were installed via MacPorts:
Please use MacTeX and not TeX from MacPorts.
Thus we will be talking about the same binaries (and if you update MacTeX via TeX Live Utility the same version of style files.)
With MacTeX
time pandoc -f markdown --latex-engine=lualatex -o myreadme_lualatex.pdf README
succeeds.
first run:
real 1m4.123s
user 0m38.678s
sys 0m10.288s
second run:
real 0m12.798s
user 0m11.822s
sys 0m1.004s
With pdflatex:
real 0m3.609s
user 0m2.880s
sys 0m0.332s
> Another significant difference in the output of the two successful PDF conversions:
> • xelatex used A4 media format for the PDF pages
> • pdflatex used Letter format
> (but I guess these defaults are builtin to the respective engines and do not have anything to do with pandoc. Or?!) This pagesize difference does not allow for an easy visual side-by side inspection of the two PDFs for any more subtile differences in their pages' appearance.
You should always set the paper size in the document.
Everything else is undefined.
It is best to use a custom template.
(I have set the page size to A4 via tlmgr, but others may have not.)
> PDF Metadata
>
> As you can see, there are a few significant differences:
> • File size: pdflatex outputs ~444 kB, xelatex outputs -185 kB (difference of ~259 kB).
Font expansion (via microtype) in pdflatex which does not work in XeTeX?
> • Page numbers: pdflatex generates 36 pages, xelatex generates 37 pages.
microtype changes the line/paragraph/page breaks.
> • Producer: pdflatex states "pdfTeX-1.40.14", xelatex states "xdvipdfmx (0.7.9)". This means xelatex goes a detour via DVI to produce its PDF.
This is the way XeTeX works.
> • Subject and Keywords: pdflatex doesn't put these metadata fields into the PDF (into object with /Type /Catalog), xelatex does so, but leaves them empty.
\hypersetup{breaklinks=true,
bookmarks=true,
pdfauthor={John MacFarlane},
pdftitle={Pandoc User's Guide},
colorlinks=true,
citecolor=blue,
urlcolor=blue,
linkcolor=magenta,
pdfborder={0 0 0}}
Doesn't define subject and keywords.
You may define pdfsubject and pdfkeywords and fill them with YAML data in a custom template.
(Please submit the changes.)
> • Page size: despite identical commandline parameters, there are slight differences in the page size. I assume this is because of the DVI detour of xelatex which may introduce some rounding errors when calculating stuff.
> (It's nice when pandoc works flawlessly -- however, it is quite difficult to narrow down the cause of a problem when something goes wrong, like in this case with LuaLaTeX. I think I'll run a Markdown=>LaTeX conversion next, and then run a LaTeX=>PDF conversion manually on the commandline, to see if I can enable some debugging switches there. Currently I do not have any experiences about directly running lualatex, xelatex or pdflatex on the command line...)
My first solution is always to generate LaTeX and examine that.
You don't have to run them on the command line, MacTeX will install TeXShop.
Axel
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 1587 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Markdown=>PDF] Performance and other differences by switching between pdflatex/xelatex/lualatex
[not found] ` <B219D441-229E-43F4-AC71-DA65D6902CFB-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2014-01-12 2:30 ` kurt.pfeifle-gM/Ye1E23mwN+BqQ9rBEUg
0 siblings, 0 replies; 6+ messages in thread
From: kurt.pfeifle-gM/Ye1E23mwN+BqQ9rBEUg @ 2014-01-12 2:30 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
[-- Attachment #1: Type: text/plain, Size: 8996 bytes --]
Thank you, Alex, for your valuable feedback!
You seem to be a real LaTeX-Guru. :)
Some of the things you tell me are above my level of LaTeX understanding
though.
The only reason why I'm at all prepared to devote my time to LaTeX now is
pandoc. Without this wonderfull tool I would have been too much afraid, and
would never consider to learn that "weird" language :)
Am Samstag, 11. Januar 2014 19:31:06 UTC+1 schrieb Axel Kielhorn:
>
>
> Am 11.01.2014 um 16:12 schrieb kurt.p...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org <javascript:>:
>
> > A few weeks ago I've been playing with different settings to created PDF
> from my own Markdown files, using --latex-engine=pdflatex|xelatex|lualatex
> >
> > At the time I noticed there were significant performance differences:
> > • pdflatex was the fastest (but sometimes had problems with
> special characters, like German umlauts)
> > • xelatex was significantly slower (but handled my umlauts out
> of the box)
> > • lualatex was extremely slow, and in many cases didn't finish
> the job at all but threw an error
>
> This is well known.
>
It wasn't to me! And most likely not to many others who like me use pandoc
to generate LaTeX or PDF documents. That's why I wrote it all down...
If pdflatex does the job, use it, it is the fastest.
>
You *did* see my conclusions at the end of my posting, didn't you?
> If you need XeLaTeX or luatex features it will take longer.
>
Since these features are only starting to become meaningful, once you begin
generating LaTeX or PDF documents
> One problem I often run into is shift-space (non-breaking space) which
> isn't supported by inputenc.
>
I don't know what 'inputenc' is. Not yet...
> Another is non ASCII characters in sections.
>
> > But I didn't have much time to investigate more deeply -- I decided to
> write most of my content in Markdown first, before I turning to fine-tuning
> the style details of the different output formats.
> >
> > Today I found some time to start taking a deeper look at the
> performance. In order to have a common (and stable) base for these
> measurements, I'm using the main README file from pandoc's Git repository
> as my Markdown source.
> >
> > I'm using the freshly released version 1.12.3, installed via cabal on a
> Macbook (running Mavericks), the different LaTeX-engines were installed via
> MacPorts:
>
> Please use MacTeX and not TeX from MacPorts.
>
Why? Just so that the two of us run the same versions?
I remember having had installed MacTeX previously. About 18 months ago I
removed it (but I can't remember exactly why, but there was also some kind
of problem with it, IIRC), because I did have MacPorts' Tex anyway. At the
time I was mainly interested in some commandline tools for processing PDFs,
which relied on a TeX environment being present. My disk was 95% filled,
and I had to remove some large packages...
So even if I was convinced re-installing MacTeX again, at the moment my
free space will not suffice...
Thus we will be talking about the same binaries (and if you update MacTeX
> via TeX Live Utility the same version of style files.)
>
Ok, that would be a reason... But "the same" binaries would between you and
me, right? However, would the versions of these be newer or older than the
once I presently have?
One thing that could convince me is this: when I saw my son recently using
MicTeX on Windows, I witnessed how the thing did automatically install
packages which it encountered as missing while processing a TeX document.
Does MaTeX sport a similar feature?
> With MacTeX
>
> time pandoc -f markdown --latex-engine=lualatex -o myreadme_lualatex.pdf
> README
>
> succeeds.
>
> first run:
> real 1m4.123s
> user 0m38.678s
> sys 0m10.288s
>
> second run:
>
> real 0m12.798s
> user 0m11.822s
> sys 0m1.004s
>
Why is the second run so much faster than the first one? Is that normal? Or
was your system simply on a higher load while you did the first run?
With pdflatex:
>
> real 0m3.609s
> user 0m2.880s
> sys 0m0.332s
>
> > Another significant difference in the output of the two successful PDF
> conversions:
> > • xelatex used A4 media format for the PDF pages
> > • pdflatex used Letter format
> > (but I guess these defaults are builtin to the respective engines and do
> not have anything to do with pandoc. Or?!) This pagesize difference does
> not allow for an easy visual side-by side inspection of the two PDFs for
> any more subtile differences in their pages' appearance.
>
> You should always set the paper size in the document.
>
Of course. :-)
I'll be setting even more things for my real production files.
However, in this case I was interested in the results from *default*settings first. Only when knowing these I can see how any tweakings I might
apply would possibly improve, worsen or change my results...
> Everything else is undefined.
> It is best to use a custom template.
> (I have set the page size to A4 via tlmgr, but others may have not.)
>
I don't know what tlmgr is...
> > PDF Metadata
> >
> > As you can see, there are a few significant differences:
> > • File size: pdflatex outputs ~444 kB, xelatex outputs -185 kB
> (difference of ~259 kB).
>
> Font expansion (via microtype) in pdflatex which does not work in XeTeX?
>
I don't know what you mean by "font expansion". I don't know what
"microtype" is.
> > • Page numbers: pdflatex generates 36 pages, xelatex generates
> 37 pages.
>
> microtype changes the line/paragraph/page breaks.
>
Obviously you started to write your response before you read what my
posting outlined towards the end....
> > • Producer: pdflatex states "pdfTeX-1.40.14", xelatex states
> "xdvipdfmx (0.7.9)". This means xelatex goes a detour via DVI to produce
> its PDF.
>
> This is the way XeTeX works.
>
Obviously :-)
> > • Subject and Keywords: pdflatex doesn't put these metadata
> fields into the PDF (into object with /Type /Catalog), xelatex does so, but
> leaves them empty.
>
> \hypersetup{breaklinks=true,
> bookmarks=true,
> pdfauthor={John MacFarlane},
> pdftitle={Pandoc User's Guide},
> colorlinks=true,
> citecolor=blue,
> urlcolor=blue,
> linkcolor=magenta,
> pdfborder={0 0 0}}
>
> Doesn't define subject and keywords.
> You may define pdfsubject and pdfkeywords and fill them with YAML data in
> a custom template.
>
I'll see if I can manage to do it. I'm still a beginner when it comes to
fiddling with the templates.
> (Please submit the changes.)
>
Once I succeed, yes.
> > • Page size: despite identical commandline parameters, there are
> slight differences in the page size. I assume this is because of the DVI
> detour of xelatex which may introduce some rounding errors when calculating
> stuff.
>
> > (It's nice when pandoc works flawlessly -- however, it is quite
> difficult to narrow down the cause of a problem when something goes wrong,
> like in this case with LuaLaTeX. I think I'll run a Markdown=>LaTeX
> conversion next, and then run a LaTeX=>PDF conversion manually on the
> commandline, to see if I can enable some debugging switches there.
> Currently I do not have any experiences about directly running lualatex,
> xelatex or pdflatex on the command line...)
>
> My first solution is always to generate LaTeX and examine that.
>
*You*'re a LaTeX expert. :)
When *I* look at LaTeX code, I feel helpless.
*I*'m a PDF expert :-)
> You don't have to run them on the command line, MacTeX will install
> TeXShop.
>
You know, I'm not afraid from the commandline (even if I'm not versed in
LaTeX). Oftentimes I can figure problems with GUI software when I start it
from the terminal, if it prints its stdout/stderr back to the terminal...
I don't know what TeXShop is. Is it something similar to TexMaker?
Thanks again for your help and feedback.
Cheers,
Kurt
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/cc5f2f61-3b24-4adb-9be3-355e66f0bc31%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
[-- Attachment #2: Type: text/html, Size: 13356 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Markdown=>PDF] Performance and other differences by switching between pdflatex/xelatex/lualatex
[not found] ` <66137e52-b12d-476a-b79c-9afb3ca612bb-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2014-01-11 15:31 ` kurt.pfeifle-gM/Ye1E23mwN+BqQ9rBEUg
2014-01-11 18:31 ` Axel Kielhorn
@ 2014-01-12 9:52 ` Dirk Laurie
[not found] ` <CABcj=tm7ON4n5G6joBFN17Rz+5vim_crbts5mqtEbnSDn7Nq+g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2 siblings, 1 reply; 6+ messages in thread
From: Dirk Laurie @ 2014-01-12 9:52 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
2014/1/11 <kurt.pfeifle-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>:
> Summary
> It needs to be determined why my lualatex engine currently does not work.
> I would be grateful other people on this list could test this too,
> especially with --pdf-engine=lualatex, and also post their respective speed
> and other results [in case it's not b0rken for them as well]...
LuaTeX is a work in progress, and new releases often have new bugs.
Version numbers that are a proper multiple of 0.05 tend to be stabler,
in particular those that have been included in a TeX Live release.
I use:
$ lualatex --version
This is LuaTeX, Version beta-0.70.2-2012062812 (TeX Live 2012)
All three versions work for me, with xelatex twice as fast
and pdflatex four times as fast as lualatex.
Note that "-t pdf" always runs LaTeX twice. If the changes to your
source are so small that cross-references don't change, you can
save time by using three-stage processing.
$ pandoc --latex-engine=lualatex -s README -o luaREADME.tex
$ lualatex luaREADME.tex
$ grep "Rerun to get cross-references right" luaREADME.log && lualatex
luaREADME.tex
The downside is that you need to do your own cleaning-up of auxiliary files.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Markdown=>PDF] Performance and other differences by switching between pdflatex/xelatex/lualatex
[not found] ` <CABcj=tm7ON4n5G6joBFN17Rz+5vim_crbts5mqtEbnSDn7Nq+g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-01-12 13:18 ` kurt.pfeifle-gM/Ye1E23mwN+BqQ9rBEUg
0 siblings, 0 replies; 6+ messages in thread
From: kurt.pfeifle-gM/Ye1E23mwN+BqQ9rBEUg @ 2014-01-12 13:18 UTC (permalink / raw)
To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw
[-- Attachment #1: Type: text/plain, Size: 2312 bytes --]
pandoc, when generating LaTeX from markdown, doesn't care which
--pdf-engine= switch is set. It's always the exact same *.tex file which is
output. So this command is sufficient:
pandoc README -o README.tex -s ; ln -s luaREADME.tex README.tex
To use the name luaREADME.tex is only useful for pre-populating the PDF
filename and logfile name when running lualatex on the TeX...
Am Sonntag, 12. Januar 2014 10:52:45 UTC+1 schrieb Dirk:
>
> 2014/1/11 <kurt.p...-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org <javascript:>>:
>
> > Summary
>
> > It needs to be determined why my lualatex engine currently does not
> work.
> > I would be grateful other people on this list could test this too,
> > especially with --pdf-engine=lualatex, and also post their respective
> speed
> > and other results [in case it's not b0rken for them as well]...
>
> LuaTeX is a work in progress, and new releases often have new bugs.
> Version numbers that are a proper multiple of 0.05 tend to be stabler,
> in particular those that have been included in a TeX Live release.
> I use:
>
> $ lualatex --version
> This is LuaTeX, Version beta-0.70.2-2012062812 (TeX Live 2012)
> All three versions work for me, with xelatex twice as fast
> and pdflatex four times as fast as lualatex.
>
> Note that "-t pdf" always runs LaTeX twice. If the changes to your
> source are so small that cross-references don't change, you can
> save time by using three-stage processing.
>
> $ pandoc --latex-engine=lualatex -s README -o luaREADME.tex
> $ lualatex luaREADME.tex
> $ grep "Rerun to get cross-references right" luaREADME.log && lualatex
> luaREADME.tex
>
> The downside is that you need to do your own cleaning-up of auxiliary
> files.
>
--
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To post to this group, send email to pandoc-discuss-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/a805f4d1-1ea8-4785-b5f2-804388cb5f13%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
[-- Attachment #2: Type: text/html, Size: 3338 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-01-12 13:18 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-11 15:12 [Markdown=>PDF] Performance and other differences by switching between pdflatex/xelatex/lualatex kurt.pfeifle-gM/Ye1E23mwN+BqQ9rBEUg
[not found] ` <66137e52-b12d-476a-b79c-9afb3ca612bb-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2014-01-11 15:31 ` kurt.pfeifle-gM/Ye1E23mwN+BqQ9rBEUg
2014-01-11 18:31 ` Axel Kielhorn
[not found] ` <B219D441-229E-43F4-AC71-DA65D6902CFB-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-01-12 2:30 ` kurt.pfeifle-gM/Ye1E23mwN+BqQ9rBEUg
2014-01-12 9:52 ` Dirk Laurie
[not found] ` <CABcj=tm7ON4n5G6joBFN17Rz+5vim_crbts5mqtEbnSDn7Nq+g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-01-12 13:18 ` kurt.pfeifle-gM/Ye1E23mwN+BqQ9rBEUg
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).