ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
From: Mojca Miklavec <mojca.miklavec.lists@gmail.com>
To: mailing list for ConTeXt users <ntg-context@ntg.nl>
Subject: Re: Best way to create a large number of documents from database
Date: Fri, 17 Apr 2020 16:37:48 +0200	[thread overview]
Message-ID: <CALBOmsbY4nCvPaef-DjOS-1MZmrzeNRLTg=Ko2YdPYipimLBqg@mail.gmail.com> (raw)
In-Reply-To: <CALBOmsa8zz=++caJUv+cToXevXbs-g=-b8BJHqF8A1=NseGTXg@mail.gmail.com>

On Thu, 16 Apr 2020 at 16:38, Mojca Miklavec wrote:
> On Thu, 16 Apr 2020 at 11:29, Taco Hoekwater wrote:
> > > On 16 Apr 2020, at 11:12, Mojca Miklavec wrote:
> > >
> > > I have been asked to create a few thousand PDF documents from a CSV
> > > "database" today
> >
> > In CPU cycles, the fastest way is to do a single context —once
> > run generating all the pages as a single document, then using
> > mutool merge to split it into separate documents using a (shell)
> > loop.
>
> Just to make it clear: I don't really need to optimize on the CPU end,

... says the optimist ... :) :) :)

> as the bottleneck is on the other side of the keyboard, so as long as
> the CPU can process 5k pages today, I'm fine with it :) :) :)

While the bottleneck was in fact at the other side of the keyboard
(preparation was certainly longer than the execution), it still took
cca 2,5 hours to generate the full batch.

(I'm pretty sure I could have further optimised the code, even though
1 second per run is still pretty fast [when I started using context it
was more like 30 seconds per run], it just adds up when talking about
thousands of pages. This greatly reminds me on the awesome speedup
that Hans achieved when rewriting the mplib code & the initial
\sometxt changes inside metapost which also lead to 100-fold speedups
as one no longer needed to start TeX a zillion times.)

While waiting I wanted to start being clever and do the processing in
the same folder in parallel (I have lots of cores after all), and
ended up calling a script with
    context --N={n} --output=doc-{nnnn}.pdf template.tex
    context --purge
only to notice much later that running multiple context runs in the
same folder (some of them compiling and some of them deleting the
temporary files) might not have been the best idea on the planet, many
documents ended up missing, and many corrupted. So I had to rerun half
of the documents.

One of the interesting statistics.
I used a bunch of images (the same png images in all documents; cca.
290k in total).

The generated documents were 1,5 GB in size. When compressed with
tar.gz, there was almost no noticeable difference between the
compressed and non-compressed data size (1,4 GB vs. 1,5 GB). But when
compressing with tar.xz, it compressed 1,5 GB worth of document into
merely 27 MB (a single document is 360 k).

The documents have been e-mailed out, but now they need to print hard
copies for archive. I'm happy I don't need to be the one printing and
storing that :) :) :)

Mojca
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

  parent reply	other threads:[~2020-04-17 14:37 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-16  9:12 Mojca Miklavec
2020-04-16  9:29 ` Taco Hoekwater
2020-04-16 14:38   ` Mojca Miklavec
2020-04-16 14:52     ` Hans Hagen
2020-04-16 16:39       ` kaddour kardio
2020-04-16 17:46       ` template system (was: Best way to create a large number of documents from database) Henning Hraban Ramm
2020-04-16 17:57         ` template system Wolfgang Schuster
2020-04-16 18:23           ` Henning Hraban Ramm
2020-04-16 18:32       ` Best way to create a large number of documents from database Mojca Miklavec
2020-04-16 19:01         ` Pablo Rodriguez
2020-04-16 20:03         ` Hans Hagen
2020-04-17 14:37     ` Mojca Miklavec [this message]
2020-04-17 19:11       ` Hans Hagen
2020-04-23  6:48         ` Mojca Miklavec

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALBOmsbY4nCvPaef-DjOS-1MZmrzeNRLTg=Ko2YdPYipimLBqg@mail.gmail.com' \
    --to=mojca.miklavec.lists@gmail.com \
    --cc=ntg-context@ntg.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).