Best way to create a large number of documents from database

ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed

* Best way to create a large number of documents from database
@ 2020-04-16  9:12 Mojca Miklavec
  2020-04-16  9:29 ` Taco Hoekwater
  0 siblings, 1 reply; 14+ messages in thread
From: Mojca Miklavec @ 2020-04-16  9:12 UTC (permalink / raw)
  To: mailing list for ConTeXt users

Hi,

I have been asked to create a few thousand PDF documents from a CSV
"database" today (which I can easily transform into any other form,
like XML or a lua table or TeX definitions or whatever).

Generating a few thousand pages would be straightforward, but I'm sure
there are some clever ways to handle this scenario as well, I'm just
not aware of them :)

One option is that I quickly draft a python script that creates a few
thousand TeX documents and compiles them individually, but it might be
easier if there was a way to just create a single template document
and then run something like
    context --some-params --N=42 --output=document-0042.pdf template.tex
or something along those lines.

What's the best approach with the existing functionality? I would be
more than grateful for any hints.

Thank you very much,
    Mojca
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Best way to create a large number of documents from database
  2020-04-16  9:12 Best way to create a large number of documents from database Mojca Miklavec
@ 2020-04-16  9:29 ` Taco Hoekwater
  2020-04-16 14:38   ` Mojca Miklavec
  0 siblings, 1 reply; 14+ messages in thread
From: Taco Hoekwater @ 2020-04-16  9:29 UTC (permalink / raw)
  To: mailing list for ConTeXt users



> On 16 Apr 2020, at 11:12, Mojca Miklavec <mojca.miklavec.lists@gmail.com> wrote:
> 
> Hi,
> 
> I have been asked to create a few thousand PDF documents from a CSV
> "database" today (which I can easily transform into any other form,
> like XML or a lua table or TeX definitions or whatever).
> 
> Generating a few thousand pages would be straightforward, but I'm sure
> there are some clever ways to handle this scenario as well, I'm just
> not aware of them :)

In CPU cycles, the fastest way is to do a single context —once
run generating all the pages as a single document, then using
mutool merge to split it into separate documents using a (shell)
loop.

Starting up mutool is much faster than starting context, even with lmtx.


> One option is that I quickly draft a python script that creates a few
> thousand TeX documents and compiles them individually, but it might be
> easier if there was a way to just create a single template document
> and then run something like
>    context --some-params --N=42 --output=document-0042.pdf template.tex
> or something along those lines.

If you want to go this route (and you may have to if not each record
fits exactly within a single page), browse back a day or so in the mailing
list archive for Gerben’s question about 

  “Using command line values in a TeX document; writing a script?"

The replies offer various options using either lua or tex code
to get at user-supplied arguments from the commandline.

Best wishes,
Taco



___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Best way to create a large number of documents from database
  2020-04-16  9:29 ` Taco Hoekwater
@ 2020-04-16 14:38   ` Mojca Miklavec
  2020-04-16 14:52     ` Hans Hagen
  2020-04-17 14:37     ` Mojca Miklavec
  0 siblings, 2 replies; 14+ messages in thread
From: Mojca Miklavec @ 2020-04-16 14:38 UTC (permalink / raw)
  To: mailing list for ConTeXt users

On Thu, 16 Apr 2020 at 11:29, Taco Hoekwater wrote:
> > On 16 Apr 2020, at 11:12, Mojca Miklavec wrote:
> >
> > I have been asked to create a few thousand PDF documents from a CSV
> > "database" today
>
> In CPU cycles, the fastest way is to do a single context —once
> run generating all the pages as a single document, then using
> mutool merge to split it into separate documents using a (shell)
> loop.

Just to make it clear: I don't really need to optimize on the CPU end,
as the bottleneck is on the other side of the keyboard, so as long as
the CPU can process 5k pages today, I'm fine with it :) :) :)

> > One option is that I quickly draft a python script that creates a few
> > thousand TeX documents and compiles them individually, but it might be
> > easier if there was a way to just create a single template document
> > and then run something like
> >    context --some-params --N=42 --output=document-0042.pdf template.tex
> > or something along those lines.
>
> If you want to go this route (and you may have to if not each record
> fits exactly within a single page),

I do have one page per document. The more annoying part is having
strange document names that need more attention when mapping page
number -> name (I'm not saying this is not doable).

> browse back a day or so in the mailing
> list archive for Gerben’s question about
>
>   “Using command line values in a TeX document; writing a script?"

Thanks a lot for the pointer. I didn't have that much time to read
through all the emails recently, I only noticed that he was super
actively working on some metapost stuff, I wasn't paying attention to
this.

> The replies offer various options using either lua or tex code
> to get at user-supplied arguments from the commandline.

Let me see what I come up with, I'm stil fiddling with data & layout
at the moment :)

Mojca
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Best way to create a large number of documents from database
  2020-04-16 14:38   ` Mojca Miklavec
@ 2020-04-16 14:52     ` Hans Hagen
  2020-04-16 16:39       ` kaddour kardio
                         ` (2 more replies)
  2020-04-17 14:37     ` Mojca Miklavec
  1 sibling, 3 replies; 14+ messages in thread
From: Hans Hagen @ 2020-04-16 14:52 UTC (permalink / raw)
  To: mailing list for ConTeXt users, Mojca Miklavec

On 4/16/2020 4:38 PM, Mojca Miklavec wrote:
> On Thu, 16 Apr 2020 at 11:29, Taco Hoekwater wrote:
>>> On 16 Apr 2020, at 11:12, Mojca Miklavec wrote:
>>>
>>> I have been asked to create a few thousand PDF documents from a CSV
>>> "database" today
>>
>> In CPU cycles, the fastest way is to do a single context —once
>> run generating all the pages as a single document, then using
>> mutool merge to split it into separate documents using a (shell)
>> loop.
> 
> Just to make it clear: I don't really need to optimize on the CPU end,
> as the bottleneck is on the other side of the keyboard, so as long as
> the CPU can process 5k pages today, I'm fine with it :) :) :)

5K is nothing ... so that will work

>>> One option is that I quickly draft a python script that creates a few
>>> thousand TeX documents and compiles them individually, but it might be
>>> easier if there was a way to just create a single template document
>>> and then run something like
>>>     context --some-params --N=42 --output=document-0042.pdf template.tex
>>> or something along those lines.
>>
>> If you want to go this route (and you may have to if not each record
>> fits exactly within a single page),
> 
> I do have one page per document. The more annoying part is having
> strange document names that need more attention when mapping page
> number -> name (I'm not saying this is not doable).

so, don't make files:

- write a tex file foo.tex
- process it: context --batch --result=1 --once foo

etc ... so, use --result for the target name and use the same input name

(I won't bother you with the template system in context that no one 
knows of.)

  Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Best way to create a large number of documents from database
  2020-04-16 14:52     ` Hans Hagen
@ 2020-04-16 16:39       ` kaddour kardio
  2020-04-16 17:46       ` template system (was: Best way to create a large number of documents from database) Henning Hraban Ramm
  2020-04-16 18:32       ` Best way to create a large number of documents from database Mojca Miklavec
  2 siblings, 0 replies; 14+ messages in thread
From: kaddour kardio @ 2020-04-16 16:39 UTC (permalink / raw)
  To: mailing list for ConTeXt users


[-- Attachment #1.1: Type: text/plain, Size: 2893 bytes --]

A relatively simple way is to use a templating system such as jinja2 and
iterate over a mkiv template.
Calling context with subprocess and you got the result.

Le jeu. 16 avr. 2020 à 15:52, Hans Hagen <j.hagen@xs4all.nl> a écrit :

> On 4/16/2020 4:38 PM, Mojca Miklavec wrote:
> > On Thu, 16 Apr 2020 at 11:29, Taco Hoekwater wrote:
> >>> On 16 Apr 2020, at 11:12, Mojca Miklavec wrote:
> >>>
> >>> I have been asked to create a few thousand PDF documents from a CSV
> >>> "database" today
> >>
> >> In CPU cycles, the fastest way is to do a single context —once
> >> run generating all the pages as a single document, then using
> >> mutool merge to split it into separate documents using a (shell)
> >> loop.
> >
> > Just to make it clear: I don't really need to optimize on the CPU end,
> > as the bottleneck is on the other side of the keyboard, so as long as
> > the CPU can process 5k pages today, I'm fine with it :) :) :)
>
> 5K is nothing ... so that will work
>
> >>> One option is that I quickly draft a python script that creates a few
> >>> thousand TeX documents and compiles them individually, but it might be
> >>> easier if there was a way to just create a single template document
> >>> and then run something like
> >>>     context --some-params --N=42 --output=document-0042.pdf
> template.tex
> >>> or something along those lines.
> >>
> >> If you want to go this route (and you may have to if not each record
> >> fits exactly within a single page),
> >
> > I do have one page per document. The more annoying part is having
> > strange document names that need more attention when mapping page
> > number -> name (I'm not saying this is not doable).
>
> so, don't make files:
>
> - write a tex file foo.tex
> - process it: context --batch --result=1 --once foo
>
> etc ... so, use --result for the target name and use the same input name
>
> (I won't bother you with the template system in context that no one
> knows of.)
>
>   Hans
>
> -----------------------------------------------------------------
>                                            Hans Hagen | PRAGMA ADE
>                Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
>         tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
> -----------------------------------------------------------------
>
> ___________________________________________________________________________________
> If your question is of interest to others as well, please add an entry to
> the Wiki!
>
> maillist : ntg-context@ntg.nl /
> http://www.ntg.nl/mailman/listinfo/ntg-context
> webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
> archive  : https://bitbucket.org/phg/context-mirror/commits/
> wiki     : http://contextgarden.net
>
> ___________________________________________________________________________________
>

[-- Attachment #1.2: Type: text/html, Size: 4219 bytes --]

[-- Attachment #2: Type: text/plain, Size: 493 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: template system (was: Best way to create a large number of documents from database)
  2020-04-16 14:52     ` Hans Hagen
  2020-04-16 16:39       ` kaddour kardio
@ 2020-04-16 17:46       ` Henning Hraban Ramm
  2020-04-16 17:57         ` template system Wolfgang Schuster
  2020-04-16 18:32       ` Best way to create a large number of documents from database Mojca Miklavec
  2 siblings, 1 reply; 14+ messages in thread
From: Henning Hraban Ramm @ 2020-04-16 17:46 UTC (permalink / raw)
  To: mailing list for ConTeXt users

> Am 16.04.2020 um 16:52 schrieb Hans Hagen <j.hagen@xs4all.nl>:
> 
> (I won't bother you with the template system in context that no one knows of.)

If you throw such bones, I get hungry – where’s the flesh? (Where is this in the sources? Is there any documentation?)

I often need ConTeXt templates and mostly use simple replacements (like TITLE, CONTENT), but I’m used to Django templates (earlier Smarty/PHP and Freemarker/Java) and recently used Jinja2 with LaTeX.

Best, Hraban
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: template system
  2020-04-16 17:46       ` template system (was: Best way to create a large number of documents from database) Henning Hraban Ramm
@ 2020-04-16 17:57         ` Wolfgang Schuster
  2020-04-16 18:23           ` Henning Hraban Ramm
  0 siblings, 1 reply; 14+ messages in thread
From: Wolfgang Schuster @ 2020-04-16 17:57 UTC (permalink / raw)
  To: mailing list for ConTeXt users, Henning Hraban Ramm

Henning Hraban Ramm schrieb am 16.04.2020 um 19:46:
> 
>> Am 16.04.2020 um 16:52 schrieb Hans Hagen <j.hagen@xs4all.nl>:
>>
>> (I won't bother you with the template system in context that no one knows of.)
> 
> If you throw such bones, I get hungry – where’s the flesh? (Where is this in the sources? Is there any documentation?)

Look in the manual folder: templates-mkiv.pdf

Wolfgang
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: template system
  2020-04-16 17:57         ` template system Wolfgang Schuster
@ 2020-04-16 18:23           ` Henning Hraban Ramm
  0 siblings, 0 replies; 14+ messages in thread
From: Henning Hraban Ramm @ 2020-04-16 18:23 UTC (permalink / raw)
  To: mailing list for ConTeXt users


> Am 16.04.2020 um 19:57 schrieb Wolfgang Schuster <wolfgang.schuster.lists@gmail.com>:
> 
> Henning Hraban Ramm schrieb am 16.04.2020 um 19:46:
>>> Am 16.04.2020 um 16:52 schrieb Hans Hagen <j.hagen@xs4all.nl>:
>>> 
>>> (I won't bother you with the template system in context that no one knows of.)
>> If you throw such bones, I get hungry – where’s the flesh? (Where is this in the sources? Is there any documentation?)
> 
> Look in the manual folder: templates-mkiv.pdf

Ah, the LMX templates. Of course I already had this in my bibliography but never had a deeper look, since it looked too Lua-centric to me.
Thank you!

Best, Hraban
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Best way to create a large number of documents from database
  2020-04-16 14:52     ` Hans Hagen
  2020-04-16 16:39       ` kaddour kardio
  2020-04-16 17:46       ` template system (was: Best way to create a large number of documents from database) Henning Hraban Ramm
@ 2020-04-16 18:32       ` Mojca Miklavec
  2020-04-16 19:01         ` Pablo Rodriguez
  2020-04-16 20:03         ` Hans Hagen
  2 siblings, 2 replies; 14+ messages in thread
From: Mojca Miklavec @ 2020-04-16 18:32 UTC (permalink / raw)
  To: Hans Hagen; +Cc: mailing list for ConTeXt users

On Thu, 16 Apr 2020 at 16:52, Hans Hagen wrote:
> On 4/16/2020 4:38 PM, Mojca Miklavec wrote:
> > On Thu, 16 Apr 2020 at 11:29, Taco Hoekwater wrote:
> >>> On 16 Apr 2020, at 11:12, Mojca Miklavec wrote:
> >>>
> >>> One option is that I quickly draft a python script that creates a few
> >>> thousand TeX documents and compiles them individually, but it might be
> >>> easier if there was a way to just create a single template document
> >>> and then run something like
> >>>     context --some-params --N=42 --output=document-0042.pdf template.tex
> >>> or something along those lines.
> >>
> >> If you want to go this route (and you may have to if not each record
> >> fits exactly within a single page),
> >
> > I do have one page per document. The more annoying part is having
> > strange document names that need more attention when mapping page
> > number -> name (I'm not saying this is not doable).
>
> so, don't make files:
>
> - write a tex file foo.tex
> - process it: context --batch --result=1 --once foo
>
> etc ... so, use --result for the target name and use the same input name

This works just perfect, thank you very much.

I now have template.tex and process it with
    context --batch --result=doc-0042 --someparam=21a --once template
which generates precisely the desired doc-0042.pdf.

For the moment I'm simply using a combination of
    \doifdocumentargument {someparam} {\getdocumentargument{someparam}}
from TeX and
    environment.arguments
from within the lua code as suggested by Taco and you in the previous
email thread.

Where would be the best way to document this / under what wiki topic,
as I'm sure I'll need it again and forget until then unless I write it
down immediately? "Mail merge"? ;)

Thank you very much,
    Mojca
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Best way to create a large number of documents from database
  2020-04-16 18:32       ` Best way to create a large number of documents from database Mojca Miklavec
@ 2020-04-16 19:01         ` Pablo Rodriguez
  2020-04-16 20:03         ` Hans Hagen
  1 sibling, 0 replies; 14+ messages in thread
From: Pablo Rodriguez @ 2020-04-16 19:01 UTC (permalink / raw)
  To: ntg-context

On 4/16/20 8:32 PM, Mojca Miklavec wrote:
> [...]
> Where would be the best way to document this / under what wiki topic,
> as I'm sure I'll need it again and forget until then unless I write it
> down immediately? "Mail merge"? ;)

Hi Mojca,

“Document merge” could be also fine.

Pablo
--
http://www.ousia.tk
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Best way to create a large number of documents from database
  2020-04-16 18:32       ` Best way to create a large number of documents from database Mojca Miklavec
  2020-04-16 19:01         ` Pablo Rodriguez
@ 2020-04-16 20:03         ` Hans Hagen
  1 sibling, 0 replies; 14+ messages in thread
From: Hans Hagen @ 2020-04-16 20:03 UTC (permalink / raw)
  To: Mojca Miklavec; +Cc: mailing list for ConTeXt users

On 4/16/2020 8:32 PM, Mojca Miklavec wrote:

> Where would be the best way to document this / under what wiki topic,
> as I'm sure I'll need it again and forget until then unless I write it
> down immediately? "Mail merge"? ;)
maybe a 'workflows' entry?

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Best way to create a large number of documents from database
  2020-04-16 14:38   ` Mojca Miklavec
  2020-04-16 14:52     ` Hans Hagen
@ 2020-04-17 14:37     ` Mojca Miklavec
  2020-04-17 19:11       ` Hans Hagen
  1 sibling, 1 reply; 14+ messages in thread
From: Mojca Miklavec @ 2020-04-17 14:37 UTC (permalink / raw)
  To: mailing list for ConTeXt users

On Thu, 16 Apr 2020 at 16:38, Mojca Miklavec wrote:
> On Thu, 16 Apr 2020 at 11:29, Taco Hoekwater wrote:
> > > On 16 Apr 2020, at 11:12, Mojca Miklavec wrote:
> > >
> > > I have been asked to create a few thousand PDF documents from a CSV
> > > "database" today
> >
> > In CPU cycles, the fastest way is to do a single context —once
> > run generating all the pages as a single document, then using
> > mutool merge to split it into separate documents using a (shell)
> > loop.
>
> Just to make it clear: I don't really need to optimize on the CPU end,

... says the optimist ... :) :) :)

> as the bottleneck is on the other side of the keyboard, so as long as
> the CPU can process 5k pages today, I'm fine with it :) :) :)

While the bottleneck was in fact at the other side of the keyboard
(preparation was certainly longer than the execution), it still took
cca 2,5 hours to generate the full batch.

(I'm pretty sure I could have further optimised the code, even though
1 second per run is still pretty fast [when I started using context it
was more like 30 seconds per run], it just adds up when talking about
thousands of pages. This greatly reminds me on the awesome speedup
that Hans achieved when rewriting the mplib code & the initial
\sometxt changes inside metapost which also lead to 100-fold speedups
as one no longer needed to start TeX a zillion times.)

While waiting I wanted to start being clever and do the processing in
the same folder in parallel (I have lots of cores after all), and
ended up calling a script with
    context --N={n} --output=doc-{nnnn}.pdf template.tex
    context --purge
only to notice much later that running multiple context runs in the
same folder (some of them compiling and some of them deleting the
temporary files) might not have been the best idea on the planet, many
documents ended up missing, and many corrupted. So I had to rerun half
of the documents.

One of the interesting statistics.
I used a bunch of images (the same png images in all documents; cca.
290k in total).

The generated documents were 1,5 GB in size. When compressed with
tar.gz, there was almost no noticeable difference between the
compressed and non-compressed data size (1,4 GB vs. 1,5 GB). But when
compressing with tar.xz, it compressed 1,5 GB worth of document into
merely 27 MB (a single document is 360 k).

The documents have been e-mailed out, but now they need to print hard
copies for archive. I'm happy I don't need to be the one printing and
storing that :) :) :)

Mojca
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Best way to create a large number of documents from database
  2020-04-17 14:37     ` Mojca Miklavec
@ 2020-04-17 19:11       ` Hans Hagen
  2020-04-23  6:48         ` Mojca Miklavec
  0 siblings, 1 reply; 14+ messages in thread
From: Hans Hagen @ 2020-04-17 19:11 UTC (permalink / raw)
  To: mailing list for ConTeXt users, Mojca Miklavec

On 4/17/2020 4:37 PM, Mojca Miklavec wrote:

> One of the interesting statistics.
> I used a bunch of images (the same png images in all documents; cca.
> 290k in total).

It can actually make a difference what kind of png image you use. Some 
png images demand a conversion (or split of map etc) to the format 
supported by pdf. Often converting the png to pdf and include those is 
faster.
  Hans


-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Best way to create a large number of documents from database
  2020-04-17 19:11       ` Hans Hagen
@ 2020-04-23  6:48         ` Mojca Miklavec
  0 siblings, 0 replies; 14+ messages in thread
From: Mojca Miklavec @ 2020-04-23  6:48 UTC (permalink / raw)
  To: Hans Hagen; +Cc: mailing list for ConTeXt users

On Fri, 17 Apr 2020 at 21:11, Hans Hagen wrote:
> On 4/17/2020 4:37 PM, Mojca Miklavec wrote:
>
> > One of the interesting statistics.
> > I used a bunch of images (the same png images in all documents; cca.
> > 290k in total).
>
> It can actually make a difference what kind of png image you use. Some
> png images demand a conversion (or split of map etc) to the format
> supported by pdf. Often converting the png to pdf and include those is
> faster.

Thanks for the hint. But I tested it and it hardly makes any difference.
I had to make another batch for the archive (creating a single
document with 4k+ pages), and the full process ran in 10 minutes
(compared to cca. 2,5 hours to create individual documents). Just for
a test run I completely **removed** all the images and it only
accounted for some 10 or 20 seconds speedup. So the biggest overhead
still seems to be in warming up the machinery (which includes my share
of overhead for reading in the 1,3 MB lua table with all data entries)
and Taco's hint of using an external tool for splicing would have
probably scored best :)

I need to add that I'm extremely happy about the resource reuse
(mostly images). As I already mentioned before, individual documents
were 1,5 GB in total, and a badly written software would have created
an equally bad cumulative PDF, while ConTeXt generates a merely 17 MB
file with 4k+ pages. It's really impressive.

Mojca
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-04-23  6:48 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-16  9:12 Best way to create a large number of documents from database Mojca Miklavec
2020-04-16  9:29 ` Taco Hoekwater
2020-04-16 14:38   ` Mojca Miklavec
2020-04-16 14:52     ` Hans Hagen
2020-04-16 16:39       ` kaddour kardio
2020-04-16 17:46       ` template system (was: Best way to create a large number of documents from database) Henning Hraban Ramm
2020-04-16 17:57         ` template system Wolfgang Schuster
2020-04-16 18:23           ` Henning Hraban Ramm
2020-04-16 18:32       ` Best way to create a large number of documents from database Mojca Miklavec
2020-04-16 19:01         ` Pablo Rodriguez
2020-04-16 20:03         ` Hans Hagen
2020-04-17 14:37     ` Mojca Miklavec
2020-04-17 19:11       ` Hans Hagen
2020-04-23  6:48         ` Mojca Miklavec

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).