ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
From: Lars Huttar <lars_huttar@sil.org>
To: mailing list for ConTeXt users <ntg-context@ntg.nl>
Subject: Re: distributed / parallel TeX?
Date: Tue, 16 Dec 2008 12:13:58 -0600	[thread overview]
Message-ID: <4947EFE6.6050302@sil.org> (raw)
In-Reply-To: <49476213.9020709@elvenkind.com>

On 12/16/2008 2:08 AM, Taco Hoekwater wrote:
> 
> Hi Lars,
> 
> Lars Huttar wrote:
>> Hello,
>>
>> We've been using TeX to typeset a 1200-page book, and at that size, the
>> time it takes to run becomes a big issue (especially with multiple
>> passes... about 8 on average). It takes us anywhere from 80 minutes on
>> our fastest machine, to 9 hours on our slowest laptop.
> 
> You should not need an average of 8 runs unless your document is
> ridiculously complex and I am curious what you are doing (but that
> is a different issue from what you are asking).
> 
>> So the question comes up, can TeX runs take advantage of parallelized or
>> distributed processing? 
> 
> No. For the most part, this is because of another requisite: for
> applications to make good use of threads, they have to deal with a
> problem that can be parallelized well. And generally speaking,
> typesetting  does not fall in this category. A seemingly small change
> on page 4 can easily affect each and every page right to the end
> of the document.

Thank you for your response.

Certainly this is true in general and in the worst case, as things stand
currently. But I don't think it has to be that way. The following could
greatly mitigate that problem:

- You could design your document *specifically* to make the parts
independent, so that the true and authoritative way to typeset them is
to typeset the parts independently. (You can do this part now without
modifying TeX at all... you just have the various sections' .tex files
input common "headers" / macro defs.) Then, by definition, a change in
one section cannot affect another section (except for page numbers, and
possibly left/right pages, q.v. below).

- Most large works are divisible into chunks separated by page breaks
and possibly page breaks that force a "recto". This greatly limits the
effects that any section can have on another. The division ("chunking")
of the whole document into fairly-separate parts could either be done
manually, or if there are clear page breaks, automatically.

- The remaining problem, as you noted, is how to fix page references
from one section to another. Currently, TeX resolves forward references
by doing a second (or third, ...) pass, which uses page information from
the previous pass. The same technique could be used for resolving
inter-chunk references and determining on what page each chunk should
start. After one pass on of the independent chunks (ideally performed
simultaneously by separate processing nodes), page information is sent
from each node to a "coordinator" process. E.g. the node that processed
section two tells the coordinator that chapter 11 starts 37 pages after
the beginning of section two. The coordinator knows in what sequence the
chunks are to be concatenated, thanks to a config file. It uses this
information together with info from each of the nodes to build a table
of what page each chunk should start on, and a table giving the absolute
page number of each page reference. If pagination has changed, or is
new, this info is sent back to the various nodes for another round of
processing.

If this distributed method of typesetting a document takes 1 additional
iteration compared to doing it in series, but you get to split the
document into say 5 roughly equal parts, you could presumably get the
job done a lot quicker in spite of the extra iteration.

This is a crude description but hopefully the idea is clear enough.

>> parallel pieces so that you could guarantee that you would get the same
>> result for section B whether or not you were typesetting the whole book
>> at the same time?
> 
> if you are willing to promiss yourself that all chapters will be exactly
> 20 pages - no more, no less - they you can split the work off into
> separate job files yourself and take advantage of a whole server
> farm. If you can't ...

Yes, the splitting can be done manually now, and when the pain point
gets high enough, we do some manual separate TeX runs.

However, I'm thinking that for large works, there is enough gain to be
had that it would be worth systematizing the splitting process and
especially the recombining process, since the later is more error-prone.

I think people would do it a lot more if there were automation support
for it. I know we would.

But then, maybe our situation of having a large book with dual columns
and multipage tables is not common enough in the TeX world.
Maybe others who are typesetting similar books just use commercial
WYSIWYG typesetting tools, as we did in the previous edition of this book.

Lars
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


  reply	other threads:[~2008-12-16 18:13 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-15 23:06 Lars Huttar
2008-12-16  8:08 ` Taco Hoekwater
2008-12-16 18:13   ` Lars Huttar [this message]
2008-12-16 21:31     ` Martin Schröder
2008-12-16 22:10       ` Lars Huttar
2008-12-16 22:17         ` Martin Schröder
2008-12-17  8:47           ` Taco Hoekwater
2008-12-16 21:15   ` luigi scarso
2008-12-16 23:02     ` Lars Huttar
2008-12-17  8:22       ` Hans Hagen
2008-12-17  8:53         ` luigi scarso
2008-12-17 13:50           ` Lars Huttar
2008-12-16  9:07 ` Hans Hagen
2008-12-16 15:06   ` Aditya Mahajan
2008-12-16 15:53     ` Hans Hagen
2008-12-16 17:25       ` Lars Huttar
2008-12-16 17:37         ` Hans Hagen
2008-12-16 19:28           ` Lars Huttar
2008-12-17  2:57             ` Yue Wang
2008-12-23  3:48             ` error when using uniqueMPgraphics Lars Huttar
2008-12-23  5:33               ` Lars Huttar
2008-12-23  7:30               ` Wolfgang Schuster
2008-12-16 18:40         ` distributed / parallel TeX? Mojca Miklavec

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4947EFE6.6050302@sil.org \
    --to=lars_huttar@sil.org \
    --cc=ntg-context@ntg.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).