From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/46359 Path: news.gmane.org!not-for-mail From: Lars Huttar Newsgroups: gmane.comp.tex.context Subject: Re: distributed / parallel TeX? Date: Tue, 16 Dec 2008 12:13:58 -0600 Message-ID: <4947EFE6.6050302@sil.org> References: <4946E2E2.1050108@sil.org> <49476213.9020709@elvenkind.com> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1229451957 7629 80.91.229.12 (16 Dec 2008 18:25:57 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 16 Dec 2008 18:25:57 +0000 (UTC) To: mailing list for ConTeXt users Original-X-From: ntg-context-bounces@ntg.nl Tue Dec 16 19:27:02 2008 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane.org Original-Received: from ronja.vet.uu.nl ([131.211.172.88] helo=ronja.ntg.nl) by lo.gmane.org with esmtp (Exim 4.50) id 1LCed9-0003yd-Gt for gctc-ntg-context-518@m.gmane.org; Tue, 16 Dec 2008 19:26:51 +0100 Original-Received: from localhost (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id 9A2D41FC93; Tue, 16 Dec 2008 19:25:37 +0100 (CET) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 03672-08; Tue, 16 Dec 2008 19:24:54 +0100 (CET) Original-Received: from ronja.vet.uu.nl (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id 9B54A1FBC1; Tue, 16 Dec 2008 19:19:06 +0100 (CET) Original-Received: from localhost (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id CA9A51FB5B for ; Tue, 16 Dec 2008 19:19:04 +0100 (CET) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 30926-01-21 for ; Tue, 16 Dec 2008 19:17:45 +0100 (CET) Original-Received: from smtp1.wsfo.org (smtp1.wsfo.org [208.145.81.51]) by ronja.ntg.nl (Postfix) with ESMTP id DFA151FBC9 for ; Tue, 16 Dec 2008 19:14:01 +0100 (CET) Original-Received: from mail.link77.net (mail.link77.net [172.22.0.125]) by smtp1.wsfo.org (8.13.1/8.13.1) with ESMTP id mBGIE0Fp023314 (version=TLSv1/SSLv3 cipher=DES-CBC3-SHA bits=168 verify=NO) for ; Tue, 16 Dec 2008 13:14:00 -0500 X-CGP-ClamAV-Result: CLEAN X-VirusScanner: Niversoft's CGPClamav Helper v1.8.2 (ClamAV engine v0.94.1) Original-Received: from [172.20.4.229] (account lars_huttar@sil.org [172.20.4.229] verified) by mail.link77.net (CommuniGate Pro SMTP 5.2.10) with ESMTPSA id 203580827 for ntg-context@ntg.nl; Tue, 16 Dec 2008 13:14:00 -0500 User-Agent: Thunderbird 2.0.0.18 (Windows/20081105) In-Reply-To: <49476213.9020709@elvenkind.com> X-Enigmail-Version: 0.95.7 X-Scanned-By: MIMEDefang 2.62 on 172.22.0.51 X-Virus-Scanned: amavisd-new at ntg.nl X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.11 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: ntg-context-bounces@ntg.nl Errors-To: ntg-context-bounces@ntg.nl X-Virus-Scanned: amavisd-new at ntg.nl Xref: news.gmane.org gmane.comp.tex.context:46359 Archived-At: On 12/16/2008 2:08 AM, Taco Hoekwater wrote: > > Hi Lars, > > Lars Huttar wrote: >> Hello, >> >> We've been using TeX to typeset a 1200-page book, and at that size, the >> time it takes to run becomes a big issue (especially with multiple >> passes... about 8 on average). It takes us anywhere from 80 minutes on >> our fastest machine, to 9 hours on our slowest laptop. > > You should not need an average of 8 runs unless your document is > ridiculously complex and I am curious what you are doing (but that > is a different issue from what you are asking). > >> So the question comes up, can TeX runs take advantage of parallelized or >> distributed processing? > > No. For the most part, this is because of another requisite: for > applications to make good use of threads, they have to deal with a > problem that can be parallelized well. And generally speaking, > typesetting does not fall in this category. A seemingly small change > on page 4 can easily affect each and every page right to the end > of the document. Thank you for your response. Certainly this is true in general and in the worst case, as things stand currently. But I don't think it has to be that way. The following could greatly mitigate that problem: - You could design your document *specifically* to make the parts independent, so that the true and authoritative way to typeset them is to typeset the parts independently. (You can do this part now without modifying TeX at all... you just have the various sections' .tex files input common "headers" / macro defs.) Then, by definition, a change in one section cannot affect another section (except for page numbers, and possibly left/right pages, q.v. below). - Most large works are divisible into chunks separated by page breaks and possibly page breaks that force a "recto". This greatly limits the effects that any section can have on another. The division ("chunking") of the whole document into fairly-separate parts could either be done manually, or if there are clear page breaks, automatically. - The remaining problem, as you noted, is how to fix page references from one section to another. Currently, TeX resolves forward references by doing a second (or third, ...) pass, which uses page information from the previous pass. The same technique could be used for resolving inter-chunk references and determining on what page each chunk should start. After one pass on of the independent chunks (ideally performed simultaneously by separate processing nodes), page information is sent from each node to a "coordinator" process. E.g. the node that processed section two tells the coordinator that chapter 11 starts 37 pages after the beginning of section two. The coordinator knows in what sequence the chunks are to be concatenated, thanks to a config file. It uses this information together with info from each of the nodes to build a table of what page each chunk should start on, and a table giving the absolute page number of each page reference. If pagination has changed, or is new, this info is sent back to the various nodes for another round of processing. If this distributed method of typesetting a document takes 1 additional iteration compared to doing it in series, but you get to split the document into say 5 roughly equal parts, you could presumably get the job done a lot quicker in spite of the extra iteration. This is a crude description but hopefully the idea is clear enough. >> parallel pieces so that you could guarantee that you would get the same >> result for section B whether or not you were typesetting the whole book >> at the same time? > > if you are willing to promiss yourself that all chapters will be exactly > 20 pages - no more, no less - they you can split the work off into > separate job files yourself and take advantage of a whole server > farm. If you can't ... Yes, the splitting can be done manually now, and when the pain point gets high enough, we do some manual separate TeX runs. However, I'm thinking that for large works, there is enough gain to be had that it would be worth systematizing the splitting process and especially the recombining process, since the later is more error-prone. I think people would do it a lot more if there were automation support for it. I know we would. But then, maybe our situation of having a large book with dual columns and multipage tables is not common enough in the TeX world. Maybe others who are typesetting similar books just use commercial WYSIWYG typesetting tools, as we did in the previous edition of this book. Lars ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________