ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* Parallelizing typesetting of large documents with lots of cross-references
@ 2020-12-03 11:04 Stephen Gaito
  2020-12-03 11:32 ` Taco Hoekwater
  2020-12-03 12:21 ` Hans Hagen
  0 siblings, 2 replies; 3+ messages in thread
From: Stephen Gaito @ 2020-12-03 11:04 UTC (permalink / raw)
  To: mailing list for ConTeXt users

Hello,

This email is largely a simple notification of one "Fool's" dream...

("Only Fools rush in where Angels fear to tread").

I am currently attempting to create "a" (crude) "tool" with which I can
typeset:

- very large (1,000+ pages),
- highly cross-referenced documents,
- with embedded literate-programmed code (which needs
  concurrent compiling and execution),
- containing multiple MetaFun graphics,

all based upon ConTeXt-LMTX.

"In theory", it should be possible to typeset individual "sub-documents"
(any section which is known to start on a page boundary rather than
inside a page), and then re-combine the individual PDFs back into one
single PDF for the whole document (complete with control over the page
numbering).

The inherent problem is that the *whole* of a ConTeXt document depends
upon cross-references from *everywhere* else in the document. TeX and
ConTeXt "solve" this problem by using a multi-pass approach (in, for
example, 5 passes for the `luametatex` document).

Between each pass, ConTeXt saves this multi-pass data (page
numbers and cross-references) in the `*.tuc` file.

Clearly any parallelization approach needs to have a process which
coordinates the update and re-distribution of any changes in this
multi-pass data obtained by typesetting each "sub-document".

My current approach is to have a federation of Docker/Podman "pods".
Each "pod" would have a number of ConTeXt workers, as well as
(somewhere in the federation) a Lua based Multi-Pass-Data-coordinator.

All work would be coordinated by messages sent and received over a
corresponding federation of [NATS servers](https://nats.io/). (Neither
[Podman](https://podman.io/) pods nor NATS message coordination are
problems at the moment).

--------------------------------------------------------------------
**The real problem**, for typesetting a ConTeXt document, is the design
of the critical process which will act as a
"Multi-Pass-Data-coordinator".
--------------------------------------------------------------------

All ConTeXt sub-documents would be typeset in "once" mode using the
latest complete set of "Multi-Pass-Data" obtained from the central
coordinator. Then, once each typesetting run is complete, the resulting
"Multi-Pass-Data" would be sent back to the coordinator to be used to
update the coordinator's complete set of "Multi-Pass-Data" ready for
any required next typesetting pass.

(From the `context --help`:
>mtx-context | --once only run once (no multipass data file is produced)
I will clearly have to patch(?) the mtx-context.lua script to allow
multipass data to be produced... this is probably not a problem).

(There would also be a number of additional processes/containers for
dependency analysis, build sequencing, compilation of code,
execution or interpretation of the code, stitching the PDFs back into
one PDF, etc -- these processes are also not the really critical
problem at the moment).

--------------------------------------------------------------------
QUESTIONS:

1. Are there any other known attempts to parallelize context?

2. Are there any other obvious problems with my approach?

3. Is there any existing documentation on the contents of the `*.tuc`
   file?

4. If there is no such documentation, is there any naming pattern of
   the Lua functions which get/set this multi-pass information that I
   should be aware of?
--------------------------------------------------------------------

Many thanks for all of the very useful comments so far...

Regards,

Stephen Gaito
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Parallelizing typesetting of large documents with lots of cross-references
  2020-12-03 11:04 Parallelizing typesetting of large documents with lots of cross-references Stephen Gaito
@ 2020-12-03 11:32 ` Taco Hoekwater
  2020-12-03 12:21 ` Hans Hagen
  1 sibling, 0 replies; 3+ messages in thread
From: Taco Hoekwater @ 2020-12-03 11:32 UTC (permalink / raw)
  To: mailing list for ConTeXt users



> On 3 Dec 2020, at 12:04, Stephen Gaito <stephen@perceptisys.co.uk> wrote:
> 
> 1. Are there any other known attempts to parallelize context?

Not that I know of, except for the tricks I mentioned in my earlier mail today.

> 2. Are there any other obvious problems with my approach?

The big problem with references is that changed / resolved references can 
change other (future) references because the typeset length can be different,
shifting a following reference to another page, which in turn can push
another reference to yet another page, perhaps changing a page break, et cetera. 

That is why the meta manual needs five runs, otherwise a max of two runs would 
always be enough (assuming no outside processing like generating a bibliography 
or index is needed). So your —once approach may fail in some cases, sorry.

Actually, the meta manual really *needs* only four runs. The last run is the one 
that verifies that the .tuc file has not changed (that is why a ConTeXt document
with no cross-references at all uses two runs, and is one of the reasons for 
the existence of the —once switch).

Depending on your docs, you may be able to skip a run by using —runs yourself.

Best wishes,
Taco



___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Parallelizing typesetting of large documents with lots of cross-references
  2020-12-03 11:04 Parallelizing typesetting of large documents with lots of cross-references Stephen Gaito
  2020-12-03 11:32 ` Taco Hoekwater
@ 2020-12-03 12:21 ` Hans Hagen
  1 sibling, 0 replies; 3+ messages in thread
From: Hans Hagen @ 2020-12-03 12:21 UTC (permalink / raw)
  To: mailing list for ConTeXt users, Stephen Gaito

On 12/3/2020 12:04 PM, Stephen Gaito wrote:

> - very large (1,000+ pages),

not that large, literate code is often verbatim so that doesn't take 
much runtime either

> - highly cross-referenced documents,

ok, that demands runs

> - with embedded literate-programmed code (which needs
>    concurrent compiling and execution),

you only need to process those snippets when something has changed and 
there are ways in context to deal with that (like \typesetbuffer and 
such which only processes when something changed between runs)

> - containing multiple MetaFun graphics,

those don't take time assuming effecitne metapost code
  Hans


-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
        tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-12-03 12:21 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-03 11:04 Parallelizing typesetting of large documents with lots of cross-references Stephen Gaito
2020-12-03 11:32 ` Taco Hoekwater
2020-12-03 12:21 ` Hans Hagen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).