[not CCing Matt because his address didn't come through to the list] Hi Matt, At 2022-07-09T08:58:10+1000, Warren Toomey via TUHS wrote: > ----- Forwarded message from Matt Gilmore ----- > > Subject: Documents for UNIX Collections > > Good afternoon everyone, my name is Matt Gilmore, and I recently > worked with some folks here to help facilitate the scanning and > release of the "Documents for UNIX" package as well as a few odds and > ends pertinent to UNIX/TS 4.0. I've been researching pretty heavily > the history of published memoranda and how they ultimately became the > formal documents that Western Electric first published with UNIX/TS > 5.0 and System V. Think the User's Guide, Graphics Guide, etc. That's excellent work--thank you for doing it! > One of the projects I'm working on (slowly) is comparing these > documents with the 4.0 docs I scanned for Arnold and making edits to > the *ROFF sources with the hopes I could then use them to produce 1:1 > clean copies of the 4.0 docs, while providing an easy means for > diff'ing the documents as well (to flush out changes between 3.0 and > 4.0). Are you using groff to do your rendering? If so, please consider me a resource; I've been the most active groff developer for the past 4 years. (I am, however, not the release manager--we're feeling heavily pregnant with groff 1.23, 3.5 years in the making.) Some of the following issues may be familiar to you; I apologize if I wear a rut in well-trodden ground here. I am wondering what you mean by "1:1 clean copies". I embarked on a similar exercise only about a week ago with the Kernighan & Cherry document "Typesetting Mathematics -- User's Guide (Second Edition)", which was part of Volume 2 of the V7 Unix Programmer's Manual. In the course of that effort I learned several things. I identified (and fixed) bugs in groff's ms(7) implementation, and to my surprise also discovered one in, apparently, V7 troff that caused an equation at the bottom of a column to go missing. Because groff was independently developed, the equation sprung back to life in its rendering. You can find a narrative of my experiences at the following thread, along with commentary from others. https://lists.gnu.org/archive/html/groff/2022-07/msg00000.html Pixel-perfect matching of C/A/T (or APS-5, etc.) output will be impossible because the fonts are different. More than that, the font _metrics_ are different, which means lines will not always fill the same when comparing historical typesetter output and a modern implementation's (this will be true even if you use Heirloom Doctools Troff, which is descended from V7 Unix, but has seen many changes over the years, starting with Kernighan's revision for device-independence ca. 1980, plus many changes for the commercial Documenter's Workbench product, and then many more by Gunnar Ritter and his successors in the Heirloom project). Beyond that, Unix troff and groff use different hyphenation systems. I don't know how stable Unix troff's was over time. All of that said, with the Kernighan and Cherry document, by spending just a few minutes eyeballing old scans and groff PostScript output, flicking between two fullscreen viewers like an ersatz blink comparator, and using binary search to tweak the ms(7) LL, PO, and MINGW registers, I was able to _almost_ perfectly match column and page breaks between the two renderings, which was a higher fidelity of reproduction than I expected. The risen equation noted above was the most dramatic change. Encouraged by that experience, I also reset the V7 Unix version of the article "A System for Typesetting Mathematics". This apparently was _not_ published in the Programmer's Manual, possibly because much of its content was duplicated in the user's guide. But the amount of effort required of me was shockingly low. On the other hand, for this I didn't have an authentically typeset copy to compare to, so all I did was look for what I would consider rendering errors as opposed to cosmetic changes. (Maybe this the standard you want to apply in your own work?) I'm attaching a diff. Another apparent difference arises between V7 Unix eqn and groff eqn; in eqn input such as "lim from {x-> pi /2} ( tan~x) sup{sin~2x}~=~1", V7 eqn will recognize "->" as beginning a new token and convert it to a right arrow glyph in the output, despite the manual (as I understand it) implying that it won't. groff eqn _does_ require token separation in this case. I say that differences are "apparent", rather than making the stronger claim of outright bugs in V7 Unix tools mainly because I don't have a cat2dit(1) tool I can run in my V7 Unix environment in SIMH. In my opinion such a tool (in K&R C, of course) would be well worth having. Right now, to satisfy myself of V7 Unix troff behavior I have to produce an octal dump of the typesetter output, pull it out of the emulation environment with copy-and-paste, undump it with a custom program (xxd is not helpful), and then give the reconstructed C/A/T stream to an interpreter written by John Garder in JavaScript. John's tool (and his personal assistance) has proven invaluable, but it's a component of a larger project of his that renders device-independent troff output in a Web browser window. For this to be practical he has to introduce additional device-independent troff commands into the output. I'd prefer something more rabidly puritan (and, if I'm honest, something written in a more traditional Unix system programming language). https://github.com/Alhadis/Roff.js/ The big advantage of a V7 Unix/PDP-11 cat2dit(1) would be that device-independent troff output is plain text and much easier to spirit out of the emulated environment to the host system. Also, some people, who may be pitied, have taught themselves to read it, making more observations possible and hypotheses testable within the PDP-11 environment. (In principle, this is also true of C/A/T command streams, whether raw or octal-encoded, but I'll just let the pity roll downhill.) Thanks largely to Henry Spencer, the information to write a new cat2dit(1) from scratch is available. Eventually, if no one else does so, I will undertake it myself; but my queue is deep (mostly with groff defect reports and feature requests). https://github.com/Alhadis/otroff/blob/92683053f9aad5b926fc447843bf2092ad59cebf/cat.5 Dan Plassche pointed me toward Adobe Transcript, but my understanding is that it falls short of my needs in 3 ways: it produces PostScript, which I can't easily read, not device-independent troff output (which I can); it's not available in a version ready to run in a modern Unix environment; and it has a licensing encumbrance. I'd like a cat2dit(1) we can all trade around libre and gratis. Alternatively, if someone leaked the troff sources from UNIX/TS 4.0, that would bring a grin of Jack Nicholsonian proportions to my face. That should be buildable in vivo on a PDP-11 and would facilitate much other historical research besides. (With it, someone could annotate a diff of the troff/nroff source trees between V7 and UNIX/TS 4.0, which I wager constitutes a highly positive and teachable moment in software design and engineering.) Okay, brain dump terminated. Please let me know if I can help. Regards, Branden