The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
From: "G. Branden Robinson" <g.branden.robinson@gmail.com>
To: tuhs@tuhs.org
Subject: [TUHS] Retypesetting Unix documents (was: Documents for UNIX Collections)
Date: Fri, 8 Jul 2022 19:40:14 -0500	[thread overview]
Message-ID: <20220709004014.jakdxeqmpsvwrlvj@illithid> (raw)
In-Reply-To: <Ysi2guVshhJPwLP5@minnie.tuhs.org>


[-- Attachment #1.1: Type: text/plain, Size: 7451 bytes --]

[not CCing Matt because his address didn't come through to the list]

Hi Matt,

At 2022-07-09T08:58:10+1000, Warren Toomey via TUHS wrote:
> ----- Forwarded message from Matt Gilmore -----
> 
> Subject: Documents for UNIX Collections
> 
> Good afternoon everyone, my name is Matt Gilmore, and I recently
> worked with some folks here to help facilitate the scanning and
> release of the "Documents for UNIX" package as well as a few odds and
> ends pertinent to UNIX/TS 4.0.  I've been researching pretty heavily
> the history of published memoranda and how they ultimately became the
> formal documents that Western Electric first published with UNIX/TS
> 5.0 and System V.  Think the User's Guide, Graphics Guide, etc.

That's excellent work--thank you for doing it!

> One of the projects I'm working on (slowly) is comparing these
> documents with the 4.0 docs I scanned for Arnold and making edits to
> the *ROFF sources with the hopes I could then use them to produce 1:1
> clean copies of the 4.0 docs, while providing an easy means for
> diff'ing the documents as well (to flush out changes between 3.0 and
> 4.0).

Are you using groff to do your rendering?  If so, please consider me a
resource; I've been the most active groff developer for the past 4
years.  (I am, however, not the release manager--we're feeling heavily
pregnant with groff 1.23, 3.5 years in the making.)

Some of the following issues may be familiar to you; I apologize if I
wear a rut in well-trodden ground here.

I am wondering what you mean by "1:1 clean copies".  I embarked on a
similar exercise only about a week ago with the Kernighan & Cherry
document "Typesetting Mathematics -- User's Guide (Second Edition)",
which was part of Volume 2 of the V7 Unix Programmer's Manual.

In the course of that effort I learned several things.  I identified
(and fixed) bugs in groff's ms(7) implementation, and to my surprise
also discovered one in, apparently, V7 troff that caused an equation at
the bottom of a column to go missing.  Because groff was independently
developed, the equation sprung back to life in its rendering.  You can
find a narrative of my experiences at the following thread, along with
commentary from others.

https://lists.gnu.org/archive/html/groff/2022-07/msg00000.html

Pixel-perfect matching of C/A/T (or APS-5, etc.) output will be
impossible because the fonts are different.  More than that, the font
_metrics_ are different, which means lines will not always fill the same
when comparing historical typesetter output and a modern
implementation's (this will be true even if you use Heirloom Doctools
Troff, which is descended from V7 Unix, but has seen many changes over
the years, starting with Kernighan's revision for device-independence
ca. 1980, plus many changes for the commercial Documenter's Workbench
product, and then many more by Gunnar Ritter and his successors in the
Heirloom project).

Beyond that, Unix troff and groff use different hyphenation systems.  I
don't know how stable Unix troff's was over time.

All of that said, with the Kernighan and Cherry document, by spending
just a few minutes eyeballing old scans and groff PostScript output,
flicking between two fullscreen viewers like an ersatz blink comparator,
and using binary search to tweak the ms(7) LL, PO, and MINGW registers,
I was able to _almost_ perfectly match column and page breaks between
the two renderings, which was a higher fidelity of reproduction than I
expected.  The risen equation noted above was the most dramatic change.

Encouraged by that experience, I also reset the V7 Unix version of the
article "A System for Typesetting Mathematics".  This apparently was
_not_ published in the Programmer's Manual, possibly because much of its
content was duplicated in the user's guide.  But the amount of effort
required of me was shockingly low.  On the other hand, for this I didn't
have an authentically typeset copy to compare to, so all I did was look
for what I would consider rendering errors as opposed to cosmetic
changes.  (Maybe this the standard you want to apply in your own work?)
I'm attaching a diff.

Another apparent difference arises between V7 Unix eqn and groff eqn; in
eqn input such as "lim from {x-> pi /2} ( tan~x) sup{sin~2x}~=~1", V7
eqn will recognize "->" as beginning a new token and convert it to a
right arrow glyph in the output, despite the manual (as I understand it)
implying that it won't.  groff eqn _does_ require token separation in
this case.

I say that differences are "apparent", rather than making the stronger
claim of outright bugs in V7 Unix tools mainly because I don't have a
cat2dit(1) tool I can run in my V7 Unix environment in SIMH.  In my
opinion such a tool (in K&R C, of course) would be well worth having.
Right now, to satisfy myself of V7 Unix troff behavior I have to produce
an octal dump of the typesetter output, pull it out of the emulation
environment with copy-and-paste, undump it with a custom program (xxd is
not helpful), and then give the reconstructed C/A/T stream to an
interpreter written by John Garder in JavaScript.  John's tool (and his
personal assistance) has proven invaluable, but it's a component of a
larger project of his that renders device-independent troff output in a
Web browser window.  For this to be practical he has to introduce
additional device-independent troff commands into the output.  I'd
prefer something more rabidly puritan (and, if I'm honest, something
written in a more traditional Unix system programming language).

https://github.com/Alhadis/Roff.js/

The big advantage of a V7 Unix/PDP-11 cat2dit(1) would be that
device-independent troff output is plain text and much easier to spirit
out of the emulated environment to the host system.  Also, some people,
who may be pitied, have taught themselves to read it, making more
observations possible and hypotheses testable within the PDP-11
environment.  (In principle, this is also true of C/A/T command streams,
whether raw or octal-encoded, but I'll just let the pity roll downhill.)

Thanks largely to Henry Spencer, the information to write a new
cat2dit(1) from scratch is available.  Eventually, if no one else does
so, I will undertake it myself; but my queue is deep (mostly with groff
defect reports and feature requests).

https://github.com/Alhadis/otroff/blob/92683053f9aad5b926fc447843bf2092ad59cebf/cat.5

Dan Plassche pointed me toward Adobe Transcript, but my understanding is
that it falls short of my needs in 3 ways: it produces PostScript, which
I can't easily read, not device-independent troff output (which I can);
it's not available in a version ready to run in a modern Unix
environment; and it has a licensing encumbrance.  I'd like a cat2dit(1)
we can all trade around libre and gratis.

Alternatively, if someone leaked the troff sources from UNIX/TS 4.0,
that would bring a grin of Jack Nicholsonian proportions to my face.
That should be buildable in vivo on a PDP-11 and would facilitate much
other historical research besides.  (With it, someone could annotate a
diff of the troff/nroff source trees between V7 and UNIX/TS 4.0, which I
wager constitutes a highly positive and teachable moment in software
design and engineering.)

Okay, brain dump terminated.  Please let me know if I can help.

Regards,
Branden

[-- Attachment #1.2: eqn.diff --]
[-- Type: text/x-diff, Size: 10918 bytes --]

diff -urN eqn-v7-pure/Makefile eqn-v7-hacked/Makefile
--- eqn-v7-pure/Makefile	1969-12-31 18:00:00.000000000 -0600
+++ eqn-v7-hacked/Makefile	2022-07-06 05:34:09.196687749 -0500
@@ -0,0 +1,21 @@
+# Set this macro to where your groff executable is located.  For best
+# results, ensure that it locates an s.tmac file with some fixes applied
+# (Savannah #62686, #62687, #62688).
+GROFF:=/home/branden/src/GIT/groff/build/test-groff
+GROFFOPTS:=-ww -e -ms -M . -m sbtl -P -e -P -pletter -T pdf
+
+ALL=eqnsystem.pdf eqnuser.pdf
+
+all: $(ALL)
+
+EQNSYSTEMSRCS:=e.mac e0 e1 e2 e3 e4 e5 e6 e7
+EQNUSERSRCS:=g.mac g0 g1 g2 g3 g4 g5
+
+eqnsystem.pdf: sbtl.tmac $(EQNSYSTEMSRCS)
+	$(GROFF) $(GROFFOPTS) $(EQNSYSTEMSRCS) >$@
+
+eqnuser.pdf: sbtl.tmac $(EQNUSERSRCS)
+	$(GROFF) $(GROFFOPTS) $(EQNUSERSRCS) >$@
+
+clean:
+	rm -f $(ALL)
diff -urN eqn-v7-pure/e0 eqn-v7-hacked/e0
--- eqn-v7-pure/e0	2022-07-01 23:44:19.064152612 -0500
+++ eqn-v7-hacked/e0	2022-07-06 05:14:01.254315338 -0500
@@ -1,11 +1,11 @@
 .nr PS 9
 .nr VS 11
-....ND "Revised  April, 1977"
+.\"...ND "Revised  April, 1977"
 .EQ
 delim $$
 gsize 9
 .EN
-....TR 17
+.\"...TR 17
 .TL
 A System for Typesetting Mathematics
 .AU
diff -urN eqn-v7-pure/e1 eqn-v7-hacked/e1
--- eqn-v7-pure/e1	2022-07-01 23:44:19.064152612 -0500
+++ eqn-v7-hacked/e1	2022-07-06 05:09:20.715604554 -0500
@@ -20,8 +20,9 @@
 is the multiplicity of characters,
 sizes, and fonts.
 An expression such as
+.\" Added space before "->", GBR 2022
 .EQ
-lim from {x-> pi /2} ( tan~x) sup{sin~2x}~=~1
+lim from {x -> pi /2} ( tan~x) sup{sin~2x}~=~1
 .EN
 requires an intimate mixture of roman, italic and greek letters, in three sizes,
 and a special character or two.
diff -urN eqn-v7-pure/g.mac eqn-v7-hacked/g.mac
--- eqn-v7-pure/g.mac	2022-06-30 16:28:19.984161912 -0500
+++ eqn-v7-hacked/g.mac	2022-07-01 11:44:16.157564585 -0500
@@ -2,6 +2,7 @@
 .de SC
 .NH
 \\$1 \\$2 \\$3 \\$4 \\$5 \\$6 \\$7 \\$8 \\$9 
+.pdfbookmark 1 \\$*
 ..
 .de UC
 \&\\$3\s-2\\$1\\s+2\\$2
diff -urN eqn-v7-pure/g0 eqn-v7-hacked/g0
--- eqn-v7-pure/g0	2022-06-30 18:38:30.234150358 -0500
+++ eqn-v7-hacked/g0	2022-07-08 19:19:39.087648085 -0500
@@ -1,9 +1,26 @@
+.\" Adapted for groff by G. Branden Robinson, 2022-06-30/07-02.  Thanks
+.\" to Deri James for PDF support.
+.\" These values, used with URW Times fonts at 10 points, produce column
+.\" and page breaks nearly identical to the C/A/T typeset original.
+.nr LL 5.4i
+.nr PO 1.55i
+.\" Define a fake Bell System logo.
+.char \[bs] \o'\[ci]|'
 .EQ
 delim $$
 .EN
-\".ND "June 2, 1976"
-.RP
-\".TM "76-1273-4 76-1271-4" 39199 39199-11
+.
+.af year 0000
+.af mo 00
+.af dy 00
+.ND "August 15, 1978 \fI(retypeset with \fPgroff\fI\
+ \n[year]-\n(mo-\n(dy)"
+.RP no \" suppress repeat of document description on first body page
+.\" Force page 1 to be numbered; it follows the cover page and the Unix
+.\" Programmer's Manual page headings (which these sources don't produce
+.\" anyway) are not appropriate for this document in isolation.
+.nr pg*P1 1
+.\"TM "76-1273-4 76-1271-4" 39199 39199-11
 .TL
 Typesetting Mathematics _ User's Guide
 \&\ \ \ \ \ (Second\ Edition)
@@ -12,13 +29,18 @@
 .AI
 .MH
 .AB
-.in
-.ll
+.\" This document uses the full page width for the abstract.
+.\"in
+.\"ll
+.nr 0:li 0
+.nr 0:ri 0
+.pdfinfo /Title Typesetting Mathematics - User's Guide (Second Edition)
+.pdfinfo /Author Brian W. Kernighan and Lorinda L. Cherry
 .PP
 This is the user's guide for a system for typesetting
 mathematics,
 using
-the phototypesetters on the
+the \%photo\%typesetters on the
 .UX
 and
 .UC GCOS
@@ -28,35 +50,42 @@
 designed to be easy to use
 by people who know neither mathematics nor typesetting.
 Enough of the language to set in-line expressions like
-$lim from {x-> pi /2} ( tan~x) sup{sin~2x}~=~1$
+.\" Correct missing space after n for GNU eqn.  GBR, 2022.
+$lim from {x -> pi /2} ( tan~x) sup{sin~2x}~=~1$
 or display equations like
-.in .5i
+.\"in .5i
 .EQ I
 G(z)~mark =~ e sup { ln ~ G(z) }
 ~=~ exp left ( 
 sum from k>=1 {S sub k z sup k} over k right )
 ~=~  prod from k>=1 e sup {S sub k z sup k /k}
 .EN
+.\" Use "cdots" for ellipses on the math axis; "ldots" on the baseline.
+.\" Redefine them to space them more widely.  Use ~ more generously than
+.\" K&C; groff eqn tends to horizontally pack more tightly.  Keep the
+.\" copy of this in file "g4" in sync.
 .EQ I
+define cdots "{ \[md]~\[md]~\[md] }"
+define ldots "{ .~.~. }"
 lineup = left ( 1 + S sub 1 z + 
-{ S sub 1 sup 2 z sup 2 } over 2! + ... right )
-left ( 1+ { S sub 2 z sup 2 } over 2
+{ S sub 1 sup 2 z sup 2 } over 2! + ~ cdots right )
+~ left ( 1+ { S sub 2 z sup 2 } over 2
 + { S sub 2 sup 2 z sup 4 } over { 2 sup 2 cdot 2! }
-+ ... right ) ...
++ ~ cdots right ) ~ cdots
 .EN
 .EQ I
 lineup =  sum from m>=0 left (
 sum from
-pile { k sub 1 ,k sub 2 ,..., k sub m  >=0
+pile { k sub 1 ,k sub 2 , ~ ldots ~ , k sub m  >=0
 above
-k sub 1 +2k sub 2 + ... +mk sub m =m}
+k sub 1 +2k sub 2 + ~ cdots ~ +mk sub m =m}
 { S sub 1 sup {k sub 1} } over {1 sup k sub 1 k sub 1 ! } ~
 { S sub 2 sup {k sub 2} } over {2 sup k sub 2 k sub 2 ! } ~
-...
+cdots
 { S sub m sup {k sub m} } over {m sup k sub m k sub m ! } 
-right ) z sup m
+right ) ~ z sup m
 .EN
-.in 0
+.\"in 0
 can be learned in an hour or so.
 .PP
 The language interfaces directly with
@@ -73,11 +102,13 @@
 .UC UNIX
 formatter
 .UC NROFF
-to set mathematical expressions on 
+to set mathematical \%expressions on 
 .UC DASI
 and
 .UC GSI
 terminals
 and Model 37 teletypes.
+.\" TODO: Force the first body page to be numbered "2" to align with PDF
+.\" page numbers.
 .AE
-.CS 11 0 11 0 0 3
+.\"CS 11 0 11 0 0 3
diff -urN eqn-v7-pure/g1 eqn-v7-hacked/g1
--- eqn-v7-pure/g1	2022-06-30 16:28:19.984161912 -0500
+++ eqn-v7-hacked/g1	2022-07-01 20:39:52.463882265 -0500
@@ -1,3 +1,6 @@
+.\" Force extra-large blank area at top of page to match original.
+.sp 10v
+.nr MINGW 5.875n
 .if t .2C
 .SC Introduction
 .PP
@@ -400,6 +403,7 @@
 e sup {i omega t}
 .P2
 is
+.\" This equation got lost in the V7 Volume 2 manual!
 .EQ
 e sup {i omega t}
 .EN
@@ -523,6 +527,7 @@
 .EQ
 sqrt a+b + 1 over sqrt {ax sup 2 +bx+c}
 .EN
+.KS
 Warning _ square roots of tall quantities look lousy,
 because a root-sign 
 big enough to cover the quantity is
@@ -534,6 +539,7 @@
 .EQ
 sqrt{a sup 2 over b sub 2}
 .EN
+.KE
 Big square roots are generally better written as something
 to the power \(12:
 .EQ
@@ -603,6 +609,7 @@
 lim from {n \(mi> inf} x sub n =0
 .P2
 is
+.\" Correct missing space after n for GNU eqn.  GBR, 2022.
 .EQ
-lim from {n-> inf} x sub n =0
+lim from {n -> inf} x sub n =0
 .EN
diff -urN eqn-v7-pure/g2 eqn-v7-hacked/g2
--- eqn-v7-pure/g2	2022-06-30 16:28:19.984161912 -0500
+++ eqn-v7-hacked/g2	2022-06-30 22:42:34.436500637 -0500
@@ -181,9 +181,11 @@
 is not subject to any of the font changes and spacing
 adjustments normally done by the equation setter.
 This provides a way to do your own spacing and adjusting if needed:
+.KS
 .P1
 italic "sin(x)" + sin (x)
 .P2
+.KE
 is
 .EQ
 italic "sin(x)" + sin (x)
@@ -281,6 +283,7 @@
 if at all possible.
 Thus, for example,
 you can say
+.KS
 .P1
 ^EQ I
 x+y mark = z
@@ -296,6 +299,7 @@
 .EQ I
 x lineup = 1
 .EN
+.KE
 For reasons too complicated to talk about,
 when you use
 .UC EQN
@@ -428,11 +432,18 @@
 right ]
 .P2
 will make
+.\" Cheat #1: AT&T eqn stacked piles more tightly than GNU eqn.
 .EQ
+set baseline_sep 1v
 A ~=~ left [
 pile { a above b above c } ~~ pile { x above y above z }
 right ]
 .EN
+.\" Reset equation baseline separation to the default, which is not
+.\" documented anywhere and has no syntactical access.  :-/
+.EQ
+set baseline_sep 140
+.EN
 The elements of the pile (there can be as many as you want)
 are centered one above another, at the right height for
 most purposes.
@@ -466,6 +477,9 @@
 .ul
 cpiles
 than it is for ordinary piles.
+.\" Cheat #2: Shrink the display distance a bit to fit this display on
+.\" the page.
+.nr DD -0.25v
 .P1 2
 roman sign (x)~=~ 
 left {
@@ -473,6 +487,7 @@
    ~~ lpile
     {if~x>0 above if~x=0 above if~x<0}
 .P2
+.nr DD +0.25v
 makes
 .EQ
 roman sign (x)~=~ 
diff -urN eqn-v7-pure/g4 eqn-v7-hacked/g4
--- eqn-v7-pure/g4	2022-06-30 16:28:19.984161912 -0500
+++ eqn-v7-hacked/g4	2022-07-01 20:51:50.768616704 -0500
@@ -2,10 +2,10 @@
 .PP
 Here is the complete source for the three display equations
 in the abstract of this guide.
-.sp
-.nf
+.DS L
 .ps -2
 .vs -2
+.\" Keep the copy of this in file "g0" in sync.
  .EQ I
  G(z)~mark =~ e sup { ln ~ G(z) }
  ~=~ exp left ( 
@@ -13,26 +13,27 @@
  ~=~  prod from k>=1 e sup {S sub k z sup k /k}
  .EN
  .EQ I
+ define cdots "{ \[md]~\[md]~\[md] }"
+ define ldots "{ .~.~. }"
  lineup = left ( 1 + S sub 1 z + 
- { S sub 1 sup 2 z sup 2 } over 2! + ... right )
- left ( 1+ { S sub 2 z sup 2 } over 2
+ { S sub 1 sup 2 z sup 2 } over 2! + ~ cdots right )
+ ~ left ( 1+ { S sub 2 z sup 2 } over 2
  + { S sub 2 sup 2 z sup 4 } over { 2 sup 2 cdot 2! }
- + ... right ) ...
+ + ~ cdots right ) ~ cdots
  .EN
  .EQ I
  lineup =  sum from m>=0 left (
  sum from
- pile { k sub 1 ,k sub 2 ,..., k sub m  >=0
+ pile { k sub 1 ,k sub 2 , ~ ldots ~ , k sub m  >=0
  above
- k sub 1 +2k sub 2 + ... +mk sub m =m}
+ k sub 1 +2k sub 2 + ~ cdots ~ +mk sub m =m}
  { S sub 1 sup {k sub 1} } over {1 sup k sub 1 k sub 1 ! } ~
  { S sub 2 sup {k sub 2} } over {2 sup k sub 2 k sub 2 ! } ~
- ...
+ cdots
  { S sub m sup {k sub m} } over {m sup k sub m k sub m ! } 
- right ) z sup m
+ right ) ~ z sup m
  .EN
-.sp
-.fi
+.DE
 .ps +2
 .vs +2
 .SC "Keywords, Precedences, Etc."
@@ -61,7 +62,7 @@
 Digits, parentheses, brackets, punctuation marks, and these mathematical words
 are converted
 to Roman font when encountered:
-.P1
+.P1 3 \" reduce indentation, GBR 2022
 sin  cos  tan  sinh  cosh  tanh  arc
 max  min  lim  log  ln  exp
 Re  Im  and  if  for  det
@@ -140,7 +141,7 @@
 .sp
 .nf
 .in .2i
-.ta .7i 1.4i 2.1i
+.ta .7i 1.4i 2.0i \" reduce last tab stop, GBR 2022
 above	17, 18	lpile	17
 back	21	mark	15
 bar	13	matrix	18
diff -urN eqn-v7-pure/sbtl.tmac eqn-v7-hacked/sbtl.tmac
--- eqn-v7-pure/sbtl.tmac	1969-12-31 18:00:00.000000000 -0600
+++ eqn-v7-hacked/sbtl.tmac	2022-06-30 21:04:56.846814044 -0500
@@ -0,0 +1,15 @@
+.de MH
+Bell Laboratories
+Murray Hill, New Jersey 07974
+..
+.de UX
+.nr btl*seen-UX-macro 0
+.ds btl*UX-suffix \(dg\"
+\s[\\n[.s]*8u/10u]UNIX\s0\\$1\\*[btl*UX-suffix]
+.if !\\n[btl*seen-UX-macro] \{\
+.  FS \\*[btl*UX-suffix]
+.  nop UNIX is a Trademark of Bell Laboratories.\" sic
+.  FE
+.  nr btl*seen-UX-macro 1
+.\}
+..

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2022-07-09  0:41 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-08 22:58 [TUHS] Documents for UNIX Collections Warren Toomey via TUHS
2022-07-09  0:40 ` G. Branden Robinson [this message]
2022-07-09 15:57   ` [TUHS] Re: Retypesetting Unix documents (was: Documents for UNIX Collections) segaloco via TUHS

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220709004014.jakdxeqmpsvwrlvj@illithid \
    --to=g.branden.robinson@gmail.com \
    --cc=tuhs@tuhs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).