* [TUHS] Re: Yacc binary on 4th edition tape
2026-01-06 11:48 [TUHS] " Paul Ruizendaal via TUHS
@ 2026-01-06 16:31 ` Thalia Archibald via TUHS
2026-01-06 17:46 ` Al Kossow via TUHS
2026-01-07 9:14 ` Paul Ruizendaal via TUHS
2026-01-06 23:54 ` Jonathan Gray via TUHS
1 sibling, 2 replies; 9+ messages in thread
From: Thalia Archibald via TUHS @ 2026-01-06 16:31 UTC (permalink / raw)
To: Paul Ruizendaal; +Cc: tuhs@tuhs.org
Hi Paul,
Excellent history on yacc and Waterloo! I’ll be adding your sources to my early
UNIX history project :).
I was aware of Wes Graham’s work on WATFOR at Waterloo and that the Computer
Systems Group acquired UNIX in 1976 (which wasn’t necessarily the first group at
Waterloo to do so), but I didn’t know of anything earlier. I found some
interesting UNIX documents in the University of Waterloo archives and have
uploaded them.
https://archive.org/search?query=subject%3A%22University+of+Waterloo+Archives%22
Also, there are some items in the University of Waterloo Computer Museum that
I’m hoping to get more information on, once I follow up with the curator.
https://github.com/thaliaarchi/unix-history/blob/main/users/waterloo/butterworth.md
> All this makes the Yacc binary on the 4th edition tape interesting to me, as it
> gives a window on the state of Yacc late in 1973 when Johnson returned to Bell
> Labs. The binary appears truncated at the 16kb mark
If you’re using Angelo’s tar, you should fetch it again. He’s fixed some bugs
that truncated several files, including yacc.
http://squoze.net/UNIX/v4/unix_v4.tar
Also you should know the tape dates to 12 June 1974, but the manual received was
V4, so it’s V5 minus a week or so (though I only know that the V5 manual dates
to the month of June). Although you could consider it a near-clean V5, I still
call it V4, since the system was versioned by its manual.
Thalia
> On Jan 6, 2026, at 04:48, Paul Ruizendaal via TUHS <tuhs@tuhs.org> wrote:
>
>
> Perusing the 4th edition archive I noticed that the usr/bin directory has a binary for Yacc. This reminded me of a project on my to-do list: recreating the yacc used at the Uni of Waterloo for their Thoth project. Unfortunately, there was no source for Yacc on the 4th edition tape. The oldest version I am aware of is the source as included with 6th edition. However, this looks quite promising.
>
> I offer the below timeline analysis for some sanity checking by the people who were there and have some specific questions at the end.
>
> For background: my interest is driven by an underlying interest in the “Eh” and “Zed” languages that evolved from B at the Uni of Waterloo. DMR mentions these languages in his paper on the history of C (https://www.nokia.com/bell-labs/about/dennis-m-ritchie/chist.pdf).
>
> First on the timeline:
>
> - By 1970 there were B compilers written in B for the PDP-7, the GE600 / Honeywell 6000 and for the PDP-11. The GE compiler generated machine code, not threaded code (DMR writes "The most ambitious enterprise I undertook was a genuine cross-compiler that
> translated B to GE-635 machine instructions, not threaded code. It was a small tour de force: a full B compiler, written in its own language and generating code for a 36-bit mainframe, that ran on [the PDP-7, ] an 18-bit machine with 4K words of user address space.”). As the compiler was written in B, I assume this means that it next also ran on the GE itself. This compiler seems to have been the basis for the nascent C compilers (AAP writes "According to dmr's history of the C language NB had a machine code generator and ken told me (by email) that dmr's work on the code generator started on the Honeywell mainframe and that NB was always in machine code.” - http://squoze.net/NB/README).
>
> - DMR also writes in that paper: "By 1971, our miniature computer center was beginning to have users. We all wanted to create interesting software more easily. Using assembler was dreary enough that B, despite its performance problems, had been supplemented by a small library of useful service routines and was being used for more and more new programs. Among the more notable results of this period was Steve Johnson’s first version of the yacc parser-generator” So, Yacc first appears in 1971 and is written in B. As such, it ran on both the PDP-11 and the GE/Honeywell.
>
> - It is a guess, but I would hypothesize that the c0/c1 structure of the early 1972/1973 C compilers goes all the way back to the GE/Honeywell implementation of B. In this respect it is suggestive that the “last1120” C compiler names its passes "nc0" and “nc1”, following shortly on the transitional “new B” / “nb”. If true, it would stand to reason to assume that this mainframe B compiler also used a similar recursive descent / operator precedence parsing scheme.
>
> - The DMR history paper then goes on to say that Johnson had a sabbatical at the University of Waterloo in 1972, but I think this might be a slip of the pen. A Uni of Waterloo retrospective says that he arrived late in 1972 (“In August 1972, […] a new arrival was causing a stir in the Math & Computer building at University of Waterloo – a brand new Honeywell 6050 mainframe size computer. […] Shortly after the arrival of the Honeywell, Steve Johnson came to the Math Faculty on sabbatical from Bell Labs.”). He brought B and Yacc with him ("I suspect that few people realize his key role in introducing Bell Labs culture to University of Waterloo so early, including B Programming Language, getchar(), putchar(), the beginnings of the notion of software portability and, of course, yacc.”). https://randalljhoward.com/tag/dead-whale/ The year 1973 is also supported by a resume from 1982 ("I spent a 9-month Sabbatical in 1973 at the University of Waterloo, where I taught courses in Advanced Applications Techniques and Algebraic Manipulation.” — https://stacks.stanford.edu/file/druid:ws821cy1376/ws821cy1376.pdf). 1973 is also a better match with the internship of Alan Snyder in that year.
>
> - In an interview Johnson mentions "When YACC first ran, it was very slow […] I set out to improve the size and space characteristics. Over the next several years, I rewrote the program over a dozen times, speeding it up by a factor of 10,000 or so. Many of my speedups involved proving theorems that we could cut this or that corner and still have a valid parser. The introduction of precedence was one example of this.” (https://www.computerworld.com/article/1570304/yacc-unix-and-advice-from-bell-labs-alumni-stephen-johnson.html). I suspect that a fair bit of this improvement happened in 1972, because he continues with "Dennis was actively working on B while I was writing YACC. One day, I came in and YACC would not compile – it was out of space. It turns out that I had been using every single slot in the symbol table. The night before, Dennis had added the ‘for’ statement to B, and the word ‘for’ took a slot, so YACC no longer fit!”. This suggests 1972 much more than 1974 as the timeframe he had in mind when saying this.
>
> - This also tallies with DMR’s account, writing: "When Steve Johnson visited the University of Waterloo on sabbatical in 1972, he brought B with him. It became popular on the Honeywell machines there, and later spawned Eh and Zed (the Canadian answers to ‘what follows B?’). When Johnson returned to Bell Labs in 1973, he was disconcerted to find that the language whose seeds he brought to Canada had evolved back home; even his own yacc program had been rewritten in C, by Alan Snyder.”. As explained above, I think this should be read as “late 1972” and “late 1973”. So: a first, early C version of Yacc can be placed at mid 1973.
>
> - Alan Snyder did the Honeywell version of his portable compiler in 1973 (the PDP-10 version and his thesis are from 1975) (https://retrocomputingforum.com/t/some-materials-on-early-c-and-the-history-of-c/3016/2). This compiler used yacc, which implies that by (late) 1973 yacc must have been stable, fast and compact enough to handle a sizable grammar. I can understand converting it to nascent C, as I have recently found yacc to be a great compiler test input. In the timeline, this Snyder version is close to the binary on the 4th edition tape.
>
> - B evolved at Waterloo. Report CS-75-23 from September 1975 says "Current efforts center on the language ‘B' which is already implemented on the HIS 6050 and PDP 11; we hope to have a version of B for the Microdata before January, 1976. […] The problem is now reduced to that of recoding the B compiler code generation section and the basic I/O primitives.” And report CS-75-29 from November of that year says "The B compiler is well suited for our preliminary experimentation with portability because it is nicely structured and therefore easily modified to generate code for other machines. This is largely due to the fact that it is a syntax directed compiler for a language which has a simple and compact syntax. The one-pass compiler is implemented in B.” I assume “syntax directed” in this context to mean that the Honeywell B compiler was recoded to use Yacc for its parser - - somewhere between 1973 and 1975. If so, that effort probably used the B version of Yacc that Johnson brought in 1973. The 1976 Eh and the 1978 Zed compilers for sure use Yacc to build their parser.
>
> - All this makes the Yacc binary on the 4th edition tape interesting to me, as it gives a window on the state of Yacc late in 1973 when Johnson returned to Bell Labs. The binary appears truncated at the 16kb mark, but a first quick look at the strings suggests it is quite similar to the source code that is included with the surviving 6th edition Yacc source code. Similar, but not fully identical. This is in a context where the surviving 1975 Yacc source looks decidedly 1973 in style. For instance the yyparse function in file “parser.c” looks like a B function that has been minimally edited to make it early C - https://www.tuhs.org/cgi-bin/utree.pl?file=V6/usr/source/yacc/lib/parser.c Another example is in y2.c, where in function “setup()” the “foutput" variable is set to -2 by default; I believe this to be a remnant from B on the Honeywell, where that means to output to the batch console.
>
> I wonder how much this 1975 yacc source has diverted from the 1973 Snyder B => C port; I presume not much. In fact, this version of Yacc proved quite easy to revert back to B (or Eh, actually): https://gitlab.com/pnru/Thoth/-/tree/master/user/yacc
>
> - An interesting sidetrack is the evolution of Johnson’s Yacc manual / paper. Several versions appear to exist, all a bit different. Later versions (1979 ?) have the “=“ sign before actions as an deprecated feature, but the 1975 source code still insists on the ‘=‘ sign. AAP appears to have the oldest version (1975?) of the document and this version still has the equals sign as mandatory in its text (http://squoze.net/UNIX/v6/files/doc/yacc.pdf). In 7th edition manual and code base the use of this ‘=‘ has become optional. I wonder when and why this change was made, the old syntax seems harmless.
>
> - The 6th edition Yacc has an optional optimizer pass, "/usr/yacc/yopti” which was optionally run after Yacc completed. As far as I can tell, the source for this optimiser is lost. I have found no materials explaining this optimizer pass.
>
> - Between 6th edition and 7th edition the code base changes substantially, presumably further compaction and speed-up. It grows from ~1700 to ~2200 lines. The optimizer pass is integrated into the base package, support for Ratfor is dropped, etc. The source also starts to look like ‘real C’. Alternatively, the yacc source in 6th edition might not reflect the latest internal Bell version and actual yacc development was perhaps more gradual. Although I use the 7th edition version in my C recreation of the Eh compiler, it does not seem like it is a good base to approximate what the Uni of Waterloo might have used in 1975-1977.
>
> Now for the questions:
>
> - Do the above timeline and assumptions sound correct (or at least plausible) to those who were there?
>
> - Does anybody know of Yacc source code older that what is included in 6th edition (other than attempting to reverse engineer the recently recovered 4th edition binary)?
>
> - Does anybody know more about the missing Yacc optimizer in 6th edition, what it did, etc.? Or is the only way to compare and contrast with 7th edition where the (that?) optimizer is integrated?
>
>
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [TUHS] Re: Yacc binary on 4th edition tape
2026-01-06 16:31 ` [TUHS] " Thalia Archibald via TUHS
@ 2026-01-06 17:46 ` Al Kossow via TUHS
2026-01-07 9:14 ` Paul Ruizendaal via TUHS
1 sibling, 0 replies; 9+ messages in thread
From: Al Kossow via TUHS @ 2026-01-06 17:46 UTC (permalink / raw)
To: tuhs
On 1/6/26 8:31 AM, Thalia Archibald via TUHS wrote:
> Also, there are some items in the University of Waterloo Computer Museum that
> I’m hoping to get more information on, once I follow up with the curator.
> https://github.com/thaliaarchi/unix-history/blob/main/users/waterloo/butterworth.md
I would really be interested if any of Thoth software has survived there.
Nothing is known to exist of Cheriton's UBC version
^ permalink raw reply [flat|nested] 9+ messages in thread
* [TUHS] Re: Yacc binary on 4th edition tape
2026-01-06 11:48 [TUHS] " Paul Ruizendaal via TUHS
2026-01-06 16:31 ` [TUHS] " Thalia Archibald via TUHS
@ 2026-01-06 23:54 ` Jonathan Gray via TUHS
1 sibling, 0 replies; 9+ messages in thread
From: Jonathan Gray via TUHS @ 2026-01-06 23:54 UTC (permalink / raw)
To: Paul Ruizendaal; +Cc: tuhs
On Tue, Jan 06, 2026 at 12:48:33PM +0100, Paul Ruizendaal via TUHS wrote:
> - The DMR history paper then goes on to say that Johnson had a
> sabbatical at the University of Waterloo in 1972, but I think this
> might be a slip of the pen. A Uni of Waterloo retrospective says
> that he arrived late in 1972 (“In August 1972, […] a new arrival
> was causing a stir in the Math & Computer building at University
> of Waterloo – a brand new Honeywell 6050 mainframe size computer.
> […] Shortly after the arrival of the Honeywell, Steve Johnson came
> to the Math Faculty on sabbatical from Bell Labs.”). He brought B
> and Yacc with him ("I suspect that few people realize his key role
> in introducing Bell Labs culture to University of Waterloo so early,
> including B Programming Language, getchar(), putchar(), the beginnings
> of the notion of software portability and, of course, yacc.”).
> https://randalljhoward.com/tag/dead-whale/ The year 1973 is also
> supported by a resume from 1982 ("I spent a 9-month Sabbatical in
> 1973 at the University of Waterloo, where I taught courses in
> Advanced Applications Techniques and Algebraic Manipulation.” —
> https://stacks.stanford.edu/file/druid:ws821cy1376/ws821cy1376.pdf).
> 1973 is also a better match with the internship of Alan Snyder in
> that year.
"sabbatical in 1973, from January to September"
Steve Johnson in: Salus - A Quarter Century of UNIX, p 100
^ permalink raw reply [flat|nested] 9+ messages in thread
* [TUHS] Re: Yacc binary on 4th edition tape
2026-01-06 16:31 ` [TUHS] " Thalia Archibald via TUHS
2026-01-06 17:46 ` Al Kossow via TUHS
@ 2026-01-07 9:14 ` Paul Ruizendaal via TUHS
1 sibling, 0 replies; 9+ messages in thread
From: Paul Ruizendaal via TUHS @ 2026-01-07 9:14 UTC (permalink / raw)
To: Thalia Archibald; +Cc: tuhs@tuhs.org
Thank you for these clarifications.
Refetching the v4 archive indeed brings a complete yacc binary, and it is unstripped. I guess at some point I need to find a tool that can disassemble a PDP-11 a.out file, taking into account the symbols.
Indeed, the tape is from 1974, not 1973 — which makes quite a difference as things were moving fast in that era — but still closer to the B version than the yacc sources in v6. Thank you for highlighting this.
Paul
> On 6 Jan 2026, at 17:31, Thalia Archibald <thalia@archibald.dev> wrote:
>
> Hi Paul,
>
> Excellent history on yacc and Waterloo! I’ll be adding your sources to my early
> UNIX history project :).
>
> I was aware of Wes Graham’s work on WATFOR at Waterloo and that the Computer
> Systems Group acquired UNIX in 1976 (which wasn’t necessarily the first group at
> Waterloo to do so), but I didn’t know of anything earlier. I found some
> interesting UNIX documents in the University of Waterloo archives and have
> uploaded them.
> https://archive.org/search?query=subject%3A%22University+of+Waterloo+Archives%22
>
> Also, there are some items in the University of Waterloo Computer Museum that
> I’m hoping to get more information on, once I follow up with the curator.
> https://github.com/thaliaarchi/unix-history/blob/main/users/waterloo/butterworth.md
>
>> All this makes the Yacc binary on the 4th edition tape interesting to me, as it
>> gives a window on the state of Yacc late in 1973 when Johnson returned to Bell
>> Labs. The binary appears truncated at the 16kb mark
>
>
> If you’re using Angelo’s tar, you should fetch it again. He’s fixed some bugs
> that truncated several files, including yacc.
> http://squoze.net/UNIX/v4/unix_v4.tar
>
> Also you should know the tape dates to 12 June 1974, but the manual received was
> V4, so it’s V5 minus a week or so (though I only know that the V5 manual dates
> to the month of June). Although you could consider it a near-clean V5, I still
> call it V4, since the system was versioned by its manual.
>
> Thalia
>
>> On Jan 6, 2026, at 04:48, Paul Ruizendaal via TUHS <tuhs@tuhs.org> wrote:
>>
>>
>> Perusing the 4th edition archive I noticed that the usr/bin directory has a binary for Yacc. This reminded me of a project on my to-do list: recreating the yacc used at the Uni of Waterloo for their Thoth project. Unfortunately, there was no source for Yacc on the 4th edition tape. The oldest version I am aware of is the source as included with 6th edition. However, this looks quite promising.
>>
>> I offer the below timeline analysis for some sanity checking by the people who were there and have some specific questions at the end.
>>
>> For background: my interest is driven by an underlying interest in the “Eh” and “Zed” languages that evolved from B at the Uni of Waterloo. DMR mentions these languages in his paper on the history of C (https://www.nokia.com/bell-labs/about/dennis-m-ritchie/chist.pdf).
>>
>> First on the timeline:
>>
>> - By 1970 there were B compilers written in B for the PDP-7, the GE600 / Honeywell 6000 and for the PDP-11. The GE compiler generated machine code, not threaded code (DMR writes "The most ambitious enterprise I undertook was a genuine cross-compiler that
>> translated B to GE-635 machine instructions, not threaded code. It was a small tour de force: a full B compiler, written in its own language and generating code for a 36-bit mainframe, that ran on [the PDP-7, ] an 18-bit machine with 4K words of user address space.”). As the compiler was written in B, I assume this means that it next also ran on the GE itself. This compiler seems to have been the basis for the nascent C compilers (AAP writes "According to dmr's history of the C language NB had a machine code generator and ken told me (by email) that dmr's work on the code generator started on the Honeywell mainframe and that NB was always in machine code.” - http://squoze.net/NB/README).
>>
>> - DMR also writes in that paper: "By 1971, our miniature computer center was beginning to have users. We all wanted to create interesting software more easily. Using assembler was dreary enough that B, despite its performance problems, had been supplemented by a small library of useful service routines and was being used for more and more new programs. Among the more notable results of this period was Steve Johnson’s first version of the yacc parser-generator” So, Yacc first appears in 1971 and is written in B. As such, it ran on both the PDP-11 and the GE/Honeywell.
>>
>> - It is a guess, but I would hypothesize that the c0/c1 structure of the early 1972/1973 C compilers goes all the way back to the GE/Honeywell implementation of B. In this respect it is suggestive that the “last1120” C compiler names its passes "nc0" and “nc1”, following shortly on the transitional “new B” / “nb”. If true, it would stand to reason to assume that this mainframe B compiler also used a similar recursive descent / operator precedence parsing scheme.
>>
>> - The DMR history paper then goes on to say that Johnson had a sabbatical at the University of Waterloo in 1972, but I think this might be a slip of the pen. A Uni of Waterloo retrospective says that he arrived late in 1972 (“In August 1972, […] a new arrival was causing a stir in the Math & Computer building at University of Waterloo – a brand new Honeywell 6050 mainframe size computer. […] Shortly after the arrival of the Honeywell, Steve Johnson came to the Math Faculty on sabbatical from Bell Labs.”). He brought B and Yacc with him ("I suspect that few people realize his key role in introducing Bell Labs culture to University of Waterloo so early, including B Programming Language, getchar(), putchar(), the beginnings of the notion of software portability and, of course, yacc.”). https://randalljhoward.com/tag/dead-whale/ The year 1973 is also supported by a resume from 1982 ("I spent a 9-month Sabbatical in 1973 at the University of Waterloo, where I taught courses in Advanced Applications Techniques and Algebraic Manipulation.” — https://stacks.stanford.edu/file/druid:ws821cy1376/ws821cy1376.pdf). 1973 is also a better match with the internship of Alan Snyder in that year.
>>
>> - In an interview Johnson mentions "When YACC first ran, it was very slow […] I set out to improve the size and space characteristics. Over the next several years, I rewrote the program over a dozen times, speeding it up by a factor of 10,000 or so. Many of my speedups involved proving theorems that we could cut this or that corner and still have a valid parser. The introduction of precedence was one example of this.” (https://www.computerworld.com/article/1570304/yacc-unix-and-advice-from-bell-labs-alumni-stephen-johnson.html). I suspect that a fair bit of this improvement happened in 1972, because he continues with "Dennis was actively working on B while I was writing YACC. One day, I came in and YACC would not compile – it was out of space. It turns out that I had been using every single slot in the symbol table. The night before, Dennis had added the ‘for’ statement to B, and the word ‘for’ took a slot, so YACC no longer fit!”. This suggests 1972 much more than 1974 as the timeframe he had in mind when saying this.
>>
>> - This also tallies with DMR’s account, writing: "When Steve Johnson visited the University of Waterloo on sabbatical in 1972, he brought B with him. It became popular on the Honeywell machines there, and later spawned Eh and Zed (the Canadian answers to ‘what follows B?’). When Johnson returned to Bell Labs in 1973, he was disconcerted to find that the language whose seeds he brought to Canada had evolved back home; even his own yacc program had been rewritten in C, by Alan Snyder.”. As explained above, I think this should be read as “late 1972” and “late 1973”. So: a first, early C version of Yacc can be placed at mid 1973.
>>
>> - Alan Snyder did the Honeywell version of his portable compiler in 1973 (the PDP-10 version and his thesis are from 1975) (https://retrocomputingforum.com/t/some-materials-on-early-c-and-the-history-of-c/3016/2). This compiler used yacc, which implies that by (late) 1973 yacc must have been stable, fast and compact enough to handle a sizable grammar. I can understand converting it to nascent C, as I have recently found yacc to be a great compiler test input. In the timeline, this Snyder version is close to the binary on the 4th edition tape.
>>
>> - B evolved at Waterloo. Report CS-75-23 from September 1975 says "Current efforts center on the language ‘B' which is already implemented on the HIS 6050 and PDP 11; we hope to have a version of B for the Microdata before January, 1976. […] The problem is now reduced to that of recoding the B compiler code generation section and the basic I/O primitives.” And report CS-75-29 from November of that year says "The B compiler is well suited for our preliminary experimentation with portability because it is nicely structured and therefore easily modified to generate code for other machines. This is largely due to the fact that it is a syntax directed compiler for a language which has a simple and compact syntax. The one-pass compiler is implemented in B.” I assume “syntax directed” in this context to mean that the Honeywell B compiler was recoded to use Yacc for its parser - - somewhere between 1973 and 1975. If so, that effort probably used the B version of Yacc that Johnson brought in 1973. The 1976 Eh and the 1978 Zed compilers for sure use Yacc to build their parser.
>>
>> - All this makes the Yacc binary on the 4th edition tape interesting to me, as it gives a window on the state of Yacc late in 1973 when Johnson returned to Bell Labs. The binary appears truncated at the 16kb mark, but a first quick look at the strings suggests it is quite similar to the source code that is included with the surviving 6th edition Yacc source code. Similar, but not fully identical. This is in a context where the surviving 1975 Yacc source looks decidedly 1973 in style. For instance the yyparse function in file “parser.c” looks like a B function that has been minimally edited to make it early C - https://www.tuhs.org/cgi-bin/utree.pl?file=V6/usr/source/yacc/lib/parser.c Another example is in y2.c, where in function “setup()” the “foutput" variable is set to -2 by default; I believe this to be a remnant from B on the Honeywell, where that means to output to the batch console.
>>
>> I wonder how much this 1975 yacc source has diverted from the 1973 Snyder B => C port; I presume not much. In fact, this version of Yacc proved quite easy to revert back to B (or Eh, actually): https://gitlab.com/pnru/Thoth/-/tree/master/user/yacc
>>
>> - An interesting sidetrack is the evolution of Johnson’s Yacc manual / paper. Several versions appear to exist, all a bit different. Later versions (1979 ?) have the “=“ sign before actions as an deprecated feature, but the 1975 source code still insists on the ‘=‘ sign. AAP appears to have the oldest version (1975?) of the document and this version still has the equals sign as mandatory in its text (http://squoze.net/UNIX/v6/files/doc/yacc.pdf). In 7th edition manual and code base the use of this ‘=‘ has become optional. I wonder when and why this change was made, the old syntax seems harmless.
>>
>> - The 6th edition Yacc has an optional optimizer pass, "/usr/yacc/yopti” which was optionally run after Yacc completed. As far as I can tell, the source for this optimiser is lost. I have found no materials explaining this optimizer pass.
>>
>> - Between 6th edition and 7th edition the code base changes substantially, presumably further compaction and speed-up. It grows from ~1700 to ~2200 lines. The optimizer pass is integrated into the base package, support for Ratfor is dropped, etc. The source also starts to look like ‘real C’. Alternatively, the yacc source in 6th edition might not reflect the latest internal Bell version and actual yacc development was perhaps more gradual. Although I use the 7th edition version in my C recreation of the Eh compiler, it does not seem like it is a good base to approximate what the Uni of Waterloo might have used in 1975-1977.
>>
>> Now for the questions:
>>
>> - Do the above timeline and assumptions sound correct (or at least plausible) to those who were there?
>>
>> - Does anybody know of Yacc source code older that what is included in 6th edition (other than attempting to reverse engineer the recently recovered 4th edition binary)?
>>
>> - Does anybody know more about the missing Yacc optimizer in 6th edition, what it did, etc.? Or is the only way to compare and contrast with 7th edition where the (that?) optimizer is integrated?
>>
>>
>>
>>
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [TUHS] Re: Yacc binary on 4th edition tape
@ 2026-01-11 22:21 Paul Ruizendaal via TUHS
2026-01-13 1:21 ` Jonathan Gray via TUHS
0 siblings, 1 reply; 9+ messages in thread
From: Paul Ruizendaal via TUHS @ 2026-01-11 22:21 UTC (permalink / raw)
To: tuhs@tuhs.org
Jonathan Gray very kindly found two sources for “yopti” in the TUHS archives.
The first is in the UNSW 110 archive (https://www.tuhs.org/Archive/Distributions/UNSW/110/). The archive is from 1981, but it appears to be the 1975 yacc of 6th edition.
This yacc has the source for the optimizer ("yopti.c”). It reads the the y.tab.c file that plain 6th edition yacc generates, does some processing and writes out a new y.tab.c. It also comes with a new yyparse routine and this routine is essentially identical to the yyparse of 7th edition (which I think is more or less the final version of classic yacc).
The other is in the PWB1 distribution (https://www.tuhs.org/cgi-bin/utree.pl?file=PWB1/sys/source/s2/yacc.d), from 1977. The optimizer is now in the source file "y5.c” and largely, but not fully the same. It has a new way to compress the “yyact” table. The y5.c file can build both as a separate program and as a subroutine of the yacc core, reusing table space from the earlier phases. The new way to compress is also used in 7th edition, but the optimizer has been fully integrated in the core yacc codebase.
I’ve also disassembled the first part (y1.c) of the yacc binary in 4th edition. There is more instrumentation (regular calls to “times()” to track time spent) and there is no code to call out to an optimizer pass, suggesting that it might not have existed yet (but of course absence of proof is not proof of absence).
The development of table sizes is interesting, in particular “yyact”; I have tested with the 1978 grammar for the Eh compiler.
- The 1975 core yacc generates a yyact table with about 1650 entries. Here the yyparse routine has to do a lot of searching through the tables whilst parsing: the representation is compact, but slow to process.
- The 1975 optimizer changes the table structure to avoid the searching (arriving at about the final form of “yaccpar”), but the yyact table that it produces is huge: the 1600 entries are expanded into almost 11,000 entries, many of which are “0”. Probably it was considered work-in-progress at this stage and hence not included on the 6th edition tape. (I had to fix one bug in yopti.c to get it to run; maybe this was a bad fix but at the moment I don’t think so.)
- The algorithm in the 1977 PWB1 version is a spectacular improvement: the “yyact” table goes down in size to about 650 entries. Note that “yaccpar” does not change: it is ‘only' a superior algorithm to find the optimal compression of a sparse table.
All in all, my hypothesis on the timeline would now be:
- 1971: first version of yacc created in B
- 1972: improvements to make it more practical
- 1973: yacc introduced to Waterloo (relevant for Eh)
- 1973: conversion from B to C
- 1974: improvements on speeding up table generation
- 1975: improvements on speeding up yyparse execution
- 1976: improvement on reducing optimized table size
- 1977/78: cleaning up code base, portability, tracking C changes
- 1979: more or less final version of classic yacc
The above matches with interviews with Johnson and Aho, where both say that yacc was improved over a number of years and with about a dozen rewrites in that period.
====
Now back to the timeline for the Waterloo Eh compiler.
- I think it is a fact that the early B compiler on the Honeywell 6000 was a two-pass compiler, three if you count the assembler. Steve Johnsons writes "The './bj' command works as follows: the first two passes, ./b1 and ./b2, of the B compiler are run in MH-TSS. The result of the second pass of the compiler is a GMAP program, which must be sent to the batch world, compiled, and loaded with the B I/O library (on file ./blib) to create an executable H* file. The ./bj command calls ./b1 and ./b2, and if no errors are detected, the GMAP deck is submitted to the batch world; currently, this is done using 'jrun’.” (https://www.nokia.com/bell-labs/about/dennis-m-ritchie/bref.html). I don’t know much about the H6000, but apparently GMAP is the system macro assembler. Not a fact, but I assume that the “b1” pass is Ken Thompson’s B front end (lexing, parsing, threaded/intermediate code generation) and the “b2” pass is Dennis Ritchie’s "tour de force" native code generator.
- In November 1975 the Thoth project writes that they like the B compiler, and “this is largely due to the fact that it is a syntax directed compiler for a language which has a simple and compact syntax. The one-pass compiler is implemented in B.” This sounds like a different compiler. The B compiler above is clearly not one pass and only syntax directed in the sense that any compiler is syntax directed. This compiler remains a bit of a mystery. Maybe it is something that Steve Johnson wrote during his sabbatical there.
- However, in February 1976 they write (CS-76-11): "As the first stage of the project, we have designed a high-level implementation language (called Eh) which will be common to all machines. […] The Eh compiler consists f two phases: the first phase does the syntax analysis and outputs intermediate language. The intermediate language resembles the order code for a stack oriented machine. During the second phase of compilation, the intermediate language is translated to the target machine language.”. This sounds like reverting to the structure of the original Honeywell B compiler, with the difference that it uses yacc in its front-end and the back-end outputs relocatable object code, not assembler source.
[as an aside: the 1978 version of this Eh intermediate code is well documented and it has some intriguing similarities to the PDP-7 threaded code for B as reconstructed by AAP; it needs more investigation, but it suggests that the Eh intermediate code was influenced by the intermediate code between the b1 and b2 passes of the original H6000 B compiler.]
The question now is what version of yacc they would have used for the Eh compiler. It may be the yacc that Steve Johnson brought to Waterloo in 1973, but in view of the above timeline, it was probably a bit unwieldy to use at that point in time. Another possibility is that they used yacc from Unix 6th edition (the Waterloo Math department had a PDP-11 running Unix), and back-ported that later to Eh later to make their tool chain self hosting. With V6 being dated around May 1975, it would stand to reason that they had V6 running by the end of that year. From what I understand so far, it is unlikely to have used the optimizer, as that appears to have been unpolished / unfinished until a bit later in time.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [TUHS] Re: Yacc binary on 4th edition tape
2026-01-11 22:21 [TUHS] Re: Yacc binary on 4th edition tape Paul Ruizendaal via TUHS
@ 2026-01-13 1:21 ` Jonathan Gray via TUHS
2026-01-19 8:38 ` Jonathan Gray via TUHS
0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Gray via TUHS @ 2026-01-13 1:21 UTC (permalink / raw)
To: Paul Ruizendaal; +Cc: tuhs
On Sun, Jan 11, 2026 at 11:21:39PM +0100, Paul Ruizendaal via TUHS wrote:
> Jonathan Gray very kindly found two sources for “yopti” in the TUHS archives.
>
> The first is in the UNSW 110 archive (https://www.tuhs.org/Archive/Distributions/UNSW/110/). The archive is from 1981, but it appears to be the 1975 yacc of 6th edition.
>
> This yacc has the source for the optimizer ("yopti.c”). It reads the the y.tab.c file that plain 6th edition yacc generates, does some processing and writes out a new y.tab.c. It also comes with a new yyparse routine and this routine is essentially identical to the yyparse of 7th edition (which I think is more or less the final version of classic yacc).
>
> The other is in the PWB1 distribution (https://www.tuhs.org/cgi-bin/utree.pl?file=PWB1/sys/source/s2/yacc.d), from 1977. The optimizer is now in the source file "y5.c” and largely, but not fully the same. It has a new way to compress the “yyact” table. The y5.c file can build both as a separate program and as a subroutine of the yacc core, reusing table space from the earlier phases. The new way to compress is also used in 7th edition, but the optimizer has been fully integrated in the core yacc codebase.
Also appears (without source) in a listing of files.
Documentation/TechReports/USG_Library/
1046_UNIX_Support_Classifications_for_PG_1C300_Issue_2.pdf
UNIX Support Classifications for PG-1C300-1 Issue 2
January 30, 1976
opar.c yacc optimization parser
yopti.c yacc optimizer postpass
> All in all, my hypothesis on the timeline would now be:
>
> - 1971: first version of yacc created in B
> - 1972: improvements to make it more practical
> - 1973: yacc introduced to Waterloo (relevant for Eh)
Unix Programmer's Manual, Third Edition, February 1973
YACC (VI) 1/20/73
mentions /crp/scj/bpar.b, output tables in actn.b
"If your grammar is too big for yacc, you may try /crp/scj/bigyacc"
> - 1973: conversion from B to C
Unix Programmer's Manual, Fourth Edition, November 1973
YACC (VI) 6/6/73
no longer mentions b files
> - 1974: improvements on speeding up table generation
Unix Programmer's Manual, Sixth Edition, May 1975
YACC (I) 11/25/74
/usr/yacc/opar.c parser for optimized tables
/usr/yacc/yopti optimizer postpass
> - 1975: improvements on speeding up yyparse execution
> - 1976: improvement on reducing optimized table size
> - 1977/78: cleaning up code base, portability, tracking C changes
> - 1979: more or less final version of classic yacc
>
> The above matches with interviews with Johnson and Aho, where both say that yacc was improved over a number of years and with about a dozen rewrites in that period.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [TUHS] Re: Yacc binary on 4th edition tape
2026-01-13 1:21 ` Jonathan Gray via TUHS
@ 2026-01-19 8:38 ` Jonathan Gray via TUHS
2026-01-20 5:40 ` Lars Brinkhoff via TUHS
0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Gray via TUHS @ 2026-01-19 8:38 UTC (permalink / raw)
To: tuhs; +Cc: Paul Ruizendaal
On Tue, Jan 13, 2026 at 12:21:25PM +1100, Jonathan Gray via TUHS wrote:
> On Sun, Jan 11, 2026 at 11:21:39PM +0100, Paul Ruizendaal via TUHS wrote:
> > Jonathan Gray very kindly found two sources for “yopti” in the TUHS archives.
> >
> > The first is in the UNSW 110 archive (https://www.tuhs.org/Archive/Distributions/UNSW/110/). The archive is from 1981, but it appears to be the 1975 yacc of 6th edition.
> >
> > This yacc has the source for the optimizer ("yopti.c”). It reads the the y.tab.c file that plain 6th edition yacc generates, does some processing and writes out a new y.tab.c. It also comes with a new yyparse routine and this routine is essentially identical to the yyparse of 7th edition (which I think is more or less the final version of classic yacc).
> >
> > The other is in the PWB1 distribution (https://www.tuhs.org/cgi-bin/utree.pl?file=PWB1/sys/source/s2/yacc.d), from 1977. The optimizer is now in the source file "y5.c” and largely, but not fully the same. It has a new way to compress the “yyact” table. The y5.c file can build both as a separate program and as a subroutine of the yacc core, reusing table space from the earlier phases. The new way to compress is also used in 7th edition, but the optimizer has been fully integrated in the core yacc codebase.
>
> Also appears (without source) in a listing of files.
>
> Documentation/TechReports/USG_Library/
> 1046_UNIX_Support_Classifications_for_PG_1C300_Issue_2.pdf
>
> UNIX Support Classifications for PG-1C300-1 Issue 2
> January 30, 1976
>
> opar.c yacc optimization parser
> yopti.c yacc optimizer postpass
>
> > All in all, my hypothesis on the timeline would now be:
> >
> > - 1971: first version of yacc created in B
"yacc was written in 1972"
Stephen C. Johnson interview in
The C Journal, Vol 3, No 2, Fall 1987, p 58
https://archive.org/details/the-c-journal-v-3-n-2-1987-fall/page/n63/mode/1up
> > - 1972: improvements to make it more practical
> > - 1973: yacc introduced to Waterloo (relevant for Eh)
>
> Unix Programmer's Manual, Third Edition, February 1973
> YACC (VI) 1/20/73
> mentions /crp/scj/bpar.b, output tables in actn.b
>
> "If your grammar is too big for yacc, you may try /crp/scj/bigyacc"
>
> > - 1973: conversion from B to C
>
> Unix Programmer's Manual, Fourth Edition, November 1973
> YACC (VI) 6/6/73
> no longer mentions b files
>
> > - 1974: improvements on speeding up table generation
>
> Unix Programmer's Manual, Sixth Edition, May 1975
> YACC (I) 11/25/74
>
> /usr/yacc/opar.c parser for optimized tables
> /usr/yacc/yopti optimizer postpass
>
> > - 1975: improvements on speeding up yyparse execution
> > - 1976: improvement on reducing optimized table size
> > - 1977/78: cleaning up code base, portability, tracking C changes
> > - 1979: more or less final version of classic yacc
> >
> > The above matches with interviews with Johnson and Aho, where both say that yacc was improved over a number of years and with about a dozen rewrites in that period.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [TUHS] Re: Yacc binary on 4th edition tape
2026-01-19 8:38 ` Jonathan Gray via TUHS
@ 2026-01-20 5:40 ` Lars Brinkhoff via TUHS
0 siblings, 0 replies; 9+ messages in thread
From: Lars Brinkhoff via TUHS @ 2026-01-20 5:40 UTC (permalink / raw)
To: Jonathan Gray via TUHS; +Cc: Jonathan Gray, Paul Ruizendaal
Jonathan Gray via TUHS <tuhs@tuhs.org> writes:
>> > - 1971: first version of yacc created in B
> "yacc was written in 1972"
Alan Snyder was at Bell labs in this time period, and brought C and yacc
back to MIT. The input file to his yacc looks rather archanic, perhaps
influenced by the B yacc? I asked Johnson about it, but he didn't
recognize it. Here's an example, the grammar for Snyder's C compiler.
https://github.com/PDP-10/its/blob/master/src/c/c.grammr
^ permalink raw reply [flat|nested] 9+ messages in thread
* [TUHS] Re: Yacc binary on 4th edition tape
@ 2026-01-21 6:57 Paul Ruizendaal via TUHS
0 siblings, 0 replies; 9+ messages in thread
From: Paul Ruizendaal via TUHS @ 2026-01-21 6:57 UTC (permalink / raw)
To: tuhs@tuhs.org; +Cc: Jonathan Gray
> Alan Snyder was at Bell labs in this time period, and brought C and yacc
> back to MIT. The input file to his yacc looks rather archanic, perhaps
> influenced by the B yacc? I asked Johnson about it, but he didn't
> recognize it. Here's an example, the grammar for Snyder's C compiler.
>
> https://github.com/PDP-10/its/blob/master/src/c/c.grammr
That is an interesting link. It uses some of the syntax that is mentioned as deprecated in the Johnson Yacc paper on your site:
<quote>
This appendix mentions synonyms and features which are supported for historical continuity, but, for various reasons, are not encouraged.
1. Literals may be delimited by double quotes ‘‘"’’ as well as single quotes ‘‘´’’.
2. Literals may be more than one character long. If all the characters are alphabetic, numeric, or _, the type number of the literal is defined, just as if the literal did not have the quotes around it. Otherwise, it is difficult to find the value for such literals.
3. The use of multi-character literals is likely to mislead those unfamiliar with Yacc, since it suggests that Yacc is doing a job which must be actually done by the lexical analyzer.
4. Most places where % is legal, backslash ‘‘\’’ may be used. In particular, \\ is the same as %%, \left the same as %left, etc.
There are a number of other synonyms:
%< is the same as %left
%> is the same as %right
%binary and %2 are the same as %nonassoc
%0 and %term are the same as %token
%= is the same as %prec
5. The curly braces ‘‘{’’ and ‘‘}’’ around an action are optional if the action consists of a single C statement. (They are always required in Ratfor).
</quote>
These old forms are still accepted by Yacc as present in the surviving V6 source files. The Snyder grammer uses backslash instead of percent and also the short forms (i.e. “\>” for “%right”). It does not put “\0” in front of the token definitions, and it does use multi-character literals. When I have time, I will attempt to disassemble/reverse the file “y2.c” from the 1974 binary as well (this has the lexer / parser part of yacc). This should give a view on Yacc grammar in mid-1974.
As to the grammar itself, I am a little confused by the single letter tokens ‘l’ through ’s’ which don’t appear used in the grammer, and I’m intrigued by the use of the name “.expression” for a rule to allow empty expressions in the FOR statement: the Eh grammar from Waterloo uses that name as well for this purpose (suggesting a common root).
===
I was a focused on the optimizer earlier and missed two relevant files from the PWB1 yacc source tree:
https://www.tuhs.org/cgi-bin/utree.pl?file=PWB1/sys/source/s2/yacc.d/INDEX
https://www.tuhs.org/cgi-bin/utree.pl?file=PWB1/sys/source/s2/yacc.d/yaccdiffs
"The archive file contains information for testing and installing Version 2 of Yacc.”
So at the time it was seen as Yacc 1 and Yacc 2, perhaps not surprising considering the material improvement in performance (much smaller tables, much faster parsing). The earlier Johnson papers appear to talk about Yacc 1 and the later versions about Yacc 2. The source in the V6 tree is Yacc 1 and the source in the PWB1 tree is Yacc 2.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-01-21 6:58 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-11 22:21 [TUHS] Re: Yacc binary on 4th edition tape Paul Ruizendaal via TUHS
2026-01-13 1:21 ` Jonathan Gray via TUHS
2026-01-19 8:38 ` Jonathan Gray via TUHS
2026-01-20 5:40 ` Lars Brinkhoff via TUHS
-- strict thread matches above, loose matches on Subject: below --
2026-01-21 6:57 Paul Ruizendaal via TUHS
2026-01-06 11:48 [TUHS] " Paul Ruizendaal via TUHS
2026-01-06 16:31 ` [TUHS] " Thalia Archibald via TUHS
2026-01-06 17:46 ` Al Kossow via TUHS
2026-01-07 9:14 ` Paul Ruizendaal via TUHS
2026-01-06 23:54 ` Jonathan Gray via TUHS
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).