The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
From: Clem Cole <clemc@ccc.com>
To: Will Senn <will.senn@gmail.com>
Cc: TUHS main list <tuhs@minnie.tuhs.org>, simh@groups.io
Subject: Re: [TUHS] 2bsd tarball
Date: Tue, 28 Jul 2020 20:21:19 -0400	[thread overview]
Message-ID: <CAC20D2NRF2CHESt_Virro2Op4mVDH2JCRBN7g5a2CvU1X=kUAw@mail.gmail.com> (raw)
In-Reply-To: <6a0063f8-128d-751d-114f-a0f811d02098@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 6857 bytes --]

Cross posting to simh - since much of this has been discussed in the last
few days there also....

in for penny, in for pound ... here is the history ...  man ... I lived
this and I'll need a strong drink later tonight after I write it all up.


On Tue, Jul 28, 2020 at 7:04 PM Will Senn <will.senn@gmail.com> wrote:

> I recall having to do something with cont.a files, which are not present on these images. So, my questions is, does anyone know of or have an actual 2bsd tape/tape image?
>
> cont.a is a tp-v6 and earlier ism.

DECtape had a directory at the front of the tape (think superblock/ilist),
but could do cool things and be treated more like a disk.
When tp was created for a very early version of Unix (I'm not sure which,
could be V2), Ken/Dennis et al had DECtape units and so the original scheme
followed the media.   This also meant that the program could write files
and go back and update the directory, which is a no-no with many tape
systems.  Then research got a 9-track unit.   So tp was changed to
calculate how much space was going to be needed, write the directory, then
the datablocks.  All good ... except...

9-track could write more files than the directory could take.   So for many
years, people would use the ar(1) program to take a number of files in a
directory, create a file called cont.a and then delete the files.  Then the
tree would be written with tp, when you read it, you reversed the ar(1)
process.  If you look at the USENIX/Harvard tape on the TUHS you'll see
this scheme in use.

BTW: tp was written in assembler and all the data structures were using
PDP-11 binary formats.  Eventually, Harvard wrote stp (super-tp) in C
(which is on the USENIX tape Warren has in the archives) that worked like
the original assembler tp but also put a redundant directory at the end of
the tape.  Another issue with tp was if the you had a bad block in the
first few blocks you could not decode the rest of the tape.  [There were
some other issues with the UNIX tree structure as disks got bigger but I'm
going to ignore all that - other than to say, tp had lived it life].

Enter Mashey and the PWB 1.0 folks (which is based on V6).  Someone in USG
created cpio (and volcpy) as part of the PWB 1.0.   Like tp it was a PDP-11
binary format, but unlike tp, the tape directory is threaded.  *i.e.* block
one describes the first file only and includes the size of the following
file, then file itself, then a new directory block for the next file and
again that file (rinse and repeat).  So it improved on tp in the directory
threading, but was still binary and for a reasons I'll leave out had a
different user interface.

As part of V7, Ken wrote a new program, tar [you can ask him why].  But
like cpio he used a threaded tape directory, but unlike cpio it was always
ASCII and not PDP-11 specific.  Furthermore, the user interface was made to
parrot tp.  So, certainly, it had the advantage that changing tp scripts to
use tar was pretty easy i.e. s/tp/tar/     not so for coil.  And it was
muscle memory compliant.

For PWB 2.0, cpio was updated to allow a -c option to write the header in
ascii and -s to byte swap the binary.   But the damage had been done ...

Thus began 'tar wars' which was a battle that raged officially over tape
archive formats, but really was an argument about user interfaces.  Since
tar was part of Research and the Universities and commercial people used
it, only USG and the folks inside the Bell System were using cpio, as
officially none of us had it since PWB was not released to us (although
thanks to many AT&T employees doing an OYOC year, many schools like UCB,
MIT and CMU all had the sources to cpio anyway -- for instance you'll see
it hidden away on Kirk's CD).

I personally had used both, preferred tar for easy of use and ASCII
directories.  But, note I had written car at Masscomp, but never tpio.
This was our trick to use the file scripting list that cpio could do, but
create tar format tapes - which was handy.  I never wrote tpio which would
have been cpio format using tp/tar user interface as I did not need it.

Roll forward to the /usr/group UNIX standard that Heinz chaired.  We ended
up not being able to agree on a distribution format, but the ISVs were PO
because now they could create UNIX programs that might actually work across
systems, but they had not standard way to distribution.
Roll forward again to IEEE.  Heinz's committee was officially disbanded
(story discussed elsewhere) and we were created as IEEE P1003 with Jim
Issack as Chair. This time the ISV's said we had to have a distribution
format.  Since *.1 was only an API we were allowed to avoid the user
interface issue but only examine the on tape format.

It turns out while it seems to have been unintended, Ken's original V7
implementation has an interesting coding feature/bug which turns out to be
what clinched the deal.   When Ken creates the directory block for each
file, he did bcopy of 0's to the buffer before he wrote that data that
fills it in.  Then when he calculated the checksum for the directory header
block, he summed the entire block (which because of the bcopy was zeros).
This means if you write beyond the end of Ken's original header and include
that extra data in the chksum, the original program will ignore the new
information but accept the directory block as valid.  i.e. he had built an
extension mechanism into the tar on-tape format.

cpio's ASCII on tape format was not able to do that as the checksum used a
sizeof(header struct) in the checksum routine.

USTAR was born ... Ken had written things like the UID/GID as ASCII
representations of the integer value in the original header.  USTAR added
the ASCII representation of the username and the group name since that was
more often portable between systems than the numbers.   There were other
additions like more room for the pathname new file types *etc*.  But the
key is that a USTAR tape can be read by the original V7 (and follow on)
tape formats, although may not recognize all the filetype or use all of the
new information.

A few years later during *.2 discussions, we finally got into the user
interface stuff and pax(1) was born.  Knowing my hack with car, Keith
Bostic, Jim McGuiness and I wrote up a description of a program that could
with both users interfaces scheme.  USENIX provided funding for a student
to implement it and put the sources out on comp.unix.sources at some
point.  That proposal was originally accepted at the first tape user
interface program in *.2 [a few years later after I stopped being part of
the committee, the USG folks did get an alternate CPIO format accepted and
cpio as an allowed program.   USENIX paid to have the program updated to
operate like cpio if it was called that, pure V7 tar if called that and if
pax, user USTAR].

'nuf said ... I hope.

Clem

[-- Attachment #2: Type: text/html, Size: 10421 bytes --]

  parent reply	other threads:[~2020-07-29  0:23 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-28 23:03 Will Senn
2020-07-29  0:09 ` Warner Losh
2020-07-29  0:19   ` Clem Cole
2020-07-29  0:45   ` Will Senn
2020-07-29  0:46   ` Will Senn
2020-07-29  0:21 ` Clem Cole [this message]
2020-07-29  9:50   ` [TUHS] [simh] " Johnny Billquist
2020-07-29 13:52     ` John Cowan
2020-07-29 14:30       ` Johnny Billquist
2020-08-11 23:41       ` Dave Horsfall
     [not found]     ` <5A12E0BB-4FFF-4C3E-B486-D4E852FAA97F@comcast.net>
2020-07-29 14:29       ` Johnny Billquist
2020-08-11 23:55         ` Dave Horsfall
2020-07-29 13:42   ` [TUHS] 2bsd tarball -> pdtar, with a side of uuslave John Gilmore
2020-07-29 15:40     ` Clem Cole
2020-07-29 19:34       ` Richard Salz
2020-07-29 19:42         ` Warner Losh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAC20D2NRF2CHESt_Virro2Op4mVDH2JCRBN7g5a2CvU1X=kUAw@mail.gmail.com' \
    --to=clemc@ccc.com \
    --cc=simh@groups.io \
    --cc=tuhs@minnie.tuhs.org \
    --cc=will.senn@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).