The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
* Re: [TUHS] head/sed/tail (was The Unix shell: a 50-year view)
@ 2021-07-15 22:00 Douglas McIlroy
  2021-07-15 22:12 ` John Cowan
  0 siblings, 1 reply; 34+ messages in thread
From: Douglas McIlroy @ 2021-07-15 22:00 UTC (permalink / raw)
  To: TUHS main list

>> -f is a strange feature that effectively turns a regular file into a pipe
>> with memory by polling for new data, A clean general alternative
>> might be to provide an open(2) mode that makes reads at the current
>> file end block if some process has the file open for writing.

> OTOH, this would mean adding more functionality (read: complexity)
> into the kernel, and there has always been a general desire to avoid
> pushing <stuff> into the kernel when it can be done in userspace.  Do
> you really think using a blocking read(2) is somehow more superior
> than using select(2) to wait for new data to be appended to the file?

I'm showing my age. tail -f antedated select(2) and was implemented
by alternately sleeping and reading. select(2) indeed overcomes that
clumsiness.

> I'll note, with amusement, that -r is one option which is *NOT* in the
> GNU version of tail.  I see it in FreeBSD, but this looks like a
> BSD'ism.

-r came from Bell Labs. This reinforces the point that the ancients
had their imperfections.

Doug

^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: [TUHS] head/sed/tail (was The Unix shell: a 50-year view)
@ 2021-07-18 20:07 Douglas McIlroy
  0 siblings, 0 replies; 34+ messages in thread
From: Douglas McIlroy @ 2021-07-18 20:07 UTC (permalink / raw)
  To: TUHS main list, charles.unix.pro

> For Multics C, ... NULL != 0

I know what you mean, but the formulation is paradoxical,
as the expression NULL==0 is always true in C :)

Doug

^ permalink raw reply	[flat|nested] 34+ messages in thread
[parent not found: <CAKH6PiW58PDPb5HRi12aKE+mT+O8AjETr9R51Db6U3KcEp_KkA@mail.gmail.com>]
* Re: [TUHS] head/sed/tail (was The Unix shell: a 50-year view)
@ 2021-07-16 12:09 Douglas McIlroy
  2021-07-16 14:32 ` Bakul Shah
  0 siblings, 1 reply; 34+ messages in thread
From: Douglas McIlroy @ 2021-07-16 12:09 UTC (permalink / raw)
  To: The Eunuchs Hysterical Society

>> -r is weird because it enables backwards reading, but only as
>> limited by count. Better would be a program, say revfile, that simply
>> reads backwards by lines. Then tail p  has an elegant implementation:
>>      revfile p | head | revfile

> tail -n can be smarter in that it can simply read the last K bytes
> and see if there are n lines. If not, it can read back further.
> revfile would have to read the whole file, which could be a lot
> more than n lines! tail -n < /dev/tty may never terminate but it
> will use a small finite amount of memory.

Revfile would work the same way. When head has seen enough
and terminates, revfile will get SIGPIPE and stop. I agree that,
depending on scheduling and buffer management, revfile might
read more than tail -n,  but  it wouldn't read the whole of a
humongous file.

Doug

^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: [TUHS] head/sed/tail (was The Unix shell: a 50-year view)
@ 2021-07-15 22:26 Nelson H. F. Beebe
  2021-07-15 23:18 ` Jim Davis
  2021-07-16  0:02 ` Clem Cole
  0 siblings, 2 replies; 34+ messages in thread
From: Nelson H. F. Beebe @ 2021-07-15 22:26 UTC (permalink / raw)
  To: tuhs

Clem Cole asks:

>> Did you know that before PCC the 'second' C compiler was a PDP-10
>> target Alan Snyder did for his MIT Thesis?
>> [https://github.com/PDP-10/Snyder-C-compiler]

I was unaware of that compiler until sometime in the 21st Century,
long after our PDP-10 was retired on 31-Oct-1990.  

The site

	https://github.com/PDP-10/Snyder-C-compiler/tree/master/tops20
 
supplies a list of some of Snyder's files, but they don't match
anything in our TOPS-20 archives of almost 180,000 files.

I then looked into our 1980s-vintage pcc source tree and compared
it with a snapshot of the current pcc source code taken three
weeks ago.  The latter has support for these architectures

	aarch64  hppa  m16c  mips64  pdp11    sparc64
	amd64    i386  m68k  nova    pdp7     superh
	arm      i86   mips  pdp10   powerpc  vax

and the pdp10 directory contains these files:

	CVS  README  code.c  local.c  local2.c  macdefs.h  order.c  table.c

All 5 of those *.c files are present in our TOPS-20 archives.  I then
grepped those archives for familiar strings:

	% find . -name '*.[ch]' | sort | \
	       xargs egrep -n -i 'scj|feldman|johnson|snyder|bell|at[&]t|mit|m.i.t.'
	./code.c:8: * Based on Steve Johnson's pdp-11 version
	./code2.c:19: * Based on Steve Johnson's pdp-11 version
	./cpp.c:1678:		stsym("TOPS20");	/* for compatibility with Snyder */
	./local.c:4: * Based on Steve Johnson's pdp-11 version
	./local2.c:4: * Based on Steve Johnson's pdp-11 version
	./local2.c:209:		case 'A':		/* emit a label */
	./match.c:2: * match.c - based on Steve Johnson's pdp11 version
	./optim.c:318:						 * Turn 'em into regular PCONV's
	./order.c:5: * Based on Steve Johnson's pdp-11 version
	./pftn.c:967:			 * fill out previous word, to permit pointer
	./pftn.c:1458:	register	commflag = 0;  /* flag for labelled common declarations */
	./pftn2.c:1011:			 * fill out previous word, to permit pointer
	./pftn2.c:1502:	register	commflag = 0;  /* flag for labelled common declarations */
	./reader.c:632:		p2->op = NOASG p2->op;	   /* this was omitted in 11 & /6 !! */
	./table.c:128:		"	movei	A1,1\nZN",	/* ZN = emit branch */
	./xdefs.c:13: *	symbol table maintainence

Thus, I'm confident that Jay's work was based on Steve Johnson's
compiler, rather than Alan Snyder's.

Norman Wilson asks:

>> ...
>> How did that C implementation handle ASCII text on the DEC-10?
>> Were it a from-scratch UNIX port it might make sense to store
>> four eight- or nine-bit bytes to a word, but if (as I sense it
>> was) it was C running on TOPS-10 or TOPS-20, it would have had
>> to work comfortably with DEC's convention of five 7-bit characters
>> (plus a spare bit used by some programs as a flag).
>> ...

Our pcc compiler treated char* as a pointer to 7-bit ASCII strings,
stored in the top 35 bits of a word, with the low-order bit normally
zero; a 1-bit there meant that the word contained a 5-digit line
number that some compilers and editors would report.  Of course, that
low-order non-character bit meant that memset(), memcpy(), and
memmove() had somewhat dicey semantics, but I no longer recall their
specs.

kcc later gave us access to the PDP-10's 1- to 36-bit byte
instructions.

For text processing, 5 x 7b + 1b bits matched the conventions for all
other programming languages on the PDP-10.  When it came time to
implement NFS, and exchange files and data with 32-bit-word machines,
we needed the ability to handle files of 4 x 8b + 4b and 9 x 8b (in
two 36-bit words), and kcc provided that.

The one's-complement 36-bit Univac 1108 machines chose instead to
store text in a 4 x 9b format, because that architecture had
quarter-word load/store instructions, but not the general variable
byte instructions of the PDP-10.  Our campus had an 1108 at the
University of Utah Computer Center, but I chose to avoid it, because
it was run in batch mode with punched cards, and never got networking.
By contrast, our TOPS-20, BSD, RSX-11, SunOS, and VMS systems all had
interactive serial-line terminals, and there was no punched card
support at all.

-------------------------------------------------------------------------------
- Nelson H. F. Beebe                    Tel: +1 801 581 5254                  -
- University of Utah                    FAX: +1 801 581 4148                  -
- Department of Mathematics, 110 LCB    Internet e-mail: beebe@math.utah.edu  -
- 155 S 1400 E RM 233                       beebe@acm.org  beebe@computer.org -
- Salt Lake City, UT 84112-0090, USA    URL: http://www.math.utah.edu/~beebe/ -
-------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: [TUHS] head/sed/tail (was The Unix shell: a 50-year view)
@ 2021-07-15 21:26 Paul Ruizendaal
  0 siblings, 0 replies; 34+ messages in thread
From: Paul Ruizendaal @ 2021-07-15 21:26 UTC (permalink / raw)
  To: TUHS main list


> Message: 7
> Date: Thu, 15 Jul 2021 10:28:04 -0400
> From: "Theodore Y. Ts'o" 
> Subject: Re: [TUHS] head/sed/tail (was The Unix shell: a 50-year view)
> 
> On Wed, Jul 14, 2021 at 10:38:06PM -0400, Douglas McIlroy wrote:
>> Head might not have been written if tail didn't exist. But, unlike head,
>> tail strayed from the tao of "do one thing well". Tail -r and tail -f are
>> as cringeworthy as cat -v.
>> 
>> -f is a strange feature that effectively turns a regular file into a pipe
>> with memory by polling for new data, A clean general alternative
>> might be to provide an open(2) mode that makes reads at the current
>> file end block if some process has the file open for writing.
> 
> OTOH, this would mean adding more functionality (read: complexity)
> into the kernel, and there has always been a general desire to avoid
> pushing <stuff> into the kernel when it can be done in userspace.  Do
> you really think using a blocking read(2) is somehow more superior
> than using select(2) to wait for new data to be appended to the file?
> 
> And even if we did this using a new open(2) mode, are you saying we
> should have a separate executable in /bin which would then be
> identical to cat, except that it uses a different open(2) mode?

Yes, it would put more complexity into the kernel, but maybe it is conceptually elegant.

Consider a classic pipe or a socket and the behaviour of read(2) for those objects. The behaviour of read(2) that Doug proposes for a file would make it in line with that for a classic pipe or a socket. Hence, maybe it should not be a mode, but the standard behaviour.

I often think that around 1981 the Unix community missed an opportunity to really think through how networking should integrate with the foundations of Unix. It seems to me that at that time there was an opportunity to merge files, pipes and sockets into a coherent, simple framework. If the 8th edition file-system-switch had been introduced already in V6 or V7, maybe this would have happened.

On the other hand, the installed base was probably already too large in 1981 to still make breaking changes to core concepts. V7 may have been the last chance saloon for that.

Paul


^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: [TUHS] head/sed/tail (was The Unix shell: a 50-year view)
@ 2021-07-15 19:01 Norman Wilson
  2021-07-15 19:27 ` Clem Cole
  0 siblings, 1 reply; 34+ messages in thread
From: Norman Wilson @ 2021-07-15 19:01 UTC (permalink / raw)
  To: tuhs

Nelson H. F. Beebe:

  P.S. Jay was the first to get Steve Johnson's Portable C Compiler,
  pcc, to run on the 36-bit PDP-10, and once we had pcc, we began the
  move from writing utilities in Pascal and PDP-10 assembly language to
  doing them in C.

======

How did that C implementation handle ASCII text on the DEC-10?
Were it a from-scratch UNIX port it might make sense to store
four eight- or nine-bit bytes to a word, but if (as I sense it
was) it was C running on TOPS-10 or TOPS-20, it would have had
to work comfortably with DEC's convention of five 7-bit characters
(plus a spare bit used by some programs as a flag).

Norman Wilson
Toronto ON

^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: [TUHS] head/sed/tail (was The Unix shell: a 50-year view)
@ 2021-07-15 16:54 Nelson H. F. Beebe
  0 siblings, 0 replies; 34+ messages in thread
From: Nelson H. F. Beebe @ 2021-07-15 16:54 UTC (permalink / raw)
  To: tuhs

On the subject of tac (concatenate and print files in reverse), I can
report that the tool was written by my late friend Jay Lepreau in the
Department of Computer Science (now, School of Computing) at the
University of Utah.  The GNU coreutils distribution for src/tac.c
contains a copyright for 1988-2020.

I searched my TOPS-20 PDP-10 archives, and found no source code for
tac, but I did find an older TOPS-20 executable in Jay's personal
directory with a file date of 17-Mar-1987.  There isn't much else in
that directory, so I suspect that he just copied over a needed tool
from his Department of Computer Science TOPS-20 system to ours in the
College of Science.

----------------------------------------

P.S. Jay was the first to get Steve Johnson's Portable C Compiler,
pcc, to run on the 36-bit PDP-10, and once we had pcc, we began the
move from writing utilities in Pascal and PDP-10 assembly language to
doing them in C.  The oldest C file for pcc in our PDP-10 archives is
dated 17-Mar-1981, with other pcc files dated to mid-1983, and final
compiler executables dated 12-May-1986.  Four system header files are
dated as late as 4-Oct-1986, presumably patched after the compiler was
built.

Later, Kok Chen and Ken Harrenstien's kcc provided another C compiler
that added support for byte datatypes, where a byte could be anything
from 1 to 36 bits.  The oldest distribution of kcc in our archives is
labeled "Fifth formal distribution snapshot" and dated 20-Apr-1988.
My info-kcc mailing list archives date from the list beginning, with
an initial post from Ken dated 27-Jul-1986 announcing the availability
of kcc at sri-nic.arpa.
	
By mid-1987, we had a dozen Sun workstations and NFS fileserver; they
marked the beginning of our move to a Unix workstation environment,
away from large, expensive, and electricity-gulping PDP-10 and VAX
mainframes.

By the summer of 1991, those mainframes were retired.  I recall
speaking to a used-equipment vendor about our VAX 8600, which cost
about US$450K (discounted academic pricing) in 1986, and was told that
its value was depreciating about 20% per month.  Although many of us
missed TOPS-20 features, I don't think anyone was sad to say goodbye
to VMS.  We always felt that the VMS developers worked in isolation
from the PDP-10 folks, and thus learned nothing from them.

-------------------------------------------------------------------------------
- Nelson H. F. Beebe                    Tel: +1 801 581 5254                  -
- University of Utah                    FAX: +1 801 581 4148                  -
- Department of Mathematics, 110 LCB    Internet e-mail: beebe@math.utah.edu  -
- 155 S 1400 E RM 233                       beebe@acm.org  beebe@computer.org -
- Salt Lake City, UT 84112-0090, USA    URL: http://www.math.utah.edu/~beebe/ -
-------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 34+ messages in thread
* Re: [TUHS] head/sed/tail (was The Unix shell: a 50-year view)
@ 2021-07-15 15:44 Norman Wilson
  0 siblings, 0 replies; 34+ messages in thread
From: Norman Wilson @ 2021-07-15 15:44 UTC (permalink / raw)
  To: tuhs

Some comments from someone (me) who tends to be pickier than
most about cramming programs together and endless sets of
options:

I, too, had always thought sed was older than head.  I stand
corrected.  I have a long-standing habit of typing sed 10q but
don't spend much time fussing about head.

When I arrived at Bell Labs in late summer 1984, tail -f was
in /usr/bin and in the manual, readslow was only in /usr/bin.
readslow was like tail -f, except it either printed the entire
file first or (option -e) started at the end of the file.

I was told readslow had come first, and had been invented in a
hurry because people wanted to watch in real time the moves
logged by one of Belle's chess matches.  Vague memory says it
was written by pjw; the name and the code style seem consistent
with that.

Personally I feel like tail -r and tail -f both fit reasonably
well within what tail does, since both have to do with the
bottom of the file, though -r's implementation does make for
a special extra code path in tail so maybe a separate program
is better.  What I think is a bigger deal is that I have
frequently missed tail -r on Linux systems, and somehow hadn't
spotted tac; thanks to whoever here (was it Ted?) pointed it
out first!

On the other hand, adding data-processing functions to cat has
never made sense to me.  It seems to originate from a mistaken
notion that cat's focus is printing data on terminals, rather
than concatenating data from different places.  Here is a test:
if cat -v and cat -n and all that make sense, why shouldn't
cat also subsume tr and pr and even grep?  What makes converting
control characters and numbering lines so different from swapping
case and adding page headers?  I don't see the distinction, and
so I think vis(1) (in later Research) makes more sense than cat -v
and nl(1) (in Linux for a long time) more sense than cat -n.
(I'd also happily argue that given nl, pr shouldn't number lines.
That a program was in V6 or V7 doesn't make it perfect.)

And all those special options to wc that amounted to doing
arithmetic on the output were always just silly.  I'm glad
they were retracted.

On the other other hand, why didn't I know about tac?  Because
there are so damn many programs in /usr/bin these days.  When
I started with UNIX ca. 1980, the manual (even the BSD version)
was still short enough that one could sit down and read it through,
section by section, and keep track of what one had read, and
remember what all the different tools did.  That hasn't been
true for decades.  This could be an argument for adding to
existing programs (which many people already know about) rather
than adding new programs (which many people will never notice).

The real problem is that the system is just too damn big.  On
an Ubuntu 18.04 system I run, ls /usr/bin | wc -l shows 3242
entries.  How much of that is redundant?  How much is rarely or
never used?  Nobody knows, and I suspect few even try to find
out.  And because nobody knows, few are brave enough to throw
things away, or even trim out bits of existing things.

One day in the late 1980s, I helped out with an Introduction
to UNIX talk at a DECUS symposium.  One of the attendees noticed
the `total' line in the output of ls, and asked why is that there?
doesn't that contradict the principles of tools' output you've
just been talking about?  I thought about it, and said yes,
you're right, that's a bit of old history and shouldn't be
there any more.  When I got home to New Jersey, I took the
`total' line out of Research ls.

Good luck doing anything like that today.

Norman Wilson
Toronto ON

^ permalink raw reply	[flat|nested] 34+ messages in thread
* [TUHS] head/sed/tail (was The Unix shell: a 50-year view)
@ 2021-07-15  2:38 Douglas McIlroy
  2021-07-15  4:19 ` arnold
                   ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Douglas McIlroy @ 2021-07-15  2:38 UTC (permalink / raw)
  To: TUHS main list

This somewhat stale note was sent some time ago, but was ignored
because it was sent from an unregistered email address.

> And if the Unix patriarchs were perhaps mistaken about how useful
> "head" might be and whether or not it should have been considered
> verboten.

Point well taken.

I don't know which of head(1) and sed(1) came first. They appeared in
different places at more or less the same time. We in Research
declined to adopt head because we already knew the idiom "sed 10q".
However one shouldn't have to do related operations in unrelated ways.
We finally admitted head in v10.

Head was independently invented by Mike Lesk. It was Lesk's
program that was deemed superfluous.

Head might not have been written if tail didn't exist. But, unlike head,
tail strayed from the tao of "do one thing well". Tail -r and tail -f are
as cringeworthy as cat -v.

-f is a strange feature that effectively turns a regular file into a pipe
with memory by polling for new data, A clean general alternative
might be to provide an open(2) mode that makes reads at the current
file end block if some process has the file open for writing.

-r is weird because it enables backwards reading, but only as
limited by count. Better would be a program, say revfile, that simply
reads backwards by lines. Then tail p  has an elegant implementation:
       revfile p | head | revfile

Doug

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2021-07-18 20:08 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-15 22:00 [TUHS] head/sed/tail (was The Unix shell: a 50-year view) Douglas McIlroy
2021-07-15 22:12 ` John Cowan
  -- strict thread matches above, loose matches on Subject: below --
2021-07-18 20:07 Douglas McIlroy
     [not found] <CAKH6PiW58PDPb5HRi12aKE+mT+O8AjETr9R51Db6U3KcEp_KkA@mail.gmail.com>
2021-07-16 14:17 ` Nelson H. F. Beebe
2021-07-16 16:13   ` Theodore Y. Ts'o
2021-07-16 12:09 Douglas McIlroy
2021-07-16 14:32 ` Bakul Shah
2021-07-15 22:26 Nelson H. F. Beebe
2021-07-15 23:18 ` Jim Davis
2021-07-16  0:02   ` John Floren
2021-07-16  1:02     ` Nelson H. F. Beebe
2021-07-16  8:27     ` Lars Brinkhoff
2021-07-16 15:28       ` John Floren
2021-07-16  0:02 ` Clem Cole
2021-07-16  0:25   ` Nelson H. F. Beebe
2021-07-16  8:50     ` Lars Brinkhoff
2021-07-15 21:26 Paul Ruizendaal
2021-07-15 19:01 Norman Wilson
2021-07-15 19:27 ` Clem Cole
2021-07-15 19:28   ` Clem Cole
2021-07-15 19:34   ` Warner Losh
2021-07-16  7:38     ` arnold
2021-07-16 16:09       ` Warner Losh
2021-07-16  8:05   ` Lars Brinkhoff
2021-07-16 14:19     ` Clem Cole
2021-07-17  0:34       ` Charles Anthony
2021-07-15 16:54 Nelson H. F. Beebe
2021-07-15 15:44 Norman Wilson
2021-07-15  2:38 Douglas McIlroy
2021-07-15  4:19 ` arnold
2021-07-15  4:25   ` Adam Thornton
2021-07-15  7:20   ` Thomas Paulsen
2021-07-15 14:28 ` Theodore Y. Ts'o
2021-07-15 22:29 ` Bakul Shah

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).