The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
From: "G. Branden Robinson" <g.branden.robinson@gmail.com>
To: TUHS main list <tuhs@tuhs.org>
Subject: [TUHS] Re: If forking is bad, how about buffering?
Date: Tue, 14 May 2024 06:10:32 -0500	[thread overview]
Message-ID: <20240514111032.2kotrrjjv772h5f4@illithid> (raw)
In-Reply-To: <CAKzdPgwr6=vND7vF-3+Amof=WEf6fqCN2gOsPmXB0_9Gy9U_rA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5407 bytes --]

I've wondered about the cat flag war myself, and have a theory.  Might
as well air it here since the real McCoy (and McIlroy) are available to
shoot it down.  :)

I'm sure the following attempt at knot-slashing is not novel, but people
relentlessly return to this issue as if the presence of _flags_ is the
problem.  (Plan 9 fans recite this point ritually, like a mantra.)

I say it isn't.

At 2024-05-14T17:10:38+1000, Rob Pike wrote:
> I agree with your (as usual) perceptive analysis. Only stopping by to
> point out that I took the buffering out of cat. I didn't have your
> perspicacity on why it should happen, just a desire to remove all the
> damn flags. When I was done, cat.c was 35 lines long. Do a read, do a
> write, continue until EOF. Guess what? That's all you need if you want
> to cat files.
> 
> Sad to say Bell Labs's cat door was hard to open and most of the world
> still has a cat with flags. And buffers.

I think this dispute is a proxy fight between two communities, or more
precisely two views of what cat(1), and other elementary Unix commands,
primarily exist to achieve.  In my opinion both perspectives are valid,
and it's better to consider what each perspective wants than mandate
that either is superior.

Viewpoint 1: Perspective from Pike's Peak

Elementary Unix commands should be elementary.  Unix is a kernel.
Programs that do simple things with system calls should remain simple.
This practices makes the system (the kernel interface) easier to learn,
and to motivate and justify to others.  Programs therefore test the
simplicity and utility of, and can reveal flaws in, the set of
primitives that the kernel exposes.  This is valuable stuff for a
research organization.  "Research" was right there in the CSRC's name.

Viewpoint 2: "I Just Want to Serve 5 Terabytes"[1]

cat(1)'s man page did not advertise the traits in the foregoing
viewpoint as objectives, and never did.[2]  Its avowed purpose was to
copy, without interruption or separation, 1..n files from storage to and
output channel or stream (which might be redirected).

I don't need to tell convince that this is a worthwhile application.
But when we think about the many possible ways--and destinations--a
person might have in mind for that I/O channel, we have to face the
necessity of buffering or performance goes through the floor.

It is 1978.  Some VMS or, ugh, CP/M advocate from those piddly little
toy machines will come along.  "Ha ha," they will say, "our OS is way
faster than the storied Unix even at the simple task of dumping files".

Nowhere[citation needed] outside of C tutorials is cat implemented as

int c;
while((c = getchar()) != EOF) putchar(c);

or its read()/write() system call equivalent.

The output channel might be across a network in a distributed computing
environment.  Nobody wants to work with one byte at a time in that
situation.  Ethernet's minimum packet size is 64 bytes.  No one wants
that kind of overhead.

While composing this mail, I had a look at an early, pre-C version of
cat, spelling error in the only comment line and all.

https://minnie.tuhs.org/cgi-bin/utree.pl?file=V2/cmd/cat.s

putc:
	movb	r0,(r2)+
	cmp	r2,$obuf+512.
	blo	1f
	mov	$1,r0
	sys	write; obuf; 512.
	mov	$obuf,r2

Well, look at that.  Buffering.  The author of this tool of course knew
the kernel well, including the size of its internal disk buffers (on the
assumption that I/O would mainly be happening to and from disks).

But that's a "leaky abstraction", or a "layering violation".  (That'll
be two tickets to the eternal fires of Brogrammer Hell, thanks.)  Once
you sweep away the break room buzzwords we understand that cat is
presuming things that it should not (the size of the kernel's buffers,
and the nature of devices serving as source and sink).

And this, as we all know, is one of the reasons the standard I/O library
came into existence.  Mike Lesk, I surmise, understood that the
"applications programmer" having knowledge of kernel internals was in
general neither necessary nor desirable.

What _should_ have happened, IMAO, is that as stdio.h came into
existence and the commercialization and USG/PWB-ification of Unix became
truly inevitable, is that Viewpoint 1 should have been salvaged for the
benefit of continuing operating systems research and kernel development.

But!

We should have kept cat(1), and let it grow as many flags as practical
use demanded--_except_ for `-u`--and at the _same time_ developed a new
kcat(1) command that really was just a thin wrapper around system calls.
Then you'd be a lot closer to measuring what the kernel was really
doing, what you were paying for it, and you could still boast of your
elegance in OS textbooks.

I concede that the name "kcat" would have been twice the length a
certain prominent user of the Unix kernel would have tolerated.  Maybe
"kc" would have been better.  The remaining 61 alphanumeric sigils that
might follow the 'k' would have been reserved for other exercises of the
kernel interface.  If your kernel is sufficiently lean,[3] 62 cases
exercising it ought to be enough for anybody.

Regards,
Branden

[1] https://news.ycombinator.com/item?id=29082014
[2] https://minnie.tuhs.org/cgi-bin/utree.pl?file=V1/man/man1/cat.1
[3] https://dl.acm.org/doi/10.1145/224056.224075

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2024-05-14 11:10 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-13 13:34 [TUHS] " Douglas McIlroy
2024-05-13 22:01 ` [TUHS] " Andrew Warkentin
2024-05-14  7:10 ` Rob Pike
2024-05-14 11:10   ` G. Branden Robinson [this message]
2024-05-15 14:42     ` Dan Cross
2024-05-15 16:42       ` G. Branden Robinson
2024-05-19  1:04         ` Bakul Shah via TUHS
2024-05-19  1:21           ` Larry McVoy
2024-05-19  1:26             ` Serissa
2024-05-19  1:40             ` Bakul Shah via TUHS
2024-05-19  1:50               ` Bakul Shah via TUHS
2024-05-19  2:02               ` Larry McVoy
2024-05-19  2:28                 ` Bakul Shah via TUHS
2024-05-19  2:53                 ` Andrew Warkentin
2024-05-19  8:30                   ` Marc Rochkind
2024-05-19  2:26             ` Andrew Warkentin
2024-05-19 16:04           ` Paul Winalski
2024-05-14 22:08   ` George Michaelson
2024-05-14 22:34 ` Bakul Shah via TUHS
2024-05-19 10:41 ` Ralph Corderoy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240514111032.2kotrrjjv772h5f4@illithid \
    --to=g.branden.robinson@gmail.com \
    --cc=tuhs@tuhs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).