[TUHS] Re: If forking is bad, how about buffering?

The Unix Heritage Society mailing list
 help / color / mirror / Atom feed

From: Dan Cross <crossd@gmail.com>
To: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: TUHS main list <tuhs@tuhs.org>
Subject: [TUHS] Re: If forking is bad, how about buffering?
Date: Wed, 15 May 2024 10:42:33 -0400	[thread overview]
Message-ID: <CAEoi9W6dsfSJUpNpKdhDeTasM8ecVxWs6PaRQwDqvo7w7LNTcA@mail.gmail.com> (raw)
In-Reply-To: <20240514111032.2kotrrjjv772h5f4@illithid>

On Tue, May 14, 2024 at 7:10 AM G. Branden Robinson
<g.branden.robinson@gmail.com> wrote:
> [snip]
> Viewpoint 1: Perspective from Pike's Peak

Clever.

> Elementary Unix commands should be elementary.  Unix is a kernel.
> Programs that do simple things with system calls should remain simple.
> This practices makes the system (the kernel interface) easier to learn,
> and to motivate and justify to others.  Programs therefore test the
> simplicity and utility of, and can reveal flaws in, the set of
> primitives that the kernel exposes.  This is valuable stuff for a
> research organization.  "Research" was right there in the CSRC's name.

I believe this is at once making a more complex argument than was
proffered, and at the same misses the contextual essence that Unix was
created in.

> Viewpoint 2: "I Just Want to Serve 5 Terabytes"[1]
>
> cat(1)'s man page did not advertise the traits in the foregoing
> viewpoint as objectives, and never did.[2]  Its avowed purpose was to
> copy, without interruption or separation, 1..n files from storage to and
> output channel or stream (which might be redirected).
>
> I don't need to tell convince that this is a worthwhile application.
> But when we think about the many possible ways--and destinations--a
> person might have in mind for that I/O channel, we have to face the
> necessity of buffering or performance goes through the floor.
>
> It is 1978.  Some VMS

I don't know about that; VMS IO is notably slower than Unix IO by
default. Unlike VMS, Unix uses the buffer cache to serialize access to
the underlying storage device(s). Ironically, caching here is a major
win, not just for speed, but to make it relatively easy to reason
about the state of a block, since that state is removed from the
minutiae of the underlying storage device and instead handled in the
bio layer. Treating the block cache as a fixed-size pool yields a
relatively simple state machine for synchronizing between the
in-memory and on-disk representations of data.

>[snip]
> And this, as we all know, is one of the reasons the standard I/O library
> came into existence.  Mike Lesk, I surmise, understood that the
> "applications programmer" having knowledge of kernel internals was in
> general neither necessary nor desirable.

I'm not sure about that.  I suspect that the justification _may_ have
been more along the lines of noting that many programs implemented
their own, largely similar buffering strategies, and that it was
preferable to centralize those into a single library, and also noting
that building some kinds of programs was inconvenient using raw system
calls. For instance, something like `gets` is handy, but is _annoying_
to write using just read(2). It can obviously be done, but if I don't
have to, I'd prefer not to.

> [snip]
> We should have kept cat(1), and let it grow as many flags as practical
> use demanded--_except_ for `-u`--and at the _same time_ developed a new
> kcat(1) command that really was just a thin wrapper around system calls.
> Then you'd be a lot closer to measuring what the kernel was really
> doing, what you were paying for it, and you could still boast of your
> elegance in OS textbooks.
> [snip]

Here's where I think this misses the mark: this focuses too much on
the idea that simple programs exist as to be tests for, and exemplars
of, the kernel system call interface, but what evidence do you have
for that? A simpler explanation is that simple programs are easier to
write, easier to read, easier to reason about, test, and examine for
correctness. Unix amplified this with Doug's "garden hoses of data"
idea and the advent of pipes; here, it was found that small, simple
programs could be combined in often surprisingly unanticipated ways.

Unix built up a philosophy about _how_ to write programs that was
rooted in the problems that were interesting when Unix was first
created. Something we often forget is that research systems are built
to address problems that are interesting _to the researchers who build
them_. This context can shape a system, and we see that with Unix: a
highly synchronous system call interface, because overly elaborate
async interfaces were hard to program; a simple file abstraction that
was easy to use (open/creat/read/write/close/seek/stat) because files
on other contemporary systems were baroque things that were difficult
to use; a simple primitive for the creation of processes because,
again, on other systems processes were very heavy, complicated things
that were difficult to use. Unix took problems related to IO and
processes and made them easy. By the 80s, these were pretty well
understood, so focus shifted to other things (languages, networking,
etc).

Unix is one of those rare beasts that escaped the lab and made it out
there in the wild. It became the workhorse that beget a whole two or
three generations of commercial work; it's unsurprising that when the
web explosion happened, Unix became the basis for it: it was there, it
was familiar, and by then it wasn't a research project anymore, but a
basis for serious commercial work. That it has retained the original
system call interface is almost incidental; perhaps that fits with
your brocolli-man analogy.

        - Dan C.

next prev parent reply	other threads:[~2024-05-15 14:43 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-13 13:34 [TUHS] " Douglas McIlroy
2024-05-13 22:01 ` [TUHS] " Andrew Warkentin
2024-05-14  7:10 ` Rob Pike
2024-05-14 11:10   ` G. Branden Robinson
2024-05-15 14:42     ` Dan Cross [this message]
2024-05-15 16:42       ` G. Branden Robinson
2024-05-19  1:04         ` Bakul Shah via TUHS
2024-05-19  1:21           ` Larry McVoy
2024-05-19  1:26             ` Serissa
2024-05-19  1:40             ` Bakul Shah via TUHS
2024-05-19  1:50               ` Bakul Shah via TUHS
2024-05-19  2:02               ` Larry McVoy
2024-05-19  2:28                 ` Bakul Shah via TUHS
2024-05-19  2:53                 ` Andrew Warkentin
2024-05-19  8:30                   ` Marc Rochkind
2024-05-19  2:26             ` Andrew Warkentin
2024-05-19 16:04           ` Paul Winalski
2024-05-14 22:08   ` George Michaelson
2024-05-14 22:34 ` Bakul Shah via TUHS
2024-05-19 10:41 ` Ralph Corderoy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAEoi9W6dsfSJUpNpKdhDeTasM8ecVxWs6PaRQwDqvo7w7LNTcA@mail.gmail.com \
    --to=crossd@gmail.com \
    --cc=g.branden.robinson@gmail.com \
    --cc=tuhs@tuhs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).