The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
From: Jon Steinhart <jon@fourwinds.com>
To: The Eunuchs Hysterical Society <tuhs@tuhs.org>
Subject: Re: [TUHS] Tech Sq elevator (Was: screen editors) [ really I think efficiency now ]
Date: Sun, 12 Jan 2020 13:37:21 -0800	[thread overview]
Message-ID: <202001122137.00CLbMrw582813@darkstar.fourwinds.com> (raw)
In-Reply-To: <CAK7dMtB0-dpyZHsxuLpL8dCEJGV24xuD9VE+ueYFM_dbFxPicg@mail.gmail.com>

Kevin Bowling writes:
> On Sun, Jan 12, 2020 at 1:45 PM Jon Steinhart <jon@fourwinds.com> wrote:
> >
> > Kevin Bowling writes:
> > > I honestly can't tell if this is genius level snark :) in case you're
> > > sincere we generally go to great lengths to build up data types and
> > > structures (in C lingo) when programming only to tear those useful
> > > attributes off often at inopportune times.  Basically type
> > > systems/type safety have been too expensive or too difficult to use
> > > through history.
> > >
> > > Think of sitting at an SQL prompt as a counterpoint.  You can pretty
> > > easily get at the underlying representation and relationships of the
> > > data and the output is just a side effect.  Not saying SQL is the
> > > ultimate answer, just that most people have a bit of experience with
> > > it and UNIX so can mentally compare the two for themselves and see the
> > > pros and cons to preserving the underlying representations.
> > >
> > > Regards,
> > > Kevin
> > >
> > > On Sun, Jan 12, 2020 at 1:34 PM Jon Steinhart <jon@fourwinds.com> wrote:
> > > >
> > > > Kevin Bowling writes:
> > > > > This is kind of illustrative of the '60s acid trip that perpetuates in
> > > > > programming "Everything's a string maaaaan".  The output is seen as
> > > > > truth because the representation is for some reason too hard to get at
> > > > > or too hard to cascade through the system.
> > > > >
> > > > > There's a total comedy of work going on in the unix way of a wc
> > > > > pipeline versus calling a length function on a list.  Nonetheless, the
> > > > > unix pipeline was and is often magnitude easier for a single user to
> > > > > get at.  This kind of thing is amusing and endearing to me about our
> > > > > profession in modern day.
> > > > >
> > > > > Regards,
> > > > > Kevin
> > > >
> > > > Can you please elaborate?  I read your post, and while I can see that it
> > > > contains English words I can't make any sense out of what you said.
> > > >
> > > > Thanks,
> > > >         Jon
> >
> > I wasn't being snarky.  You said
> >
> >   "The output is seen as truth because the representation is for some
> >   reason too hard to get at or too hard to cascade through the system."
> >
> > I honestly have no idea what that means.
>
> If the SQL prompt example did not clarify you are welcome to go one on
> one if this is something you think is curious to you, I think I've
> explained the point I was making adequately for a general audience.
>
> >
> > Likewise,
> >
> >   "There's a total comedy of work going on in the unix way of
> >   a wc pipeline versus calling a length function on a list."
> >
> > I just don't know what you mean.
> >
>
> Reason through what happens in a shell pipeline, the more detail the
> better.  A quick nudge is fork/exec, what happens in the kernel, what
> happens in page tables, what happens at the buffered output, tty layer
> etc.  Very few people actually understand all these steps even at a
> general level in modern systems.
>
> If you had a grocery list on a piece of paper would you
> a) count the lines or read it directly off the last line number on the
> paper if it is numbered
>
> b) copy each character letter by letter to a new piece of equipment
> (say, a word processor), until you encounter a special character that
> happens to be represented as a space on the screen, increment a
> counter, repeat until you reach another special character, output the
> result and then destroy and throw away both the list and the word
> processor equipment.
>
> This kind of thing doesn't really matter in the small or at all for
> performance because computers are fast.  But most programming bugs in
> the large eventually boil down to some kind of misunderstanding where
> the representation was lost and recast in a way that does not make
> sense.
>
> Regards,
> Kevin

OK, I have trouble correlating this with your original post but I think
that I understand it well enough to comment.

I agree that it is a problem that very few people understand what's going on
inside anything today from a toaster to a computer.  On the computer end of
things this concerns me a lot and improving the quality of education in this
area is one of my main late-in-life missions.  I'm under the illusion that
I've helped some based on comments that I've received from people who have
tracked me down and let me know how much the information in my book helped them.

On to your example...

If I had a grocery list on a piece of paper I would count the lines because I
don't number my grocery lists.  I'm going to guess that few people do.  So, I
would count the lines in my head and remember the result.  This is pretty much
equivalent to what happens when something is piped into wc.

I don't see much difference between a and b in your example.  That's because
when I count up the number of lines in the list, I am making a temporary copy
of the list in my head and then forgetting what was on the list (which may
account for the late night trip to the grocery store a couple of days ago).

So I think that the point that you're trying to make, correct me if I'm wrong,
is that if lists just knew how long they were you could just ask and that it
would be more efficient.

While that may be true, it sort of assume that this is something so common that
the extra overhead for line counting should be part of every list.  And it doesn't
address the issue that while maybe you want a line count I may want a character
count or a count of all lines that begin with the letter A.  Limiting this example
to just line numbers ignores the fact that different people might want different
information that can't all be predicted in advance and built into every program.

It also seems to me that the root problem here is that the data in the original
example was in an emacs-specific format instead of the default UNIX text file
format.

The beauty of UNIX is that with a common file format one can create tools that
process data in different ways that then operate on all data.  Yes, it's not as
efficient as creating a custom tool for a particular purpose, but is much better
for casual use.  One can always create a special purpose tool if a particular
use becomes so prevalent that the extra efficiency is worthwhile.  If you're not
familiar with it, find a copy of the Communications of the ACM issue where Knuth
presented a clever search algorithm (if I remember correctly) and McIlroy did a
critique.  One of the things that Doug pointed out what that while Don's code was
more efficient, by creating a new pile of special-purpose code he introduced bugs.

Many people have claimed, incorrectly in my opinion, that this model fails in the
modern era because it only works on text data.  They change the subject when I
point out that ImageMagick works on binary data.  And, there are now stream
processing utilities for JSON data and such that show that the UNIX model still
works IF you understand it and know how to use it.

I don't agree with your closing comment about "most programming bugs".  Do you
have any data to support this or is it just an opinion?  My opinion is that most
programming bugs today result from total incompetence as one can prety much get
a computer science degree today without every learning that programs run on
computers or what a computer is.  That's something I'm trying to change, but it's
probably a lost cause.  A long topic, and not necessarily appropriate for this list.

Jon

  reply	other threads:[~2020-01-12 21:38 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-12 13:43 [TUHS] Tech Sq elevator (Was: screen editors) Doug McIlroy
2020-01-12 16:56 ` Warner Losh
2020-01-12 17:21   ` markus schnalke
2020-01-12 20:25   ` Kevin Bowling
2020-01-12 20:32     ` Larry McVoy
2020-01-12 20:34     ` Jon Steinhart
2020-01-12 20:40       ` Kevin Bowling
2020-01-12 20:44         ` Jon Steinhart
2020-01-12 21:03           ` Kevin Bowling
2020-01-12 21:37             ` Jon Steinhart [this message]
     [not found]               ` <CAEoi9W4fXLaTRM1mv4wnVbifCFBEw_iKL9cds8ds-FBRTwM-=g@mail.gmail.com>
     [not found]                 ` <CAEoi9W6LedGGjWPO=ZgZzVdGLqs8drhqcWkvA_DfKTOtMDgegQ@mail.gmail.com>
2020-01-13 23:46                   ` [TUHS] Tech Sq elevator (Was: screen editors) [ really I think efficiency now ] Dan Cross
2020-01-14 23:17                     ` Kevin Bowling
2020-01-18 15:45                     ` Michael Parson
2020-01-18 18:45                       ` Jon Steinhart
2020-01-18 18:59                         ` Michael Parson
2020-01-18 20:31                           ` Adam Thornton
2020-01-21 21:57                       ` Derek Fawcus
2020-01-22  7:21                         ` arnold
2020-01-22  7:29                           ` Tyler Adams
2020-01-12 21:41           ` [TUHS] Tech Sq elevator (Was: screen editors) Bakul Shah
2020-01-12 21:47             ` Jon Steinhart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202001122137.00CLbMrw582813@darkstar.fourwinds.com \
    --to=jon@fourwinds.com \
    --cc=tuhs@tuhs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).