From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <tuhs-bounces@minnie.tuhs.org>
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on inbox.vuxu.org
X-Spam-Level: 
X-Spam-Status: No, score=-0.6 required=5.0 tests=DKIM_SIGNED,DKIM_VALID,
	DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
	HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_NONE autolearn=ham autolearn_force=no version=3.4.2
Received: from minnie.tuhs.org (minnie.tuhs.org [45.79.103.53])
	by inbox.vuxu.org (OpenSMTPD) with ESMTP id 81563ee3
	for <ml@inbox.vuxu.org>;
	Mon, 13 Jan 2020 23:48:02 +0000 (UTC)
Received: by minnie.tuhs.org (Postfix, from userid 112)
	id 2EF9E9B88E; Tue, 14 Jan 2020 09:48:01 +1000 (AEST)
Received: from minnie.tuhs.org (localhost [127.0.0.1])
	by minnie.tuhs.org (Postfix) with ESMTP id 2D25A9B841;
	Tue, 14 Jan 2020 09:47:23 +1000 (AEST)
Authentication-Results: minnie.tuhs.org;
	dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="OCILvd1h";
	dkim-atps=neutral
Received: by minnie.tuhs.org (Postfix, from userid 112)
 id A18259B841; Tue, 14 Jan 2020 09:47:20 +1000 (AEST)
Received: from mail-qt1-f176.google.com (mail-qt1-f176.google.com
 [209.85.160.176])
 by minnie.tuhs.org (Postfix) with ESMTPS id E495E9B804
 for <tuhs@tuhs.org>; Tue, 14 Jan 2020 09:47:19 +1000 (AEST)
Received: by mail-qt1-f176.google.com with SMTP id i13so10827201qtr.3
 for <tuhs@tuhs.org>; Mon, 13 Jan 2020 15:47:19 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc; bh=vSK70GvlFLZVfzRsiiB8Co9n9oRbArOwDGRHNl5hsTA=;
 b=OCILvd1hhgZyjhJisti1hAhnLrFvlCtvSYKWk1oZIQ3r29vZPqEjkCOfQUh145sGmq
 J/nlAlI6NfYXI6PkdZVpmQ35c80sHEPaNr05vymsMIOOKTmcw6abxgVxK518mqdjS+1i
 cxtNHsUoO5fyzX5jYQHTDGwlmVz725V6THZLomsD/gPAiLi6xodO5ZrPorAuX2U/xGTG
 s02p+vkwA2aE7oVrmtf21ps9NRjCuob98LasWfC/Y9FZUAJMIxYVPMpkQ+GCccDQWKAr
 QULTTd/cwVQAUz9DicyxngUgjN/HYthdm0cK3jZZTHGqJdfu9itIddXC6/JM9VB2TvsL
 94hQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=vSK70GvlFLZVfzRsiiB8Co9n9oRbArOwDGRHNl5hsTA=;
 b=XNnJdDhHzkftFhqncEW16Z3tGPQnOG36L7hB0HiCV9wzHgyFcrwf9zsu1aR5VOR4rH
 8EDRk2IcX+RHWPsWId+o8OZftUVepGBF+LnaT8bgAsotrlrUCicZE2Vc+micVlhRF2JQ
 rRph/RupVWwOD6MyLMfU456mVO+8V3qamRt8lyIcVaZqrTxCuEB0RyX8JczDxn7ydGJM
 Hjs55SoaxMHemDgHpqYLxpM6njAakQKskzNOGKNmAY31qULwcfAUp5+YWeN+8cEP8k5/
 Pxto3WKeKA9bTy4hx97wjvN+ICHAjZBGZSlec5hDCDg2UBB646UECd/ALqBVYbNERbnm
 4rDw==
X-Gm-Message-State: APjAAAXznvC6TqKV8hF5HHh6gtjAAflKKb8cNuxAfTz97bB0NajJzQcC
 zpmdqXH2etQdkqe7Gn8RZMndRsTcwOLH+5GU/bc=
X-Google-Smtp-Source: APXvYqyC4vx6ndBQK9aE/sXAFOzyPz1A6fEu2CjIekpmoQTVjNLJPOqNzNs7ORTYZOisnUFqZE8lh1RdAU3SOmBSm94=
X-Received: by 2002:aed:2ce4:: with SMTP id g91mr1137818qtd.352.1578959239217; 
 Mon, 13 Jan 2020 15:47:19 -0800 (PST)
MIME-Version: 1.0
References: <202001121343.00CDhYHK132101@tahoe.cs.Dartmouth.EDU>
 <CANCZdfrPvQKEhb2dP1_iM72pZU_Gw7dPMZLy4GMKU-1Q5iEY7w@mail.gmail.com>
 <CAK7dMtAH2frfiTCy=XxeYb4F5u5ndQsMVk_-MSxQd6aVfjWOwQ@mail.gmail.com>
 <202001122034.00CKYQ39571239@darkstar.fourwinds.com>
 <CAK7dMtBhRNUS4t-CaUFnWAP7TWpLRetTA36pL5wP1q6=19APnQ@mail.gmail.com>
 <202001122044.00CKiUNV573279@darkstar.fourwinds.com>
 <CAK7dMtB0-dpyZHsxuLpL8dCEJGV24xuD9VE+ueYFM_dbFxPicg@mail.gmail.com>
 <202001122137.00CLbMrw582813@darkstar.fourwinds.com>
 <CAEoi9W4fXLaTRM1mv4wnVbifCFBEw_iKL9cds8ds-FBRTwM-=g@mail.gmail.com>
 <CAEoi9W6LedGGjWPO=ZgZzVdGLqs8drhqcWkvA_DfKTOtMDgegQ@mail.gmail.com>
In-Reply-To: <CAEoi9W6LedGGjWPO=ZgZzVdGLqs8drhqcWkvA_DfKTOtMDgegQ@mail.gmail.com>
From: Dan Cross <crossd@gmail.com>
Date: Mon, 13 Jan 2020 18:46:43 -0500
Message-ID: <CAEoi9W5nykr0V_qgXCr-5W=T6k_5h8j87-YNDnWCRj21DfPwfA@mail.gmail.com>
To: Jon Steinhart <jon@fourwinds.com>
Content-Type: multipart/alternative; boundary="0000000000009184bd059c0e183d"
Subject: Re: [TUHS] Tech Sq elevator (Was: screen editors) [ really I think
 efficiency now ]
X-BeenThere: tuhs@minnie.tuhs.org
X-Mailman-Version: 2.1.26
Precedence: list
List-Id: The Unix Heritage Society mailing list <tuhs.minnie.tuhs.org>
List-Unsubscribe: <https://minnie.tuhs.org/cgi-bin/mailman/options/tuhs>,
 <mailto:tuhs-request@minnie.tuhs.org?subject=unsubscribe>
List-Archive: <http://minnie.tuhs.org/pipermail/tuhs/>
List-Post: <mailto:tuhs@minnie.tuhs.org>
List-Help: <mailto:tuhs-request@minnie.tuhs.org?subject=help>
List-Subscribe: <https://minnie.tuhs.org/cgi-bin/mailman/listinfo/tuhs>,
 <mailto:tuhs-request@minnie.tuhs.org?subject=subscribe>
Cc: The Eunuchs Hysterical Society <tuhs@tuhs.org>
Errors-To: tuhs-bounces@minnie.tuhs.org
Sender: "TUHS" <tuhs-bounces@minnie.tuhs.org>

--0000000000009184bd059c0e183d
Content-Type: text/plain; charset="UTF-8"

[Resending as this got squashed a few days ago. Jon, sorry for the
duplicate. Again.]

On Sun, Jan 12, 2020 at 4:38 PM Jon Steinhart <jon@fourwinds.com> wrote:

> [snip]
> So I think that the point that you're trying to make, correct me if I'm
> wrong,
> is that if lists just knew how long they were you could just ask and that
> it
> would be more efficient.
>

What I understood was that, by translating into a lowest-common-denominator
format like text, one loses much of the semantic information implicit in a
richer representation. In particular, much of the internal knowledge (like
type information...) is lost during translation and presentation. Put
another way, with text as usually used by the standard suite of Unix tools,
type information is implicit, rather than explicit. I took this to be less
an issue of efficiency and more of expressiveness.

It is, perhaps, important to remember that Unix works so well because of
heavy use of convention: to take Doug's example, the total number of
commands might be easy to find with `wc` because one assumes each command
is presented on a separate line, with no gaudy header or footer information
or extraneous explanatory text.

This sort of convention, where each logical "record" is a line by itself,
is pervasive on Unix systems, but is not guaranteed. In some sense, those
representations are fragile: a change in output might break something else
downstream in the pipeline, whereas a representation that captures more
semantic meaning is more robust in the face of change but, as in Doug's
example, often harder to use. The Lisp Machine had all sorts of cool
information in the image and a good Lisp hacker familiar with the machine's
structures could write programs to extract and present that information.
But doing so wasn't trivial in the way that '| wc -l' in response to a
casual query is.

While that may be true, it sort of assume that this is something so common
> that
> the extra overhead for line counting should be part of every list.  And it
> doesn't
> address the issue that while maybe you want a line count I may want a
> character
> count or a count of all lines that begin with the letter A.  Limiting this
> example
> to just line numbers ignores the fact that different people might want
> different
> information that can't all be predicted in advance and built into every
> program.
>

This I think illustrates an important point: Unix conventions worked well
enough in practice that many interesting tasks were not just tractable, but
easy and in some cases trivial. Combining programs was easy via pipelines.
Harder stuff involving more elaborate data formats was possible, but, well,
harder and required more involved programming. By contrast, the Lisp
machine could do the hard stuff, but the simple stuff also required
non-trivial programming.

The SQL database point was similarly interesting: having written programs
to talk to relational databases, yes, one can do powerful things: but the
amount of programming required is significant at a minimum and often
substantial.


> It also seems to me that the root problem here is that the data in the
> original
> example was in an emacs-specific format instead of the default UNIX text
> file
> format.
>
> The beauty of UNIX is that with a common file format one can create tools
> that
> process data in different ways that then operate on all data.  Yes, it's
> not as
> efficient as creating a custom tool for a particular purpose, but is much
> better
> for casual use.  One can always create a special purpose tool if a
> particular
> use becomes so prevalent that the extra efficiency is worthwhile.  If
> you're not
> familiar with it, find a copy of the Communications of the ACM issue where
> Knuth
> presented a clever search algorithm (if I remember correctly) and McIlroy
> did a
> critique.  One of the things that Doug pointed out what that while Don's
> code was
> more efficient, by creating a new pile of special-purpose code he
> introduced bugs.
>

The flip side is that one often loses information in the conversion to
text: yes, there are structured data formats with text serializations that
can preserve the lost information, but consuming and processing those with
the standard Unix tools can be messy. Seemingly trivial changes in text,
like reversing the order of two fields, can break programs that consume
that data. Data must be suitable for pipelining (e.g., perhaps free-form
text must be free of newlines or something). These are all limitations.
Where I think the argument went awry is in not recognizing that very often
those problems, while real, are at least tractable.

Many people have claimed, incorrectly in my opinion, that this model fails
> in the
> modern era because it only works on text data.  They change the subject
> when I
> point out that ImageMagick works on binary data.  And, there are now stream
> processing utilities for JSON data and such that show that the UNIX model
> still
> works IF you understand it and know how to use it.
>

Certainly. I think you hit the nail on the head with the proviso that one
must _understand_ the Unix model and how to use it. If one does so, it's
very powerful indeed, and it really is applicable more often than not. But
it is not a panacea (not that anyone suggested it is). As an example, how
do I apply an unmodified `grep` to arbitrary JSON data (which may span more
than one line)? Perhaps there is a way (I can imagine a 'record2line'
program that consumes a single JSON object and emits it as a syntactically
valid one-liner...) but I can also imagine all sorts of ways that might go
wrong.

        - Dan C.

--0000000000009184bd059c0e183d
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail=
_attr">[Resending as this got squashed a few days ago. Jon, sorry for the d=
uplicate. Again.]<br></div><div dir=3D"ltr"><div class=3D"gmail_quote"><div=
 dir=3D"ltr" class=3D"gmail_attr"><br></div><div dir=3D"ltr" class=3D"gmail=
_attr">On Sun, Jan 12, 2020 at 4:38 PM Jon Steinhart &lt;<a href=3D"mailto:=
jon@fourwinds.com" target=3D"_blank">jon@fourwinds.com</a>&gt; wrote:<br></=
div><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div class=3D"gmail_=
quote"><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;=
border-left:1px solid rgb(204,204,204);padding-left:1ex">[snip]<br>
So I think that the point that you&#39;re trying to make, correct me if I&#=
39;m wrong,<br>
is that if lists just knew how long they were you could just ask and that i=
t<br>
would be more efficient.<br></blockquote><div><br></div><div>What I underst=
ood was that, by translating into a lowest-common-denominator format like t=
ext, one loses much of the semantic information implicit in a richer repres=
entation. In particular, much of the internal knowledge (like type informat=
ion...) is lost during translation and presentation. Put another way, with =
text as usually used by the standard suite of Unix tools, type information =
is implicit, rather than explicit. I took this to be less an issue of effic=
iency and more of expressiveness.</div><div><br></div><div>It is, perhaps, =
important to remember that Unix works so well because of heavy use of conve=
ntion: to take Doug&#39;s example, the total number of commands might be ea=
sy to find with `wc` because one assumes each command is presented on a sep=
arate line, with no gaudy header or footer information or extraneous explan=
atory text.</div><div><br></div><div>This sort of convention, where each lo=
gical &quot;record&quot; is a line by itself, is pervasive on Unix systems,=
 but is not guaranteed. In some sense, those representations are fragile: a=
 change in output might break something=C2=A0else downstream in the pipelin=
e, whereas a representation that captures more semantic meaning is more rob=
ust in the face of change but, as in Doug&#39;s example, often harder to us=
e. The Lisp Machine had all sorts of cool information in the image and a go=
od Lisp hacker familiar with the machine&#39;s structures could write progr=
ams to extract and present that information. But doing so wasn&#39;t trivia=
l in the way that &#39;| wc -l&#39; in response to a casual query is.</div>=
<div><br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0p=
x 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
While that may be true, it sort of assume that this is something so common =
that<br>
the extra overhead for line counting should be part of every list.=C2=A0 An=
d it doesn&#39;t<br>
address the issue that while maybe you want a line count I may want a chara=
cter<br>
count or a count of all lines that begin with the letter A.=C2=A0 Limiting =
this example<br>
to just line numbers ignores the fact that different people might want diff=
erent<br>
information that can&#39;t all be predicted in advance and built into every=
 program.<br></blockquote><div><br></div><div>This I think illustrates an i=
mportant point: Unix conventions worked well enough in practice that many i=
nteresting tasks were not just tractable, but easy and in some cases trivia=
l. Combining programs was easy via pipelines. Harder stuff involving more e=
laborate data formats was possible, but, well, harder and required more inv=
olved programming. By contrast, the Lisp machine could do the hard stuff, b=
ut the simple stuff also required non-trivial programming.</div><div><br></=
div><div>The SQL database point was similarly interesting: having written p=
rograms to talk to relational databases, yes, one can do powerful things: b=
ut the amount of programming required is significant at a minimum and often=
 substantial.</div><div></div><div>=C2=A0</div><blockquote class=3D"gmail_q=
uote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,2=
04);padding-left:1ex">
It also seems to me that the root problem here is that the data in the orig=
inal<br>
example was in an emacs-specific format instead of the default UNIX text fi=
le<br>
format.<br>
<br>
The beauty of UNIX is that with a common file format one can create tools t=
hat<br>
process data in different ways that then operate on all data.=C2=A0 Yes, it=
&#39;s not as<br>
efficient as creating a custom tool for a particular purpose, but is much b=
etter<br>
for casual use.=C2=A0 One can always create a special purpose tool if a par=
ticular<br>
use becomes so prevalent that the extra efficiency is worthwhile.=C2=A0 If =
you&#39;re not<br>
familiar with it, find a copy of the Communications of the ACM issue where =
Knuth<br>
presented a clever search algorithm (if I remember correctly) and McIlroy d=
id a<br>
critique.=C2=A0 One of the things that Doug pointed out what that while Don=
&#39;s code was<br>
more efficient, by creating a new pile of special-purpose code he introduce=
d bugs.<br></blockquote><div><br></div><div>The flip side is that one often=
 loses information in the conversion to text: yes, there are structured dat=
a formats with text serializations that can preserve the lost information, =
but consuming and processing those with the standard Unix tools can be mess=
y. Seemingly trivial changes in text, like reversing the order of two field=
s, can break programs that consume that data. Data must be suitable for pip=
elining (e.g., perhaps free-form text must be free of newlines or something=
). These are all limitations. Where I think the argument went awry is in no=
t recognizing that very often those problems, while real, are at least trac=
table.</div><div><br></div><blockquote class=3D"gmail_quote" style=3D"margi=
n:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex=
">
Many people have claimed, incorrectly in my opinion, that this model fails =
in the<br>
modern era because it only works on text data.=C2=A0 They change the subjec=
t when I<br>
point out that ImageMagick works on binary data.=C2=A0 And, there are now s=
tream<br>
processing utilities for JSON data and such that show that the UNIX model s=
till<br>
works IF you understand it and know how to use it.<br></blockquote><div><br=
></div><div>Certainly. I think you hit the nail on the head with the provis=
o that one must _understand_ the Unix model and how to use it. If one does =
so, it&#39;s very powerful indeed, and it really is applicable more often t=
han not. But it is not a panacea (not that anyone suggested it is). As an e=
xample, how do I apply an unmodified `grep` to arbitrary JSON data (which m=
ay span more than one line)? Perhaps there is a way (I can imagine a &#39;r=
ecord2line&#39; program that consumes a single JSON object and emits it as =
a syntactically valid one-liner...) but I can also imagine all sorts of way=
s that might go wrong.</div><div><br></div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0=
 - Dan C.</div><div><br></div></div></div></div></div>
</div></div>
</div></div>

--0000000000009184bd059c0e183d--