From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham autolearn_force=no version=3.4.2 Received: from minnie.tuhs.org (minnie.tuhs.org [45.79.103.53]) by inbox.vuxu.org (OpenSMTPD) with ESMTP id 17b5c29e for ; Sun, 12 Jan 2020 21:38:06 +0000 (UTC) Received: by minnie.tuhs.org (Postfix, from userid 112) id 0C5029BD48; Mon, 13 Jan 2020 07:38:05 +1000 (AEST) Received: from minnie.tuhs.org (localhost [127.0.0.1]) by minnie.tuhs.org (Postfix) with ESMTP id 554D49BD16; Mon, 13 Jan 2020 07:37:29 +1000 (AEST) Received: by minnie.tuhs.org (Postfix, from userid 112) id A91459BD0F; Mon, 13 Jan 2020 07:37:26 +1000 (AEST) Received: from fourwinds.com (fourwinds.com [63.64.179.162]) by minnie.tuhs.org (Postfix) with ESMTPS id 9E1F69BCA8 for ; Mon, 13 Jan 2020 07:37:24 +1000 (AEST) Received: from darkstar.fourwinds.com (localhost [127.0.0.1]) by fourwinds.com (8.15.2/8.15.2) with ESMTPS id 00CLbMBi582816 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT) for ; Sun, 12 Jan 2020 13:37:23 -0800 Received: from darkstar.fourwinds.com (jon@localhost) by darkstar.fourwinds.com (8.15.2/8.15.2/Submit) with ESMTP id 00CLbMrw582813 for ; Sun, 12 Jan 2020 13:37:22 -0800 Message-Id: <202001122137.00CLbMrw582813@darkstar.fourwinds.com> From: Jon Steinhart To: The Eunuchs Hysterical Society In-reply-to: References: <202001121343.00CDhYHK132101@tahoe.cs.Dartmouth.EDU> <202001122034.00CKYQ39571239@darkstar.fourwinds.com> <202001122044.00CKiUNV573279@darkstar.fourwinds.com> Comments: In-reply-to Kevin Bowling message dated "Sun, 12 Jan 2020 14:03:39 -0700." MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <582806.1578865041.1@darkstar.fourwinds.com> Date: Sun, 12 Jan 2020 13:37:21 -0800 X-JON-SPAM: local delivery Subject: Re: [TUHS] Tech Sq elevator (Was: screen editors) [ really I think efficiency now ] X-BeenThere: tuhs@minnie.tuhs.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: The Unix Heritage Society mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: tuhs-bounces@minnie.tuhs.org Sender: "TUHS" Kevin Bowling writes: > On Sun, Jan 12, 2020 at 1:45 PM Jon Steinhart wrote: > > > > Kevin Bowling writes: > > > I honestly can't tell if this is genius level snark :) in case you're > > > sincere we generally go to great lengths to build up data types and > > > structures (in C lingo) when programming only to tear those useful > > > attributes off often at inopportune times. Basically type > > > systems/type safety have been too expensive or too difficult to use > > > through history. > > > > > > Think of sitting at an SQL prompt as a counterpoint. You can pretty > > > easily get at the underlying representation and relationships of the > > > data and the output is just a side effect. Not saying SQL is the > > > ultimate answer, just that most people have a bit of experience with > > > it and UNIX so can mentally compare the two for themselves and see the > > > pros and cons to preserving the underlying representations. > > > > > > Regards, > > > Kevin > > > > > > On Sun, Jan 12, 2020 at 1:34 PM Jon Steinhart wrote: > > > > > > > > Kevin Bowling writes: > > > > > This is kind of illustrative of the '60s acid trip that perpetuates in > > > > > programming "Everything's a string maaaaan". The output is seen as > > > > > truth because the representation is for some reason too hard to get at > > > > > or too hard to cascade through the system. > > > > > > > > > > There's a total comedy of work going on in the unix way of a wc > > > > > pipeline versus calling a length function on a list. Nonetheless, the > > > > > unix pipeline was and is often magnitude easier for a single user to > > > > > get at. This kind of thing is amusing and endearing to me about our > > > > > profession in modern day. > > > > > > > > > > Regards, > > > > > Kevin > > > > > > > > Can you please elaborate? I read your post, and while I can see that it > > > > contains English words I can't make any sense out of what you said. > > > > > > > > Thanks, > > > > Jon > > > > I wasn't being snarky. You said > > > > "The output is seen as truth because the representation is for some > > reason too hard to get at or too hard to cascade through the system." > > > > I honestly have no idea what that means. > > If the SQL prompt example did not clarify you are welcome to go one on > one if this is something you think is curious to you, I think I've > explained the point I was making adequately for a general audience. > > > > > Likewise, > > > > "There's a total comedy of work going on in the unix way of > > a wc pipeline versus calling a length function on a list." > > > > I just don't know what you mean. > > > > Reason through what happens in a shell pipeline, the more detail the > better. A quick nudge is fork/exec, what happens in the kernel, what > happens in page tables, what happens at the buffered output, tty layer > etc. Very few people actually understand all these steps even at a > general level in modern systems. > > If you had a grocery list on a piece of paper would you > a) count the lines or read it directly off the last line number on the > paper if it is numbered > > b) copy each character letter by letter to a new piece of equipment > (say, a word processor), until you encounter a special character that > happens to be represented as a space on the screen, increment a > counter, repeat until you reach another special character, output the > result and then destroy and throw away both the list and the word > processor equipment. > > This kind of thing doesn't really matter in the small or at all for > performance because computers are fast. But most programming bugs in > the large eventually boil down to some kind of misunderstanding where > the representation was lost and recast in a way that does not make > sense. > > Regards, > Kevin OK, I have trouble correlating this with your original post but I think that I understand it well enough to comment. I agree that it is a problem that very few people understand what's going on inside anything today from a toaster to a computer. On the computer end of things this concerns me a lot and improving the quality of education in this area is one of my main late-in-life missions. I'm under the illusion that I've helped some based on comments that I've received from people who have tracked me down and let me know how much the information in my book helped them. On to your example... If I had a grocery list on a piece of paper I would count the lines because I don't number my grocery lists. I'm going to guess that few people do. So, I would count the lines in my head and remember the result. This is pretty much equivalent to what happens when something is piped into wc. I don't see much difference between a and b in your example. That's because when I count up the number of lines in the list, I am making a temporary copy of the list in my head and then forgetting what was on the list (which may account for the late night trip to the grocery store a couple of days ago). So I think that the point that you're trying to make, correct me if I'm wrong, is that if lists just knew how long they were you could just ask and that it would be more efficient. While that may be true, it sort of assume that this is something so common that the extra overhead for line counting should be part of every list. And it doesn't address the issue that while maybe you want a line count I may want a character count or a count of all lines that begin with the letter A. Limiting this example to just line numbers ignores the fact that different people might want different information that can't all be predicted in advance and built into every program. It also seems to me that the root problem here is that the data in the original example was in an emacs-specific format instead of the default UNIX text file format. The beauty of UNIX is that with a common file format one can create tools that process data in different ways that then operate on all data. Yes, it's not as efficient as creating a custom tool for a particular purpose, but is much better for casual use. One can always create a special purpose tool if a particular use becomes so prevalent that the extra efficiency is worthwhile. If you're not familiar with it, find a copy of the Communications of the ACM issue where Knuth presented a clever search algorithm (if I remember correctly) and McIlroy did a critique. One of the things that Doug pointed out what that while Don's code was more efficient, by creating a new pile of special-purpose code he introduced bugs. Many people have claimed, incorrectly in my opinion, that this model fails in the modern era because it only works on text data. They change the subject when I point out that ImageMagick works on binary data. And, there are now stream processing utilities for JSON data and such that show that the UNIX model still works IF you understand it and know how to use it. I don't agree with your closing comment about "most programming bugs". Do you have any data to support this or is it just an opinion? My opinion is that most programming bugs today result from total incompetence as one can prety much get a computer science degree today without every learning that programs run on computers or what a computer is. That's something I'm trying to change, but it's probably a lost cause. A long topic, and not necessarily appropriate for this list. Jon