From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=0.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FROM,HTML_MESSAGE,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE,RDNS_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 Received: (qmail 20342 invoked from network); 9 Mar 2020 21:07:17 -0000 Received-SPF: pass (minnie.tuhs.org: domain of minnie.tuhs.org designates 45.79.103.53 as permitted sender) receiver=inbox.vuxu.org; client-ip=45.79.103.53 envelope-from= Received: from unknown (HELO minnie.tuhs.org) (45.79.103.53) by inbox.vuxu.org with ESMTP; 9 Mar 2020 21:07:17 -0000 Received: by minnie.tuhs.org (Postfix, from userid 112) id 529C893D2E; Tue, 10 Mar 2020 07:07:16 +1000 (AEST) Received: from minnie.tuhs.org (localhost [127.0.0.1]) by minnie.tuhs.org (Postfix) with ESMTP id 9555293D2B; Tue, 10 Mar 2020 07:06:37 +1000 (AEST) Authentication-Results: minnie.tuhs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="UtpSUAfB"; dkim-atps=neutral Received: by minnie.tuhs.org (Postfix, from userid 112) id 592E393D29; Tue, 10 Mar 2020 07:06:34 +1000 (AEST) Received: from mail-vs1-f52.google.com (mail-vs1-f52.google.com [209.85.217.52]) by minnie.tuhs.org (Postfix) with ESMTPS id F107093D06 for ; Tue, 10 Mar 2020 07:06:32 +1000 (AEST) Received: by mail-vs1-f52.google.com with SMTP id u26so7043563vsg.2 for ; Mon, 09 Mar 2020 14:06:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=C3LNxEG7Izinzrlk/QSKwYzyTZENqAr/AY+Q0BCMfJU=; b=UtpSUAfBHL/CMJisO0vjHpe5qtD3YPSaLadiUHGgbfJ8qSSbQjf9a0r+EvtO1wcoDe lvtPHOLA/Pa5nzolbUQj3bogehpXn51d8Fkoh5lcmBtBXJS3hS23yQ7/ECblIfh4L2Ls X2/RKaSk29lQLa7mWKwR29JO5PZZNfj/QE+VSb2X3e9H2EaN/jPD2i+j4vo63iD2BuGK xEP7BaIva/+dYCCUvJym495vMmGVfsWOp/YiBkMTr4Jjr/Nq9jj9PU7oJqM8LesaoBMj 3/4HZ3wz5E0gbQ/n3EsY3q5E2gsuFs30QKepJIo3lCk0i0+uGeGUSauo+6oZYx5q8OX3 WRMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=C3LNxEG7Izinzrlk/QSKwYzyTZENqAr/AY+Q0BCMfJU=; b=eLjrxQcEkNVrp+Kb/eSQVc7N8F31lNRENHdoj+CTEoQjn1i9V+n9OFrenB2wYazk1X Xb+nC6NLEOeUTIhdS6o6Lc5v00ac5AmKlhWp3G3REUPwBaDDX4rkrTBCxZH+rXtMPHRJ zZrxoXxj9Y6qMpeUNpQRb7FXID8hmXNBRltdjZmgysRuV4PdaGd9PAYoEXSx0776hf4u YRuoL/8q3Rgcq77SY4KQjuJ+b0tX06Pn97dao95V41sWQAabZKh5Q6ahcX+NC2oG+VN0 Fc1wJ+hQ0Bxjta4yy5lBsQLAECqYKpNRkmXsNKLswe/qvVRjJMD90q7WIeePyJHVzj+L oDNw== X-Gm-Message-State: ANhLgQ0Q/uBfNoafPxMIbiSt3ylalflafN3cHnXrgqoEF4SZiEjw46YU 1qlMdOQKFmPR9n62OXMWRQ8Wied6cosaDJo1Va8= X-Google-Smtp-Source: ADFU+vvJo40Bc4DNNJmr/l5QQ0F8mFkcchKeGKdsLS5K4z/zuxritoF6VH7xOVSImMwBYqZWhenDhrmvIaW4fLqKW6U= X-Received: by 2002:a05:6102:2dc:: with SMTP id h28mr11287719vsh.169.1583787991876; Mon, 09 Mar 2020 14:06:31 -0700 (PDT) MIME-Version: 1.0 References: <20200305020517.GA13872@wopr> <20200308052632.GD20478@eureka.lemis.com> <202003080532.0285WcWn1544496@darkstar.fourwinds.com> In-Reply-To: From: "John P. Linderman" Date: Mon, 9 Mar 2020 17:06:20 -0400 Message-ID: To: Tyler Adams Content-Type: multipart/alternative; boundary="000000000000a7cbac05a072601c" Subject: Re: [TUHS] Command line options and complexity X-BeenThere: tuhs@minnie.tuhs.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: The Unix Heritage Society mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: The Unix Heritage Society Errors-To: tuhs-bounces@minnie.tuhs.org Sender: "TUHS" --000000000000a7cbac05a072601c Content-Type: text/plain; charset="UTF-8" Nothing I'm aware of. I didn't mind throwing "tac" over the wall, because it was trivial, probably a couple hours work for me, under a minute for Ken. But the rsort source is not at all trivial, and still of potential value to AT&T. The source managed to get out as part of the "Hancock" project. I found a link in https://www.wired.com/2007/10/att-invents-pro/ but the page is gone. It probably didn't help that Wired titled the article *AT&T Invents Programming Language for Mass Surveillance* That's horse-pucky, akin to "Pitchfork makers invent device for spearing babies". I'm trying to track down a copy that was released publicly. I'm not hopeful. On Mon, Mar 9, 2020 at 11:28 AM Tyler Adams wrote: > Woah, this sounds really useful, is there anything like it today? > > On Sun, Mar 8, 2020, 16:32 John P. Linderman wrote: > >> In the "UNIX SYSTEM" issue of the BSTJ back in October of 1984, I >> suggested that it might be better, both for functionality *and* >> performance, to have a sort that only worked on records with a *single* >> key to be sorted *lexicographically*, and put all the complexity of >> dealing with native integers, dates, case-mapping, etc into a key-building >> front end. I wrote such a sort built around a radix sort. The sort >> itself sported very few options relating to record format (fixed-length, >> newline terminated, and header-based, where an ascii header identified >> record length, and, optionally, key position and key length), where to find >> the key in fixed-length and newline terminated records, merge-only, check >> sort order only, unique, strip off the sort key (to avoid the need for a >> post-process in many cases). Key-building was usually near-trivial using >> awk or perl or a few commands for tweaking native integer and floating >> point values so they would sort lexicographically. The sort was stable and >> blazingly fast. Some summer students once complained to me that I was >> messing up a paper they were writing because my external sort was faster >> than an internal qsort... the kind of complaint that warms one's heart. At >> the back of my mind was a generic key-building library that would >> accommodate (decimal) numbers of arbitrary length, with or without "E" >> exponents, dates in various formats, string collation for Unicode, etc. It >> remains at the back of my mind. >> >> On Sun, Mar 8, 2020 at 5:32 AM Tyler Adams wrote: >> >>> The idea of a simple rule is great, but the suggested rule fails on sort >>> -u which afaik came after sort | uniq for performance reasons. >>> >>> Another idea on the same vein is that a flag should be added only when >>> the job can be done inside the program and not with stdin/stdout (or no >>> flag can be added if one can reproduce the same behavior using pipelines). >>> >>> So, you need sort -u because only within sort can you get the >>> performance needed to get the job done. >>> >>> But you don't need -h in ls -lh. All the information to render a human >>> readable number is present on stdout of ls -l. You could easily have a >>> filter which renders numbers with options like adding commas, dots, >>> scientific notation, precision, money, units, etc. >>> >>> Tyler >>> >>> On Sun, Mar 8, 2020, 07:33 Jon Steinhart wrote: >>> >>>> After following this discussion, I guess that I have a simplistic way to >>>> determine whether something should be a dash option or a filter. In >>>> general, I'd make a filter if whatever it was doing was applicable to >>>> more than one command, a dash option otherwise. >>>> >>>> Jon >>>> >>> --000000000000a7cbac05a072601c Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Not= hing I'm aware of. I didn't mind throwing "tac" over the = wall, because it was trivial, probably a couple hours work for me, under a = minute for Ken. But the rsort source is not at all trivial, and still of po= tential value to AT&T.

The source managed to get out as part of the "Hancock" project.= I found a link in


but the page is gone. It probably didn't help that Wired titled= the article
AT&T = Invents Programming Language for Mass Surveillance

That's horse-pucky, akin to &= quot;Pitchfork makers invent device for spearing babies". I'm tryi= ng to track down a copy that was released publicly. I'm not hopeful.

On Mon, Mar 9, 2020 at 11:28 AM Tyler Adams <coppero1237@gmail.com> wrote:
Woah, this sou= nds really useful, is there anything like it today?

On Sun, Mar 8, 2020, 16:32= John P. Linderman <jpl.jpl@gmail.com> wrote:
In the "UNIX SYSTEM" issue of the BSTJ back in O= ctober of 1984, I suggested that it might be better, both for functionality= and performance, to have a sort that only worked on records with a = single key to be sorted lexicographically, and put all the co= mplexity of dealing with native integers, dates, case-mapping, etc into a k= ey-building front end. I wrote such a sort built around a radix sort. The s= ort itself=C2=A0sported very few options relating to record format (fixed-l= ength, newline terminated, and header-based, where an ascii header identifi= ed record length, and, optionally, key position and key length), where to f= ind the key in fixed-length and newline terminated records, merge-only, che= ck sort order only, unique, strip off the sort key (to avoid the need for a= post-process in many cases). Key-building was usually near-trivial using a= wk or perl or a few commands for tweaking native integer and floating point= values so they would sort lexicographically. The sort was stable and blazi= ngly fast. Some summer students once complained to me that I was messing up= a paper they were writing because my external sort was faster than an inte= rnal qsort... the kind of complaint that warms one's heart. At the back= of my mind was a generic key-building library that would accommodate (deci= mal) numbers of arbitrary length, with or without "E" exponents, = dates in various formats, string collation for Unicode, etc. It remains at = the back of my mind.

On Sun, Mar 8, 2020 at 5:32 AM Tyler Adams <coppero1237@gmail.com> wrote:
The idea of a simple rule is g= reat, but the suggested rule fails on sort -u which afaik came after sort |= uniq for performance reasons.

Another idea on the same vein is that a flag should be added only wh= en the job can be done inside the program and not with stdin/stdout (or no = flag can be added if one can reproduce the same behavior using pipelines).<= br>

So, you need sort -u= because only within sort can you get the performance needed to get the job= done.

But you don't= need -h in ls -lh. All the information to render a human readable number i= s present on stdout of ls -l. You could easily have a filter which renders = numbers with options like adding commas, dots, scientific notation, precisi= on, money, units, etc.

T= yler

On Sun, Mar 8, 2020, 07:33 Jon Steinhart= <jon@fourwinds.com> wrote:
After following this discussion, I guess that I have a= simplistic way to
determine whether something should be a dash option or a filter.=C2=A0 In general, I'd make a filter if whatever it was doing was applicable to more than one command, a dash option otherwise.

Jon
--000000000000a7cbac05a072601c--