The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
From: "Alan D. Salewski" <salewski@att.net>
To: tuhs@minnie.tuhs.org
Subject: Re: [TUHS] Command line options and complexity
Date: Thu, 5 Mar 2020 01:12:24 -0500	[thread overview]
Message-ID: <20200305061224.GL24454@att.net> (raw)
In-Reply-To: <5019a751-d69a-4839-9a56-b977b275070d@www.fastmail.com>

On 2020-03-04 16:50:34, Random832 spake thus:
[...]
> Sure, but "stdin is a sequence of any type, and the argument is an expression that operates on that type or the name of a property that that type has" is universal enough.
> 
> The part that has to operate on a specific structure isn't the command, it's the arguments.
> 
> For example, a powershell pipeline to produce a list of files sorted by modified date is:
> 
> gci . | sort lastwritetime | select name
> 
> all three *commands* are universal - not all objects have a "lastwritetime" and "name" property, but sort and select can operate on any property that the sequence of objects passed into it has.

There are some examples of that type of thing in widely used Unix tools;
my use of 'sort -k1,1n' further down is demonstrating such a use case (the
'sort' command is being told that it is operating on numbers). But beyond
some lowest common denominator types ("number", "string", ...) how many
commands can really usefully operate on a large number of types? For
example, a program that can operate on IP addresses is probably doing
something different than a program that wants to operate on email
addresses.

I could see where named properties of some object can be used more
generally than types, but again there are widely used tools that do do
that (e.g., jq(1)). IMHO, though, they are more cumbersome to use than
most of the commands I need to use minute to minute.


> (gci is an alias for get-childitem... it also has aliases ls and dir, but I'm emphasizing that it's not exclusive to directories)
> 
> *assuming that ls -t didn't exist*, to do this with unix tools that operate on text you would need:
> 
> ls -l | [somehow convert the date to a sortable format, probably in awk] | sort | [somehow pick the filename alone out of the output - possibly with cut or sed or awk again]

(Just nit-picking at this particular example)

You could do it without ls[0]:

    $ stat -c '%Y %n' * | sort -k1,1n | xargs -L1 sh -c 'echo "$@"'

That doesn't seem so bad to me, but if it was something I needed regularly
I'd of course put it in an alias[1] or (more likely) a short script file.


> and it's very difficult to get tools like awk, sort, and cut to work on formats that contain more than one field that may contain embedded spaces (you can just about get away with it for ls output because the date is always three "words").
[...]

Yes, that's often true. And when I enounter it I typically start out by
seeing if I can inject and remove tokens in the data at key places in the
pipeline. Beyond anything trivial, though, I then quickly start reaching
for tools to put the data into some form that more easily allow for it
(CSV, JSON, ...). But that invariably adds other complications (such as
the need to find or build tools to marshal/unmarshal the data, and to
deal with data-domain-specific notions of null-vs-empty-string).

For the (more common (for me)) case where there is only one field that
contains embedded spaces, I just try to get 'em at the end of the line
and let the shell deal with it:

    $ some-command | while read -r first second rest; do ... ; done


> Maybe it would be enough to have the universal interface be "tables" (i.e. text streams in some format that supports adequate escaping of embedded row and column delimiters)... or maybe even just table rows, and let the user deal with memorizing column numbers (or let each originating command support a fully general way to specify what columns are requested, as ps alone does on modern systems) Of course, this isn't *really* different from allowing any data structure - after all, the value for any field could itself be a fully escaped table in text format.
[...]

Well, in some sense with byte streams you have a table of newline-delimited
bytes (rows), and byte subfields separated by whitespace (columns). And
anything on top of that could (in some context, and with some syntax) be
considered just further escaped tables in text format. I think that's
essentially the same thing that you said, only with the outermost table
syntax removed. But like you said, this isn't really different from
allowing any data structure. Importantly, though, it doesn't impose any
particular data structure, either.

I've worked at a couple of different places that had in-house tools for
working with explicit table semantics in command line suites, and where
they fit the data domain, that was hugely useful. Generally speaking, they
were special purpose enough to warrant their own tools, but still general
purpose enough to be composable (were designed for use in shell pipelines)
and applicable in domains beyond the intentions of their original authors.

Still, the burden of "thinking in tables" would make them too heavyweight
for a lot of common use cases. Sometimes my data structure is "paragraphs
of text":

    $ lorem -p 3 | perl -00 -wnle '2 == $. && print' | wc -w

Other times I want a tree (JSON, s-expressions, ...), or even a stream of
trees[2]. I consider it a feature that these more complex data structures
are not assumed or imposed in contexts where they are not needed.

Take care,
-Al 


[0] You could get 'ls' to do it, too, (without '-t') but here the use of
    TIME_STYLE is a presumably non-portable (but handy!) GNU-ism:

        $ TIME_STYLE='+%s' ls -l | tail -n +2 | sort -k6,6n | xargs -L1 sh -c 'shift 5; echo "$@"'

    It's different from the '-t' option, though, in that it forces a
    predicatable date field format in the output of 'ls -l', so side-steps
    the need for downstream date parsing altogether and simply jumps into
    sorting (after chopping off the 'total N' header (groans all around)).

[1] E.g.,

        $ # read 'bmt' as: "by mtime"

        $ alias bmt='stat -c "%Y %n" * | sort -k1,1n | xargs -L1 sh -c '"'echo "'"$@"'"'"

        $ bmt

[2] Probably flattened.


  parent reply	other threads:[~2020-03-05  6:11 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-03 18:15 Jon Steinhart
2020-03-03 18:44 ` Adam Thornton
2020-03-04  4:11   ` Tyler Adams
2020-03-04  6:03     ` Dave Horsfall
2020-03-04  6:48       ` arnold
2020-03-04 21:17         ` Dave Horsfall
2020-03-05  0:49         ` Lyndon Nerenberg
2020-03-05 20:54           ` Dave Horsfall
2020-03-05 22:01             ` William Cheswick
2020-03-04 21:50   ` Random832
2020-03-04 23:19     ` Steffen Nurpmeso
2020-03-05  6:12     ` Alan D. Salewski [this message]
2020-03-04 22:03   ` Random832
2020-03-04 23:25     ` Terry Jones
2020-03-10 23:03 ` Dan Stromberg
2020-03-11  3:18   ` Dave Horsfall
2020-03-11  4:02     ` Steve Nickolas
2020-03-11 22:56     ` Greg 'groggy' Lehey
2020-03-11 23:14       ` Dan Cross
2020-03-12  0:42         ` Greg 'groggy' Lehey
2020-03-12  0:53       ` Steve Nickolas
2020-03-12  3:09         ` Greg 'groggy' Lehey
2020-03-12  3:34           ` Steve Nickolas
2020-03-13  1:02             ` Greg 'groggy' Lehey
2020-03-12  5:38         ` Dave Horsfall
2020-03-12  6:48         ` Peter Jeremy
2020-03-12  7:37           ` Steve Nickolas
2020-03-12  7:42             ` Warner Losh
2020-03-12 23:57           ` Greg 'groggy' Lehey
2020-03-12  5:22       ` Dave Horsfall
2020-03-12  5:35         ` Steve Nickolas
2020-03-13  0:36         ` Greg 'groggy' Lehey
2020-03-13 11:26           ` Dave Horsfall
2020-03-14  2:13           ` Greg A. Woods
2020-03-14  4:31             ` Greg 'groggy' Lehey
2020-03-04 14:06 Nelson H. F. Beebe
2020-03-04 16:17 ` John P. Linderman
2020-03-04 17:25   ` Bakul Shah
2020-03-05  0:55   ` Rob Pike
2020-03-05  2:05   ` Kurt H Maier
2020-03-05  4:17     ` Ken Thompson via TUHS
2020-03-05 14:53       ` Dan Cross
2020-03-05 21:50       ` Dave Horsfall
2020-03-05 21:56         ` Warner Losh
2020-03-08  5:26           ` Greg 'groggy' Lehey
2020-03-08  5:32             ` Jon Steinhart
2020-03-08  9:30               ` Tyler Adams
     [not found]                 ` <CAC0cEp8eFRkkLTw88WVaKZoKy+qsrhuC8LkzmmsbqtdZgMf8eQ@mail.gmail.com>
     [not found]                   ` <CAEuQd1D7+dfap98AwPo2W41+06prrcVaAWk3Ve-ve0uQ0xBu3Q@mail.gmail.com>
2020-03-09 21:06                     ` John P. Linderman
2020-03-09 21:22                       ` Kurt H Maier
2020-03-11 17:41                         ` John P. Linderman
2020-03-11 21:29                           ` Warner Losh
2020-03-12  0:13                             ` John P. Linderman
2020-03-12  0:34                               ` Chet Ramey
2020-03-12 12:57                             ` John P. Linderman
2020-03-12 19:24                               ` Steffen Nurpmeso
2020-03-08  9:51             ` Michael Kjörling
2020-03-05  4:57 Doug McIlroy
2020-03-05 22:17 ` Diomidis Spinellis
2020-03-10 16:15 Doug McIlroy
2020-03-10 17:38 ` Dan Cross
2020-03-10 17:44   ` Bakul Shah
2020-03-10 18:09     ` Dan Cross
2020-03-10 18:42 Doug McIlroy
2020-03-10 19:38 ` Dan Cross
2020-03-13 10:45 Dave Horsfall
2020-03-14  4:35 ` Greg 'groggy' Lehey
2020-03-14 19:52   ` John P. Linderman
2020-03-14 20:25     ` Steffen Nurpmeso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200305061224.GL24454@att.net \
    --to=salewski@att.net \
    --cc=tuhs@minnie.tuhs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).