The Unix Heritage Society mailing list
 help / color / Atom feed
* [TUHS] Command line options and complexity
@ 2020-03-03 18:15 Jon Steinhart
  2020-03-03 18:44 ` Adam Thornton
  2020-03-10 23:03 ` Dan Stromberg
  0 siblings, 2 replies; 68+ messages in thread
From: Jon Steinhart @ 2020-03-03 18:15 UTC (permalink / raw)
  To: tuhs

OK, this should be good for some conversation.  A friend sent me this
link today: http://danluu.com/cli-complexity/

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-03 18:15 [TUHS] Command line options and complexity Jon Steinhart
@ 2020-03-03 18:44 ` Adam Thornton
  2020-03-04  4:11   ` Tyler Adams
                     ` (2 more replies)
  2020-03-10 23:03 ` Dan Stromberg
  1 sibling, 3 replies; 68+ messages in thread
From: Adam Thornton @ 2020-03-03 18:44 UTC (permalink / raw)
  To: Jon Steinhart, The Eunuchs Hysterical Society

[-- Attachment #1: Type: text/plain, Size: 1181 bytes --]

I've heard people say that there isn't really any alternative to this kind
of complexity for command line tools, but people who say that have never
really tried the alternative, something like PowerShell. I have plenty of
complaints about PowerShell, but passing structured data around and easily
being able to operate on structured data without having to hold metadata
information in my head so that I can pass the appropriate metadata to the
right command line tools at that right places the pipeline isn't among my
complaints3 <https://danluu.com/cli-complexity/#fn:W>.

Somewhat disingenuous.  I mean, yes, that's true, but on the other hand it
means that you have to keep the "what Powershell commands operate on what
structure" in your head instead, since you can no longer assume the
pipelines to be a universal interface.

Same basic problem as CMS Pipelines.  Fantastically powerful, and nowhere
near as easy to compose good functionality as "it's just a byte stream."

Adam

On Tue, Mar 3, 2020 at 11:16 AM Jon Steinhart <jon@fourwinds.com> wrote:

> OK, this should be good for some conversation.  A friend sent me this
> link today: http://danluu.com/cli-complexity/
>

[-- Attachment #2: Type: text/html, Size: 1796 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-03 18:44 ` Adam Thornton
@ 2020-03-04  4:11   ` Tyler Adams
  2020-03-04  6:03     ` Dave Horsfall
  2020-03-04 21:50   ` Random832
  2020-03-04 22:03   ` Random832
  2 siblings, 1 reply; 68+ messages in thread
From: Tyler Adams @ 2020-03-04  4:11 UTC (permalink / raw)
  To: Adam Thornton; +Cc: The Eunuchs Hysterical Society

[-- Attachment #1: Type: text/plain, Size: 1522 bytes --]

> These go all the way back to v7 unix, where ls has an option to reverse
the sort order (which could have been done by passing the output to tac).

Good point. Why was this done in v7 unix and why wasn't it thrown out?

Tyler

On Tue, Mar 3, 2020, 20:45 Adam Thornton <athornton@gmail.com> wrote:

> I've heard people say that there isn't really any alternative to this kind
> of complexity for command line tools, but people who say that have never
> really tried the alternative, something like PowerShell. I have plenty of
> complaints about PowerShell, but passing structured data around and easily
> being able to operate on structured data without having to hold metadata
> information in my head so that I can pass the appropriate metadata to the
> right command line tools at that right places the pipeline isn't among my
> complaints3 <https://danluu.com/cli-complexity/#fn:W>.
>
> Somewhat disingenuous.  I mean, yes, that's true, but on the other hand it
> means that you have to keep the "what Powershell commands operate on what
> structure" in your head instead, since you can no longer assume the
> pipelines to be a universal interface.
>
> Same basic problem as CMS Pipelines.  Fantastically powerful, and nowhere
> near as easy to compose good functionality as "it's just a byte stream."
>
> Adam
>
> On Tue, Mar 3, 2020 at 11:16 AM Jon Steinhart <jon@fourwinds.com> wrote:
>
>> OK, this should be good for some conversation.  A friend sent me this
>> link today: http://danluu.com/cli-complexity/
>>
>

[-- Attachment #2: Type: text/html, Size: 2628 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-04  4:11   ` Tyler Adams
@ 2020-03-04  6:03     ` Dave Horsfall
  2020-03-04  6:48       ` arnold
  0 siblings, 1 reply; 68+ messages in thread
From: Dave Horsfall @ 2020-03-04  6:03 UTC (permalink / raw)
  To: The Eunuchs Hysterical Society

[-- Attachment #1: Type: text/plain, Size: 674 bytes --]

On Wed, 4 Mar 2020, Tyler Adams wrote:

> > These go all the way back to v7 unix, where ls has an option to 
> > reverse the sort order (which could have been done by passing the 
> > output to tac).
> 
> Good point. Why was this done in v7 unix and why wasn't it thrown out?

I seem to recall that "sort -r" was in V6, or perhaps that was one of the 
programs I'd back-ported from V7 (being stuck with 11/40-class boxes).

And speaking of "tac" (which I never saw), I couldn't think of a single 
use for "rev" (although no doubt I'll now get told).  Mind you, you get 
some amusing output with the "man" command because of the way that the 
underlining works...

-- Dave

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-04  6:03     ` Dave Horsfall
@ 2020-03-04  6:48       ` arnold
  2020-03-04 21:17         ` Dave Horsfall
  2020-03-05  0:49         ` Lyndon Nerenberg
  0 siblings, 2 replies; 68+ messages in thread
From: arnold @ 2020-03-04  6:48 UTC (permalink / raw)
  To: tuhs, dave

Dave Horsfall <dave@horsfall.org> wrote:

> On Wed, 4 Mar 2020, Tyler Adams wrote:
>
> > > These go all the way back to v7 unix, where ls has an option to 
> > > reverse the sort order (which could have been done by passing the 
> > > output to tac).
> > 
> > Good point. Why was this done in v7 unix and why wasn't it thrown out?

There was no tac in V7 Unix. It was first posted to USENET, I don't
know by who, and picked up by Linux and *BSD.

> And speaking of "tac" (which I never saw), I couldn't think of a single 
> use for "rev" (although no doubt I'll now get told).

It's useful for reading Hebrew sent in plain text email :-). Hebrew is
read right to left but stored in physical order (left to right) in files.

Arnold

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-04  6:48       ` arnold
@ 2020-03-04 21:17         ` Dave Horsfall
  2020-03-05  0:49         ` Lyndon Nerenberg
  1 sibling, 0 replies; 68+ messages in thread
From: Dave Horsfall @ 2020-03-04 21:17 UTC (permalink / raw)
  To: The Eunuchs Hysterical Society

On Tue, 3 Mar 2020, arnold@skeeve.com wrote:

>> And speaking of "tac" (which I never saw), I couldn't think of a single 
>> use for "rev" (although no doubt I'll now get told).
>
> It's useful for reading Hebrew sent in plain text email :-). Hebrew is 
> read right to left but stored in physical order (left to right) in 
> files.

Ah, of course :-)  And Arabic too, as I recall.

-- Dave

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-03 18:44 ` Adam Thornton
  2020-03-04  4:11   ` Tyler Adams
@ 2020-03-04 21:50   ` Random832
  2020-03-04 23:19     ` Steffen Nurpmeso
  2020-03-05  6:12     ` Alan D. Salewski
  2020-03-04 22:03   ` Random832
  2 siblings, 2 replies; 68+ messages in thread
From: Random832 @ 2020-03-04 21:50 UTC (permalink / raw)
  To: Grant Taylor via TUHS

On Tue, Mar 3, 2020, at 13:44, Adam Thornton wrote:
> I've heard people say that there isn't really any alternative to this 
> kind of complexity for command line tools, but people who say that have 
> never really tried the alternative, something like PowerShell. I have 
> plenty of complaints about PowerShell, but passing structured data 
> around and easily being able to operate on structured data without 
> having to hold metadata information in my head so that I can pass the 
> appropriate metadata to the right command line tools at that right 
> places the pipeline isn't among my complaints3 
> <https://danluu.com/cli-complexity/#fn:W>.
> 
> Somewhat disingenuous. I mean, yes, that's true, but on the other hand 
> it means that you have to keep the "what Powershell commands operate on 
> what structure" in your head instead, since you can no longer assume 
> the pipelines to be a universal interface.

Sure, but "stdin is a sequence of any type, and the argument is an expression that operates on that type or the name of a property that that type has" is universal enough.

The part that has to operate on a specific structure isn't the command, it's the arguments.

For example, a powershell pipeline to produce a list of files sorted by modified date is:

gci . | sort lastwritetime | select name

all three *commands* are universal - not all objects have a "lastwritetime" and "name" property, but sort and select can operate on any property that the sequence of objects passed into it has.

(gci is an alias for get-childitem... it also has aliases ls and dir, but I'm emphasizing that it's not exclusive to directories)

*assuming that ls -t didn't exist*, to do this with unix tools that operate on text you would need:

ls -l | [somehow convert the date to a sortable format, probably in awk] | sort | [somehow pick the filename alone out of the output - possibly with cut or sed or awk again]

and it's very difficult to get tools like awk, sort, and cut to work on formats that contain more than one field that may contain embedded spaces (you can just about get away with it for ls output because the date is always three "words").

A significant portion of ls's options are related to sorting, because you can sort based on fields that are either not present in the output, or are not in a format that can be sorted textually.

Maybe it would be enough to have the universal interface be "tables" (i.e. text streams in some format that supports adequate escaping of embedded row and column delimiters)... or maybe even just table rows, and let the user deal with memorizing column numbers (or let each originating command support a fully general way to specify what columns are requested, as ps alone does on modern systems) Of course, this isn't *really* different from allowing any data structure - after all, the value for any field could itself be a fully escaped table in text format.

The benefit of having actual data structures with types is that when you *don't* end the pipeline with select, each object knows how to print itself [files print mode, mtime, size, and name in a human-readable format, more or less equivalent to ls -l] rather than just dumping out every single field that you might want sort or select to operate on.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-03 18:44 ` Adam Thornton
  2020-03-04  4:11   ` Tyler Adams
  2020-03-04 21:50   ` Random832
@ 2020-03-04 22:03   ` Random832
  2020-03-04 23:25     ` Terry Jones
  2 siblings, 1 reply; 68+ messages in thread
From: Random832 @ 2020-03-04 22:03 UTC (permalink / raw)
  To: Grant Taylor via TUHS

I put a lot of thoughts in my previous message, but hit send before thinking of a good way to summarize my main point...

On Tue, Mar 3, 2020, at 13:44, Adam Thornton wrote:
> Somewhat disingenuous. I mean, yes, that's true, but on the other hand 
> it means that you have to keep the "what Powershell commands operate on 
> what structure" in your head instead, since you can no longer assume 
> the pipelines to be a universal interface.

The thing is, each Unix command imposes an implied structure on its
input, so it's not *really* a universal interface. Some operate on
lines as free text, some operate on space-delimited fields [with no
good way to escape them, though some do support an IFS environment
variable to at least change the delimiter], some work best with
fixed-width fields. Few provide a way to embed delimiters [be they
newline/null for record separator, tab/comma/space field separators, or
a user-defined separator for commands that support that] within a
value. Sort requires all values to be comparable as either strings or
numbers. Most commands you might want to use as a source in a pipeline
also expect to be used directly for human-readable output, so they
produce output that can be difficult to use for further processing
(e.g. dates in ls, which not only can't be sorted directly, but also
are limited to minutes for dates in the past year, and days for dates
before that, and are in the local time zone)

Hardly *any* commands you'd use in a pipeline really operate on unstructured bytes. Compression, I suppose. But other than that, you have just as much need to know what commands operate on what structure in Unix as in Powershell - the only difference is that the serialization is explicitly part of the interface... and due to the typical inability to escape delimiters, leaky.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-04 21:50   ` Random832
@ 2020-03-04 23:19     ` Steffen Nurpmeso
  2020-03-05  6:12     ` Alan D. Salewski
  1 sibling, 0 replies; 68+ messages in thread
From: Steffen Nurpmeso @ 2020-03-04 23:19 UTC (permalink / raw)
  To: Random832; +Cc: Grant Taylor via TUHS

Random832 wrote in
<5019a751-d69a-4839-9a56-b977b275070d@www.fastmail.com>:
 |On Tue, Mar 3, 2020, at 13:44, Adam Thornton wrote:
 |> I've heard people say that there isn't really any alternative to this 
 |> kind of complexity for command line tools, but people who say that have 
 |> never really tried the alternative, something like PowerShell. I have 
 |> plenty of complaints about PowerShell, but passing structured data 
 |> around and easily being able to operate on structured data without 
 |> having to hold metadata information in my head so that I can pass the 
 |> appropriate metadata to the right command line tools at that right 
 |> places the pipeline isn't among my complaints3 
 |> <https://danluu.com/cli-complexity/#fn:W>.
 |> 
 |> Somewhat disingenuous. I mean, yes, that's true, but on the other hand 
 |> it means that you have to keep the "what Powershell commands operate on 
 |> what structure" in your head instead, since you can no longer assume 
 |> the pipelines to be a universal interface.
 |
 |Sure, but "stdin is a sequence of any type, and the argument is an \
 |expression that operates on that type or the name of a property that \
 |that type has" is universal enough.
 |
 |The part that has to operate on a specific structure isn't the command, \
 |it's the arguments.
 |
 |For example, a powershell pipeline to produce a list of files sorted \
 |by modified date is:
 |
 |gci . | sort lastwritetime | select name
 ...
 |*assuming that ls -t didn't exist*, to do this with unix tools that \
 |operate on text you would need:
 |
 |ls -l | [somehow convert the date to a sortable format, probably in \
 |awk] | sort | [somehow pick the filename alone out of the output - \
 |possibly with cut or sed or awk again]
 |
 |and it's very difficult to get tools like awk, sort, and cut to work \
 |on formats that contain more than one field that may contain embedded \
 |spaces (you can just about get away with it for ls output because the \
 |date is always three "words").

Yes, that is really bad, except only that a lot of output is
pretty portables since a very long time.  FreeBSD started using
libxo in many base utilities, which can output in structured
formats.  This includes CSV and even CBOR :), i do not know how
the latter integrates in Unix text utilities however.  (I think
the format string syntax, that a bit originates in QT ??, could
have been warped to something better, like the Python ones, plus
further extensions, however.  But it is an improvement to what the
standard formats end up with when reordering etc. comes into
place.)

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-04 22:03   ` Random832
@ 2020-03-04 23:25     ` Terry Jones
  0 siblings, 0 replies; 68+ messages in thread
From: Terry Jones @ 2020-03-04 23:25 UTC (permalink / raw)
  To: Random832; +Cc: Grant Taylor via TUHS

[-- Attachment #1: Type: text/plain, Size: 3694 bytes --]

On Wed, Mar 4, 2020 at 11:04 PM Random832 <random832@fastmail.com> wrote:

> Hardly *any* commands you'd use in a pipeline really operate on
> unstructured bytes. Compression, I suppose. But other than that, you have
> just as much need to know what commands operate on what structure in Unix
> as in Powershell - the only difference is that the serialization is
> explicitly part of the interface... and due to the typical inability to
> escape delimiters, leaky.
>

Another difference is that probably most people on this list are extremely
familiar with the various quirks and I/O nuances of the tools many have
been using every day for decades. Just as the native speakers of a natural
language can't so easily see/appreciate its complexity (e.g., pronunciation
in English!), I suspect many of us have internalized these idiosyncrasies.
I teach occasional shell/Python courses to absolute beginners (no computing
experience at all) and came to appreciate how weird the shell is (in the
sense of having baked-in historical accidents that cannot / will not /
should not be "corrected"). Some of my appreciation of that was due to
discussions on this list (e.g., regarding comment syntax, and the :
command) - so thanks!

I know what follows won't be to everyone's taste, but I like Python and I
love shell pipelines, so I tried to write a shell that gave you both and
which allowed fairly free mixing of invoking UNIX tools and running Python.
You can send anything down its pipelines - lines of text, atoms, numbers,
Python objects, whatever (in the Python _ variable).  Of course the
receiving end of the pipeline needs to know (or figure out) what it's
getting. One advantage is that you have a carefully designed programming
language (no offence intended!) underlying the shell, so you can e.g.,
write shell functions in Python (and put them in a start-up file if you
want) and just pipe regular UNIX output into them and pipe their output
into whatever's next (more Python, another UNIX command, etc). Probably
almost no one would actually want to regularly do the following on the
command line, but you could:

>>> from os import stat
>>> def fd(): return [name for (name, time) in sorted((f, stat(f).st_mtime)
for f in _)]
>>> ls | fd() | tail -n 3

Here I've stuck a simple (DSU - see [1]) Python function in between two
UNIX commands and use it to get the most recently modified files.

You probably wouldn't want to do this either, but you could:

>>> seq 0 9 | list(map(lambda x: 2 ** int(x), _)) | tee /tmp/powers-of-two | sum(map(int, _))1023>>> cat /tmp/powers-of-two1248163264128256512

Of course it also lets you do things you *would* want to do :-)

More at https://github.com/terrycojones/daudin   Python has fairly nice
tools for reading and evaluating Python code, which meant that getting a
first version of this implemented took only one evening of playing around.
It's pretty simple (and still has plenty of rough edges).  Apologies if
this seems like self-promotion, but I very much enjoy thinking about things
in this thread and about how we work with information. I'm also constantly
blown away by how elegant UNIX is and how the core ideas have endured.
Pipelines are really wonderful, as "natural" alternative to function
composition as a mathematician or programmer would do it (see point #1 at
https://github.com/terrycojones/daudin#background--thanks), and I wanted to
build a shell that preserved that, while giving you Python. The overview of
their history on pages 67-70 of bwk's recent book [2] is very interesting.

Terry

[1] https://en.wikipedia.org/wiki/Schwartzian_transform
[2] https://www.amazon.com/UNIX-History-Memoir-Brian-Kernighan/dp/1695978552

[-- Attachment #2: Type: text/html, Size: 6073 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-04  6:48       ` arnold
  2020-03-04 21:17         ` Dave Horsfall
@ 2020-03-05  0:49         ` Lyndon Nerenberg
  2020-03-05 20:54           ` Dave Horsfall
  1 sibling, 1 reply; 68+ messages in thread
From: Lyndon Nerenberg @ 2020-03-05  0:49 UTC (permalink / raw)
  To: tuhs

> > And speaking of "tac" (which I never saw), I couldn't think of a single 
> > use for "rev" (although no doubt I'll now get told).

It's handy for building rhyming dictionaries:

  rev < /usr/share/dict/web2 | sort | rev > rhymes

--lyndon

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-04 21:50   ` Random832
  2020-03-04 23:19     ` Steffen Nurpmeso
@ 2020-03-05  6:12     ` Alan D. Salewski
  1 sibling, 0 replies; 68+ messages in thread
From: Alan D. Salewski @ 2020-03-05  6:12 UTC (permalink / raw)
  To: tuhs

On 2020-03-04 16:50:34, Random832 spake thus:
[...]
> Sure, but "stdin is a sequence of any type, and the argument is an expression that operates on that type or the name of a property that that type has" is universal enough.
> 
> The part that has to operate on a specific structure isn't the command, it's the arguments.
> 
> For example, a powershell pipeline to produce a list of files sorted by modified date is:
> 
> gci . | sort lastwritetime | select name
> 
> all three *commands* are universal - not all objects have a "lastwritetime" and "name" property, but sort and select can operate on any property that the sequence of objects passed into it has.

There are some examples of that type of thing in widely used Unix tools;
my use of 'sort -k1,1n' further down is demonstrating such a use case (the
'sort' command is being told that it is operating on numbers). But beyond
some lowest common denominator types ("number", "string", ...) how many
commands can really usefully operate on a large number of types? For
example, a program that can operate on IP addresses is probably doing
something different than a program that wants to operate on email
addresses.

I could see where named properties of some object can be used more
generally than types, but again there are widely used tools that do do
that (e.g., jq(1)). IMHO, though, they are more cumbersome to use than
most of the commands I need to use minute to minute.


> (gci is an alias for get-childitem... it also has aliases ls and dir, but I'm emphasizing that it's not exclusive to directories)
> 
> *assuming that ls -t didn't exist*, to do this with unix tools that operate on text you would need:
> 
> ls -l | [somehow convert the date to a sortable format, probably in awk] | sort | [somehow pick the filename alone out of the output - possibly with cut or sed or awk again]

(Just nit-picking at this particular example)

You could do it without ls[0]:

    $ stat -c '%Y %n' * | sort -k1,1n | xargs -L1 sh -c 'echo "$@"'

That doesn't seem so bad to me, but if it was something I needed regularly
I'd of course put it in an alias[1] or (more likely) a short script file.


> and it's very difficult to get tools like awk, sort, and cut to work on formats that contain more than one field that may contain embedded spaces (you can just about get away with it for ls output because the date is always three "words").
[...]

Yes, that's often true. And when I enounter it I typically start out by
seeing if I can inject and remove tokens in the data at key places in the
pipeline. Beyond anything trivial, though, I then quickly start reaching
for tools to put the data into some form that more easily allow for it
(CSV, JSON, ...). But that invariably adds other complications (such as
the need to find or build tools to marshal/unmarshal the data, and to
deal with data-domain-specific notions of null-vs-empty-string).

For the (more common (for me)) case where there is only one field that
contains embedded spaces, I just try to get 'em at the end of the line
and let the shell deal with it:

    $ some-command | while read -r first second rest; do ... ; done


> Maybe it would be enough to have the universal interface be "tables" (i.e. text streams in some format that supports adequate escaping of embedded row and column delimiters)... or maybe even just table rows, and let the user deal with memorizing column numbers (or let each originating command support a fully general way to specify what columns are requested, as ps alone does on modern systems) Of course, this isn't *really* different from allowing any data structure - after all, the value for any field could itself be a fully escaped table in text format.
[...]

Well, in some sense with byte streams you have a table of newline-delimited
bytes (rows), and byte subfields separated by whitespace (columns). And
anything on top of that could (in some context, and with some syntax) be
considered just further escaped tables in text format. I think that's
essentially the same thing that you said, only with the outermost table
syntax removed. But like you said, this isn't really different from
allowing any data structure. Importantly, though, it doesn't impose any
particular data structure, either.

I've worked at a couple of different places that had in-house tools for
working with explicit table semantics in command line suites, and where
they fit the data domain, that was hugely useful. Generally speaking, they
were special purpose enough to warrant their own tools, but still general
purpose enough to be composable (were designed for use in shell pipelines)
and applicable in domains beyond the intentions of their original authors.

Still, the burden of "thinking in tables" would make them too heavyweight
for a lot of common use cases. Sometimes my data structure is "paragraphs
of text":

    $ lorem -p 3 | perl -00 -wnle '2 == $. && print' | wc -w

Other times I want a tree (JSON, s-expressions, ...), or even a stream of
trees[2]. I consider it a feature that these more complex data structures
are not assumed or imposed in contexts where they are not needed.

Take care,
-Al 


[0] You could get 'ls' to do it, too, (without '-t') but here the use of
    TIME_STYLE is a presumably non-portable (but handy!) GNU-ism:

        $ TIME_STYLE='+%s' ls -l | tail -n +2 | sort -k6,6n | xargs -L1 sh -c 'shift 5; echo "$@"'

    It's different from the '-t' option, though, in that it forces a
    predicatable date field format in the output of 'ls -l', so side-steps
    the need for downstream date parsing altogether and simply jumps into
    sorting (after chopping off the 'total N' header (groans all around)).

[1] E.g.,

        $ # read 'bmt' as: "by mtime"

        $ alias bmt='stat -c "%Y %n" * | sort -k1,1n | xargs -L1 sh -c '"'echo "'"$@"'"'"

        $ bmt

[2] Probably flattened.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-05  0:49         ` Lyndon Nerenberg
@ 2020-03-05 20:54           ` Dave Horsfall
  2020-03-05 22:01             ` William Cheswick
  0 siblings, 1 reply; 68+ messages in thread
From: Dave Horsfall @ 2020-03-05 20:54 UTC (permalink / raw)
  To: The Eunuchs Hysterical Society

On Wed, 4 Mar 2020, Lyndon Nerenberg wrote:

(Uses for "rev")

> It's handy for building rhyming dictionaries:
>
>  rev < /usr/share/dict/web2 | sort | rev > rhymes

Neat!

-- Dave

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-05 20:54           ` Dave Horsfall
@ 2020-03-05 22:01             ` William Cheswick
  0 siblings, 0 replies; 68+ messages in thread
From: William Cheswick @ 2020-03-05 22:01 UTC (permalink / raw)
  Cc: The Eunuchs Hysterical Society

My use for rev(1):

uniq(1)’s -f <n> ignores the first <n> fields of a line.  If you want it to ignore the last <n> fields:

rev | uniq -f <n> | rev

ches


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-03 18:15 [TUHS] Command line options and complexity Jon Steinhart
  2020-03-03 18:44 ` Adam Thornton
@ 2020-03-10 23:03 ` Dan Stromberg
  2020-03-11  3:18   ` Dave Horsfall
  1 sibling, 1 reply; 68+ messages in thread
From: Dan Stromberg @ 2020-03-10 23:03 UTC (permalink / raw)
  To: Jon Steinhart; +Cc: tuhs

[-- Attachment #1: Type: text/plain, Size: 475 bytes --]

When I took a comparative languages class in school, the teacher said that
the complexity of a programming language varies with the square of its
number of features.

I wonder if it's similar for command line options in shell-callables?

On the other hand, adding command line options was (at least at one time)
seen as a way of distinguishing GNU tools from Unix tools - that is, they
were seen as a way of avoiding the copyright lawsuits that were snipping at
BSD's heels.

[-- Attachment #2: Type: text/html, Size: 607 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-10 23:03 ` Dan Stromberg
@ 2020-03-11  3:18   ` Dave Horsfall
  2020-03-11  4:02     ` Steve Nickolas
  2020-03-11 22:56     ` Greg 'groggy' Lehey
  0 siblings, 2 replies; 68+ messages in thread
From: Dave Horsfall @ 2020-03-11  3:18 UTC (permalink / raw)
  To: The Eunuchs Hysterical Society

[-- Attachment #1: Type: text/plain, Size: 1952 bytes --]

On Tue, 10 Mar 2020, Dan Stromberg wrote:

> When I took a comparative languages class in school, the teacher said 
> that the complexity of a programming language varies with the square of 
> its number of features.

That sort of makes sense from a mathematical point of view, if you regard 
it as a matrix of side effects.  I hate to think about how it affects Perl 
(my favourite language) though :-)

> I wonder if it's similar for command line options in shell-callables?

I'm starting to think that if a utility requires many options then perhaps 
they ought to be split into filters (or at least environment variables); I 
despair at how *ix is drifting from "one tool, one job" to "one size fits 
all"...

The "ls" command for example really needs an option-ectomy; I find that I 
don't really care about the exact number of bytes there are in a file as 
the nearest KiB or MiB (or even GiB) is usually good enough, so I'd be 
happy if "-h" was the default with some way to turn it off (yes, I know 
that it's occasionally useful to add them all up in a column, but that 
won't tell you how many media blocks are required).

Quickly now, without looking: which option shows unprintable characters in 
a filename?  Unless you use it regularly (in which case you have real 
problems) you would have to look it up; I find that "ls ... | od -bc" to 
be quicker, especially on filenames with trailing blanks etc (which "-B" 
won't show).

> On the other hand, adding command line options was (at least at one 
> time) seen seen as a way of distinguishing GNU tools from Unix tools - 
> that is, they were seen as a way of avoiding the copyright lawsuits that 
> were snipping at BSD's heels.

I've never liked GNU's "--bloody-long-option" convention as you still have 
to look up which one does what, but I've never thought about that view; a 
lot of long options still accept a single character (subject to feeping 
creaturism, of course).

-- Dave

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-11  3:18   ` Dave Horsfall
@ 2020-03-11  4:02     ` Steve Nickolas
  2020-03-11 22:56     ` Greg 'groggy' Lehey
  1 sibling, 0 replies; 68+ messages in thread
From: Steve Nickolas @ 2020-03-11  4:02 UTC (permalink / raw)
  To: Dave Horsfall; +Cc: The Eunuchs Hysterical Society

On Wed, 11 Mar 2020, Dave Horsfall wrote:

> I'm starting to think that if a utility requires many options then perhaps 
> they ought to be split into filters (or at least environment variables); I 
> despair at how *ix is drifting from "one tool, one job" to "one size fits 
> all"...
>
> The "ls" command for example really needs an option-ectomy; I find that I 
> don't really care about the exact number of bytes there are in a file as the 
> nearest KiB or MiB (or even GiB) is usually good enough, so I'd be happy if 
> "-h" was the default with some way to turn it off (yes, I know that it's 
> occasionally useful to add them all up in a column, but that won't tell you 
> how many media blocks are required).
>
> Quickly now, without looking: which option shows unprintable characters in a 
> filename?  Unless you use it regularly (in which case you have real problems) 
> you would have to look it up; I find that "ls ... | od -bc" to be quicker, 
> especially on filenames with trailing blanks etc (which "-B" won't show).

It would probably be interesting to define a simplified standard, because 
yeesh, trying to implement even a command as basic as ls is just torture 
(mainly because it basically requires putting all of "column" and most of 
"sort" into it)!

> I've never liked GNU's "--bloody-long-option" convention as you still have to 
> look up which one does what, but I've never thought about that view; a lot of 
> long options still accept a single character (subject to feeping creaturism, 
> of course).

I'm still into the one-character switch thing, personally.

-uso.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-11  3:18   ` Dave Horsfall
  2020-03-11  4:02     ` Steve Nickolas
@ 2020-03-11 22:56     ` Greg 'groggy' Lehey
  2020-03-11 23:14       ` Dan Cross
                         ` (2 more replies)
  1 sibling, 3 replies; 68+ messages in thread
From: Greg 'groggy' Lehey @ 2020-03-11 22:56 UTC (permalink / raw)
  To: Dave Horsfall; +Cc: The Eunuchs Hysterical Society

[-- Attachment #1: Type: text/plain, Size: 1821 bytes --]

On Wednesday, 11 March 2020 at 14:18:08 +1100, Dave Horsfall wrote:
>
> The "ls" command for example really needs an option-ectomy; I find that I
> don't really care about the exact number of bytes there are in a file as
> the nearest KiB or MiB (or even GiB) is usually good enough, so I'd be
> happy if "-h" was the default with some way to turn it off (yes, I know
> that it's occasionally useful to add them all up in a column, but that
> won't tell you how many media blocks are required).

A good example.  But you're not removing options, you're just
redefining them.  In fact I find the -h option particularly emetic, so
a better choice in removing options would be to remove -h and use a
filter to mutilate the sizes:

  $ ls -l | humanize

But that's a pain, isn't it?  That's why there's a -h option for
people who like it.  Note that you can't do it the other way round:
you can't get the exact size from -h output.

And then there's the question why you don't like the standard output.
Because the number strings are too long and difficult to read, maybe?
That's the rationale for the -, option.

> Quickly now, without looking: which option shows unprintable
> characters in a filename?  Unless you use it regularly (in which
> case you have real problems) you would have to look it up; I find
> that "ls ... | od -bc" to be quicker, especially on filenames with
> trailing blanks etc (which "-B" won't show).

This is arguably a bug in the -B option.  I certainly don't think the
pipe notation is quicker.  But it's nice to have both alternatives.

Greg
--
Sent from my desktop computer.
Finger grog@lemis.com for PGP public key.
See complete headers for address and phone numbers.
This message is digitally signed.  If your Microsoft mail program
reports problems, please read http://lemis.com/broken-MUA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 163 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-11 22:56     ` Greg 'groggy' Lehey
@ 2020-03-11 23:14       ` Dan Cross
  2020-03-12  0:42         ` Greg 'groggy' Lehey
  2020-03-12  0:53       ` Steve Nickolas
  2020-03-12  5:22       ` Dave Horsfall
  2 siblings, 1 reply; 68+ messages in thread
From: Dan Cross @ 2020-03-11 23:14 UTC (permalink / raw)
  To: Greg 'groggy' Lehey; +Cc: The Eunuchs Hysterical Society

[-- Attachment #1: Type: text/plain, Size: 3408 bytes --]

On Wed, Mar 11, 2020 at 6:57 PM Greg 'groggy' Lehey <grog@lemis.com> wrote:

> On Wednesday, 11 March 2020 at 14:18:08 +1100, Dave Horsfall wrote:
> >
> > The "ls" command for example really needs an option-ectomy; I find that I
> > don't really care about the exact number of bytes there are in a file as
> > the nearest KiB or MiB (or even GiB) is usually good enough, so I'd be
> > happy if "-h" was the default with some way to turn it off (yes, I know
> > that it's occasionally useful to add them all up in a column, but that
> > won't tell you how many media blocks are required).
>
> A good example.  But you're not removing options, you're just
> redefining them.  In fact I find the -h option particularly emetic, so
> a better choice in removing options would be to remove -h and use a
> filter to mutilate the sizes:
>
>   $ ls -l | humanize
>
> But that's a pain, isn't it?


I don't know; that's subjective.


> That's why there's a -h option for
> people who like it.


That's incomplete, in that it implies that an option is the only way to
achieve the goal of reducing the perceived pain, but that's not the case.
(Note I'm not saying you intended that as an interpretation, but it's a
reasonable intuition for an intention.)

An interesting counterpoint to this argument is how columnized "ls" is
handled under Plan 9: there is no `-C` option to `ls` there; instead,
there's a general-purpose `mc` filter that figures out the size of the
window it's running in, reads its input, decides how many columns the input
will fit into, and emits it columnized. But yes, it would be a pain to type
`ls | mc` every time one wanted columnized `ls` output, so this is wrapped
up into a shell script called `lc`. Note that this lets you do stuff like,
`lc -l` and see multi-column long listings if the window is wide enough.

I got so used to this from plan9 that I keep an approximation in
$HOME/bin/lc: `exec ls -ACF "$@"`.

For the `humanize` thing, I don't see why one couldn't have an `lh` command
that generated "human-friendly long output from ls."


> Note that you can't do it the other way round:
> you can't get the exact size from -h output.
>

That's true, but now the logic is specialized to ls, and not applicable to
anything else (e.g., du? df? wc, perhaps?). Similarly with `-,`. It is not
general purpose, which is unfortunate.

Granted, combining these things would be a little challenging, but is it
likely that one would want `ls -l,h`? Optimize for the common case, etc....

And then there's the question why you don't like the standard output.
> Because the number strings are too long and difficult to read, maybe?
> That's the rationale for the -, option.
>
> > Quickly now, without looking: which option shows unprintable
> > characters in a filename?  Unless you use it regularly (in which
> > case you have real problems) you would have to look it up; I find
> > that "ls ... | od -bc" to be quicker, especially on filenames with
> > trailing blanks etc (which "-B" won't show).
>
> This is arguably a bug in the -B option.  I certainly don't think the
> pipe notation is quicker.  But it's nice to have both alternatives.


By default, plan9 would quote filenames that had characters that were
special to the shell (there wasn't really the concept of "non-printable
characters in the Unix/TTY sense); this could be disabled by specifying the
`-Q` option.

        - Dan C.

[-- Attachment #2: Type: text/html, Size: 4662 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-11 23:14       ` Dan Cross
@ 2020-03-12  0:42         ` Greg 'groggy' Lehey
  0 siblings, 0 replies; 68+ messages in thread
From: Greg 'groggy' Lehey @ 2020-03-12  0:42 UTC (permalink / raw)
  To: Dan Cross; +Cc: The Eunuchs Hysterical Society

[-- Attachment #1: Type: text/plain, Size: 5879 bytes --]

On Wednesday, 11 March 2020 at 19:14:32 -0400, Dan Cross wrote:
> On Wed, Mar 11, 2020 at 6:57 PM Greg 'groggy' Lehey <grog@lemis.com> wrote:
>
>> On Wednesday, 11 March 2020 at 14:18:08 +1100, Dave Horsfall wrote:
>>>
>>> The "ls" command for example really needs an option-ectomy; I find that I
>>> don't really care about the exact number of bytes there are in a file as
>>> the nearest KiB or MiB (or even GiB) is usually good enough, so I'd be
>>> happy if "-h" was the default with some way to turn it off (yes, I know
>>> that it's occasionally useful to add them all up in a column, but that
>>> won't tell you how many media blocks are required).
>>
>> A good example.  But you're not removing options, you're just
>> redefining them.  In fact I find the -h option particularly emetic, so
>> a better choice in removing options would be to remove -h and use a
>> filter to mutilate the sizes:
>>
>>   $ ls -l | humanize
>>
>> But that's a pain, isn't it?
>
> I don't know; that's subjective.

It's certainly more work than -h.

>> That's why there's a -h option for people who like it.
>
> That's incomplete, in that it implies that an option is the only way
> to achieve the goal of reducing the perceived pain, but that's not
> the case.  (Note I'm not saying you intended that as an
> interpretation, but it's a reasonable intuition for an intention.)

What I meant (and this is certainly my interpretation) was that
somebody added the -h option because of perceived pain with piping
output through another program.  I didn't intend to imply that it was
the only alternative.

> An interesting counterpoint to this argument is how columnized "ls"
> is handled under Plan 9: there is no `-C` option to `ls` there;
> instead, there's a general-purpose `mc` filter that figures out the
> size of the window it's running in, reads its input, decides how
> many columns the input will fit into, and emits it columnized. But
> yes, it would be a pain to type `ls | mc` every time one wanted
> columnized `ls` output, so this is wrapped up into a shell script
> called `lc`. Note that this lets you do stuff like, `lc -l` and see
> multi-column long listings if the window is wide enough.

Yes, that sounds like an excellent method.

> For the `humanize` thing, I don't see why one couldn't have an `lh`
> command that generated "human-friendly long output from ls."

And yes, I deliberately didn't mention this option, though it occurred
to me.  I have a couple of scripts like this, like:

    alias l="ls -lbL,"

>> Note that you can't do it the other way round: you can't get the
>> exact size from -h output.
>
> That's true, but now the logic is specialized to ls, and not
> applicable to anything else (e.g., du? df? wc, perhaps?). Similarly
> with `-,`. It is not general purpose, which is unfortunate.

Yes, this is an issue that I mentioned in an earlier message (I added
a positional parameter to work around it).  But this is in the nature
of the output.  mc doesn't have this issue.

> Granted, combining these things would be a little challenging, but is it
> likely that one would want `ls -l,h`? Optimize for the common case,
> etc....

Heh.  Never thought of that.  But since -h (apparently) never produces
output with 4 digits, the -, doesn't ever come into effect.  I've just
tried it on some big files, and the -, is effectively ignored.

> And then there's the question why you don't like the standard
> output.

I don't like the standard output because things like this are hard to
read:

  -rw-r--r--  1 grog  lemis   8234010624 22 Mar  2012 Casanova-TV-1-5
  -rw-r--r--  1 grog  home   13225168900 31 Aug  2019 Movie:_Sahara_2005-2016-04-11-2028

I find this easier to read:

  -rw-r--r--  1 grog  lemis   8,234,010,624 22 Mar  2012 Casanova-TV-1-5
  -rw-r--r--  1 grog  home   13,225,168,900 31 Aug  2019 Movie:_Sahara_2005-2016-04-11-2028

I can't speak for Dave, but this is also less painful:

  -rw-r--r--  1 grog  lemis   7.7G 22 Mar  2012 Casanova-TV-1-5
  -rw-r--r--  1 grog  home     12G 31 Aug  2019 Movie:_Sahara_2005-2016-04-11-2028

The problem for me there is the difficulty comparing lengths, and the
implicit inaccuracy.

>> Because the number strings are too long and difficult to read, maybe?
>> That's the rationale for the -, option.
>>
>>> Quickly now, without looking: which option shows unprintable
>>> characters in a filename?  Unless you use it regularly (in which
>>> case you have real problems) you would have to look it up; I find
>>> that "ls ... | od -bc" to be quicker, especially on filenames with
>>> trailing blanks etc (which "-B" won't show).
>>
>> This is arguably a bug in the -B option.  I certainly don't think the
>> pipe notation is quicker.  But it's nice to have both alternatives.
>
> By default, plan9 would quote filenames that had characters that
> were special to the shell (there wasn't really the concept of
> "non-printable characters in the Unix/TTY sense); this could be
> disabled by specifying the `-Q` option.

Hmm.  In this particular case, so does Linux:

  === grog@bilbo (/dev/pts/11) ~ 2 -> touch "foo   "
  === grog@bilbo (/dev/pts/11) ~ 4 -> l foo*
  -rw-r--r-- 1 grog grog 1499570 Jun 30  2012  foo
  -rw-r--r-- 1 grog grog       0 Mar 12 10:40 'foo   '

I wonder if that's something we should emulate in FreeBSD.  At the
very least we should consider whether the lack of identification of
trailing blanks is a bug in the FreeBSD implementation of -B.  This
option isn't in POSIX, and in Linux it means

     -B, --ignore-backups
              do not list implied entries ending with ~

So maybe it's a candidate for fixing.

Greg
--
Sent from my desktop computer.
Finger grog@lemis.com for PGP public key.
See complete headers for address and phone numbers.
This message is digitally signed.  If your Microsoft mail program
reports problems, please read http://lemis.com/broken-MUA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 163 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-11 22:56     ` Greg 'groggy' Lehey
  2020-03-11 23:14       ` Dan Cross
@ 2020-03-12  0:53       ` Steve Nickolas
  2020-03-12  3:09         ` Greg 'groggy' Lehey
                           ` (2 more replies)
  2020-03-12  5:22       ` Dave Horsfall
  2 siblings, 3 replies; 68+ messages in thread
From: Steve Nickolas @ 2020-03-12  0:53 UTC (permalink / raw)
  To: Greg 'groggy' Lehey; +Cc: The Eunuchs Hysterical Society

On Thu, 12 Mar 2020, Greg 'groggy' Lehey wrote:

> On Wednesday, 11 March 2020 at 14:18:08 +1100, Dave Horsfall wrote:
>>
>> The "ls" command for example really needs an option-ectomy; I find that I
>> don't really care about the exact number of bytes there are in a file as
>> the nearest KiB or MiB (or even GiB) is usually good enough, so I'd be
>> happy if "-h" was the default with some way to turn it off (yes, I know
>> that it's occasionally useful to add them all up in a column, but that
>> won't tell you how many media blocks are required).
>
> A good example.  But you're not removing options, you're just
> redefining them.  In fact I find the -h option particularly emetic, so
> a better choice in removing options would be to remove -h and use a
> filter to mutilate the sizes:
>
>  $ ls -l | humanize
>
> But that's a pain, isn't it?  That's why there's a -h option for
> people who like it.  Note that you can't do it the other way round:
> you can't get the exact size from -h output.
>
> And then there's the question why you don't like the standard output.
> Because the number strings are too long and difficult to read, maybe?
> That's the rationale for the -, option.
>
>> Quickly now, without looking: which option shows unprintable
>> characters in a filename?  Unless you use it regularly (in which
>> case you have real problems) you would have to look it up; I find
>> that "ls ... | od -bc" to be quicker, especially on filenames with
>> trailing blanks etc (which "-B" won't show).
>
> This is arguably a bug in the -B option.  I certainly don't think the
> pipe notation is quicker.  But it's nice to have both alternatives.
>
> Greg
> --
> Sent from my desktop computer.
> Finger grog@lemis.com for PGP public key.
> See complete headers for address and phone numbers.
> This message is digitally signed.  If your Microsoft mail program
> reports problems, please read http://lemis.com/broken-MUA
>

I went through all the switches defined by POSIX, and figured that those 
26 could be cut down.  My concept reduced the number of switches from 26 
to 9 (FLRadfiln).  Of course, the idea is to be more minimalist than 
POSIX, so some people's opinions on what is or isn't necessary may differ 
from mine.

Of course, this changes the default behavior of ls because it no longer 
would be able to do columnar listings (|column for that).

I felt -A was a redundant "almost -a".
I felt -C and -x were redundant because a tool like column(1) could be 
used to do the same job (even though column(1) isn't POSIX).
I felt -H was a redundant "almost -L".
I felt -S, -r and -t could be implemented in other ways using sort(1).
I felt -c and -u were meaningless, but that's because of the filesystems I 
usually work with that do not have functional equivalents.  -u for one is 
completely useless on VFAT even though it has such timestamps!  YMMV.
I felt -g and -o could be replaced by cut(1).
I felt -k wasn't really all that important.  Just halve the numbers.
I felt -m wasn't really all that important.  There's other ways to convert 
to that format, no doubt, through filters.
I felt -p was a redundant "almost -F".
I felt -q could be done just fine with something like tr(1).
I felt -s was a redundant "kindasorta -l".
And -1 becomes the new default, so it's redundant. ;)

Again, YMMV. ;)

-uso.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-12  0:53       ` Steve Nickolas
@ 2020-03-12  3:09         ` Greg 'groggy' Lehey
  2020-03-12  3:34           ` Steve Nickolas
  2020-03-12  5:38         ` Dave Horsfall
  2020-03-12  6:48         ` Peter Jeremy
  2 siblings, 1 reply; 68+ messages in thread
From: Greg 'groggy' Lehey @ 2020-03-12  3:09 UTC (permalink / raw)
  To: Steve Nickolas; +Cc: The Eunuchs Hysterical Society

[-- Attachment #1: Type: text/plain, Size: 4120 bytes --]

On Wednesday, 11 March 2020 at 20:53:12 -0400, Steve Nickolas wrote:
> I went through all the switches defined by POSIX, and figured that
> those 26 could be cut down.

A brave man to defy POSIX!  I wasn't so brave, which is why we have
the -y option.

> My concept reduced the number of switches from 26 to 9 (FLRadfiln).
> Of course, the idea is to be more minimalist than POSIX, so some
> people's opinions on what is or isn't necessary may differ from
> mine.

OK, let's compare notes:

> I felt -A was a redundant "almost -a".

Arguably -a could go too.  The distinction seems arbitrary.

> I felt -C and -x were redundant because a tool like column(1) could be
> used to do the same job (even though column(1) isn't POSIX).

Neither would this ls(1) be.

> I felt -H was a redundant "almost -L".

No arguments, but I suspect that somebody had a good reason for this
distinction, and removing it could cause problems.

> I felt -S, -r and -t could be implemented in other ways using sort(1).

-S isn't POSIX.  And to implement it without an option would mean
removing -h.

As I mentioned earlier, -t can't be done by a filter without
significantly modifying the timestamp output.  That was my rationale
for the -D option, which allows sorting by an external filter.

-r could work.

> I felt -c and -u were meaningless, but that's because of the filesystems I
> usually work with that do not have functional equivalents.  -u for one is
> completely useless on VFAT even though it has such timestamps!  YMMV.

I think this says more about your file systems than about the options.
I find both incredibly useful, and there's no easy way to get the
information elsewhere.  stat(1) would be an option, but then that
could replace ls(1) completely.

> I felt -g and -o could be replaced by cut(1).

-g is already obsolete in FreeBSD (accepted and ignored).  -o has
already been repurposed (show file flags).

> I felt -k wasn't really all that important.  Just halve the numbers.

Agreed.

> I felt -m wasn't really all that important.  There's other ways to convert
> to that format, no doubt, through filters.

Possibly.  Certainly I wouldn't miss it.

> I felt -p was a redundant "almost -F".

OK.

> I felt -q could be done just fine with something like tr(1).

I think that it could be replaced by -b.  "?" isn't really very
helpful.

> I felt -s was a redundant "kindasorta -l".

I can't agree with that, but I've never used it.  The only sensible
use would appear to be talking about disk blocks, but on FreeBSD at
any rate it looks at the BLOCKSIZE environment variable, which I have
set to 1048576 (so that utilities will display in MB where
appropriate), and that's what -s does too:

   2079 -rw-r--r--  1 grog  wheel   2,178,735,915  4 Oct 11:15 Willkommen-bei-den-Honeckers---Spielfilm,-Deutschland-2016-20191003-125200.mp4

That makes it pretty useless.

So, any others?

-G: Colorized output.  I'd be *really* happy to get rid of this, but
    it's not easy to instate with a filter, so I suppose there are
    enough people who like it that it will have to stay.

-P: Seems only to be there to cancel a -H or -L.

-W: "Display whiteouts when scanning directories".  I don't even
    understand what that is.

-a: See discussion of -A.

--color: Again, no thanks.

-f: We haven't really discussed this one.  If you want to remove -S,
    -r and -t, then arguably -f should become the default and be
    -removed.

-n: Make it the default and require a filter to convert group and user
    numbers to IDs.

-y: If we get rid of all sorting, it will no longer be needed.

-,: Make the option standard: output numbers with commas every 3
    digits.  Then this option specification wouldn't be needed.

Of course, none of this will happen.  But it is interesting to think
about it.  In particular, options like -g and -o, which are no longer
modern.

Greg
--
Sent from my desktop computer.
Finger grog@lemis.com for PGP public key.
See complete headers for address and phone numbers.
This message is digitally signed.  If your Microsoft mail program
reports problems, please read http://lemis.com/broken-MUA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 163 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-12  3:09         ` Greg 'groggy' Lehey
@ 2020-03-12  3:34           ` Steve Nickolas
  2020-03-13  1:02             ` Greg 'groggy' Lehey
  0 siblings, 1 reply; 68+ messages in thread
From: Steve Nickolas @ 2020-03-12  3:34 UTC (permalink / raw)
  To: Greg 'groggy' Lehey; +Cc: The Eunuchs Hysterical Society

On Thu, 12 Mar 2020, Greg 'groggy' Lehey wrote:

> On Wednesday, 11 March 2020 at 20:53:12 -0400, Steve Nickolas wrote:
>> I went through all the switches defined by POSIX, and figured that
>> those 26 could be cut down.
>
> A brave man to defy POSIX!  I wasn't so brave, which is why we have
> the -y option.

xD

>> My concept reduced the number of switches from 26 to 9 (FLRadfiln).
>> Of course, the idea is to be more minimalist than POSIX, so some
>> people's opinions on what is or isn't necessary may differ from
>> mine.
>
> OK, let's compare notes:
>
>> I felt -A was a redundant "almost -a".
>
> Arguably -a could go too.  The distinction seems arbitrary.

Well, I think one or the other would be desirable.  I figured -a was the 
better to keep - since it shows all dotfiles where -A leaves off . and .. 
.

>> I felt -C and -x were redundant because a tool like column(1) could be
>> used to do the same job (even though column(1) isn't POSIX).
>
> Neither would this ls(1) be.

Of course. ;)

<snip>

> -S isn't POSIX.  And to implement it without an option would mean
> removing -h.

-h is a gnuism, isn't it?

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/ls.html does 
specify the -S switch.  That's POSIX, isn't it?

> As I mentioned earlier, -t can't be done by a filter without
> significantly modifying the timestamp output.  That was my rationale
> for the -D option, which allows sorting by an external filter.

Understandable.

Honestly if the date format weren't standardized as it were, I would've 
standardized on "yyyy-mm-dd,mm:ss" - which wouldn't need special 
processing in order to pump into sort(1).

>> I felt -c and -u were meaningless, but that's because of the filesystems I
>> usually work with that do not have functional equivalents.  -u for one is
>> completely useless on VFAT even though it has such timestamps!  YMMV.
>
> I think this says more about your file systems than about the options.
> I find both incredibly useful, and there's no easy way to get the
> information elsewhere.  stat(1) would be an option, but then that
> could replace ls(1) completely.

Perhaps true.

<snip>

> So, any others?
>
> -G: Colorized output.  I'd be *really* happy to get rid of this, but
>    it's not easy to instate with a filter, so I suppose there are
>    enough people who like it that it will have to stay.
>
> -P: Seems only to be there to cancel a -H or -L.
>
> -W: "Display whiteouts when scanning directories".  I don't even
>    understand what that is.

I was using the link I referenced as my "standard", which doesn't have any 
of those.

I can take or leave color ls.  I don't like the GNU defaults because dark 
blue is TOO dark on my default settings.  I think the flags are adequate 
to know what kind of file I'm dealing with.

> -f: We haven't really discussed this one.  If you want to remove -S,
>    -r and -t, then arguably -f should become the default and be
>    -removed.

I used to use "dir|sort" a lot on PC DOS before it got "dir /o" in 5.0.  I 
wouldn't have a problem with removing sort from ls altogether.

<snip>

> Of course, none of this will happen.  But it is interesting to think
> about it.  In particular, options like -g and -o, which are no longer
> modern.

-uso.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-11 22:56     ` Greg 'groggy' Lehey
  2020-03-11 23:14       ` Dan Cross
  2020-03-12  0:53       ` Steve Nickolas
@ 2020-03-12  5:22       ` Dave Horsfall
  2020-03-12  5:35         ` Steve Nickolas
  2020-03-13  0:36         ` Greg 'groggy' Lehey
  2 siblings, 2 replies; 68+ messages in thread
From: Dave Horsfall @ 2020-03-12  5:22 UTC (permalink / raw)
  To: The Eunuchs Hysterical Society

On Thu, 12 Mar 2020, Greg 'groggy' Lehey wrote:

> A good example.  But you're not removing options, you're just redefining 
> them.  In fact I find the -h option particularly emetic, so a better 
> choice in removing options would be to remove -h and use a filter to 
> mutilate the sizes:
>
>  $ ls -l | humanize

I also had something like that in mind, except being British/Australian 
I'd spell it with an "s" :-)

> But that's a pain, isn't it?  That's why there's a -h option for people 
> who like it.  Note that you can't do it the other way round: you can't 
> get the exact size from -h output.

Which is why I suggested there be a means to turn it off; I'm becoming a 
fan of environment variables to modify the standard behaviour of tools 
(but I loathe the Penguin/OS default to use colours).

> And then there's the question why you don't like the standard output. 
> Because the number strings are too long and difficult to read, maybe? 
> That's the rationale for the -, option.

More than likely; as I approach age 68 I notice that I'm losing some 
cognitive facility...  I might start using "," and see if I like it, but I 
see that the Mac doesn't have it (my Penguin is off the air at the 
moment), and having it as an environment variable would be nice.

>> Quickly now, without looking: which option shows unprintable
>> characters in a filename?  Unless you use it regularly (in which
>> case you have real problems) you would have to look it up; I find
>> that "ls ... | od -bc" to be quicker, especially on filenames with
>> trailing blanks etc (which "-B" won't show).
>
> This is arguably a bug in the -B option.  I certainly don't think the 
> pipe notation is quicker.  But it's nice to have both alternatives.

Agreed; as for the bug I think it comes down to what is meant by an
unprintable character.  I certainly remember finding "hidden" set-uid
shells with the name of ".. " etc back when I was going after the
UNSW kiddies with an axe back in the late 70s...

-- Dave

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-12  5:22       ` Dave Horsfall
@ 2020-03-12  5:35         ` Steve Nickolas
  2020-03-13  0:36         ` Greg 'groggy' Lehey
  1 sibling, 0 replies; 68+ messages in thread
From: Steve Nickolas @ 2020-03-12  5:35 UTC (permalink / raw)
  To: Dave Horsfall; +Cc: The Eunuchs Hysterical Society

On Thu, 12 Mar 2020, Dave Horsfall wrote:

> Which is why I suggested there be a means to turn it off; I'm becoming a fan 
> of environment variables to modify the standard behaviour of tools (but I 
> loathe the Penguin/OS default to use colours).

When I first used Linux, that wasn't the default.  Personally, I don't 
think it should be (actually I think there simply shouldn't be a color 
mode at all to ls).

> More than likely; as I approach age 68 I notice that I'm losing some 
> cognitive facility...  I might start using "," and see if I like it, but I 
> see that the Mac doesn't have it (my Penguin is off the air at the moment), 
> and having it as an environment variable would be nice.

GNU ls does not appear to have a -, switch.

IBM, interestingly, introduced an environment variable in PC DOS 6.3 that 
did the opposite thing.  If the NO_SEP variable existed, it suppressed 
commas in file sizes.

-uso.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-12  0:53       ` Steve Nickolas
  2020-03-12  3:09         ` Greg 'groggy' Lehey
@ 2020-03-12  5:38         ` Dave Horsfall
  2020-03-12  6:48         ` Peter Jeremy
  2 siblings, 0 replies; 68+ messages in thread
From: Dave Horsfall @ 2020-03-12  5:38 UTC (permalink / raw)
  To: The Eunuchs Hysterical Society

On Wed, 11 Mar 2020, Steve Nickolas wrote:

> I felt -c and -u were meaningless, but that's because of the filesystems 
> I usually work with that do not have functional equivalents.  -u for one 
> is completely useless on VFAT even though it has such timestamps! 
> YMMV.

I find those flags really useful when doing forensic analysis on a file 
system :-)  One particular instance was at $ORKPLACE some years back when 
a critical chunk of a file system had somehow disappeared overnight (it 
was our source base!).  I got to work by comparing login sessions with 
those someone-unknown "ls" flags and had just about nailed the perp who 
was online at the time when I was ordered off it in no uncertain terms.

Ummm, did I mention that my then $BOSS had a habit of working from home 
after a few (and quite a few) drinks?  As I said, I was this -><- far away 
from fingering him...  As it stood I knew who it was but wasn't able to 
prove it in time.

-- Dave

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-12  0:53       ` Steve Nickolas
  2020-03-12  3:09         ` Greg 'groggy' Lehey
  2020-03-12  5:38         ` Dave Horsfall
@ 2020-03-12  6:48         ` Peter Jeremy
  2020-03-12  7:37           ` Steve Nickolas
  2020-03-12 23:57           ` Greg 'groggy' Lehey
  2 siblings, 2 replies; 68+ messages in thread
From: Peter Jeremy @ 2020-03-12  6:48 UTC (permalink / raw)
  To: Steve Nickolas; +Cc: The Eunuchs Hysterical Society

[-- Attachment #1: Type: text/plain, Size: 729 bytes --]

On 2020-Mar-11 20:53:12 -0400, Steve Nickolas <usotsuki@buric.co> wrote:
>On Thu, 12 Mar 2020, Greg 'groggy' Lehey wrote:
>> a better choice in removing options would be to remove -h and use a
>> filter to mutilate the sizes:
>>
>>  $ ls -l | humanize

How does humanize decide which column to work on?  If it only works on
"ls -l", then it's not useful if I want other columns as well.  Maybe
it could just humanize any large number it found, but you probably
don't want to "humanize" the inode number or filename.

>I felt -s was a redundant "kindasorta -l".

Except they are reporting completely different things - consider sparse
files or filesystems (like ZFS) that support compression.

-- 
Peter Jeremy

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 963 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-12  6:48         ` Peter Jeremy
@ 2020-03-12  7:37           ` Steve Nickolas
  2020-03-12  7:42             ` Warner Losh
  2020-03-12 23:57           ` Greg 'groggy' Lehey
  1 sibling, 1 reply; 68+ messages in thread
From: Steve Nickolas @ 2020-03-12  7:37 UTC (permalink / raw)
  To: Peter Jeremy; +Cc: The Eunuchs Hysterical Society

On Thu, 12 Mar 2020, Peter Jeremy wrote:

> On 2020-Mar-11 20:53:12 -0400, Steve Nickolas <usotsuki@buric.co> wrote:
>
>> I felt -s was a redundant "kindasorta -l".
>
> Except they are reporting completely different things - consider sparse
> files or filesystems (like ZFS) that support compression.

I was under the impression that -s simply showed the file size divided by 
512 and didn't account for sparseness or compression.

(Of the filesystems I frequently work with, one of them does actually 
support sparseness (ProDOS).)

-uso.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-12  7:37           ` Steve Nickolas
@ 2020-03-12  7:42             ` Warner Losh
  0 siblings, 0 replies; 68+ messages in thread
From: Warner Losh @ 2020-03-12  7:42 UTC (permalink / raw)
  To: Steve Nickolas; +Cc: The Eunuchs Hysterical Society

[-- Attachment #1: Type: text/plain, Size: 790 bytes --]

On Thu, Mar 12, 2020, 1:37 AM Steve Nickolas <usotsuki@buric.co> wrote:

> On Thu, 12 Mar 2020, Peter Jeremy wrote:
>
> > On 2020-Mar-11 20:53:12 -0400, Steve Nickolas <usotsuki@buric.co> wrote:
> >
> >> I felt -s was a redundant "kindasorta -l".
> >
> > Except they are reporting completely different things - consider sparse
> > files or filesystems (like ZFS) that support compression.
>
> I was under the impression that -s simply showed the file size divided by
> 512 and didn't account for sparseness or compression.
>

Stat returns two values. The offset of the last byte and the number of
blocks allocated to the file. Useful if you have a sparse file too...

Warner

(Of the filesystems I frequently work with, one of them does actually
> support sparseness (ProDOS).)
>
> -uso.
>

[-- Attachment #2: Type: text/html, Size: 1514 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-12  6:48         ` Peter Jeremy
  2020-03-12  7:37           ` Steve Nickolas
@ 2020-03-12 23:57           ` Greg 'groggy' Lehey
  1 sibling, 0 replies; 68+ messages in thread
From: Greg 'groggy' Lehey @ 2020-03-12 23:57 UTC (permalink / raw)
  To: Peter Jeremy; +Cc: The Eunuchs Hysterical Society

[-- Attachment #1: Type: text/plain, Size: 1085 bytes --]

On Thursday, 12 March 2020 at 17:48:07 +1100, Peter Jeremy wrote:
> On 2020-Mar-11 20:53:12 -0400, Steve Nickolas <usotsuki@buric.co> wrote:
>> On Thu, 12 Mar 2020, Greg 'groggy' Lehey wrote:
>>> a better choice in removing options would be to remove -h and use a
>>> filter to mutilate the sizes:
>>>
>>>  $ ls -l | humanize
>
> How does humanize decide which column to work on?

It knows.  It was written that way.

> If it only works on "ls -l", then it's not useful if I want other
> columns as well.

Right.  You'd have to change it.  Recall that this was just an
example.

> Maybe it could just humanize any large number it found, but you
> probably don't want to "humanize" the inode number or filename.

Yes, this is exactly the scenario I described in an earlier mail
message, where I called it

  $ ls -l | commafy 5

Greg
--
Sent from my desktop computer.
Finger grog@lemis.com for PGP public key.
See complete headers for address and phone numbers.
This message is digitally signed.  If your Microsoft mail program
reports problems, please read http://lemis.com/broken-MUA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 163 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-12  5:22       ` Dave Horsfall
  2020-03-12  5:35         ` Steve Nickolas
@ 2020-03-13  0:36         ` Greg 'groggy' Lehey
  2020-03-13 11:26           ` Dave Horsfall
  2020-03-14  2:13           ` Greg A. Woods
  1 sibling, 2 replies; 68+ messages in thread
From: Greg 'groggy' Lehey @ 2020-03-13  0:36 UTC (permalink / raw)
  To: Dave Horsfall; +Cc: The Eunuchs Hysterical Society

[-- Attachment #1: Type: text/plain, Size: 2396 bytes --]

On Thursday, 12 March 2020 at 16:22:01 +1100, Dave Horsfall wrote:
> On Thu, 12 Mar 2020, Greg 'groggy' Lehey wrote:
>
>> A good example.  But you're not removing options, you're just redefining
>> them.  In fact I find the -h option particularly emetic, so a better
>> choice in removing options would be to remove -h and use a filter to
>> mutilate the sizes:
>>
>>  $ ls -l | humanize
>
> I also had something like that in mind, except being British/Australian
> I'd spell it with an "s" :-)

It's a common misconception that -ize is US English.  The Oxford
English Dictionary, normally not prescriptive, prefers it.  See
https://www.oed.com/page/faqs/Frequently+asked+questions#spell.  I
personally had -ise drummed out of me by my uncle, very much
Australian.

>> And then there's the question why you don't like the standard output.
>> Because the number strings are too long and difficult to read, maybe?
>> That's the rationale for the -, option.
>
> More than likely; as I approach age 68 I notice that I'm losing some
> cognitive facility...  I might start using "," and see if I like it, but I
> see that the Mac doesn't have it (my Penguin is off the air at the
> moment), and having it as an environment variable would be nice.

Yes, currently only FreeBSD has it.  But you have the sources.  Apart
from option handling, it's only:

--- print.c     (.../head/bin/ls/print.c)       (revision 241014)
+++ print.c     (.../stable/10/bin/ls/print.c)  (working copy)
@@ -606,6 +606,10 @@
                humanize_number(buf, sizeof(buf), (int64_t)bytes, "",
                    HN_AUTOSCALE, HN_B | HN_NOSPACE | HN_DECIMAL);
                (void)printf("%*s ", (u_int)width, buf);
+       } else if (f_thousands) {               /* with commas */
+               /* This format assignment needed to work round gcc bug. */
+               const char *format = "%*j'd ";
+               (void)printf(format, (u_int)width, bytes);
        } else
                (void)printf("%*jd ", (u_int)width, bytes);
 }

A quick and dirty fix would be simply to replace the format string.

Greg
--
Sent from my desktop computer.
Finger grog@lemis.com for PGP public key.
See complete headers for address and phone numbers.
This message is digitally signed.  If your Microsoft mail program
reports problems, please read http://lemis.com/broken-MUA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 163 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-12  3:34           ` Steve Nickolas
@ 2020-03-13  1:02             ` Greg 'groggy' Lehey
  0 siblings, 0 replies; 68+ messages in thread
From: Greg 'groggy' Lehey @ 2020-03-13  1:02 UTC (permalink / raw)
  To: Steve Nickolas; +Cc: The Eunuchs Hysterical Society

[-- Attachment #1: Type: text/plain, Size: 1946 bytes --]

On Wednesday, 11 March 2020 at 23:34:46 -0400, Steve Nickolas wrote:
> On Thu, 12 Mar 2020, Greg 'groggy' Lehey wrote:
>> -S isn't POSIX.  And to implement it without an option would mean
>> removing -h.
>
> -h is a gnuism, isn't it?

It might have originated there, but then I would expect it to be spelt
'--produce-human-readable-output'.  I haven't been able to establish
from the FreeBSD sources or commit logs when it was introduced.  It
would clearly have been a reimplementation.

> https://pubs.opengroup.org/onlinepubs/9699919799/utilities/ls.html does
> specify the -S switch.  That's POSIX, isn't it?

So it is!  This was the first option that I wanted to add, back when I
still had practice wheels.  I asked my mentor, and he said "not the
Unix way", so I let it be.  Then Wes Peters came up with the idea, and
I thought he committed it, but it seems that it ultimately came from
Kostas Blekos in 2005, based on the same feature on NetBSD and
OpenBSD.  I wonder when it made it to POSIX.

>> As I mentioned earlier, -t can't be done by a filter without
>> significantly modifying the timestamp output.  That was my rationale
>> for the -D option, which allows sorting by an external filter.
>
> Understandable.
>
> Honestly if the date format weren't standardized as it were, I would've
> standardized on "yyyy-mm-dd,mm:ss" - which wouldn't need special
> processing in order to pump into sort(1).

Yes, that was one of the possibilities I thought of.  Another obvious
one was time_t, which is even easier to process.  And then there's ISO
8601.  That's why it didn't take me long to decide "do it *your* way”
with the -D option.

Greg
--
Sent from my desktop computer.
Finger grog@lemis.com for PGP public key.
See complete headers for address and phone numbers.
This message is digitally signed.  If your Microsoft mail program
reports problems, please read http://lemis.com/broken-MUA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 163 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-13  0:36         ` Greg 'groggy' Lehey
@ 2020-03-13 11:26           ` Dave Horsfall
  2020-03-14  2:13           ` Greg A. Woods
  1 sibling, 0 replies; 68+ messages in thread
From: Dave Horsfall @ 2020-03-13 11:26 UTC (permalink / raw)
  To: The Eunuchs Hysterical Society

On Fri, 13 Mar 2020, Greg 'groggy' Lehey wrote:

>>>  $ ls -l | humanize
>>
>> I also had something like that in mind, except being British/Australian
>> I'd spell it with an "s" :-)
>
> It's a common misconception that -ize is US English.  The Oxford English 
> Dictionary, normally not prescriptive, prefers it.  See 
> https://www.oed.com/page/faqs/Frequently+asked+questions#spell.  I 
> personally had -ise drummed out of me by my uncle, very much Australian.

I'm familiar with that (and also the fact that "aluminum" and "color" etc 
were British spelling).  Being born and bred British with pedantic parents 
I've always hated "American" spelling as we called it, and it's sad to see 
such noted media as the Sydney Morning Herald slowly adopting it over the 
past few years; Australia has used British spelling at least since I 
emigrated here in 1965.

Oh, it was meant to be a creat/create joke, BTW...

>> More than likely; as I approach age 68 I notice that I'm losing some 
>> cognitive facility...  I might start using "," and see if I like it, 
>> but I see that the Mac doesn't have it (my Penguin is off the air at 
>> the moment), and having it as an environment variable would be nice.
>
> Yes, currently only FreeBSD has it.  But you have the sources.  Apart 
> from option handling, it's only:

[...]

I don't like my chances with suggesting that to Apple; I'm not even sure 
if they even take user contributions (although back when I was on the dole 
and having delusions of grandeur I did register as an Apple developer, but 
I suspect that that's for non-Apple stuff i.e. it goes into the Apple 
Store).

> A quick and dirty fix would be simply to replace the format string.

I have done the odd binary patch (usually to reconfigure Unify database 
volumes back when I was with FGH)...  Not right now, though, as it's time 
for bed.

-- Dave

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-13  0:36         ` Greg 'groggy' Lehey
  2020-03-13 11:26           ` Dave Horsfall
@ 2020-03-14  2:13           ` Greg A. Woods
  2020-03-14  4:31             ` Greg 'groggy' Lehey
  1 sibling, 1 reply; 68+ messages in thread
From: Greg A. Woods @ 2020-03-14  2:13 UTC (permalink / raw)
  To: The Unix Heritage Society mailing list

[-- Attachment #1: Type: text/plain, Size: 1230 bytes --]

At Fri, 13 Mar 2020 11:36:47 +1100, Greg 'groggy' Lehey <grog@lemis.com> wrote:
Subject: Re: [TUHS] Command line options and complexity
>
> On Thursday, 12 March 2020 at 16:22:01 +1100, Dave Horsfall wrote:
> > On Thu, 12 Mar 2020, Greg 'groggy' Lehey wrote:
> > >
> > > And then there's the question why you don't like the standard output.
> > > Because the number strings are too long and difficult to read, maybe?
> > > That's the rationale for the -, option.
> >
> > More than likely; as I approach age 68 I notice that I'm losing some
> > cognitive facility...  I might start using "," and see if I like it, but I
> > see that the Mac doesn't have it (my Penguin is off the air at the
> > moment), and having it as an environment variable would be nice.
>
> Yes, currently only FreeBSD has it.

Because of course NetBSD has chosen a different option letter:  'M'

Unfortunately on NetBSD and FreeBSD the appearance of commas (or
whatever is appropriate) depends on the locale being correctly
configured, and this is not always so easy to do!

--
					Greg A. Woods <gwoods@acm.org>

Kelowna, BC     +1 250 762-7675           RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com>     Avoncote Farms <woods@avoncote.ca>

[-- Attachment #2: OpenPGP Digital Signature --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-14  2:13           ` Greg A. Woods
@ 2020-03-14  4:31             ` Greg 'groggy' Lehey
  0 siblings, 0 replies; 68+ messages in thread
From: Greg 'groggy' Lehey @ 2020-03-14  4:31 UTC (permalink / raw)
  To: The Unix Heritage Society mailing list

[-- Attachment #1: Type: text/plain, Size: 933 bytes --]

On Friday, 13 March 2020 at 19:13:53 -0700, Greg A. Woods wrote:
> At Fri, 13 Mar 2020 11:36:47 +1100, Greg 'groggy' Lehey <grog@lemis.com> wrote:
>> Yes, currently only FreeBSD has it.
>
> Because of course NetBSD has chosen a different option letter:  'M'

Oh.  Somehow I missed that.  Damn.

> Unfortunately on NetBSD and FreeBSD the appearance of commas (or
> whatever is appropriate) depends on the locale being correctly
> configured, and this is not always so easy to do!

Agreed.  I've been meaning to default to , if the locale doesn't
specify a delimiter, but haven't got round to it.  Give me a problem
report (https://bugs.freebsd.org/bugzilla/) and I'll fix it.

Greg
--
Sent from my desktop computer.
Finger grog@lemis.com for PGP public key.
See complete headers for address and phone numbers.
This message is digitally signed.  If your Microsoft mail program
reports problems, please read http://lemis.com/broken-MUA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 163 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-14 19:52   ` John P. Linderman
@ 2020-03-14 20:25     ` Steffen Nurpmeso
  0 siblings, 0 replies; 68+ messages in thread
From: Steffen Nurpmeso @ 2020-03-14 20:25 UTC (permalink / raw)
  To: John P. Linderman; +Cc: The Eunuchs Hysterical Society

John P. Linderman wrote in
<CAC0cEp-dL2iPikiGvaQ_s9_6AS=mFO4RvbT423fNJ3gQiLdthQ@mail.gmail.com>:
 |Here's a command I wrote long ago using a different way to deal with \
 |options:
 |
 |  isee
 |Usage: isee format file ...
 |    Display specified inode information for files passed as arguments.
 |    Items of the form ``%X'' in format will be replaced for these X:
 |dev inode ino mode nlink uid gid rdev size atime
 |mtime ctime now filename
 |    Parenthesized printf-style format specifications can follow a %
 |    to override the default format for the various items.
 |    %filename is the name of the current file argument.
 |    %now is the time (in seconds) when the command started running.
 |    The other items are from the stat structure.
 |
 |    Example: isee "%(40s)filename: %mtime %mode" /dev/null
 |    Show file modification time and mode of /dev/null
 |
 |inode is just a synonym for ino.
 |
 |Instead of a kazillion options, the %-stat-field items identify what \
 |you want to see and the printf-style formats identify how you want \
 |them shown. Someone in the Murray Hill library added strftime 
 |formats for date fields, a fine addition, in my view. Adding readable \
 |user and group names rather than numerical ids would be worth considering. \
 |Maybe having a "rwx"-style form for mode. Sorting can be 
 |done by piping the output through sort. Don't get hung up on shortcomings \
 |of the command, just consider how a few familiar concepts and pipes \
 |can be combined to provide a large number of options.

When i switched to FreeBSD around 2001, the handbook was on the
CDs i had, and i stumbled upon a very impressive assembler
example.  It is still there[1], at least in parts(?).  Coming from
C64, then DOS/4DOS and <2 years Linux, aka kid games,
grey-industry, MS and xeyes background, i read

  [1] https://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/x86-fpu.html

  Personally, I like to keep it simple. Something either is
  a number, so I process it. Or it is not a number, so I discard
  it. I do not like the computer complaining about me typing in an
  extra character when it is obvious that it is an extra
  character. Duh!

  Plus, it allows me to break up the monotony of computing and
  type in a query instead of just a number:

    What is the best pinhole diameter for the
          focal length of 150?

  There is no reason for the computer to spit out a number of complaints:

  Syntax error: What
  Syntax error: is
  Syntax error: the
  Syntax error: best

  Et cetera, et cetera, et cetera.

  Secondly, I like the # character to denote the start of
  a comment which extends to the end of the line. This does not
  take too much effort to code, and lets me treat input files for
  my software as executable scripts.

and it was like being warped from Chaplin's Modern Times to a rich
man's California style living!  And that in assembler!!

  % pinhole

  Computer,

  What size pinhole do I need for the focal length of 150?
  150	490	306	362	2930	12
  Hmmm... How about 160?
  160	506	316	362	3125	12
  Let's make it 155, please.
  155	498	311	362	3027	12
  Ah, let's try 157...
  157	501	313	362	3066	12
  156?
  156	500	312	362	3047	12
  That's it! Perfect! Thank you very much!
  ^D

Nonetheless: i never managed to create Hippie-proof programs in
real life.

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-14  4:35 ` Greg 'groggy' Lehey
@ 2020-03-14 19:52   ` John P. Linderman
  2020-03-14 20:25     ` Steffen Nurpmeso
  0 siblings, 1 reply; 68+ messages in thread
From: John P. Linderman @ 2020-03-14 19:52 UTC (permalink / raw)
  To: Greg 'groggy' Lehey; +Cc: The Eunuchs Hysterical Society

[-- Attachment #1: Type: text/plain, Size: 2974 bytes --]

Here's a command I wrote long ago using a different way to deal with
options:

  *isee*
Usage: isee format file ...
    Display specified inode information for files passed as arguments.
    Items of the form ``%X'' in format will be replaced for these X:
dev inode ino mode nlink uid gid rdev size atime
mtime ctime now filename
    Parenthesized printf-style format specifications can follow a %
    to override the default format for the various items.
    %filename is the name of the current file argument.
    %now is the time (in seconds) when the command started running.
    The other items are from the stat structure.

    Example: isee "%(40s)filename: %mtime %mode" /dev/null
    Show file modification time and mode of /dev/null

inode is just a synonym for ino.

Instead of a kazillion options, the %-stat-field items identify *what* you
want to see and the printf-style formats identify *how* you want them
shown. Someone in the Murray Hill library added strftime formats for date
fields, a fine addition, in my view. Adding readable user and group names
rather than numerical ids would be worth considering. *Maybe* having a
"rwx"-style form for mode. Sorting can be done by piping the output through
sort. Don't get hung up on shortcomings of the command, just consider how a
few familiar concepts and pipes can be combined to provide a large number
of options.

On Sat, Mar 14, 2020 at 12:35 AM Greg 'groggy' Lehey <grog@lemis.com> wrote:

> On Friday, 13 March 2020 at 21:45:21 +1100, Dave Horsfall wrote:
> > On Fri, 13 Mar 2020, Greg 'groggy' Lehey wrote:
> >
> >>> -h is a gnuism, isn't it?
> >>
> >> It might have originated there, but then I would expect it to be spelt
> >> '--produce-human-readable-output'.  I haven't been able to establish
> from the
> >> FreeBSD sources or commit logs when it was introduced.  It would
> clearly have
> >> been a reimplementation.
> >
> > It's in "df" as well, praise Cthulu:
> >
> >      aneurin# df -h
> >      Filesystem     Size    Used   Avail Capacity  Mounted on
> >      /dev/ad0s1a    496M    302M    154M    66%    /
> >      /dev/ad0s1d    2.9G    1.4G    1.2G    54%    /usr
> >      /dev/ad0s1e    989M    581M    329M    64%    /var
> ...
>
> It also has the , option:
>
>   === grog@eureka (/dev/pts/72) ~ 8 -> df -,
>   Filesystem  1048576-blocks      Used     Avail Capacity  Mounted on
>   /dev/ada0p4         39,662    21,918    14,571    60%    /
>   /dev/ada0p2         39,662    13,447    23,042    37%    /destdir
>   /dev/ada0p5      3,705,520 1,831,345 1,577,733    54%    /home
>   /dev/ada1p1      7,629,565 6,358,607 1,194,661    84%    /Photos
>
> I find it much easier to see the relative size like that.
>
> Greg
> --
> Sent from my desktop computer.
> Finger grog@lemis.com for PGP public key.
> See complete headers for address and phone numbers.
> This message is digitally signed.  If your Microsoft mail program
> reports problems, please read http://lemis.com/broken-MUA
>

[-- Attachment #2: Type: text/html, Size: 4297 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-13 10:45 Dave Horsfall
@ 2020-03-14  4:35 ` Greg 'groggy' Lehey
  2020-03-14 19:52   ` John P. Linderman
  0 siblings, 1 reply; 68+ messages in thread
From: Greg 'groggy' Lehey @ 2020-03-14  4:35 UTC (permalink / raw)
  To: Dave Horsfall; +Cc: The Eunuchs Hysterical Society

[-- Attachment #1: Type: text/plain, Size: 1439 bytes --]

On Friday, 13 March 2020 at 21:45:21 +1100, Dave Horsfall wrote:
> On Fri, 13 Mar 2020, Greg 'groggy' Lehey wrote:
>
>>> -h is a gnuism, isn't it?
>>
>> It might have originated there, but then I would expect it to be spelt
>> '--produce-human-readable-output'.  I haven't been able to establish from the
>> FreeBSD sources or commit logs when it was introduced.  It would clearly have
>> been a reimplementation.
>
> It's in "df" as well, praise Cthulu:
>
>      aneurin# df -h
>      Filesystem     Size    Used   Avail Capacity  Mounted on
>      /dev/ad0s1a    496M    302M    154M    66%    /
>      /dev/ad0s1d    2.9G    1.4G    1.2G    54%    /usr
>      /dev/ad0s1e    989M    581M    329M    64%    /var
...

It also has the , option:

  === grog@eureka (/dev/pts/72) ~ 8 -> df -,
  Filesystem  1048576-blocks      Used     Avail Capacity  Mounted on
  /dev/ada0p4         39,662    21,918    14,571    60%    /
  /dev/ada0p2         39,662    13,447    23,042    37%    /destdir
  /dev/ada0p5      3,705,520 1,831,345 1,577,733    54%    /home
  /dev/ada1p1      7,629,565 6,358,607 1,194,661    84%    /Photos

I find it much easier to see the relative size like that.

Greg
--
Sent from my desktop computer.
Finger grog@lemis.com for PGP public key.
See complete headers for address and phone numbers.
This message is digitally signed.  If your Microsoft mail program
reports problems, please read http://lemis.com/broken-MUA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 163 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
@ 2020-03-13 10:45 Dave Horsfall
  2020-03-14  4:35 ` Greg 'groggy' Lehey
  0 siblings, 1 reply; 68+ messages in thread
From: Dave Horsfall @ 2020-03-13 10:45 UTC (permalink / raw)
  To: The Eunuchs Hysterical Society

Meant for the list (and don't get me started on Reply All)...

-- Dave

---------- Forwarded message ----------
Date: Fri, 13 Mar 2020 21:43:51 +1100 (EST)
From: Dave Horsfall <dave@horsfall.org>
To: Greg 'groggy' Lehey <grog@lemis.com>
Subject: Re: [TUHS] Command line options and complexity

On Fri, 13 Mar 2020, Greg 'groggy' Lehey wrote:

>> -h is a gnuism, isn't it?
> 
> It might have originated there, but then I would expect it to be spelt 
> '--produce-human-readable-output'.  I haven't been able to establish from the 
> FreeBSD sources or commit logs when it was introduced.  It would clearly have 
> been a reimplementation.

It's in "df" as well, praise Cthulu:

     aneurin# df -h
     Filesystem     Size    Used   Avail Capacity  Mounted on
     /dev/ad0s1a    496M    302M    154M    66%    /
     devfs          1.0K    1.0K      0B   100%    /dev
     tmpfs          1000    272K    999M     0%    /tmp
     /dev/ad0s1d    2.9G    1.4G    1.2G    54%    /usr
     /dev/ad0s1e    989M    581M    329M    64%    /var
     /dev/ad0s1f    3.9G    2.2G    1.4G    62%    /home
     /dev/ad0s1g    8.9G    8.0G    127M    98%    /usr/local
     fdescfs        1.0K    1.0K      0B   100%    /dev/fd
     procfs         4.0K    4.0K      0B   100%    /proc

(Memo to self: see where all the room has gone in /usr/local, as that's where I 
assigned the leftover space after the other partitions.)

No, I've never liked stuffing everything under the root file system as both the 
Mac and Penguin do; fill the root file system and you're hosed (and I also have 
an itch about /tmp being there as it's a world-writable directory).

>> https://pubs.opengroup.org/onlinepubs/9699919799/utilities/ls.html does
>> specify the -S switch.  That's POSIX, isn't it?
> 
> So it is!  This was the first option that I wanted to add, back when I still 
> had practice wheels.  I asked my mentor, and he said "not the Unix way", so I 
> let it be.  Then Wes Peters came up with the idea, and I thought he committed 
> it, but it seems that it ultimately came from Kostas Blekos in 2005, based on 
> the same feature on NetBSD and OpenBSD. I wonder when it made it to POSIX.

Years ago I wrote a simple script "lss" which did the sort after being
howled down on one of the FreeBSD lists; what a surprise to see "-S"...

Heck, back in my UNSW days I suggested extending stty() to cover non-TTY 
devices and got trashed by the AGSM/ElecEng mob; well well, look at ioctl() 
when it appeared.

-- Dave

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-12 12:57                             ` John P. Linderman
@ 2020-03-12 19:24                               ` Steffen Nurpmeso
  0 siblings, 0 replies; 68+ messages in thread
From: Steffen Nurpmeso @ 2020-03-12 19:24 UTC (permalink / raw)
  To: John P. Linderman; +Cc: The Unix Heritage Society

John P. Linderman wrote in
<CAC0cEp_fQsq6-EaG-nhvXTvZij+PSab5PNTEx7WhNjYwnFVnaw@mail.gmail.com>:
 |My error. I was looking at getopt(1) rather than getopt(3). Of course \
 |optind is documented, it's the way to find non-flag arguments.

 |I don't know why the Hancock authors chose to make rsort into a subroutine \
 |rather than just piping into the command. Perhaps something to do with \
 |the software release process?

I really like a lot of such old code, and reading it.  One can
only learn from it.  Even though i discovered all this in
(Free)BSD land, after coming over from Linux, I loved reading
those "old-hand" comment blocks, it was inspiration and kindled
something here.  For the few pieces of code that i am prowd of aka
that i thought were worth it i followed their example.  This
rsort.c is however more verbose and spiritful than anything i ever
wrote.  I keep it in my box of precious things.

getopt(3) on the other hand is portable but terrible.  Just on the
10th i resorted a small SCSI MMC-3 cdda access tool (~50 KB
C source are necessary for that in 2020, missing Solaris and
MacOS, but including CD-TEXT and all that!!) to it because people
are used to option and/or argument joining etc, but it lost long
option support.

Not worth commenting a lot, but here is an option parser of 6359
bytes when development verification code and dump_doc() are not
counted, but is uses a carrier struct, supports long options, and
documentation strings as part of long option strings (one .RODATA
entry).  FreeBSD's standard compatible and thus naked
lib/libc/stdlib/getopt.c is 4312 bytes.  And GNU's getopt_long is
huge and even permutates arguments.

At least getopt(3) is predictable once a user gets it.  Things are
different for sed(1)s -i and some sccs commands i have forgotten.
I think it has even be tried to standardize optional arguments in
that respect, but i would argue this is not a good direction to
go, consider for example "sed -ie".  Isn't this asking for
troubles without accompanying comments.

 *   static char const a_sopts[] = "A:h#";
 *   static char const * const a_lopts[] = {
 *      "account:;A;" N_("execute an `account' command"),
..
 *      "long-help;\201;" N_("this listing"),
 *      NIL
 *   };
..
 *   struct su_avopt avo;
..
 *   su_avopt_setup(&avo, --argc, C(char const*const*,++argv),
 *      a_sopts, a_lopts);
 *   while((i = su_avopt_parse(&avo)) != su_AVOPT_STATE_DONE){
 *      switch(i){
 *      case 'A':
 *         "account_name" = avo.avo_current_arg;
 *         break;
 *      case 'h':
 *      case S(char,S(u8,'\201')):
 *         a_main_usage(n_stdout);
 *         if(i != 'h'){
 *            fprintf(n_stdout, "\nLong options:\n");
 *            su_avopt_dump_doc(&avo, &a_main_dump_doc, S(up,n_stdout));
 *         }
 *         exit(0);
..
 *   argc = avo.avo_argc;
 *   argv = C(char**,avo.avo_argv);

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-11 21:29                           ` Warner Losh
  2020-03-12  0:13                             ` John P. Linderman
@ 2020-03-12 12:57                             ` John P. Linderman
  2020-03-12 19:24                               ` Steffen Nurpmeso
  1 sibling, 1 reply; 68+ messages in thread
From: John P. Linderman @ 2020-03-12 12:57 UTC (permalink / raw)
  To: Warner Losh; +Cc: The Unix Heritage Society

[-- Attachment #1: Type: text/plain, Size: 1791 bytes --]

My error. I was looking at getopt(1) rather than getopt(3). Of course
optind is documented, it's the way to find non-flag arguments.

I don't know why the Hancock authors chose to make rsort into a subroutine
rather than just piping into the command. Perhaps something to do with the
software release process?

On Wed, Mar 11, 2020 at 5:29 PM Warner Losh <imp@bsdimp.com> wrote:

>
>
> On Wed, Mar 11, 2020 at 11:43 AM John P. Linderman <jpl.jpl@gmail.com>
> wrote:
>
>> This is *great*, Kurt. The source in src/runtime/hrs/src for rsort.c is
>> their version of my external sort, modified to be a subroutine. There's
>> some lessons to be learned about "software hygiene". I was cavalier about
>> freeing what I allocated dynamically. As a result, their version leaks like
>> a sieve if the subroutine is called repeatedly. Apropos of which, they came
>> to me having noted that only the first call was acting as expected. There's
>> a wonderful irony (I'm big on irony). I had replaced my do-it-yourself
>> argument processing with getopt. The code has the following comment
>>
>> ** Use getopt() for portability.
>>
>> A few lines later, you see
>>
>>     optind = 1;  /* reset after use in Hancock program *
>>     while ((c = getopt(argc, argv, "cCiIjmrsSuvb:f:D:o:p:T:x:y:z:")) !=
>> EOF) {
>>
>> optind??? Seems getopt has an undocumented global flag to prevent
>> reprocessing the arguments. How portable:-)
>>
>
> It's documented:
>
>      The variables opterr and optind are both initialized to 1.  The optind
>      variable may be set to another value before a set of calls to
> getopt() in
>      order to skip over more or less argv entries.
>
> is what the FreeBSD man page has to say about it. So this just resets any
> scanning that had happened before this...
>
> Warner
>

[-- Attachment #2: Type: text/html, Size: 3143 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-12  0:13                             ` John P. Linderman
@ 2020-03-12  0:34                               ` Chet Ramey
  0 siblings, 0 replies; 68+ messages in thread
From: Chet Ramey @ 2020-03-12  0:34 UTC (permalink / raw)
  To: John P. Linderman, Warner Losh; +Cc: The Unix Heritage Society

On 3/11/20 8:13 PM, John P. Linderman wrote:
> I wasn't running FreeBSD. Linux has nothing to say about it. The wonderful
> thing about standards is that there are so many to choose from.

Did somebody mention ... standards?

https://pubs.opengroup.org/onlinepubs/9699919799/functions/getopt.html#tag_16_206

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
		 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet@case.edu    http://tiswww.cwru.edu/~chet/

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-11 21:29                           ` Warner Losh
@ 2020-03-12  0:13                             ` John P. Linderman
  2020-03-12  0:34                               ` Chet Ramey
  2020-03-12 12:57                             ` John P. Linderman
  1 sibling, 1 reply; 68+ messages in thread
From: John P. Linderman @ 2020-03-12  0:13 UTC (permalink / raw)
  To: Warner Losh; +Cc: The Unix Heritage Society

[-- Attachment #1: Type: text/plain, Size: 1620 bytes --]

I wasn't running FreeBSD. Linux has nothing to say about it. The wonderful
thing about standards is that there are so many to choose from.

On Wed, Mar 11, 2020 at 5:29 PM Warner Losh <imp@bsdimp.com> wrote:

>
>
> On Wed, Mar 11, 2020 at 11:43 AM John P. Linderman <jpl.jpl@gmail.com>
> wrote:
>
>> This is *great*, Kurt. The source in src/runtime/hrs/src for rsort.c is
>> their version of my external sort, modified to be a subroutine. There's
>> some lessons to be learned about "software hygiene". I was cavalier about
>> freeing what I allocated dynamically. As a result, their version leaks like
>> a sieve if the subroutine is called repeatedly. Apropos of which, they came
>> to me having noted that only the first call was acting as expected. There's
>> a wonderful irony (I'm big on irony). I had replaced my do-it-yourself
>> argument processing with getopt. The code has the following comment
>>
>> ** Use getopt() for portability.
>>
>> A few lines later, you see
>>
>>     optind = 1;  /* reset after use in Hancock program *
>>     while ((c = getopt(argc, argv, "cCiIjmrsSuvb:f:D:o:p:T:x:y:z:")) !=
>> EOF) {
>>
>> optind??? Seems getopt has an undocumented global flag to prevent
>> reprocessing the arguments. How portable:-)
>>
>
> It's documented:
>
>      The variables opterr and optind are both initialized to 1.  The optind
>      variable may be set to another value before a set of calls to
> getopt() in
>      order to skip over more or less argv entries.
>
> is what the FreeBSD man page has to say about it. So this just resets any
> scanning that had happened before this...
>
> Warner
>

[-- Attachment #2: Type: text/html, Size: 2852 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-11 17:41                         ` John P. Linderman
@ 2020-03-11 21:29                           ` Warner Losh
  2020-03-12  0:13                             ` John P. Linderman
  2020-03-12 12:57                             ` John P. Linderman
  0 siblings, 2 replies; 68+ messages in thread
From: Warner Losh @ 2020-03-11 21:29 UTC (permalink / raw)
  To: John P. Linderman; +Cc: The Unix Heritage Society

[-- Attachment #1: Type: text/plain, Size: 1360 bytes --]

On Wed, Mar 11, 2020 at 11:43 AM John P. Linderman <jpl.jpl@gmail.com>
wrote:

> This is *great*, Kurt. The source in src/runtime/hrs/src for rsort.c is
> their version of my external sort, modified to be a subroutine. There's
> some lessons to be learned about "software hygiene". I was cavalier about
> freeing what I allocated dynamically. As a result, their version leaks like
> a sieve if the subroutine is called repeatedly. Apropos of which, they came
> to me having noted that only the first call was acting as expected. There's
> a wonderful irony (I'm big on irony). I had replaced my do-it-yourself
> argument processing with getopt. The code has the following comment
>
> ** Use getopt() for portability.
>
> A few lines later, you see
>
>     optind = 1;  /* reset after use in Hancock program *
>     while ((c = getopt(argc, argv, "cCiIjmrsSuvb:f:D:o:p:T:x:y:z:")) !=
> EOF) {
>
> optind??? Seems getopt has an undocumented global flag to prevent
> reprocessing the arguments. How portable:-)
>

It's documented:

     The variables opterr and optind are both initialized to 1.  The optind
     variable may be set to another value before a set of calls to getopt()
in
     order to skip over more or less argv entries.

is what the FreeBSD man page has to say about it. So this just resets any
scanning that had happened before this...

Warner

[-- Attachment #2: Type: text/html, Size: 2292 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-09 21:22                       ` Kurt H Maier
@ 2020-03-11 17:41                         ` John P. Linderman
  2020-03-11 21:29                           ` Warner Losh
  0 siblings, 1 reply; 68+ messages in thread
From: John P. Linderman @ 2020-03-11 17:41 UTC (permalink / raw)
  To: John P. Linderman, Tyler Adams, The Unix Heritage Society

[-- Attachment #1: Type: text/plain, Size: 3399 bytes --]

This is *great*, Kurt. The source in src/runtime/hrs/src for rsort.c is
their version of my external sort, modified to be a subroutine. There's
some lessons to be learned about "software hygiene". I was cavalier about
freeing what I allocated dynamically. As a result, their version leaks like
a sieve if the subroutine is called repeatedly. Apropos of which, they came
to me having noted that only the first call was acting as expected. There's
a wonderful irony (I'm big on irony). I had replaced my do-it-yourself
argument processing with getopt. The code has the following comment

** Use getopt() for portability.

A few lines later, you see

    optind = 1;  /* reset after use in Hancock program *
    while ((c = getopt(argc, argv, "cCiIjmrsSuvb:f:D:o:p:T:x:y:z:")) !=
EOF) {

optind??? Seems getopt has an undocumented global flag to prevent
reprocessing the arguments. How portable:-)

Anyway, it should be possible to turn rsort.c back into standalone code.
I'd be the obvious person to do it, but that would probably be a violation
of some agreement with AT&T. However, if somebody else wants to take on the
task (it would make a great summer intern project), I'd be happy to share
ideas I have had since retiring that would improve the code.

fc.c in the same directory is a library-ized version of a fixcut command I
wrote as a fixed-length counterpart to the cut command, for fixed-length
inputs (like native floats and integers, which can be tweaked to sort
lexicographically). Unlike rsort, I practiced good hygiene and kept track
of all allocated space so it could be freed. Too bad they didn't include
the man pages for rsort and fixcut. They'd make it easier to understand
them. Jon Bentley observed that "comments are love letters to your future
self", and I feel a lot of love from the heavily commented rsort code.

This probably should move to coff, it's not really about UNIX history
(although rsort has vestigial traces of ancient days, like the code to
write checkpoint files after each output temp is closed... sorting a
million bytes once took hours, with slow processors and disks. It was
painful to have to start from scratch if an overnight sort got interrupted.
Now sorting a billion bytes is pretty quick, and the checkpoint stuff never
gets used. It's one of the things that could profitably disappear.)

On Mon, Mar 9, 2020 at 5:22 PM Kurt H Maier <khm@sciops.net> wrote:

> On Mon, Mar 09, 2020 at 05:06:20PM -0400, John P. Linderman wrote:
> > but the page is gone. It probably didn't help that Wired titled the
> article
> >
> > *AT&T Invents Programming Language for Mass Surveillance*
> >
> > That's horse-pucky, akin to "Pitchfork makers invent device for spearing
> > babies". I'm trying to track down a copy that was released publicly. I'm
> > not hopeful.
>
> There is a copy here:  https://github.com/mqudsi/hancock
>
> Not sure what other conclusion Wired was supposed to come to, given that
> the provided "Hello World" programs in the paper were all mass
> surveillance examples (tracking international calls to given numbers,
> tracking data streams to given IP addresses, and tracking specific
> connections to a given ISP).
>
> The license in the linked repository is different than the old
> password-gated NSL that was applied on the research.att.com pages.  I
> wonder how many licenses this code was released with, over the years.
>
>
> khm
>

[-- Attachment #2: Type: text/html, Size: 5106 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-10 18:42 Doug McIlroy
@ 2020-03-10 19:38 ` Dan Cross
  0 siblings, 0 replies; 68+ messages in thread
From: Dan Cross @ 2020-03-10 19:38 UTC (permalink / raw)
  To: Doug McIlroy; +Cc: The Eunuchs Hysterical Society

[-- Attachment #1: Type: text/plain, Size: 2210 bytes --]

On Tue, Mar 10, 2020 at 2:43 PM Doug McIlroy <doug@cs.dartmouth.edu> wrote:

> > This begs questions of stability
>
> Astute question. I had that in my original draft, but eliminited
> it for what I thought was clarity. Anyway, depending on implementation
> of sort, you may need sort -s. Of course it doesn't matter which copy
> among several equal lines uniq produces, nor does it matter in sort
> when there are no comparison options--they're all the same.
>

Thanks. That's interesting.

Did `sort -s` come later? The idea that you preferred clarity over
stability for `sort -u` would indicate so, otherwise one might imagine that
`-u` would just imply `-s` and that would be that.

> I don't know enough about the
> > internals of sed to know even what algorithm it uses
> > (... a disk-based merge sort?)
>
> sed is not a sorting program--basically it copies input to
> output, making line-by-line editing changes. That's the
> way I meant to use it in sed s/nonkeys//|sort -keys|uniq.
> (I have added options to sort, hopefully for clarity).
> The argument to sed here means substitute the empty
> string for the nonkey fields (specified by a regular expression).
>

`sed` in my email was a typo, as you speculated below.

Interestingly, this `sed` construction prior to `sort` loses information,
which perhaps doesn't matter in any given specific case, but is
insufficient in general, which I gathered to be the entire reason you
implemented `sort -u`.

If "sed" was a typo for "sort",


It was.

all versions of sort that
> I know of use an internal sorting algorithm for big chunks
> of the file, then combines the chunks by merge. But internal
> sorting varies all over the map--variations on quicksort,
> radix sort, merge sort, ...
>

It's the details of the internal sorts that are most interesting in some
sense, as the merges are probably fairly straight forward but the internal
sorts will affect stability and have other interesting characteristics.

As an aside, one must imagine that, in this day and age, a "big chunk" is
probably big enough to hold the vast majority of files entirely in RAM, and
only exceptionally large files actually require merging multiple blocks.

        - Dan C.

[-- Attachment #2: Type: text/html, Size: 3279 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
@ 2020-03-10 18:42 Doug McIlroy
  2020-03-10 19:38 ` Dan Cross
  0 siblings, 1 reply; 68+ messages in thread
From: Doug McIlroy @ 2020-03-10 18:42 UTC (permalink / raw)
  To: tuhs

> This begs questions of stability

Astute question. I had that in my original draft, but eliminited
it for what I thought was clarity. Anyway, depending on implementation
of sort, you may need sort -s. Of course it doesn't matter which copy
among several equal lines uniq produces, nor does it matter in sort 
when there are no comparison options--they're all the same.

> I don't know enough about the
> internals of sed to know even what algorithm it uses 
> (... a disk-based merge sort?)

sed is not a sorting program--basically it copies input to     
output, making line-by-line editing changes. That's the       
way I meant to use it in sed s/nonkeys//|sort -keys|uniq.
(I have added options to sort, hopefully for clarity).
The argument to sed here means substitute the empty
string for the nonkey fields (specified by a regular expression).


If "sed" was a typo for "sort", all versions of sort that
I know of use an internal sorting algorithm for big chunks
of the file, then combines the chunks by merge. But internal
sorting varies all over the map--variations on quicksort,
radix sort, merge sort, ...

Doug

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-10 17:44   ` Bakul Shah
@ 2020-03-10 18:09     ` Dan Cross
  0 siblings, 0 replies; 68+ messages in thread
From: Dan Cross @ 2020-03-10 18:09 UTC (permalink / raw)
  To: Bakul Shah; +Cc: The Eunuchs Hysterical Society, Doug McIlroy

[-- Attachment #1: Type: text/plain, Size: 797 bytes --]

On Tue, Mar 10, 2020 at 1:44 PM Bakul Shah <bakul@bitblocks.com> wrote:

> On Tue, 10 Mar 2020 13:38:23 -0400 Dan Cross <crossd@gmail.com> wrote:
> >
> > This begs questions of stability: in the event of non-unique keys and
> > non-key fields in the sortable data, which "records" (lines) are kept and
> > which are discarded? Surely the "first" is kept and subsequent entries
> with
> > the same key suppressed, but I confess I don't know enough about the
> > internals of sed to know even what algorithm it uses (I assume a
> disk-based
> > merge sort?), but I would imagine these details have changed over time.
>
> FreeBSD manpage for sort says that -u implies a stable sort,
> similar to -s.
>

Thanks; that makes sense. I'm still interested in historical data, though.
:-)

        - Dan C.

[-- Attachment #2: Type: text/html, Size: 1255 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-10 17:38 ` Dan Cross
@ 2020-03-10 17:44   ` Bakul Shah
  2020-03-10 18:09     ` Dan Cross
  0 siblings, 1 reply; 68+ messages in thread
From: Bakul Shah @ 2020-03-10 17:44 UTC (permalink / raw)
  To: Dan Cross; +Cc: The Eunuchs Hysterical Society, Doug McIlroy

On Tue, 10 Mar 2020 13:38:23 -0400 Dan Cross <crossd@gmail.com> wrote:
>
> This begs questions of stability: in the event of non-unique keys and
> non-key fields in the sortable data, which "records" (lines) are kept and
> which are discarded? Surely the "first" is kept and subsequent entries with
> the same key suppressed, but I confess I don't know enough about the
> internals of sed to know even what algorithm it uses (I assume a disk-based
> merge sort?), but I would imagine these details have changed over time.

FreeBSD manpage for sort says that -u implies a stable sort,
similar to -s.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-10 16:15 Doug McIlroy
@ 2020-03-10 17:38 ` Dan Cross
  2020-03-10 17:44   ` Bakul Shah
  0 siblings, 1 reply; 68+ messages in thread
From: Dan Cross @ 2020-03-10 17:38 UTC (permalink / raw)
  To: Doug McIlroy; +Cc: The Eunuchs Hysterical Society

[-- Attachment #1: Type: text/plain, Size: 1381 bytes --]

On Tue, Mar 10, 2020 at 12:16 PM Doug McIlroy <doug@cs.dartmouth.edu> wrote:

> > The idea of a simple rule is great, but the suggested rule fails on sort
> -u
> > which afaik came after sort | uniq for performance reasons.
>
> As the guilty party for most of sort's comparison options, I can
> attest that efficiency was not an objective of -u. It was invented
> precisely because uniq had proved useful, but not when one was
> interested in uniqueness only of some key aspect of the data.
>
> -u differs from uniq in that -u selects samples based on
> equality of keys, not equality of lines. In the default
> case of whole-line keys, sort -u of course does exactly
> what sort|uniq does.
>
> For many applications of -u with keys, the non-key fields
> are not of interest. Then sed s/nonkeys//|sort|uniq may
> suffice. But sed did not exist when -u was invented.
> And not all sort key specs are easily imitated in sed.
>

This begs questions of stability: in the event of non-unique keys and
non-key fields in the sortable data, which "records" (lines) are kept and
which are discarded? Surely the "first" is kept and subsequent entries with
the same key suppressed, but I confess I don't know enough about the
internals of sed to know even what algorithm it uses (I assume a disk-based
merge sort?), but I would imagine these details have changed over time.

        - Dan C.

[-- Attachment #2: Type: text/html, Size: 1790 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
@ 2020-03-10 16:15 Doug McIlroy
  2020-03-10 17:38 ` Dan Cross
  0 siblings, 1 reply; 68+ messages in thread
From: Doug McIlroy @ 2020-03-10 16:15 UTC (permalink / raw)
  To: tuhs

> The idea of a simple rule is great, but the suggested rule fails on sort -u
> which afaik came after sort | uniq for performance reasons.

As the guilty party for most of sort's comparison options, I can
attest that efficiency was not an objective of -u. It was invented
precisely because uniq had proved useful, but not when one was
interested in uniqueness only of some key aspect of the data.

-u differs from uniq in that -u selects samples based on
equality of keys, not equality of lines. In the default
case of whole-line keys, sort -u of course does exactly
what sort|uniq does.

For many applications of -u with keys, the non-key fields
are not of interest. Then sed s/nonkeys//|sort|uniq may
suffice. But sed did not exist when -u was invented.
And not all sort key specs are easily imitated in sed.

Doug

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-09 21:06                     ` John P. Linderman
@ 2020-03-09 21:22                       ` Kurt H Maier
  2020-03-11 17:41                         ` John P. Linderman
  0 siblings, 1 reply; 68+ messages in thread
From: Kurt H Maier @ 2020-03-09 21:22 UTC (permalink / raw)
  To: John P. Linderman; +Cc: The Unix Heritage Society

On Mon, Mar 09, 2020 at 05:06:20PM -0400, John P. Linderman wrote:
> but the page is gone. It probably didn't help that Wired titled the article
> 
> *AT&T Invents Programming Language for Mass Surveillance*
> 
> That's horse-pucky, akin to "Pitchfork makers invent device for spearing
> babies". I'm trying to track down a copy that was released publicly. I'm
> not hopeful.

There is a copy here:  https://github.com/mqudsi/hancock

Not sure what other conclusion Wired was supposed to come to, given that
the provided "Hello World" programs in the paper were all mass
surveillance examples (tracking international calls to given numbers,
tracking data streams to given IP addresses, and tracking specific
connections to a given ISP).

The license in the linked repository is different than the old
password-gated NSL that was applied on the research.att.com pages.  I
wonder how many licenses this code was released with, over the years.


khm

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
       [not found]                   ` <CAEuQd1D7+dfap98AwPo2W41+06prrcVaAWk3Ve-ve0uQ0xBu3Q@mail.gmail.com>
@ 2020-03-09 21:06                     ` John P. Linderman
  2020-03-09 21:22                       ` Kurt H Maier
  0 siblings, 1 reply; 68+ messages in thread
From: John P. Linderman @ 2020-03-09 21:06 UTC (permalink / raw)
  To: Tyler Adams; +Cc: The Unix Heritage Society

[-- Attachment #1: Type: text/plain, Size: 3659 bytes --]

Nothing I'm aware of. I didn't mind throwing "tac" over the wall, because
it was trivial, probably a couple hours work for me, under a minute for
Ken. But the rsort source is not at all trivial, and still of potential
value to AT&T.

The source managed to get out as part of the "Hancock" project. I found a
link in

https://www.wired.com/2007/10/att-invents-pro/

but the page is gone. It probably didn't help that Wired titled the article

*AT&T Invents Programming Language for Mass Surveillance*

That's horse-pucky, akin to "Pitchfork makers invent device for spearing
babies". I'm trying to track down a copy that was released publicly. I'm
not hopeful.

On Mon, Mar 9, 2020 at 11:28 AM Tyler Adams <coppero1237@gmail.com> wrote:

> Woah, this sounds really useful, is there anything like it today?
>
> On Sun, Mar 8, 2020, 16:32 John P. Linderman <jpl.jpl@gmail.com> wrote:
>
>> In the "UNIX SYSTEM" issue of the BSTJ back in October of 1984, I
>> suggested that it might be better, both for functionality *and*
>> performance, to have a sort that only worked on records with a *single*
>> key to be sorted *lexicographically*, and put all the complexity of
>> dealing with native integers, dates, case-mapping, etc into a key-building
>> front end. I wrote such a sort built around a radix sort. The sort
>> itself sported very few options relating to record format (fixed-length,
>> newline terminated, and header-based, where an ascii header identified
>> record length, and, optionally, key position and key length), where to find
>> the key in fixed-length and newline terminated records, merge-only, check
>> sort order only, unique, strip off the sort key (to avoid the need for a
>> post-process in many cases). Key-building was usually near-trivial using
>> awk or perl or a few commands for tweaking native integer and floating
>> point values so they would sort lexicographically. The sort was stable and
>> blazingly fast. Some summer students once complained to me that I was
>> messing up a paper they were writing because my external sort was faster
>> than an internal qsort... the kind of complaint that warms one's heart. At
>> the back of my mind was a generic key-building library that would
>> accommodate (decimal) numbers of arbitrary length, with or without "E"
>> exponents, dates in various formats, string collation for Unicode, etc. It
>> remains at the back of my mind.
>>
>> On Sun, Mar 8, 2020 at 5:32 AM Tyler Adams <coppero1237@gmail.com> wrote:
>>
>>> The idea of a simple rule is great, but the suggested rule fails on sort
>>> -u which afaik came after sort | uniq for performance reasons.
>>>
>>> Another idea on the same vein is that a flag should be added only when
>>> the job can be done inside the program and not with stdin/stdout (or no
>>> flag can be added if one can reproduce the same behavior using pipelines).
>>>
>>> So, you need sort -u because only within sort can you get the
>>> performance needed to get the job done.
>>>
>>> But you don't need -h in ls -lh. All the information to render a human
>>> readable number is present on stdout of ls -l. You could easily have a
>>> filter which renders numbers with options like adding commas, dots,
>>> scientific notation, precision, money, units, etc.
>>>
>>> Tyler
>>>
>>> On Sun, Mar 8, 2020, 07:33 Jon Steinhart <jon@fourwinds.com> wrote:
>>>
>>>> After following this discussion, I guess that I have a simplistic way to
>>>> determine whether something should be a dash option or a filter.  In
>>>> general, I'd make a filter if whatever it was doing was applicable to
>>>> more than one command, a dash option otherwise.
>>>>
>>>> Jon
>>>>
>>>

[-- Attachment #2: Type: text/html, Size: 5838 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-08  5:26           ` Greg 'groggy' Lehey
  2020-03-08  5:32             ` Jon Steinhart
@ 2020-03-08  9:51             ` Michael Kjörling
  1 sibling, 0 replies; 68+ messages in thread
From: Michael Kjörling @ 2020-03-08  9:51 UTC (permalink / raw)
  To: tuhs

On 8 Mar 2020 16:26 +1100, from grog@lemis.com (Greg 'groggy' Lehey):
> FAT timestamps have a granularity of 1 second,

Not quite.

Last modified time is recorded to within two seconds (FAT squeezes the
seconds into a 5-bit field, which allows packing a time into two bytes).

Other times are recorded with different granularity, sometimes
depending on the OS/version used to make the change to the file
system.

And of course FAT has no concept of time zones; everything is local
time, all the time.

https://en.wikipedia.org/wiki/Design_of_the_FAT_file_system#Directory_entry
has some of the gory details.

-- 
Michael Kjörling • https://michael.kjorling.se • michael@kjorling.se
 “Remember when, on the Internet, nobody cared that you were a dog?”


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-08  5:32             ` Jon Steinhart
@ 2020-03-08  9:30               ` Tyler Adams
       [not found]                 ` <CAC0cEp8eFRkkLTw88WVaKZoKy+qsrhuC8LkzmmsbqtdZgMf8eQ@mail.gmail.com>
  0 siblings, 1 reply; 68+ messages in thread
From: Tyler Adams @ 2020-03-08  9:30 UTC (permalink / raw)
  To: Jon Steinhart; +Cc: The Eunuchs Hysterical Society

[-- Attachment #1: Type: text/plain, Size: 1075 bytes --]

The idea of a simple rule is great, but the suggested rule fails on sort -u
which afaik came after sort | uniq for performance reasons.

Another idea on the same vein is that a flag should be added only when the
job can be done inside the program and not with stdin/stdout (or no flag
can be added if one can reproduce the same behavior using pipelines).

So, you need sort -u because only within sort can you get the performance
needed to get the job done.

But you don't need -h in ls -lh. All the information to render a human
readable number is present on stdout of ls -l. You could easily have a
filter which renders numbers with options like adding commas, dots,
scientific notation, precision, money, units, etc.

Tyler

On Sun, Mar 8, 2020, 07:33 Jon Steinhart <jon@fourwinds.com> wrote:

> After following this discussion, I guess that I have a simplistic way to
> determine whether something should be a dash option or a filter.  In
> general, I'd make a filter if whatever it was doing was applicable to
> more than one command, a dash option otherwise.
>
> Jon
>

[-- Attachment #2: Type: text/html, Size: 1592 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-08  5:26           ` Greg 'groggy' Lehey
@ 2020-03-08  5:32             ` Jon Steinhart
  2020-03-08  9:30               ` Tyler Adams
  2020-03-08  9:51             ` Michael Kjörling
  1 sibling, 1 reply; 68+ messages in thread
From: Jon Steinhart @ 2020-03-08  5:32 UTC (permalink / raw)
  To: The Eunuchs Hysterical Society

After following this discussion, I guess that I have a simplistic way to
determine whether something should be a dash option or a filter.  In
general, I'd make a filter if whatever it was doing was applicable to
more than one command, a dash option otherwise.

Jon

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-05 21:56         ` Warner Losh
@ 2020-03-08  5:26           ` Greg 'groggy' Lehey
  2020-03-08  5:32             ` Jon Steinhart
  2020-03-08  9:51             ` Michael Kjörling
  0 siblings, 2 replies; 68+ messages in thread
From: Greg 'groggy' Lehey @ 2020-03-08  5:26 UTC (permalink / raw)
  To: Warner Losh; +Cc: The Eunuchs Hysterical Society

[-- Attachment #1: Type: text/plain, Size: 5818 bytes --]

On Thursday,  5 March 2020 at 14:56:58 -0700, Warner Losh wrote:
> On Thu, Mar 5, 2020 at 2:51 PM Dave Horsfall <dave@horsfall.org> wrote:
>> On Wed, 4 Mar 2020, Ken Thompson via TUHS wrote:
>>
>>> do i get a prize:
>>> ls -tj
>>> /bin/ls: illegal option -- j
>>> usage: ls [-ABCFGHLOPRSTUWabcdefghiklmnopqrstuwx1] [file ...]
>>
>> Another candidate for option-cleansing...  Interesting; I get different
>> options with the Mac and FreeBSD:
>>
>> Mac:
>>
>>      usage: ls [-ABCFGHLOPRSTUWabcdefghiklmnopqrstuwx1] [file ...]
>>
>> FreeBSD:
>>
>>      usage: ls [-ABCFGHILPRSTUWZabcdfghiklmnopqrstuwxy1,] [-D format]
>> [file ...]
>>
>> So FreeBSD has added up "y,D:" (in getopt(3)-speak); my eyes are burning...
>
> FreeBSD wouldn't need -, if there were a good filter to add , to large
> numbers...  Some of the proliferation of options has been due to a lack of
> proper building-blocks....

I wasn't going to join this discussion, but as the perpetrator of all
three of the options that Dave complains about, I think it's worth
explaining the rationale.

First: yes, filters are good.  They make for an extraordinarily
flexible system.  And many options are just bloat.

But on the other hand, let's follow on with your example and assume a
clever filter, say commafy, which would insert commas as needed in its
input:

  $ ls -l | commafy 5

You really need the 5 (column number), because you can't rely on all
large numeric values to require commas.  Consider:

  $ ls -l 939585975893478543543
  -rw-r--r--  2 grog  home  1719298048  8 Mar 14:14 939585975893478543543

The alternative would be to have the column number explicitly stated
in the filter, but that would make the filter more specific to ls.

But do you really want to add that much input when typing
interactively into a shell?  How much easier it is just to write:

  $ ls -l, 939585975893478543543
  -rw-r--r--  2 grog  home  1,719,298,048  8 Mar 14:14 939585975893478543543

And then there are things that a filter can't easily do, the
rationales for -y and -D format.  -y is really a workaround for a bug
in the POSIX specification for ls(1).  From
https://pubs.opengroup.org/onlinepubs/009695399/utilities/ls.html:

 -t
    Sort with the primary key being time modified (most recently
    modified first) and the secondary key being filename in the
    collating sequence.

It's not immediately obvious, but these two keys sort in the opposite
order.  The file name is sorted alphabetically, but the modification
time is the other way round (*reverse* chronological).  This problem
bites you, for example, when you list files from two different cameras
that can take more than one image with the same time stamp.  FAT
timestamps have a granularity of 1 second, so they all end up with
exactly the same time stamp.

From a diary entry for 24 January 2009
(http://www.lemis.com/grog/diary-jan2009.php?subtitle=%E2%80%9CNot%20a%20bug,%20a%20feature%E2%80%9D:%20episode%204714&article=lsorder#lsorder):

  === grog@dereel (/dev/ttyp2) ~/Photos/20061223/orig 63 -> ls  -lTrt
  -rwxrwxrwx  1 grog  home  2478324 Dec 23 15:35:08 2006 DSCN1325.JPG
  -rwxr-xr-x  1 grog  home  1628592 Dec 23 17:11:00 2006 img_5504.jpg
  -rwxr-xr-x  1 grog  home  1621982 Dec 23 17:11:00 2006 img_5503.jpg
  -rwxrwxrwx  1 grog  home  2583242 Dec 23 17:27:30 2006 DSCN1326.JPG
  -rwxrwxrwx  1 grog  home  2476707 Dec 23 17:27:48 2006 DSCN1327.JPG

The file names for images with different timestamps are sorted
alphabetically.  The file names for images with the same timestamps
are sorted in reverse alphabetical order.  What to do?  Potentially
you could write a filter here too, though it wouldn't be simple,
because the timestamp representation depends on the age of the file.
And you can't just fix the bug, because it has been elevated to a
feature.  So -y does the right thing.

And that date.  There are three relatively arbitrary formats, two of
them depending on how long ago the timestamp was:

  -rw-r--r--  2 grog  home  1,719,298,048  8 Mar 14:14 939585975893478543543
  -rw-r--r--  1 grog  home              0 24 Sep  2012 foo

You can fix that (on FreeBSD and probably on macOS) with the equally
unsupported -T flag ("full timestamp"):

  $ ls -lT 939585975893478543543 foo
  -rw-r--r--  2 grog  home  1719298048  8 Mar 14:14:58 2020 939585975893478543543
  -rw-r--r--  1 grog  home           0 24 Sep 14:42:57 2012 foo

Do we need another format?  Maybe.  Certainly it would help to have a
different format if you want to pass the output to a filter that looks
at the timestamp.  What should it be?  Your guess is as good as mine,
but probably different.  Obvious choices are raw time_t and
YYYYMMDDhhmmss.  So I introduced the -D option to allow the user to
choose his own output format.

Is this a good idea?  I certainly had pangs of conscience every time,
and a non-standard option runs the risk of being incompatible with
other systems.  For example, Linux uses -T to define the tab size
(arguably a better choice for a filter) and -D to produce output for
Emacs dired mode.

In summary: there's a tradeoff between the elegance of filters and the
effort that they require.  Adding options has its disadvantages too.
You need to remember them, and they can easily become incompatible.
But these specific features make life considerably easier and add very
little to the size of the executable.  I'd be interested to hear of
alternative solutions to the issues.

Greg
--
Sent from my desktop computer.
Finger grog@lemis.com for PGP public key.
See complete headers for address and phone numbers.
This message is digitally signed.  If your Microsoft mail program
reports problems, please read http://lemis.com/broken-MUA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 163 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-05  4:57 Doug McIlroy
@ 2020-03-05 22:17 ` Diomidis Spinellis
  0 siblings, 0 replies; 68+ messages in thread
From: Diomidis Spinellis @ 2020-03-05 22:17 UTC (permalink / raw)
  To: tuhs

On 05-Mar-20 6:57, Doug McIlroy wrote:
>> These go all the way back to v7 unix, where ls has an option to reverse
> the sort order (which could have been done by passing the output to tac).
> 
> A cool idea, but tac was not in v7. And tail didn't get the -r
> option until v8.

Tail acquired a -r option between 3BSD [1] and 4BSD [2].

I remember using that option on SunOS in 1990 as part of a prank we 
played on a friend at the university.  On the Sun 3 workstations we were 
using at the time, one could enter the monitor/debugger program by 
pressing L1-A.  By remotely logging into a workstation and running a 
shell loop, one could ensure that when the monitor was entered the 
active program would be that shell.  It was then easy to modify the uid 
field for the active process (the loop-running shell) and set it to 
zero.  After exiting the monitor, a subshell launched from that shell 
would have full root privileges.  All we had to do was wait for the 
friend to lock his workstation when taking a break in order to obtain 
root privileges on his workstation and then change to his uid in order 
to modify his files via NFS on the university's Gould file server.

Based on this capability, I wrote the following script that would rename 
all our friend's files and directories to words from the dictionary. 
The script also created (via tail -r) another script that would undo 
this change.

#!/bin/sh
TMP=/tmp
DIR=$1
FILES=$TMP/f.$$
WORDS=$TMP/w.$$
CMD=$TMP/c.$$
REV=$TMP/r.$$
trap '' 0 1 2 3 15
find $DIR -depth -print >$FILES
head -`wc -l <$FILES|sed 's/[ 	]*//'` /usr/dict/words >$WORDS
paste $FILES $WORDS |
sed -e '
/^\.	/d
s/\(.*\)\/\(.*\)	\(.*\)/mv \1\/\2 \1\/\3/
' >$CMD
rm $FILES $WORDS
tail -r $CMD |
sed -e '
s/mv \(.*\) \(.*\)/mv \2 \1/
' >$REV
sh <$CMD
rm $CMD

Unfortunately, it turned out that tail -r had a limit on the number of 
lines it could reverse.  Although the script and its undo worked fine on 
a test set of a small number of files, when run on our friend's 
directory it created a faulty undo script.  Our friend ended up 
graduating with files named "abaca" and "abacinate".


[1] 
https://dspinellis.github.io/manview/?src=https%3A%2F%2Fraw.githubusercontent.com%2Fdspinellis%2Funix-history-repo%2FBSD-3%2Fusr%2Fman%2Fman1%2Ftail.1&name=BSD%203%3A%20tail(1)&link=https%3A%2F%2Fgithub.com%2Fdspinellis%2Funix-history-repo%2Fblob%2FBSD-3%2Fusr%2Fman%2Fman1%2Ftail.1

[2] 
https://dspinellis.github.io/manview/?src=https%3A%2F%2Fraw.githubusercontent.com%2Fdspinellis%2Funix-history-repo%2FBSD-4%2Fusr%2Fman%2Fman1%2Ftail.1&name=BSD%204%3A%20tail(1)&link=https%3A%2F%2Fgithub.com%2Fdspinellis%2Funix-history-repo%2Fblob%2FBSD-4%2Fusr%2Fman%2Fman1%2Ftail.1

-- 
Diomidis Spinellis
Free edX MOOC on Unix Tools: Data, Software, and Production Engineering
https://www.spinellis.gr/unix?tuhs20200306

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-05 21:50       ` Dave Horsfall
@ 2020-03-05 21:56         ` Warner Losh
  2020-03-08  5:26           ` Greg 'groggy' Lehey
  0 siblings, 1 reply; 68+ messages in thread
From: Warner Losh @ 2020-03-05 21:56 UTC (permalink / raw)
  To: Dave Horsfall; +Cc: The Eunuchs Hysterical Society

[-- Attachment #1: Type: text/plain, Size: 814 bytes --]

On Thu, Mar 5, 2020 at 2:51 PM Dave Horsfall <dave@horsfall.org> wrote:

> On Wed, 4 Mar 2020, Ken Thompson via TUHS wrote:
>
> > do i get a prize:
> > ls -tj
> > /bin/ls: illegal option -- j
> > usage: ls [-ABCFGHLOPRSTUWabcdefghiklmnopqrstuwx1] [file ...]
>
> Another candidate for option-cleansing...  Interesting; I get different
> options with the Mac and FreeBSD:
>
> Mac:
>
>      usage: ls [-ABCFGHLOPRSTUWabcdefghiklmnopqrstuwx1] [file ...]
>
> FreeBSD:
>
>      usage: ls [-ABCFGHILPRSTUWZabcdfghiklmnopqrstuwxy1,] [-D format]
> [file ...]
>
> So FreeBSD has added up "y,D:" (in getopt(3)-speak); my eyes are burning...
>

FreeBSD wouldn't need -, if there were a good filter to add , to large
numbers...  Some of the proliferation of options has been due to a lack of
proper building-blocks....

Warner

[-- Attachment #2: Type: text/html, Size: 1259 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-05  4:17     ` Ken Thompson via TUHS
  2020-03-05 14:53       ` Dan Cross
@ 2020-03-05 21:50       ` Dave Horsfall
  2020-03-05 21:56         ` Warner Losh
  1 sibling, 1 reply; 68+ messages in thread
From: Dave Horsfall @ 2020-03-05 21:50 UTC (permalink / raw)
  To: The Eunuchs Hysterical Society

On Wed, 4 Mar 2020, Ken Thompson via TUHS wrote:

> do i get a prize:
> ls -tj
> /bin/ls: illegal option -- j
> usage: ls [-ABCFGHLOPRSTUWabcdefghiklmnopqrstuwx1] [file ...]

Another candidate for option-cleansing...  Interesting; I get different 
options with the Mac and FreeBSD:

Mac:

     usage: ls [-ABCFGHLOPRSTUWabcdefghiklmnopqrstuwx1] [file ...]

FreeBSD:

     usage: ls [-ABCFGHILPRSTUWZabcdfghiklmnopqrstuwxy1,] [-D format] [file ...]

So FreeBSD has added up "y,D:" (in getopt(3)-speak); my eyes are burning...

-- Dave

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-05  4:17     ` Ken Thompson via TUHS
@ 2020-03-05 14:53       ` Dan Cross
  2020-03-05 21:50       ` Dave Horsfall
  1 sibling, 0 replies; 68+ messages in thread
From: Dan Cross @ 2020-03-05 14:53 UTC (permalink / raw)
  To: Ken Thompson; +Cc: The Eunuchs Hysterical Society

[-- Attachment #1: Type: text/plain, Size: 510 bytes --]

On Wed, Mar 4, 2020 at 11:18 PM Ken Thompson via TUHS <tuhs@minnie.tuhs.org>
wrote:

> do i get a prize:
>

Depends on whether you do your grocery shopping at Trader Joe's.

ls -tj
> /bin/ls: illegal option -- j
> usage: ls [-ABCFGHLOPRSTUWabcdefghiklmnopqrstuwx1] [file ...]
>

Very nice. Wasn't there something in the fortune file at one point about
the "Monty Python and the Holy Grail" bridge crossing scene where the
question was, "what $n$ lower case letters are not options to ls(1)?"

        - Dan C.

[-- Attachment #2: Type: text/html, Size: 1081 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
@ 2020-03-05  4:57 Doug McIlroy
  2020-03-05 22:17 ` Diomidis Spinellis
  0 siblings, 1 reply; 68+ messages in thread
From: Doug McIlroy @ 2020-03-05  4:57 UTC (permalink / raw)
  To: tuhs

> These go all the way back to v7 unix, where ls has an option to reverse
the sort order (which could have been done by passing the output to tac).

A cool idea, but tac was not in v7. And tail didn't get the -r
option until v8.

As for rev, I don't know why it was first written, but one
use was to examine suffixes--a kind of thing that several
word lovers in the Unix lab were prone to do.

Apropos of using rev to make rhyming dictionaries, Walker's
Rhyming Dictionary was published decades before Noah
Webster's dictionary appeared and stayed in print
for about 200 years. Notionally the relation between
webster and walker is
	rev <webster | sort | rev >walker

Doug

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-05  2:05   ` Kurt H Maier
@ 2020-03-05  4:17     ` Ken Thompson via TUHS
  2020-03-05 14:53       ` Dan Cross
  2020-03-05 21:50       ` Dave Horsfall
  0 siblings, 2 replies; 68+ messages in thread
From: Ken Thompson via TUHS @ 2020-03-05  4:17 UTC (permalink / raw)
  To: Kurt H Maier; +Cc: The Eunuchs Hysterical Society

[-- Attachment #1: Type: text/plain, Size: 421 bytes --]

do i get a prize:

ls -tj
/bin/ls: illegal option -- j
usage: ls [-ABCFGHLOPRSTUWabcdefghiklmnopqrstuwx1] [file ...]

On Wed, Mar 4, 2020 at 6:06 PM Kurt H Maier <khm@sciops.net> wrote:

> On Wed, Mar 04, 2020 at 11:17:46AM -0500, John P. Linderman wrote:
> > I think that was a useful option, but the irony of Rob
> > adding an option to "tac" was hard to overlook.
>
> tac came back from Jersey waving flags?
>
> khm
>

[-- Attachment #2: Type: text/html, Size: 766 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-04 16:17 ` John P. Linderman
  2020-03-04 17:25   ` Bakul Shah
  2020-03-05  0:55   ` Rob Pike
@ 2020-03-05  2:05   ` Kurt H Maier
  2020-03-05  4:17     ` Ken Thompson via TUHS
  2 siblings, 1 reply; 68+ messages in thread
From: Kurt H Maier @ 2020-03-05  2:05 UTC (permalink / raw)
  To: John P. Linderman; +Cc: The Eunuchs Hysterical Society

On Wed, Mar 04, 2020 at 11:17:46AM -0500, John P. Linderman wrote:
> I think that was a useful option, but the irony of Rob
> adding an option to "tac" was hard to overlook.
          
tac came back from Jersey waving flags?

khm

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-04 16:17 ` John P. Linderman
  2020-03-04 17:25   ` Bakul Shah
@ 2020-03-05  0:55   ` Rob Pike
  2020-03-05  2:05   ` Kurt H Maier
  2 siblings, 0 replies; 68+ messages in thread
From: Rob Pike @ 2020-03-05  0:55 UTC (permalink / raw)
  To: John P. Linderman; +Cc: The Eunuchs Hysterical Society

I have no memory of this, but that doesn't mean it's false.

Also in my defense, suggesting an option compared to actually adding
the code is a lesser crime. Or is it?

Anyway I removed all the options from research cat, including -u. That
counts for something.

-rob

On Thu, Mar 5, 2020 at 3:19 AM John P. Linderman <jpl.jpl@gmail.com> wrote:
>
> The "statute of limitations" must have passed long ago, so I confess to having been the author of the original tac (cat in reverse). I was working on a project that wrote log files, but the logs were very "bursty". Minutes might go by without any activity, followed by a burst of logging activity. We often wanted to see the most recent burst of activity, so "tail -f" wouldn't do the job. It would show the next burst of activity, which might not occur for quite some time. Somebody posted a functional equivalent on some netnews group, but it was ghastly. I think it did seeks of -1 characters at a time to accumulate each line. That would have been fast enough to feed our pathetic 1200 baud terminals, but it would have beat the system to death, and that would have been a disservice to other users. My version did reads of 512 bytes on 512-byte boundaries, so it put much less load on the system. I couldn't bear to see something like the netnews version
> get adopted. The software release process at the Labs was a bureaucratic nightmare, so I "tossed my version over the wall", into the arms of Andy Tanenbaum, as I recall. He made it public, attributed to "an unknown author".
>
> I don't know how Rob Pike got ahold of it, but he recognized that mailbox files had the same bursty growth. Unlike our log files, whose contents were acceptably understandable in reverse order, mail messages were hard to read in reverse order, so he proposed making it possible to recognize the headers at the start of each mail message, and put the entire message out in readable order. I think that was a useful option, but the irony of Rob adding an option to "tac" was hard to overlook.
>
> The version out there now was rewritten by Jay Lepreau, it seems:
>
> /*
>  * tac.c - Print file segments in reverse order
>  *
>  * Original line-only version by unknown author off the net.
>  * Rewritten in 1985 by Jay Lepreau, Univ of Utah, to allocate memory
>  * dynamically, handle string bounded segments (suggested by Rob Pike),
>  * and handle pipes.
>  */
>
> Dynamic buffer allocation rather than relying on the time-honored 512-bytes-is-enough assumption was a positive, as was supporting Rob's suggestion. Handling pipes strikes me as a waste of code, but hey, anything is better than that version I replaced.
>
> On Wed, Mar 4, 2020 at 9:15 AM Nelson H. F. Beebe <beebe@math.utah.edu> wrote:
>>
>> Arnold Robbins writes:
>>
>> >> There was no tac in V7 Unix. It was first posted to USENET, I don't
>> >> know by who, and picked up by Linux and *BSD.
>>
>> That brought back memories, and to verify them, I checked the tac.c
>> source code in the latest GNU coreutils test release.  It says
>>
>> /* Written by Jay Lepreau (lepreau@cs.utah.edu).
>>    GNU enhancements by David MacKenzie (djm@gnu.ai.mit.edu). */
>>
>> So my memory was right that my old friend Jay was the author.  Sadly,
>> we lost him in September 2008: see
>>
>>         https://www.legacy.com/obituaries/saltlaketribune/obituary.aspx?page=lifestory&pid=117597321
>>
>> Jay founded the influential Flux group in advanced networking research:
>>
>>         http://www.flux.utah.edu/profile/lepreau
>>
>> -------------------------------------------------------------------------------
>> - Nelson H. F. Beebe                    Tel: +1 801 581 5254                  -
>> - University of Utah                    FAX: +1 801 581 4148                  -
>> - Department of Mathematics, 110 LCB    Internet e-mail: beebe@math.utah.edu  -
>> - 155 S 1400 E RM 233                       beebe@acm.org  beebe@computer.org -
>> - Salt Lake City, UT 84112-0090, USA    URL: http://www.math.utah.edu/~beebe/ -
>> -------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-04 16:17 ` John P. Linderman
@ 2020-03-04 17:25   ` Bakul Shah
  2020-03-05  0:55   ` Rob Pike
  2020-03-05  2:05   ` Kurt H Maier
  2 siblings, 0 replies; 68+ messages in thread
From: Bakul Shah @ 2020-03-04 17:25 UTC (permalink / raw)
  To: John P. Linderman; +Cc: The Eunuchs Hysterical Society

[-- Attachment #1: Type: text/plain, Size: 4168 bytes --]

I missed knowing about tac till now. I’ve used tail -r since
1982 when Yost pointed out that tail -r|rev was equivalent
to a toy recursive C program I had written to reverse a file.
He was almost right!

rev(){int c=getchar();if(c==EOF)return;rev();putchar(c);}

> On Mar 4, 2020, at 8:19 AM, John P. Linderman <jpl.jpl@gmail.com> wrote:
> 
> 
> The "statute of limitations" must have passed long ago, so I confess to having been the author of the original tac (cat in reverse). I was working on a project that wrote log files, but the logs were very "bursty". Minutes might go by without any activity, followed by a burst of logging activity. We often wanted to see the most recent burst of activity, so "tail -f" wouldn't do the job. It would show the next burst of activity, which might not occur for quite some time. Somebody posted a functional equivalent on some netnews group, but it was ghastly. I think it did seeks of -1 characters at a time to accumulate each line. That would have been fast enough to feed our pathetic 1200 baud terminals, but it would have beat the system to death, and that would have been a disservice to other users. My version did reads of 512 bytes on 512-byte boundaries, so it put much less load on the system. I couldn't bear to see something like the netnews version
> get adopted. The software release process at the Labs was a bureaucratic nightmare, so I "tossed my version over the wall", into the arms of Andy Tanenbaum, as I recall. He made it public, attributed to "an unknown author".
> 
> I don't know how Rob Pike got ahold of it, but he recognized that mailbox files had the same bursty growth. Unlike our log files, whose contents were acceptably understandable in reverse order, mail messages were hard to read in reverse order, so he proposed making it possible to recognize the headers at the start of each mail message, and put the entire message out in readable order. I think that was a useful option, but the irony of Rob adding an option to "tac" was hard to overlook.
> 
> The version out there now was rewritten by Jay Lepreau, it seems:
> 
> /*
>  * tac.c - Print file segments in reverse order
>  *
>  * Original line-only version by unknown author off the net.
>  * Rewritten in 1985 by Jay Lepreau, Univ of Utah, to allocate memory
>  * dynamically, handle string bounded segments (suggested by Rob Pike),
>  * and handle pipes.
>  */
> 
> Dynamic buffer allocation rather than relying on the time-honored 512-bytes-is-enough assumption was a positive, as was supporting Rob's suggestion. Handling pipes strikes me as a waste of code, but hey, anything is better than that version I replaced.
> 
>> On Wed, Mar 4, 2020 at 9:15 AM Nelson H. F. Beebe <beebe@math.utah.edu> wrote:
>> Arnold Robbins writes:
>> 
>> >> There was no tac in V7 Unix. It was first posted to USENET, I don't
>> >> know by who, and picked up by Linux and *BSD.
>> 
>> That brought back memories, and to verify them, I checked the tac.c
>> source code in the latest GNU coreutils test release.  It says
>> 
>> /* Written by Jay Lepreau (lepreau@cs.utah.edu).
>>    GNU enhancements by David MacKenzie (djm@gnu.ai.mit.edu). */
>> 
>> So my memory was right that my old friend Jay was the author.  Sadly, 
>> we lost him in September 2008: see
>> 
>>         https://www.legacy.com/obituaries/saltlaketribune/obituary.aspx?page=lifestory&pid=117597321
>> 
>> Jay founded the influential Flux group in advanced networking research:
>> 
>>         http://www.flux.utah.edu/profile/lepreau
>> 
>> -------------------------------------------------------------------------------
>> - Nelson H. F. Beebe                    Tel: +1 801 581 5254                  -
>> - University of Utah                    FAX: +1 801 581 4148                  -
>> - Department of Mathematics, 110 LCB    Internet e-mail: beebe@math.utah.edu  -
>> - 155 S 1400 E RM 233                       beebe@acm.org  beebe@computer.org -
>> - Salt Lake City, UT 84112-0090, USA    URL: http://www.math.utah.edu/~beebe/ -
>> -------------------------------------------------------------------------------

[-- Attachment #2: Type: text/html, Size: 6405 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
  2020-03-04 14:06 Nelson H. F. Beebe
@ 2020-03-04 16:17 ` John P. Linderman
  2020-03-04 17:25   ` Bakul Shah
                     ` (2 more replies)
  0 siblings, 3 replies; 68+ messages in thread
From: John P. Linderman @ 2020-03-04 16:17 UTC (permalink / raw)
  To: Nelson H. F. Beebe; +Cc: The Eunuchs Hysterical Society

[-- Attachment #1: Type: text/plain, Size: 3678 bytes --]

The "statute of limitations" must have passed long ago, so I confess to
having been the author of the original tac (cat in reverse). I was working
on a project that wrote log files, but the logs were very "bursty". Minutes
might go by without any activity, followed by a burst of logging activity.
We often wanted to see *the most recent* burst of activity, so "tail -f"
wouldn't do the job. It would show the *next* burst of activity, which
might not occur for quite some time. Somebody posted a functional
equivalent on some netnews group, but it was *ghastly*. I think it did
seeks of -1 characters at a time to accumulate each line. That would have
been fast enough to feed our pathetic 1200 baud terminals, but it would
have beat the system to death, and that would have been a disservice to
other users. My version did reads of 512 bytes on 512-byte boundaries, so
it put much less load on the system. I couldn't bear to see something like
the netnews version
get adopted. The software release process at the Labs was a bureaucratic
nightmare, so I "tossed my version over the wall", into the arms of Andy
Tanenbaum, as I recall. He made it public, attributed to "an unknown
author".

I don't know how Rob Pike got ahold of it, but he recognized that mailbox
files had the same bursty growth. Unlike our log files, whose contents were
acceptably understandable in reverse order, mail messages were hard to read
in reverse order, so he proposed making it possible to recognize the
headers at the start of each mail message, and put the entire message out
in readable order. I think that was a useful option, but the irony of Rob
adding an option to "tac" was hard to overlook.

The version out there now was rewritten by Jay Lepreau, it seems:

/*
 * tac.c - Print file segments in reverse order
 *
 * Original line-only version by unknown author off the net.
 * Rewritten in 1985 by Jay Lepreau, Univ of Utah, to allocate memory
 * dynamically, handle string bounded segments (suggested by Rob Pike),
 * and handle pipes.
 */

Dynamic buffer allocation rather than relying on the time-honored
512-bytes-is-enough assumption was a positive, as was supporting Rob's
suggestion. Handling pipes strikes me as a waste of code, but hey, anything
is better than that version I replaced.

On Wed, Mar 4, 2020 at 9:15 AM Nelson H. F. Beebe <beebe@math.utah.edu>
wrote:

> Arnold Robbins writes:
>
> >> There was no tac in V7 Unix. It was first posted to USENET, I don't
> >> know by who, and picked up by Linux and *BSD.
>
> That brought back memories, and to verify them, I checked the tac.c
> source code in the latest GNU coreutils test release.  It says
>
> /* Written by Jay Lepreau (lepreau@cs.utah.edu).
>    GNU enhancements by David MacKenzie (djm@gnu.ai.mit.edu). */
>
> So my memory was right that my old friend Jay was the author.  Sadly,
> we lost him in September 2008: see
>
>
> https://www.legacy.com/obituaries/saltlaketribune/obituary.aspx?page=lifestory&pid=117597321
>
> Jay founded the influential Flux group in advanced networking research:
>
>         http://www.flux.utah.edu/profile/lepreau
>
>
> -------------------------------------------------------------------------------
> - Nelson H. F. Beebe                    Tel: +1 801 581 5254
>     -
> - University of Utah                    FAX: +1 801 581 4148
>     -
> - Department of Mathematics, 110 LCB    Internet e-mail:
> beebe@math.utah.edu  -
> - 155 S 1400 E RM 233                       beebe@acm.org
> beebe@computer.org -
> - Salt Lake City, UT 84112-0090, USA    URL:
> http://www.math.utah.edu/~beebe/ -
>
> -------------------------------------------------------------------------------
>

[-- Attachment #2: Type: text/html, Size: 5471 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [TUHS] Command line options and complexity
@ 2020-03-04 14:06 Nelson H. F. Beebe
  2020-03-04 16:17 ` John P. Linderman
  0 siblings, 1 reply; 68+ messages in thread
From: Nelson H. F. Beebe @ 2020-03-04 14:06 UTC (permalink / raw)
  To: tuhs

Arnold Robbins writes:

>> There was no tac in V7 Unix. It was first posted to USENET, I don't
>> know by who, and picked up by Linux and *BSD.

That brought back memories, and to verify them, I checked the tac.c
source code in the latest GNU coreutils test release.  It says

/* Written by Jay Lepreau (lepreau@cs.utah.edu).
   GNU enhancements by David MacKenzie (djm@gnu.ai.mit.edu). */

So my memory was right that my old friend Jay was the author.  Sadly, 
we lost him in September 2008: see

	https://www.legacy.com/obituaries/saltlaketribune/obituary.aspx?page=lifestory&pid=117597321
	
Jay founded the influential Flux group in advanced networking research:

	http://www.flux.utah.edu/profile/lepreau

-------------------------------------------------------------------------------
- Nelson H. F. Beebe                    Tel: +1 801 581 5254                  -
- University of Utah                    FAX: +1 801 581 4148                  -
- Department of Mathematics, 110 LCB    Internet e-mail: beebe@math.utah.edu  -
- 155 S 1400 E RM 233                       beebe@acm.org  beebe@computer.org -
- Salt Lake City, UT 84112-0090, USA    URL: http://www.math.utah.edu/~beebe/ -
-------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, back to index

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-03 18:15 [TUHS] Command line options and complexity Jon Steinhart
2020-03-03 18:44 ` Adam Thornton
2020-03-04  4:11   ` Tyler Adams
2020-03-04  6:03     ` Dave Horsfall
2020-03-04  6:48       ` arnold
2020-03-04 21:17         ` Dave Horsfall
2020-03-05  0:49         ` Lyndon Nerenberg
2020-03-05 20:54           ` Dave Horsfall
2020-03-05 22:01             ` William Cheswick
2020-03-04 21:50   ` Random832
2020-03-04 23:19     ` Steffen Nurpmeso
2020-03-05  6:12     ` Alan D. Salewski
2020-03-04 22:03   ` Random832
2020-03-04 23:25     ` Terry Jones
2020-03-10 23:03 ` Dan Stromberg
2020-03-11  3:18   ` Dave Horsfall
2020-03-11  4:02     ` Steve Nickolas
2020-03-11 22:56     ` Greg 'groggy' Lehey
2020-03-11 23:14       ` Dan Cross
2020-03-12  0:42         ` Greg 'groggy' Lehey
2020-03-12  0:53       ` Steve Nickolas
2020-03-12  3:09         ` Greg 'groggy' Lehey
2020-03-12  3:34           ` Steve Nickolas
2020-03-13  1:02             ` Greg 'groggy' Lehey
2020-03-12  5:38         ` Dave Horsfall
2020-03-12  6:48         ` Peter Jeremy
2020-03-12  7:37           ` Steve Nickolas
2020-03-12  7:42             ` Warner Losh
2020-03-12 23:57           ` Greg 'groggy' Lehey
2020-03-12  5:22       ` Dave Horsfall
2020-03-12  5:35         ` Steve Nickolas
2020-03-13  0:36         ` Greg 'groggy' Lehey
2020-03-13 11:26           ` Dave Horsfall
2020-03-14  2:13           ` Greg A. Woods
2020-03-14  4:31             ` Greg 'groggy' Lehey
2020-03-04 14:06 Nelson H. F. Beebe
2020-03-04 16:17 ` John P. Linderman
2020-03-04 17:25   ` Bakul Shah
2020-03-05  0:55   ` Rob Pike
2020-03-05  2:05   ` Kurt H Maier
2020-03-05  4:17     ` Ken Thompson via TUHS
2020-03-05 14:53       ` Dan Cross
2020-03-05 21:50       ` Dave Horsfall
2020-03-05 21:56         ` Warner Losh
2020-03-08  5:26           ` Greg 'groggy' Lehey
2020-03-08  5:32             ` Jon Steinhart
2020-03-08  9:30               ` Tyler Adams
     [not found]                 ` <CAC0cEp8eFRkkLTw88WVaKZoKy+qsrhuC8LkzmmsbqtdZgMf8eQ@mail.gmail.com>
     [not found]                   ` <CAEuQd1D7+dfap98AwPo2W41+06prrcVaAWk3Ve-ve0uQ0xBu3Q@mail.gmail.com>
2020-03-09 21:06                     ` John P. Linderman
2020-03-09 21:22                       ` Kurt H Maier
2020-03-11 17:41                         ` John P. Linderman
2020-03-11 21:29                           ` Warner Losh
2020-03-12  0:13                             ` John P. Linderman
2020-03-12  0:34                               ` Chet Ramey
2020-03-12 12:57                             ` John P. Linderman
2020-03-12 19:24                               ` Steffen Nurpmeso
2020-03-08  9:51             ` Michael Kjörling
2020-03-05  4:57 Doug McIlroy
2020-03-05 22:17 ` Diomidis Spinellis
2020-03-10 16:15 Doug McIlroy
2020-03-10 17:38 ` Dan Cross
2020-03-10 17:44   ` Bakul Shah
2020-03-10 18:09     ` Dan Cross
2020-03-10 18:42 Doug McIlroy
2020-03-10 19:38 ` Dan Cross
2020-03-13 10:45 Dave Horsfall
2020-03-14  4:35 ` Greg 'groggy' Lehey
2020-03-14 19:52   ` John P. Linderman
2020-03-14 20:25     ` Steffen Nurpmeso

The Unix Heritage Society mailing list

Archives are clonable: git clone --mirror http://inbox.vuxu.org/tuhs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://inbox.vuxu.org/vuxu.archive.tuhs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git