OK, this should be good for some conversation. A friend sent me this link today: http://danluu.com/cli-complexity/
[-- Attachment #1: Type: text/plain, Size: 1181 bytes --] I've heard people say that there isn't really any alternative to this kind of complexity for command line tools, but people who say that have never really tried the alternative, something like PowerShell. I have plenty of complaints about PowerShell, but passing structured data around and easily being able to operate on structured data without having to hold metadata information in my head so that I can pass the appropriate metadata to the right command line tools at that right places the pipeline isn't among my complaints3 <https://danluu.com/cli-complexity/#fn:W>. Somewhat disingenuous. I mean, yes, that's true, but on the other hand it means that you have to keep the "what Powershell commands operate on what structure" in your head instead, since you can no longer assume the pipelines to be a universal interface. Same basic problem as CMS Pipelines. Fantastically powerful, and nowhere near as easy to compose good functionality as "it's just a byte stream." Adam On Tue, Mar 3, 2020 at 11:16 AM Jon Steinhart <jon@fourwinds.com> wrote: > OK, this should be good for some conversation. A friend sent me this > link today: http://danluu.com/cli-complexity/ > [-- Attachment #2: Type: text/html, Size: 1796 bytes --]
[-- Attachment #1: Type: text/plain, Size: 1522 bytes --] > These go all the way back to v7 unix, where ls has an option to reverse the sort order (which could have been done by passing the output to tac). Good point. Why was this done in v7 unix and why wasn't it thrown out? Tyler On Tue, Mar 3, 2020, 20:45 Adam Thornton <athornton@gmail.com> wrote: > I've heard people say that there isn't really any alternative to this kind > of complexity for command line tools, but people who say that have never > really tried the alternative, something like PowerShell. I have plenty of > complaints about PowerShell, but passing structured data around and easily > being able to operate on structured data without having to hold metadata > information in my head so that I can pass the appropriate metadata to the > right command line tools at that right places the pipeline isn't among my > complaints3 <https://danluu.com/cli-complexity/#fn:W>. > > Somewhat disingenuous. I mean, yes, that's true, but on the other hand it > means that you have to keep the "what Powershell commands operate on what > structure" in your head instead, since you can no longer assume the > pipelines to be a universal interface. > > Same basic problem as CMS Pipelines. Fantastically powerful, and nowhere > near as easy to compose good functionality as "it's just a byte stream." > > Adam > > On Tue, Mar 3, 2020 at 11:16 AM Jon Steinhart <jon@fourwinds.com> wrote: > >> OK, this should be good for some conversation. A friend sent me this >> link today: http://danluu.com/cli-complexity/ >> > [-- Attachment #2: Type: text/html, Size: 2628 bytes --]
[-- Attachment #1: Type: text/plain, Size: 674 bytes --] On Wed, 4 Mar 2020, Tyler Adams wrote: > > These go all the way back to v7 unix, where ls has an option to > > reverse the sort order (which could have been done by passing the > > output to tac). > > Good point. Why was this done in v7 unix and why wasn't it thrown out? I seem to recall that "sort -r" was in V6, or perhaps that was one of the programs I'd back-ported from V7 (being stuck with 11/40-class boxes). And speaking of "tac" (which I never saw), I couldn't think of a single use for "rev" (although no doubt I'll now get told). Mind you, you get some amusing output with the "man" command because of the way that the underlining works... -- Dave
Dave Horsfall <dave@horsfall.org> wrote: > On Wed, 4 Mar 2020, Tyler Adams wrote: > > > > These go all the way back to v7 unix, where ls has an option to > > > reverse the sort order (which could have been done by passing the > > > output to tac). > > > > Good point. Why was this done in v7 unix and why wasn't it thrown out? There was no tac in V7 Unix. It was first posted to USENET, I don't know by who, and picked up by Linux and *BSD. > And speaking of "tac" (which I never saw), I couldn't think of a single > use for "rev" (although no doubt I'll now get told). It's useful for reading Hebrew sent in plain text email :-). Hebrew is read right to left but stored in physical order (left to right) in files. Arnold
Arnold Robbins writes: >> There was no tac in V7 Unix. It was first posted to USENET, I don't >> know by who, and picked up by Linux and *BSD. That brought back memories, and to verify them, I checked the tac.c source code in the latest GNU coreutils test release. It says /* Written by Jay Lepreau (lepreau@cs.utah.edu). GNU enhancements by David MacKenzie (djm@gnu.ai.mit.edu). */ So my memory was right that my old friend Jay was the author. Sadly, we lost him in September 2008: see https://www.legacy.com/obituaries/saltlaketribune/obituary.aspx?page=lifestory&pid=117597321 Jay founded the influential Flux group in advanced networking research: http://www.flux.utah.edu/profile/lepreau ------------------------------------------------------------------------------- - Nelson H. F. Beebe Tel: +1 801 581 5254 - - University of Utah FAX: +1 801 581 4148 - - Department of Mathematics, 110 LCB Internet e-mail: beebe@math.utah.edu - - 155 S 1400 E RM 233 beebe@acm.org beebe@computer.org - - Salt Lake City, UT 84112-0090, USA URL: http://www.math.utah.edu/~beebe/ - -------------------------------------------------------------------------------
[-- Attachment #1: Type: text/plain, Size: 3678 bytes --] The "statute of limitations" must have passed long ago, so I confess to having been the author of the original tac (cat in reverse). I was working on a project that wrote log files, but the logs were very "bursty". Minutes might go by without any activity, followed by a burst of logging activity. We often wanted to see *the most recent* burst of activity, so "tail -f" wouldn't do the job. It would show the *next* burst of activity, which might not occur for quite some time. Somebody posted a functional equivalent on some netnews group, but it was *ghastly*. I think it did seeks of -1 characters at a time to accumulate each line. That would have been fast enough to feed our pathetic 1200 baud terminals, but it would have beat the system to death, and that would have been a disservice to other users. My version did reads of 512 bytes on 512-byte boundaries, so it put much less load on the system. I couldn't bear to see something like the netnews version get adopted. The software release process at the Labs was a bureaucratic nightmare, so I "tossed my version over the wall", into the arms of Andy Tanenbaum, as I recall. He made it public, attributed to "an unknown author". I don't know how Rob Pike got ahold of it, but he recognized that mailbox files had the same bursty growth. Unlike our log files, whose contents were acceptably understandable in reverse order, mail messages were hard to read in reverse order, so he proposed making it possible to recognize the headers at the start of each mail message, and put the entire message out in readable order. I think that was a useful option, but the irony of Rob adding an option to "tac" was hard to overlook. The version out there now was rewritten by Jay Lepreau, it seems: /* * tac.c - Print file segments in reverse order * * Original line-only version by unknown author off the net. * Rewritten in 1985 by Jay Lepreau, Univ of Utah, to allocate memory * dynamically, handle string bounded segments (suggested by Rob Pike), * and handle pipes. */ Dynamic buffer allocation rather than relying on the time-honored 512-bytes-is-enough assumption was a positive, as was supporting Rob's suggestion. Handling pipes strikes me as a waste of code, but hey, anything is better than that version I replaced. On Wed, Mar 4, 2020 at 9:15 AM Nelson H. F. Beebe <beebe@math.utah.edu> wrote: > Arnold Robbins writes: > > >> There was no tac in V7 Unix. It was first posted to USENET, I don't > >> know by who, and picked up by Linux and *BSD. > > That brought back memories, and to verify them, I checked the tac.c > source code in the latest GNU coreutils test release. It says > > /* Written by Jay Lepreau (lepreau@cs.utah.edu). > GNU enhancements by David MacKenzie (djm@gnu.ai.mit.edu). */ > > So my memory was right that my old friend Jay was the author. Sadly, > we lost him in September 2008: see > > > https://www.legacy.com/obituaries/saltlaketribune/obituary.aspx?page=lifestory&pid=117597321 > > Jay founded the influential Flux group in advanced networking research: > > http://www.flux.utah.edu/profile/lepreau > > > ------------------------------------------------------------------------------- > - Nelson H. F. Beebe Tel: +1 801 581 5254 > - > - University of Utah FAX: +1 801 581 4148 > - > - Department of Mathematics, 110 LCB Internet e-mail: > beebe@math.utah.edu - > - 155 S 1400 E RM 233 beebe@acm.org > beebe@computer.org - > - Salt Lake City, UT 84112-0090, USA URL: > http://www.math.utah.edu/~beebe/ - > > ------------------------------------------------------------------------------- > [-- Attachment #2: Type: text/html, Size: 5471 bytes --]
[-- Attachment #1: Type: text/plain, Size: 4168 bytes --] I missed knowing about tac till now. I’ve used tail -r since 1982 when Yost pointed out that tail -r|rev was equivalent to a toy recursive C program I had written to reverse a file. He was almost right! rev(){int c=getchar();if(c==EOF)return;rev();putchar(c);} > On Mar 4, 2020, at 8:19 AM, John P. Linderman <jpl.jpl@gmail.com> wrote: > > > The "statute of limitations" must have passed long ago, so I confess to having been the author of the original tac (cat in reverse). I was working on a project that wrote log files, but the logs were very "bursty". Minutes might go by without any activity, followed by a burst of logging activity. We often wanted to see the most recent burst of activity, so "tail -f" wouldn't do the job. It would show the next burst of activity, which might not occur for quite some time. Somebody posted a functional equivalent on some netnews group, but it was ghastly. I think it did seeks of -1 characters at a time to accumulate each line. That would have been fast enough to feed our pathetic 1200 baud terminals, but it would have beat the system to death, and that would have been a disservice to other users. My version did reads of 512 bytes on 512-byte boundaries, so it put much less load on the system. I couldn't bear to see something like the netnews version > get adopted. The software release process at the Labs was a bureaucratic nightmare, so I "tossed my version over the wall", into the arms of Andy Tanenbaum, as I recall. He made it public, attributed to "an unknown author". > > I don't know how Rob Pike got ahold of it, but he recognized that mailbox files had the same bursty growth. Unlike our log files, whose contents were acceptably understandable in reverse order, mail messages were hard to read in reverse order, so he proposed making it possible to recognize the headers at the start of each mail message, and put the entire message out in readable order. I think that was a useful option, but the irony of Rob adding an option to "tac" was hard to overlook. > > The version out there now was rewritten by Jay Lepreau, it seems: > > /* > * tac.c - Print file segments in reverse order > * > * Original line-only version by unknown author off the net. > * Rewritten in 1985 by Jay Lepreau, Univ of Utah, to allocate memory > * dynamically, handle string bounded segments (suggested by Rob Pike), > * and handle pipes. > */ > > Dynamic buffer allocation rather than relying on the time-honored 512-bytes-is-enough assumption was a positive, as was supporting Rob's suggestion. Handling pipes strikes me as a waste of code, but hey, anything is better than that version I replaced. > >> On Wed, Mar 4, 2020 at 9:15 AM Nelson H. F. Beebe <beebe@math.utah.edu> wrote: >> Arnold Robbins writes: >> >> >> There was no tac in V7 Unix. It was first posted to USENET, I don't >> >> know by who, and picked up by Linux and *BSD. >> >> That brought back memories, and to verify them, I checked the tac.c >> source code in the latest GNU coreutils test release. It says >> >> /* Written by Jay Lepreau (lepreau@cs.utah.edu). >> GNU enhancements by David MacKenzie (djm@gnu.ai.mit.edu). */ >> >> So my memory was right that my old friend Jay was the author. Sadly, >> we lost him in September 2008: see >> >> https://www.legacy.com/obituaries/saltlaketribune/obituary.aspx?page=lifestory&pid=117597321 >> >> Jay founded the influential Flux group in advanced networking research: >> >> http://www.flux.utah.edu/profile/lepreau >> >> ------------------------------------------------------------------------------- >> - Nelson H. F. Beebe Tel: +1 801 581 5254 - >> - University of Utah FAX: +1 801 581 4148 - >> - Department of Mathematics, 110 LCB Internet e-mail: beebe@math.utah.edu - >> - 155 S 1400 E RM 233 beebe@acm.org beebe@computer.org - >> - Salt Lake City, UT 84112-0090, USA URL: http://www.math.utah.edu/~beebe/ - >> ------------------------------------------------------------------------------- [-- Attachment #2: Type: text/html, Size: 6405 bytes --]
On Tue, 3 Mar 2020, arnold@skeeve.com wrote:
>> And speaking of "tac" (which I never saw), I couldn't think of a single
>> use for "rev" (although no doubt I'll now get told).
>
> It's useful for reading Hebrew sent in plain text email :-). Hebrew is
> read right to left but stored in physical order (left to right) in
> files.
Ah, of course :-) And Arabic too, as I recall.
-- Dave
On Tue, Mar 3, 2020, at 13:44, Adam Thornton wrote:
> I've heard people say that there isn't really any alternative to this
> kind of complexity for command line tools, but people who say that have
> never really tried the alternative, something like PowerShell. I have
> plenty of complaints about PowerShell, but passing structured data
> around and easily being able to operate on structured data without
> having to hold metadata information in my head so that I can pass the
> appropriate metadata to the right command line tools at that right
> places the pipeline isn't among my complaints3
> <https://danluu.com/cli-complexity/#fn:W>.
>
> Somewhat disingenuous. I mean, yes, that's true, but on the other hand
> it means that you have to keep the "what Powershell commands operate on
> what structure" in your head instead, since you can no longer assume
> the pipelines to be a universal interface.
Sure, but "stdin is a sequence of any type, and the argument is an expression that operates on that type or the name of a property that that type has" is universal enough.
The part that has to operate on a specific structure isn't the command, it's the arguments.
For example, a powershell pipeline to produce a list of files sorted by modified date is:
gci . | sort lastwritetime | select name
all three *commands* are universal - not all objects have a "lastwritetime" and "name" property, but sort and select can operate on any property that the sequence of objects passed into it has.
(gci is an alias for get-childitem... it also has aliases ls and dir, but I'm emphasizing that it's not exclusive to directories)
*assuming that ls -t didn't exist*, to do this with unix tools that operate on text you would need:
ls -l | [somehow convert the date to a sortable format, probably in awk] | sort | [somehow pick the filename alone out of the output - possibly with cut or sed or awk again]
and it's very difficult to get tools like awk, sort, and cut to work on formats that contain more than one field that may contain embedded spaces (you can just about get away with it for ls output because the date is always three "words").
A significant portion of ls's options are related to sorting, because you can sort based on fields that are either not present in the output, or are not in a format that can be sorted textually.
Maybe it would be enough to have the universal interface be "tables" (i.e. text streams in some format that supports adequate escaping of embedded row and column delimiters)... or maybe even just table rows, and let the user deal with memorizing column numbers (or let each originating command support a fully general way to specify what columns are requested, as ps alone does on modern systems) Of course, this isn't *really* different from allowing any data structure - after all, the value for any field could itself be a fully escaped table in text format.
The benefit of having actual data structures with types is that when you *don't* end the pipeline with select, each object knows how to print itself [files print mode, mtime, size, and name in a human-readable format, more or less equivalent to ls -l] rather than just dumping out every single field that you might want sort or select to operate on.
I put a lot of thoughts in my previous message, but hit send before thinking of a good way to summarize my main point...
On Tue, Mar 3, 2020, at 13:44, Adam Thornton wrote:
> Somewhat disingenuous. I mean, yes, that's true, but on the other hand
> it means that you have to keep the "what Powershell commands operate on
> what structure" in your head instead, since you can no longer assume
> the pipelines to be a universal interface.
The thing is, each Unix command imposes an implied structure on its
input, so it's not *really* a universal interface. Some operate on
lines as free text, some operate on space-delimited fields [with no
good way to escape them, though some do support an IFS environment
variable to at least change the delimiter], some work best with
fixed-width fields. Few provide a way to embed delimiters [be they
newline/null for record separator, tab/comma/space field separators, or
a user-defined separator for commands that support that] within a
value. Sort requires all values to be comparable as either strings or
numbers. Most commands you might want to use as a source in a pipeline
also expect to be used directly for human-readable output, so they
produce output that can be difficult to use for further processing
(e.g. dates in ls, which not only can't be sorted directly, but also
are limited to minutes for dates in the past year, and days for dates
before that, and are in the local time zone)
Hardly *any* commands you'd use in a pipeline really operate on unstructured bytes. Compression, I suppose. But other than that, you have just as much need to know what commands operate on what structure in Unix as in Powershell - the only difference is that the serialization is explicitly part of the interface... and due to the typical inability to escape delimiters, leaky.
Random832 wrote in <5019a751-d69a-4839-9a56-b977b275070d@www.fastmail.com>: |On Tue, Mar 3, 2020, at 13:44, Adam Thornton wrote: |> I've heard people say that there isn't really any alternative to this |> kind of complexity for command line tools, but people who say that have |> never really tried the alternative, something like PowerShell. I have |> plenty of complaints about PowerShell, but passing structured data |> around and easily being able to operate on structured data without |> having to hold metadata information in my head so that I can pass the |> appropriate metadata to the right command line tools at that right |> places the pipeline isn't among my complaints3 |> <https://danluu.com/cli-complexity/#fn:W>. |> |> Somewhat disingenuous. I mean, yes, that's true, but on the other hand |> it means that you have to keep the "what Powershell commands operate on |> what structure" in your head instead, since you can no longer assume |> the pipelines to be a universal interface. | |Sure, but "stdin is a sequence of any type, and the argument is an \ |expression that operates on that type or the name of a property that \ |that type has" is universal enough. | |The part that has to operate on a specific structure isn't the command, \ |it's the arguments. | |For example, a powershell pipeline to produce a list of files sorted \ |by modified date is: | |gci . | sort lastwritetime | select name ... |*assuming that ls -t didn't exist*, to do this with unix tools that \ |operate on text you would need: | |ls -l | [somehow convert the date to a sortable format, probably in \ |awk] | sort | [somehow pick the filename alone out of the output - \ |possibly with cut or sed or awk again] | |and it's very difficult to get tools like awk, sort, and cut to work \ |on formats that contain more than one field that may contain embedded \ |spaces (you can just about get away with it for ls output because the \ |date is always three "words"). Yes, that is really bad, except only that a lot of output is pretty portables since a very long time. FreeBSD started using libxo in many base utilities, which can output in structured formats. This includes CSV and even CBOR :), i do not know how the latter integrates in Unix text utilities however. (I think the format string syntax, that a bit originates in QT ??, could have been warped to something better, like the Python ones, plus further extensions, however. But it is an improvement to what the standard formats end up with when reordering etc. comes into place.) --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
[-- Attachment #1: Type: text/plain, Size: 3694 bytes --] On Wed, Mar 4, 2020 at 11:04 PM Random832 <random832@fastmail.com> wrote: > Hardly *any* commands you'd use in a pipeline really operate on > unstructured bytes. Compression, I suppose. But other than that, you have > just as much need to know what commands operate on what structure in Unix > as in Powershell - the only difference is that the serialization is > explicitly part of the interface... and due to the typical inability to > escape delimiters, leaky. > Another difference is that probably most people on this list are extremely familiar with the various quirks and I/O nuances of the tools many have been using every day for decades. Just as the native speakers of a natural language can't so easily see/appreciate its complexity (e.g., pronunciation in English!), I suspect many of us have internalized these idiosyncrasies. I teach occasional shell/Python courses to absolute beginners (no computing experience at all) and came to appreciate how weird the shell is (in the sense of having baked-in historical accidents that cannot / will not / should not be "corrected"). Some of my appreciation of that was due to discussions on this list (e.g., regarding comment syntax, and the : command) - so thanks! I know what follows won't be to everyone's taste, but I like Python and I love shell pipelines, so I tried to write a shell that gave you both and which allowed fairly free mixing of invoking UNIX tools and running Python. You can send anything down its pipelines - lines of text, atoms, numbers, Python objects, whatever (in the Python _ variable). Of course the receiving end of the pipeline needs to know (or figure out) what it's getting. One advantage is that you have a carefully designed programming language (no offence intended!) underlying the shell, so you can e.g., write shell functions in Python (and put them in a start-up file if you want) and just pipe regular UNIX output into them and pipe their output into whatever's next (more Python, another UNIX command, etc). Probably almost no one would actually want to regularly do the following on the command line, but you could: >>> from os import stat >>> def fd(): return [name for (name, time) in sorted((f, stat(f).st_mtime) for f in _)] >>> ls | fd() | tail -n 3 Here I've stuck a simple (DSU - see [1]) Python function in between two UNIX commands and use it to get the most recently modified files. You probably wouldn't want to do this either, but you could: >>> seq 0 9 | list(map(lambda x: 2 ** int(x), _)) | tee /tmp/powers-of-two | sum(map(int, _))1023>>> cat /tmp/powers-of-two1248163264128256512 Of course it also lets you do things you *would* want to do :-) More at https://github.com/terrycojones/daudin Python has fairly nice tools for reading and evaluating Python code, which meant that getting a first version of this implemented took only one evening of playing around. It's pretty simple (and still has plenty of rough edges). Apologies if this seems like self-promotion, but I very much enjoy thinking about things in this thread and about how we work with information. I'm also constantly blown away by how elegant UNIX is and how the core ideas have endured. Pipelines are really wonderful, as "natural" alternative to function composition as a mathematician or programmer would do it (see point #1 at https://github.com/terrycojones/daudin#background--thanks), and I wanted to build a shell that preserved that, while giving you Python. The overview of their history on pages 67-70 of bwk's recent book [2] is very interesting. Terry [1] https://en.wikipedia.org/wiki/Schwartzian_transform [2] https://www.amazon.com/UNIX-History-Memoir-Brian-Kernighan/dp/1695978552 [-- Attachment #2: Type: text/html, Size: 6073 bytes --]
> > And speaking of "tac" (which I never saw), I couldn't think of a single
> > use for "rev" (although no doubt I'll now get told).
It's handy for building rhyming dictionaries:
rev < /usr/share/dict/web2 | sort | rev > rhymes
--lyndon
I have no memory of this, but that doesn't mean it's false.
Also in my defense, suggesting an option compared to actually adding
the code is a lesser crime. Or is it?
Anyway I removed all the options from research cat, including -u. That
counts for something.
-rob
On Thu, Mar 5, 2020 at 3:19 AM John P. Linderman <jpl.jpl@gmail.com> wrote:
>
> The "statute of limitations" must have passed long ago, so I confess to having been the author of the original tac (cat in reverse). I was working on a project that wrote log files, but the logs were very "bursty". Minutes might go by without any activity, followed by a burst of logging activity. We often wanted to see the most recent burst of activity, so "tail -f" wouldn't do the job. It would show the next burst of activity, which might not occur for quite some time. Somebody posted a functional equivalent on some netnews group, but it was ghastly. I think it did seeks of -1 characters at a time to accumulate each line. That would have been fast enough to feed our pathetic 1200 baud terminals, but it would have beat the system to death, and that would have been a disservice to other users. My version did reads of 512 bytes on 512-byte boundaries, so it put much less load on the system. I couldn't bear to see something like the netnews version
> get adopted. The software release process at the Labs was a bureaucratic nightmare, so I "tossed my version over the wall", into the arms of Andy Tanenbaum, as I recall. He made it public, attributed to "an unknown author".
>
> I don't know how Rob Pike got ahold of it, but he recognized that mailbox files had the same bursty growth. Unlike our log files, whose contents were acceptably understandable in reverse order, mail messages were hard to read in reverse order, so he proposed making it possible to recognize the headers at the start of each mail message, and put the entire message out in readable order. I think that was a useful option, but the irony of Rob adding an option to "tac" was hard to overlook.
>
> The version out there now was rewritten by Jay Lepreau, it seems:
>
> /*
> * tac.c - Print file segments in reverse order
> *
> * Original line-only version by unknown author off the net.
> * Rewritten in 1985 by Jay Lepreau, Univ of Utah, to allocate memory
> * dynamically, handle string bounded segments (suggested by Rob Pike),
> * and handle pipes.
> */
>
> Dynamic buffer allocation rather than relying on the time-honored 512-bytes-is-enough assumption was a positive, as was supporting Rob's suggestion. Handling pipes strikes me as a waste of code, but hey, anything is better than that version I replaced.
>
> On Wed, Mar 4, 2020 at 9:15 AM Nelson H. F. Beebe <beebe@math.utah.edu> wrote:
>>
>> Arnold Robbins writes:
>>
>> >> There was no tac in V7 Unix. It was first posted to USENET, I don't
>> >> know by who, and picked up by Linux and *BSD.
>>
>> That brought back memories, and to verify them, I checked the tac.c
>> source code in the latest GNU coreutils test release. It says
>>
>> /* Written by Jay Lepreau (lepreau@cs.utah.edu).
>> GNU enhancements by David MacKenzie (djm@gnu.ai.mit.edu). */
>>
>> So my memory was right that my old friend Jay was the author. Sadly,
>> we lost him in September 2008: see
>>
>> https://www.legacy.com/obituaries/saltlaketribune/obituary.aspx?page=lifestory&pid=117597321
>>
>> Jay founded the influential Flux group in advanced networking research:
>>
>> http://www.flux.utah.edu/profile/lepreau
>>
>> -------------------------------------------------------------------------------
>> - Nelson H. F. Beebe Tel: +1 801 581 5254 -
>> - University of Utah FAX: +1 801 581 4148 -
>> - Department of Mathematics, 110 LCB Internet e-mail: beebe@math.utah.edu -
>> - 155 S 1400 E RM 233 beebe@acm.org beebe@computer.org -
>> - Salt Lake City, UT 84112-0090, USA URL: http://www.math.utah.edu/~beebe/ -
>> -------------------------------------------------------------------------------
On Wed, Mar 04, 2020 at 11:17:46AM -0500, John P. Linderman wrote:
> I think that was a useful option, but the irony of Rob
> adding an option to "tac" was hard to overlook.
tac came back from Jersey waving flags?
khm
[-- Attachment #1: Type: text/plain, Size: 421 bytes --] do i get a prize: ls -tj /bin/ls: illegal option -- j usage: ls [-ABCFGHLOPRSTUWabcdefghiklmnopqrstuwx1] [file ...] On Wed, Mar 4, 2020 at 6:06 PM Kurt H Maier <khm@sciops.net> wrote: > On Wed, Mar 04, 2020 at 11:17:46AM -0500, John P. Linderman wrote: > > I think that was a useful option, but the irony of Rob > > adding an option to "tac" was hard to overlook. > > tac came back from Jersey waving flags? > > khm > [-- Attachment #2: Type: text/html, Size: 766 bytes --]
> These go all the way back to v7 unix, where ls has an option to reverse
the sort order (which could have been done by passing the output to tac).
A cool idea, but tac was not in v7. And tail didn't get the -r
option until v8.
As for rev, I don't know why it was first written, but one
use was to examine suffixes--a kind of thing that several
word lovers in the Unix lab were prone to do.
Apropos of using rev to make rhyming dictionaries, Walker's
Rhyming Dictionary was published decades before Noah
Webster's dictionary appeared and stayed in print
for about 200 years. Notionally the relation between
webster and walker is
rev <webster | sort | rev >walker
Doug
On 2020-03-04 16:50:34, Random832 spake thus: [...] > Sure, but "stdin is a sequence of any type, and the argument is an expression that operates on that type or the name of a property that that type has" is universal enough. > > The part that has to operate on a specific structure isn't the command, it's the arguments. > > For example, a powershell pipeline to produce a list of files sorted by modified date is: > > gci . | sort lastwritetime | select name > > all three *commands* are universal - not all objects have a "lastwritetime" and "name" property, but sort and select can operate on any property that the sequence of objects passed into it has. There are some examples of that type of thing in widely used Unix tools; my use of 'sort -k1,1n' further down is demonstrating such a use case (the 'sort' command is being told that it is operating on numbers). But beyond some lowest common denominator types ("number", "string", ...) how many commands can really usefully operate on a large number of types? For example, a program that can operate on IP addresses is probably doing something different than a program that wants to operate on email addresses. I could see where named properties of some object can be used more generally than types, but again there are widely used tools that do do that (e.g., jq(1)). IMHO, though, they are more cumbersome to use than most of the commands I need to use minute to minute. > (gci is an alias for get-childitem... it also has aliases ls and dir, but I'm emphasizing that it's not exclusive to directories) > > *assuming that ls -t didn't exist*, to do this with unix tools that operate on text you would need: > > ls -l | [somehow convert the date to a sortable format, probably in awk] | sort | [somehow pick the filename alone out of the output - possibly with cut or sed or awk again] (Just nit-picking at this particular example) You could do it without ls[0]: $ stat -c '%Y %n' * | sort -k1,1n | xargs -L1 sh -c 'echo "$@"' That doesn't seem so bad to me, but if it was something I needed regularly I'd of course put it in an alias[1] or (more likely) a short script file. > and it's very difficult to get tools like awk, sort, and cut to work on formats that contain more than one field that may contain embedded spaces (you can just about get away with it for ls output because the date is always three "words"). [...] Yes, that's often true. And when I enounter it I typically start out by seeing if I can inject and remove tokens in the data at key places in the pipeline. Beyond anything trivial, though, I then quickly start reaching for tools to put the data into some form that more easily allow for it (CSV, JSON, ...). But that invariably adds other complications (such as the need to find or build tools to marshal/unmarshal the data, and to deal with data-domain-specific notions of null-vs-empty-string). For the (more common (for me)) case where there is only one field that contains embedded spaces, I just try to get 'em at the end of the line and let the shell deal with it: $ some-command | while read -r first second rest; do ... ; done > Maybe it would be enough to have the universal interface be "tables" (i.e. text streams in some format that supports adequate escaping of embedded row and column delimiters)... or maybe even just table rows, and let the user deal with memorizing column numbers (or let each originating command support a fully general way to specify what columns are requested, as ps alone does on modern systems) Of course, this isn't *really* different from allowing any data structure - after all, the value for any field could itself be a fully escaped table in text format. [...] Well, in some sense with byte streams you have a table of newline-delimited bytes (rows), and byte subfields separated by whitespace (columns). And anything on top of that could (in some context, and with some syntax) be considered just further escaped tables in text format. I think that's essentially the same thing that you said, only with the outermost table syntax removed. But like you said, this isn't really different from allowing any data structure. Importantly, though, it doesn't impose any particular data structure, either. I've worked at a couple of different places that had in-house tools for working with explicit table semantics in command line suites, and where they fit the data domain, that was hugely useful. Generally speaking, they were special purpose enough to warrant their own tools, but still general purpose enough to be composable (were designed for use in shell pipelines) and applicable in domains beyond the intentions of their original authors. Still, the burden of "thinking in tables" would make them too heavyweight for a lot of common use cases. Sometimes my data structure is "paragraphs of text": $ lorem -p 3 | perl -00 -wnle '2 == $. && print' | wc -w Other times I want a tree (JSON, s-expressions, ...), or even a stream of trees[2]. I consider it a feature that these more complex data structures are not assumed or imposed in contexts where they are not needed. Take care, -Al [0] You could get 'ls' to do it, too, (without '-t') but here the use of TIME_STYLE is a presumably non-portable (but handy!) GNU-ism: $ TIME_STYLE='+%s' ls -l | tail -n +2 | sort -k6,6n | xargs -L1 sh -c 'shift 5; echo "$@"' It's different from the '-t' option, though, in that it forces a predicatable date field format in the output of 'ls -l', so side-steps the need for downstream date parsing altogether and simply jumps into sorting (after chopping off the 'total N' header (groans all around)). [1] E.g., $ # read 'bmt' as: "by mtime" $ alias bmt='stat -c "%Y %n" * | sort -k1,1n | xargs -L1 sh -c '"'echo "'"$@"'"'" $ bmt [2] Probably flattened.
[-- Attachment #1: Type: text/plain, Size: 510 bytes --] On Wed, Mar 4, 2020 at 11:18 PM Ken Thompson via TUHS <tuhs@minnie.tuhs.org> wrote: > do i get a prize: > Depends on whether you do your grocery shopping at Trader Joe's. ls -tj > /bin/ls: illegal option -- j > usage: ls [-ABCFGHLOPRSTUWabcdefghiklmnopqrstuwx1] [file ...] > Very nice. Wasn't there something in the fortune file at one point about the "Monty Python and the Holy Grail" bridge crossing scene where the question was, "what $n$ lower case letters are not options to ls(1)?" - Dan C. [-- Attachment #2: Type: text/html, Size: 1081 bytes --]
On Wed, 4 Mar 2020, Lyndon Nerenberg wrote:
(Uses for "rev")
> It's handy for building rhyming dictionaries:
>
> rev < /usr/share/dict/web2 | sort | rev > rhymes
Neat!
-- Dave
On Wed, 4 Mar 2020, Ken Thompson via TUHS wrote:
> do i get a prize:
> ls -tj
> /bin/ls: illegal option -- j
> usage: ls [-ABCFGHLOPRSTUWabcdefghiklmnopqrstuwx1] [file ...]
Another candidate for option-cleansing... Interesting; I get different
options with the Mac and FreeBSD:
Mac:
usage: ls [-ABCFGHLOPRSTUWabcdefghiklmnopqrstuwx1] [file ...]
FreeBSD:
usage: ls [-ABCFGHILPRSTUWZabcdfghiklmnopqrstuwxy1,] [-D format] [file ...]
So FreeBSD has added up "y,D:" (in getopt(3)-speak); my eyes are burning...
-- Dave
[-- Attachment #1: Type: text/plain, Size: 814 bytes --] On Thu, Mar 5, 2020 at 2:51 PM Dave Horsfall <dave@horsfall.org> wrote: > On Wed, 4 Mar 2020, Ken Thompson via TUHS wrote: > > > do i get a prize: > > ls -tj > > /bin/ls: illegal option -- j > > usage: ls [-ABCFGHLOPRSTUWabcdefghiklmnopqrstuwx1] [file ...] > > Another candidate for option-cleansing... Interesting; I get different > options with the Mac and FreeBSD: > > Mac: > > usage: ls [-ABCFGHLOPRSTUWabcdefghiklmnopqrstuwx1] [file ...] > > FreeBSD: > > usage: ls [-ABCFGHILPRSTUWZabcdfghiklmnopqrstuwxy1,] [-D format] > [file ...] > > So FreeBSD has added up "y,D:" (in getopt(3)-speak); my eyes are burning... > FreeBSD wouldn't need -, if there were a good filter to add , to large numbers... Some of the proliferation of options has been due to a lack of proper building-blocks.... Warner [-- Attachment #2: Type: text/html, Size: 1259 bytes --]
My use for rev(1): uniq(1)’s -f <n> ignores the first <n> fields of a line. If you want it to ignore the last <n> fields: rev | uniq -f <n> | rev ches
On 05-Mar-20 6:57, Doug McIlroy wrote: >> These go all the way back to v7 unix, where ls has an option to reverse > the sort order (which could have been done by passing the output to tac). > > A cool idea, but tac was not in v7. And tail didn't get the -r > option until v8. Tail acquired a -r option between 3BSD [1] and 4BSD [2]. I remember using that option on SunOS in 1990 as part of a prank we played on a friend at the university. On the Sun 3 workstations we were using at the time, one could enter the monitor/debugger program by pressing L1-A. By remotely logging into a workstation and running a shell loop, one could ensure that when the monitor was entered the active program would be that shell. It was then easy to modify the uid field for the active process (the loop-running shell) and set it to zero. After exiting the monitor, a subshell launched from that shell would have full root privileges. All we had to do was wait for the friend to lock his workstation when taking a break in order to obtain root privileges on his workstation and then change to his uid in order to modify his files via NFS on the university's Gould file server. Based on this capability, I wrote the following script that would rename all our friend's files and directories to words from the dictionary. The script also created (via tail -r) another script that would undo this change. #!/bin/sh TMP=/tmp DIR=$1 FILES=$TMP/f.$$ WORDS=$TMP/w.$$ CMD=$TMP/c.$$ REV=$TMP/r.$$ trap '' 0 1 2 3 15 find $DIR -depth -print >$FILES head -`wc -l <$FILES|sed 's/[ ]*//'` /usr/dict/words >$WORDS paste $FILES $WORDS | sed -e ' /^\. /d s/\(.*\)\/\(.*\) \(.*\)/mv \1\/\2 \1\/\3/ ' >$CMD rm $FILES $WORDS tail -r $CMD | sed -e ' s/mv \(.*\) \(.*\)/mv \2 \1/ ' >$REV sh <$CMD rm $CMD Unfortunately, it turned out that tail -r had a limit on the number of lines it could reverse. Although the script and its undo worked fine on a test set of a small number of files, when run on our friend's directory it created a faulty undo script. Our friend ended up graduating with files named "abaca" and "abacinate". [1] https://dspinellis.github.io/manview/?src=https%3A%2F%2Fraw.githubusercontent.com%2Fdspinellis%2Funix-history-repo%2FBSD-3%2Fusr%2Fman%2Fman1%2Ftail.1&name=BSD%203%3A%20tail(1)&link=https%3A%2F%2Fgithub.com%2Fdspinellis%2Funix-history-repo%2Fblob%2FBSD-3%2Fusr%2Fman%2Fman1%2Ftail.1 [2] https://dspinellis.github.io/manview/?src=https%3A%2F%2Fraw.githubusercontent.com%2Fdspinellis%2Funix-history-repo%2FBSD-4%2Fusr%2Fman%2Fman1%2Ftail.1&name=BSD%204%3A%20tail(1)&link=https%3A%2F%2Fgithub.com%2Fdspinellis%2Funix-history-repo%2Fblob%2FBSD-4%2Fusr%2Fman%2Fman1%2Ftail.1 -- Diomidis Spinellis Free edX MOOC on Unix Tools: Data, Software, and Production Engineering https://www.spinellis.gr/unix?tuhs20200306
[-- Attachment #1: Type: text/plain, Size: 5818 bytes --] On Thursday, 5 March 2020 at 14:56:58 -0700, Warner Losh wrote: > On Thu, Mar 5, 2020 at 2:51 PM Dave Horsfall <dave@horsfall.org> wrote: >> On Wed, 4 Mar 2020, Ken Thompson via TUHS wrote: >> >>> do i get a prize: >>> ls -tj >>> /bin/ls: illegal option -- j >>> usage: ls [-ABCFGHLOPRSTUWabcdefghiklmnopqrstuwx1] [file ...] >> >> Another candidate for option-cleansing... Interesting; I get different >> options with the Mac and FreeBSD: >> >> Mac: >> >> usage: ls [-ABCFGHLOPRSTUWabcdefghiklmnopqrstuwx1] [file ...] >> >> FreeBSD: >> >> usage: ls [-ABCFGHILPRSTUWZabcdfghiklmnopqrstuwxy1,] [-D format] >> [file ...] >> >> So FreeBSD has added up "y,D:" (in getopt(3)-speak); my eyes are burning... > > FreeBSD wouldn't need -, if there were a good filter to add , to large > numbers... Some of the proliferation of options has been due to a lack of > proper building-blocks.... I wasn't going to join this discussion, but as the perpetrator of all three of the options that Dave complains about, I think it's worth explaining the rationale. First: yes, filters are good. They make for an extraordinarily flexible system. And many options are just bloat. But on the other hand, let's follow on with your example and assume a clever filter, say commafy, which would insert commas as needed in its input: $ ls -l | commafy 5 You really need the 5 (column number), because you can't rely on all large numeric values to require commas. Consider: $ ls -l 939585975893478543543 -rw-r--r-- 2 grog home 1719298048 8 Mar 14:14 939585975893478543543 The alternative would be to have the column number explicitly stated in the filter, but that would make the filter more specific to ls. But do you really want to add that much input when typing interactively into a shell? How much easier it is just to write: $ ls -l, 939585975893478543543 -rw-r--r-- 2 grog home 1,719,298,048 8 Mar 14:14 939585975893478543543 And then there are things that a filter can't easily do, the rationales for -y and -D format. -y is really a workaround for a bug in the POSIX specification for ls(1). From https://pubs.opengroup.org/onlinepubs/009695399/utilities/ls.html: -t Sort with the primary key being time modified (most recently modified first) and the secondary key being filename in the collating sequence. It's not immediately obvious, but these two keys sort in the opposite order. The file name is sorted alphabetically, but the modification time is the other way round (*reverse* chronological). This problem bites you, for example, when you list files from two different cameras that can take more than one image with the same time stamp. FAT timestamps have a granularity of 1 second, so they all end up with exactly the same time stamp. From a diary entry for 24 January 2009 (http://www.lemis.com/grog/diary-jan2009.php?subtitle=%E2%80%9CNot%20a%20bug,%20a%20feature%E2%80%9D:%20episode%204714&article=lsorder#lsorder): === grog@dereel (/dev/ttyp2) ~/Photos/20061223/orig 63 -> ls -lTrt -rwxrwxrwx 1 grog home 2478324 Dec 23 15:35:08 2006 DSCN1325.JPG -rwxr-xr-x 1 grog home 1628592 Dec 23 17:11:00 2006 img_5504.jpg -rwxr-xr-x 1 grog home 1621982 Dec 23 17:11:00 2006 img_5503.jpg -rwxrwxrwx 1 grog home 2583242 Dec 23 17:27:30 2006 DSCN1326.JPG -rwxrwxrwx 1 grog home 2476707 Dec 23 17:27:48 2006 DSCN1327.JPG The file names for images with different timestamps are sorted alphabetically. The file names for images with the same timestamps are sorted in reverse alphabetical order. What to do? Potentially you could write a filter here too, though it wouldn't be simple, because the timestamp representation depends on the age of the file. And you can't just fix the bug, because it has been elevated to a feature. So -y does the right thing. And that date. There are three relatively arbitrary formats, two of them depending on how long ago the timestamp was: -rw-r--r-- 2 grog home 1,719,298,048 8 Mar 14:14 939585975893478543543 -rw-r--r-- 1 grog home 0 24 Sep 2012 foo You can fix that (on FreeBSD and probably on macOS) with the equally unsupported -T flag ("full timestamp"): $ ls -lT 939585975893478543543 foo -rw-r--r-- 2 grog home 1719298048 8 Mar 14:14:58 2020 939585975893478543543 -rw-r--r-- 1 grog home 0 24 Sep 14:42:57 2012 foo Do we need another format? Maybe. Certainly it would help to have a different format if you want to pass the output to a filter that looks at the timestamp. What should it be? Your guess is as good as mine, but probably different. Obvious choices are raw time_t and YYYYMMDDhhmmss. So I introduced the -D option to allow the user to choose his own output format. Is this a good idea? I certainly had pangs of conscience every time, and a non-standard option runs the risk of being incompatible with other systems. For example, Linux uses -T to define the tab size (arguably a better choice for a filter) and -D to produce output for Emacs dired mode. In summary: there's a tradeoff between the elegance of filters and the effort that they require. Adding options has its disadvantages too. You need to remember them, and they can easily become incompatible. But these specific features make life considerably easier and add very little to the size of the executable. I'd be interested to hear of alternative solutions to the issues. Greg -- Sent from my desktop computer. Finger grog@lemis.com for PGP public key. See complete headers for address and phone numbers. This message is digitally signed. If your Microsoft mail program reports problems, please read http://lemis.com/broken-MUA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 163 bytes --]
After following this discussion, I guess that I have a simplistic way to determine whether something should be a dash option or a filter. In general, I'd make a filter if whatever it was doing was applicable to more than one command, a dash option otherwise. Jon
[-- Attachment #1: Type: text/plain, Size: 1075 bytes --] The idea of a simple rule is great, but the suggested rule fails on sort -u which afaik came after sort | uniq for performance reasons. Another idea on the same vein is that a flag should be added only when the job can be done inside the program and not with stdin/stdout (or no flag can be added if one can reproduce the same behavior using pipelines). So, you need sort -u because only within sort can you get the performance needed to get the job done. But you don't need -h in ls -lh. All the information to render a human readable number is present on stdout of ls -l. You could easily have a filter which renders numbers with options like adding commas, dots, scientific notation, precision, money, units, etc. Tyler On Sun, Mar 8, 2020, 07:33 Jon Steinhart <jon@fourwinds.com> wrote: > After following this discussion, I guess that I have a simplistic way to > determine whether something should be a dash option or a filter. In > general, I'd make a filter if whatever it was doing was applicable to > more than one command, a dash option otherwise. > > Jon > [-- Attachment #2: Type: text/html, Size: 1592 bytes --]
On 8 Mar 2020 16:26 +1100, from grog@lemis.com (Greg 'groggy' Lehey): > FAT timestamps have a granularity of 1 second, Not quite. Last modified time is recorded to within two seconds (FAT squeezes the seconds into a 5-bit field, which allows packing a time into two bytes). Other times are recorded with different granularity, sometimes depending on the OS/version used to make the change to the file system. And of course FAT has no concept of time zones; everything is local time, all the time. https://en.wikipedia.org/wiki/Design_of_the_FAT_file_system#Directory_entry has some of the gory details. -- Michael Kjörling • https://michael.kjorling.se • michael@kjorling.se “Remember when, on the Internet, nobody cared that you were a dog?”
[-- Attachment #1: Type: text/plain, Size: 3659 bytes --] Nothing I'm aware of. I didn't mind throwing "tac" over the wall, because it was trivial, probably a couple hours work for me, under a minute for Ken. But the rsort source is not at all trivial, and still of potential value to AT&T. The source managed to get out as part of the "Hancock" project. I found a link in https://www.wired.com/2007/10/att-invents-pro/ but the page is gone. It probably didn't help that Wired titled the article *AT&T Invents Programming Language for Mass Surveillance* That's horse-pucky, akin to "Pitchfork makers invent device for spearing babies". I'm trying to track down a copy that was released publicly. I'm not hopeful. On Mon, Mar 9, 2020 at 11:28 AM Tyler Adams <coppero1237@gmail.com> wrote: > Woah, this sounds really useful, is there anything like it today? > > On Sun, Mar 8, 2020, 16:32 John P. Linderman <jpl.jpl@gmail.com> wrote: > >> In the "UNIX SYSTEM" issue of the BSTJ back in October of 1984, I >> suggested that it might be better, both for functionality *and* >> performance, to have a sort that only worked on records with a *single* >> key to be sorted *lexicographically*, and put all the complexity of >> dealing with native integers, dates, case-mapping, etc into a key-building >> front end. I wrote such a sort built around a radix sort. The sort >> itself sported very few options relating to record format (fixed-length, >> newline terminated, and header-based, where an ascii header identified >> record length, and, optionally, key position and key length), where to find >> the key in fixed-length and newline terminated records, merge-only, check >> sort order only, unique, strip off the sort key (to avoid the need for a >> post-process in many cases). Key-building was usually near-trivial using >> awk or perl or a few commands for tweaking native integer and floating >> point values so they would sort lexicographically. The sort was stable and >> blazingly fast. Some summer students once complained to me that I was >> messing up a paper they were writing because my external sort was faster >> than an internal qsort... the kind of complaint that warms one's heart. At >> the back of my mind was a generic key-building library that would >> accommodate (decimal) numbers of arbitrary length, with or without "E" >> exponents, dates in various formats, string collation for Unicode, etc. It >> remains at the back of my mind. >> >> On Sun, Mar 8, 2020 at 5:32 AM Tyler Adams <coppero1237@gmail.com> wrote: >> >>> The idea of a simple rule is great, but the suggested rule fails on sort >>> -u which afaik came after sort | uniq for performance reasons. >>> >>> Another idea on the same vein is that a flag should be added only when >>> the job can be done inside the program and not with stdin/stdout (or no >>> flag can be added if one can reproduce the same behavior using pipelines). >>> >>> So, you need sort -u because only within sort can you get the >>> performance needed to get the job done. >>> >>> But you don't need -h in ls -lh. All the information to render a human >>> readable number is present on stdout of ls -l. You could easily have a >>> filter which renders numbers with options like adding commas, dots, >>> scientific notation, precision, money, units, etc. >>> >>> Tyler >>> >>> On Sun, Mar 8, 2020, 07:33 Jon Steinhart <jon@fourwinds.com> wrote: >>> >>>> After following this discussion, I guess that I have a simplistic way to >>>> determine whether something should be a dash option or a filter. In >>>> general, I'd make a filter if whatever it was doing was applicable to >>>> more than one command, a dash option otherwise. >>>> >>>> Jon >>>> >>> [-- Attachment #2: Type: text/html, Size: 5838 bytes --]
On Mon, Mar 09, 2020 at 05:06:20PM -0400, John P. Linderman wrote: > but the page is gone. It probably didn't help that Wired titled the article > > *AT&T Invents Programming Language for Mass Surveillance* > > That's horse-pucky, akin to "Pitchfork makers invent device for spearing > babies". I'm trying to track down a copy that was released publicly. I'm > not hopeful. There is a copy here: https://github.com/mqudsi/hancock Not sure what other conclusion Wired was supposed to come to, given that the provided "Hello World" programs in the paper were all mass surveillance examples (tracking international calls to given numbers, tracking data streams to given IP addresses, and tracking specific connections to a given ISP). The license in the linked repository is different than the old password-gated NSL that was applied on the research.att.com pages. I wonder how many licenses this code was released with, over the years. khm
> The idea of a simple rule is great, but the suggested rule fails on sort -u
> which afaik came after sort | uniq for performance reasons.
As the guilty party for most of sort's comparison options, I can
attest that efficiency was not an objective of -u. It was invented
precisely because uniq had proved useful, but not when one was
interested in uniqueness only of some key aspect of the data.
-u differs from uniq in that -u selects samples based on
equality of keys, not equality of lines. In the default
case of whole-line keys, sort -u of course does exactly
what sort|uniq does.
For many applications of -u with keys, the non-key fields
are not of interest. Then sed s/nonkeys//|sort|uniq may
suffice. But sed did not exist when -u was invented.
And not all sort key specs are easily imitated in sed.
Doug
[-- Attachment #1: Type: text/plain, Size: 1381 bytes --] On Tue, Mar 10, 2020 at 12:16 PM Doug McIlroy <doug@cs.dartmouth.edu> wrote: > > The idea of a simple rule is great, but the suggested rule fails on sort > -u > > which afaik came after sort | uniq for performance reasons. > > As the guilty party for most of sort's comparison options, I can > attest that efficiency was not an objective of -u. It was invented > precisely because uniq had proved useful, but not when one was > interested in uniqueness only of some key aspect of the data. > > -u differs from uniq in that -u selects samples based on > equality of keys, not equality of lines. In the default > case of whole-line keys, sort -u of course does exactly > what sort|uniq does. > > For many applications of -u with keys, the non-key fields > are not of interest. Then sed s/nonkeys//|sort|uniq may > suffice. But sed did not exist when -u was invented. > And not all sort key specs are easily imitated in sed. > This begs questions of stability: in the event of non-unique keys and non-key fields in the sortable data, which "records" (lines) are kept and which are discarded? Surely the "first" is kept and subsequent entries with the same key suppressed, but I confess I don't know enough about the internals of sed to know even what algorithm it uses (I assume a disk-based merge sort?), but I would imagine these details have changed over time. - Dan C. [-- Attachment #2: Type: text/html, Size: 1790 bytes --]
On Tue, 10 Mar 2020 13:38:23 -0400 Dan Cross <crossd@gmail.com> wrote:
>
> This begs questions of stability: in the event of non-unique keys and
> non-key fields in the sortable data, which "records" (lines) are kept and
> which are discarded? Surely the "first" is kept and subsequent entries with
> the same key suppressed, but I confess I don't know enough about the
> internals of sed to know even what algorithm it uses (I assume a disk-based
> merge sort?), but I would imagine these details have changed over time.
FreeBSD manpage for sort says that -u implies a stable sort,
similar to -s.
[-- Attachment #1: Type: text/plain, Size: 797 bytes --] On Tue, Mar 10, 2020 at 1:44 PM Bakul Shah <bakul@bitblocks.com> wrote: > On Tue, 10 Mar 2020 13:38:23 -0400 Dan Cross <crossd@gmail.com> wrote: > > > > This begs questions of stability: in the event of non-unique keys and > > non-key fields in the sortable data, which "records" (lines) are kept and > > which are discarded? Surely the "first" is kept and subsequent entries > with > > the same key suppressed, but I confess I don't know enough about the > > internals of sed to know even what algorithm it uses (I assume a > disk-based > > merge sort?), but I would imagine these details have changed over time. > > FreeBSD manpage for sort says that -u implies a stable sort, > similar to -s. > Thanks; that makes sense. I'm still interested in historical data, though. :-) - Dan C. [-- Attachment #2: Type: text/html, Size: 1255 bytes --]
> This begs questions of stability Astute question. I had that in my original draft, but eliminited it for what I thought was clarity. Anyway, depending on implementation of sort, you may need sort -s. Of course it doesn't matter which copy among several equal lines uniq produces, nor does it matter in sort when there are no comparison options--they're all the same. > I don't know enough about the > internals of sed to know even what algorithm it uses > (... a disk-based merge sort?) sed is not a sorting program--basically it copies input to output, making line-by-line editing changes. That's the way I meant to use it in sed s/nonkeys//|sort -keys|uniq. (I have added options to sort, hopefully for clarity). The argument to sed here means substitute the empty string for the nonkey fields (specified by a regular expression). If "sed" was a typo for "sort", all versions of sort that I know of use an internal sorting algorithm for big chunks of the file, then combines the chunks by merge. But internal sorting varies all over the map--variations on quicksort, radix sort, merge sort, ... Doug
[-- Attachment #1: Type: text/plain, Size: 2210 bytes --] On Tue, Mar 10, 2020 at 2:43 PM Doug McIlroy <doug@cs.dartmouth.edu> wrote: > > This begs questions of stability > > Astute question. I had that in my original draft, but eliminited > it for what I thought was clarity. Anyway, depending on implementation > of sort, you may need sort -s. Of course it doesn't matter which copy > among several equal lines uniq produces, nor does it matter in sort > when there are no comparison options--they're all the same. > Thanks. That's interesting. Did `sort -s` come later? The idea that you preferred clarity over stability for `sort -u` would indicate so, otherwise one might imagine that `-u` would just imply `-s` and that would be that. > I don't know enough about the > > internals of sed to know even what algorithm it uses > > (... a disk-based merge sort?) > > sed is not a sorting program--basically it copies input to > output, making line-by-line editing changes. That's the > way I meant to use it in sed s/nonkeys//|sort -keys|uniq. > (I have added options to sort, hopefully for clarity). > The argument to sed here means substitute the empty > string for the nonkey fields (specified by a regular expression). > `sed` in my email was a typo, as you speculated below. Interestingly, this `sed` construction prior to `sort` loses information, which perhaps doesn't matter in any given specific case, but is insufficient in general, which I gathered to be the entire reason you implemented `sort -u`. If "sed" was a typo for "sort", It was. all versions of sort that > I know of use an internal sorting algorithm for big chunks > of the file, then combines the chunks by merge. But internal > sorting varies all over the map--variations on quicksort, > radix sort, merge sort, ... > It's the details of the internal sorts that are most interesting in some sense, as the merges are probably fairly straight forward but the internal sorts will affect stability and have other interesting characteristics. As an aside, one must imagine that, in this day and age, a "big chunk" is probably big enough to hold the vast majority of files entirely in RAM, and only exceptionally large files actually require merging multiple blocks. - Dan C. [-- Attachment #2: Type: text/html, Size: 3279 bytes --]
[-- Attachment #1: Type: text/plain, Size: 475 bytes --] When I took a comparative languages class in school, the teacher said that the complexity of a programming language varies with the square of its number of features. I wonder if it's similar for command line options in shell-callables? On the other hand, adding command line options was (at least at one time) seen as a way of distinguishing GNU tools from Unix tools - that is, they were seen as a way of avoiding the copyright lawsuits that were snipping at BSD's heels. [-- Attachment #2: Type: text/html, Size: 607 bytes --]
[-- Attachment #1: Type: text/plain, Size: 1952 bytes --] On Tue, 10 Mar 2020, Dan Stromberg wrote: > When I took a comparative languages class in school, the teacher said > that the complexity of a programming language varies with the square of > its number of features. That sort of makes sense from a mathematical point of view, if you regard it as a matrix of side effects. I hate to think about how it affects Perl (my favourite language) though :-) > I wonder if it's similar for command line options in shell-callables? I'm starting to think that if a utility requires many options then perhaps they ought to be split into filters (or at least environment variables); I despair at how *ix is drifting from "one tool, one job" to "one size fits all"... The "ls" command for example really needs an option-ectomy; I find that I don't really care about the exact number of bytes there are in a file as the nearest KiB or MiB (or even GiB) is usually good enough, so I'd be happy if "-h" was the default with some way to turn it off (yes, I know that it's occasionally useful to add them all up in a column, but that won't tell you how many media blocks are required). Quickly now, without looking: which option shows unprintable characters in a filename? Unless you use it regularly (in which case you have real problems) you would have to look it up; I find that "ls ... | od -bc" to be quicker, especially on filenames with trailing blanks etc (which "-B" won't show). > On the other hand, adding command line options was (at least at one > time) seen seen as a way of distinguishing GNU tools from Unix tools - > that is, they were seen as a way of avoiding the copyright lawsuits that > were snipping at BSD's heels. I've never liked GNU's "--bloody-long-option" convention as you still have to look up which one does what, but I've never thought about that view; a lot of long options still accept a single character (subject to feeping creaturism, of course). -- Dave
On Wed, 11 Mar 2020, Dave Horsfall wrote: > I'm starting to think that if a utility requires many options then perhaps > they ought to be split into filters (or at least environment variables); I > despair at how *ix is drifting from "one tool, one job" to "one size fits > all"... > > The "ls" command for example really needs an option-ectomy; I find that I > don't really care about the exact number of bytes there are in a file as the > nearest KiB or MiB (or even GiB) is usually good enough, so I'd be happy if > "-h" was the default with some way to turn it off (yes, I know that it's > occasionally useful to add them all up in a column, but that won't tell you > how many media blocks are required). > > Quickly now, without looking: which option shows unprintable characters in a > filename? Unless you use it regularly (in which case you have real problems) > you would have to look it up; I find that "ls ... | od -bc" to be quicker, > especially on filenames with trailing blanks etc (which "-B" won't show). It would probably be interesting to define a simplified standard, because yeesh, trying to implement even a command as basic as ls is just torture (mainly because it basically requires putting all of "column" and most of "sort" into it)! > I've never liked GNU's "--bloody-long-option" convention as you still have to > look up which one does what, but I've never thought about that view; a lot of > long options still accept a single character (subject to feeping creaturism, > of course). I'm still into the one-character switch thing, personally. -uso.
[-- Attachment #1: Type: text/plain, Size: 3399 bytes --] This is *great*, Kurt. The source in src/runtime/hrs/src for rsort.c is their version of my external sort, modified to be a subroutine. There's some lessons to be learned about "software hygiene". I was cavalier about freeing what I allocated dynamically. As a result, their version leaks like a sieve if the subroutine is called repeatedly. Apropos of which, they came to me having noted that only the first call was acting as expected. There's a wonderful irony (I'm big on irony). I had replaced my do-it-yourself argument processing with getopt. The code has the following comment ** Use getopt() for portability. A few lines later, you see optind = 1; /* reset after use in Hancock program * while ((c = getopt(argc, argv, "cCiIjmrsSuvb:f:D:o:p:T:x:y:z:")) != EOF) { optind??? Seems getopt has an undocumented global flag to prevent reprocessing the arguments. How portable:-) Anyway, it should be possible to turn rsort.c back into standalone code. I'd be the obvious person to do it, but that would probably be a violation of some agreement with AT&T. However, if somebody else wants to take on the task (it would make a great summer intern project), I'd be happy to share ideas I have had since retiring that would improve the code. fc.c in the same directory is a library-ized version of a fixcut command I wrote as a fixed-length counterpart to the cut command, for fixed-length inputs (like native floats and integers, which can be tweaked to sort lexicographically). Unlike rsort, I practiced good hygiene and kept track of all allocated space so it could be freed. Too bad they didn't include the man pages for rsort and fixcut. They'd make it easier to understand them. Jon Bentley observed that "comments are love letters to your future self", and I feel a lot of love from the heavily commented rsort code. This probably should move to coff, it's not really about UNIX history (although rsort has vestigial traces of ancient days, like the code to write checkpoint files after each output temp is closed... sorting a million bytes once took hours, with slow processors and disks. It was painful to have to start from scratch if an overnight sort got interrupted. Now sorting a billion bytes is pretty quick, and the checkpoint stuff never gets used. It's one of the things that could profitably disappear.) On Mon, Mar 9, 2020 at 5:22 PM Kurt H Maier <khm@sciops.net> wrote: > On Mon, Mar 09, 2020 at 05:06:20PM -0400, John P. Linderman wrote: > > but the page is gone. It probably didn't help that Wired titled the > article > > > > *AT&T Invents Programming Language for Mass Surveillance* > > > > That's horse-pucky, akin to "Pitchfork makers invent device for spearing > > babies". I'm trying to track down a copy that was released publicly. I'm > > not hopeful. > > There is a copy here: https://github.com/mqudsi/hancock > > Not sure what other conclusion Wired was supposed to come to, given that > the provided "Hello World" programs in the paper were all mass > surveillance examples (tracking international calls to given numbers, > tracking data streams to given IP addresses, and tracking specific > connections to a given ISP). > > The license in the linked repository is different than the old > password-gated NSL that was applied on the research.att.com pages. I > wonder how many licenses this code was released with, over the years. > > > khm > [-- Attachment #2: Type: text/html, Size: 5106 bytes --]
[-- Attachment #1: Type: text/plain, Size: 1360 bytes --] On Wed, Mar 11, 2020 at 11:43 AM John P. Linderman <jpl.jpl@gmail.com> wrote: > This is *great*, Kurt. The source in src/runtime/hrs/src for rsort.c is > their version of my external sort, modified to be a subroutine. There's > some lessons to be learned about "software hygiene". I was cavalier about > freeing what I allocated dynamically. As a result, their version leaks like > a sieve if the subroutine is called repeatedly. Apropos of which, they came > to me having noted that only the first call was acting as expected. There's > a wonderful irony (I'm big on irony). I had replaced my do-it-yourself > argument processing with getopt. The code has the following comment > > ** Use getopt() for portability. > > A few lines later, you see > > optind = 1; /* reset after use in Hancock program * > while ((c = getopt(argc, argv, "cCiIjmrsSuvb:f:D:o:p:T:x:y:z:")) != > EOF) { > > optind??? Seems getopt has an undocumented global flag to prevent > reprocessing the arguments. How portable:-) > It's documented: The variables opterr and optind are both initialized to 1. The optind variable may be set to another value before a set of calls to getopt() in order to skip over more or less argv entries. is what the FreeBSD man page has to say about it. So this just resets any scanning that had happened before this... Warner [-- Attachment #2: Type: text/html, Size: 2292 bytes --]
[-- Attachment #1: Type: text/plain, Size: 1821 bytes --] On Wednesday, 11 March 2020 at 14:18:08 +1100, Dave Horsfall wrote: > > The "ls" command for example really needs an option-ectomy; I find that I > don't really care about the exact number of bytes there are in a file as > the nearest KiB or MiB (or even GiB) is usually good enough, so I'd be > happy if "-h" was the default with some way to turn it off (yes, I know > that it's occasionally useful to add them all up in a column, but that > won't tell you how many media blocks are required). A good example. But you're not removing options, you're just redefining them. In fact I find the -h option particularly emetic, so a better choice in removing options would be to remove -h and use a filter to mutilate the sizes: $ ls -l | humanize But that's a pain, isn't it? That's why there's a -h option for people who like it. Note that you can't do it the other way round: you can't get the exact size from -h output. And then there's the question why you don't like the standard output. Because the number strings are too long and difficult to read, maybe? That's the rationale for the -, option. > Quickly now, without looking: which option shows unprintable > characters in a filename? Unless you use it regularly (in which > case you have real problems) you would have to look it up; I find > that "ls ... | od -bc" to be quicker, especially on filenames with > trailing blanks etc (which "-B" won't show). This is arguably a bug in the -B option. I certainly don't think the pipe notation is quicker. But it's nice to have both alternatives. Greg -- Sent from my desktop computer. Finger grog@lemis.com for PGP public key. See complete headers for address and phone numbers. This message is digitally signed. If your Microsoft mail program reports problems, please read http://lemis.com/broken-MUA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 163 bytes --]
[-- Attachment #1: Type: text/plain, Size: 3408 bytes --] On Wed, Mar 11, 2020 at 6:57 PM Greg 'groggy' Lehey <grog@lemis.com> wrote: > On Wednesday, 11 March 2020 at 14:18:08 +1100, Dave Horsfall wrote: > > > > The "ls" command for example really needs an option-ectomy; I find that I > > don't really care about the exact number of bytes there are in a file as > > the nearest KiB or MiB (or even GiB) is usually good enough, so I'd be > > happy if "-h" was the default with some way to turn it off (yes, I know > > that it's occasionally useful to add them all up in a column, but that > > won't tell you how many media blocks are required). > > A good example. But you're not removing options, you're just > redefining them. In fact I find the -h option particularly emetic, so > a better choice in removing options would be to remove -h and use a > filter to mutilate the sizes: > > $ ls -l | humanize > > But that's a pain, isn't it? I don't know; that's subjective. > That's why there's a -h option for > people who like it. That's incomplete, in that it implies that an option is the only way to achieve the goal of reducing the perceived pain, but that's not the case. (Note I'm not saying you intended that as an interpretation, but it's a reasonable intuition for an intention.) An interesting counterpoint to this argument is how columnized "ls" is handled under Plan 9: there is no `-C` option to `ls` there; instead, there's a general-purpose `mc` filter that figures out the size of the window it's running in, reads its input, decides how many columns the input will fit into, and emits it columnized. But yes, it would be a pain to type `ls | mc` every time one wanted columnized `ls` output, so this is wrapped up into a shell script called `lc`. Note that this lets you do stuff like, `lc -l` and see multi-column long listings if the window is wide enough. I got so used to this from plan9 that I keep an approximation in $HOME/bin/lc: `exec ls -ACF "$@"`. For the `humanize` thing, I don't see why one couldn't have an `lh` command that generated "human-friendly long output from ls." > Note that you can't do it the other way round: > you can't get the exact size from -h output. > That's true, but now the logic is specialized to ls, and not applicable to anything else (e.g., du? df? wc, perhaps?). Similarly with `-,`. It is not general purpose, which is unfortunate. Granted, combining these things would be a little challenging, but is it likely that one would want `ls -l,h`? Optimize for the common case, etc.... And then there's the question why you don't like the standard output. > Because the number strings are too long and difficult to read, maybe? > That's the rationale for the -, option. > > > Quickly now, without looking: which option shows unprintable > > characters in a filename? Unless you use it regularly (in which > > case you have real problems) you would have to look it up; I find > > that "ls ... | od -bc" to be quicker, especially on filenames with > > trailing blanks etc (which "-B" won't show). > > This is arguably a bug in the -B option. I certainly don't think the > pipe notation is quicker. But it's nice to have both alternatives. By default, plan9 would quote filenames that had characters that were special to the shell (there wasn't really the concept of "non-printable characters in the Unix/TTY sense); this could be disabled by specifying the `-Q` option. - Dan C. [-- Attachment #2: Type: text/html, Size: 4662 bytes --]
[-- Attachment #1: Type: text/plain, Size: 1620 bytes --] I wasn't running FreeBSD. Linux has nothing to say about it. The wonderful thing about standards is that there are so many to choose from. On Wed, Mar 11, 2020 at 5:29 PM Warner Losh <imp@bsdimp.com> wrote: > > > On Wed, Mar 11, 2020 at 11:43 AM John P. Linderman <jpl.jpl@gmail.com> > wrote: > >> This is *great*, Kurt. The source in src/runtime/hrs/src for rsort.c is >> their version of my external sort, modified to be a subroutine. There's >> some lessons to be learned about "software hygiene". I was cavalier about >> freeing what I allocated dynamically. As a result, their version leaks like >> a sieve if the subroutine is called repeatedly. Apropos of which, they came >> to me having noted that only the first call was acting as expected. There's >> a wonderful irony (I'm big on irony). I had replaced my do-it-yourself >> argument processing with getopt. The code has the following comment >> >> ** Use getopt() for portability. >> >> A few lines later, you see >> >> optind = 1; /* reset after use in Hancock program * >> while ((c = getopt(argc, argv, "cCiIjmrsSuvb:f:D:o:p:T:x:y:z:")) != >> EOF) { >> >> optind??? Seems getopt has an undocumented global flag to prevent >> reprocessing the arguments. How portable:-) >> > > It's documented: > > The variables opterr and optind are both initialized to 1. The optind > variable may be set to another value before a set of calls to > getopt() in > order to skip over more or less argv entries. > > is what the FreeBSD man page has to say about it. So this just resets any > scanning that had happened before this... > > Warner > [-- Attachment #2: Type: text/html, Size: 2852 bytes --]
On 3/11/20 8:13 PM, John P. Linderman wrote: > I wasn't running FreeBSD. Linux has nothing to say about it. The wonderful > thing about standards is that there are so many to choose from. Did somebody mention ... standards? https://pubs.opengroup.org/onlinepubs/9699919799/functions/getopt.html#tag_16_206 -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRU chet@case.edu http://tiswww.cwru.edu/~chet/
[-- Attachment #1: Type: text/plain, Size: 5879 bytes --] On Wednesday, 11 March 2020 at 19:14:32 -0400, Dan Cross wrote: > On Wed, Mar 11, 2020 at 6:57 PM Greg 'groggy' Lehey <grog@lemis.com> wrote: > >> On Wednesday, 11 March 2020 at 14:18:08 +1100, Dave Horsfall wrote: >>> >>> The "ls" command for example really needs an option-ectomy; I find that I >>> don't really care about the exact number of bytes there are in a file as >>> the nearest KiB or MiB (or even GiB) is usually good enough, so I'd be >>> happy if "-h" was the default with some way to turn it off (yes, I know >>> that it's occasionally useful to add them all up in a column, but that >>> won't tell you how many media blocks are required). >> >> A good example. But you're not removing options, you're just >> redefining them. In fact I find the -h option particularly emetic, so >> a better choice in removing options would be to remove -h and use a >> filter to mutilate the sizes: >> >> $ ls -l | humanize >> >> But that's a pain, isn't it? > > I don't know; that's subjective. It's certainly more work than -h. >> That's why there's a -h option for people who like it. > > That's incomplete, in that it implies that an option is the only way > to achieve the goal of reducing the perceived pain, but that's not > the case. (Note I'm not saying you intended that as an > interpretation, but it's a reasonable intuition for an intention.) What I meant (and this is certainly my interpretation) was that somebody added the -h option because of perceived pain with piping output through another program. I didn't intend to imply that it was the only alternative. > An interesting counterpoint to this argument is how columnized "ls" > is handled under Plan 9: there is no `-C` option to `ls` there; > instead, there's a general-purpose `mc` filter that figures out the > size of the window it's running in, reads its input, decides how > many columns the input will fit into, and emits it columnized. But > yes, it would be a pain to type `ls | mc` every time one wanted > columnized `ls` output, so this is wrapped up into a shell script > called `lc`. Note that this lets you do stuff like, `lc -l` and see > multi-column long listings if the window is wide enough. Yes, that sounds like an excellent method. > For the `humanize` thing, I don't see why one couldn't have an `lh` > command that generated "human-friendly long output from ls." And yes, I deliberately didn't mention this option, though it occurred to me. I have a couple of scripts like this, like: alias l="ls -lbL," >> Note that you can't do it the other way round: you can't get the >> exact size from -h output. > > That's true, but now the logic is specialized to ls, and not > applicable to anything else (e.g., du? df? wc, perhaps?). Similarly > with `-,`. It is not general purpose, which is unfortunate. Yes, this is an issue that I mentioned in an earlier message (I added a positional parameter to work around it). But this is in the nature of the output. mc doesn't have this issue. > Granted, combining these things would be a little challenging, but is it > likely that one would want `ls -l,h`? Optimize for the common case, > etc.... Heh. Never thought of that. But since -h (apparently) never produces output with 4 digits, the -, doesn't ever come into effect. I've just tried it on some big files, and the -, is effectively ignored. > And then there's the question why you don't like the standard > output. I don't like the standard output because things like this are hard to read: -rw-r--r-- 1 grog lemis 8234010624 22 Mar 2012 Casanova-TV-1-5 -rw-r--r-- 1 grog home 13225168900 31 Aug 2019 Movie:_Sahara_2005-2016-04-11-2028 I find this easier to read: -rw-r--r-- 1 grog lemis 8,234,010,624 22 Mar 2012 Casanova-TV-1-5 -rw-r--r-- 1 grog home 13,225,168,900 31 Aug 2019 Movie:_Sahara_2005-2016-04-11-2028 I can't speak for Dave, but this is also less painful: -rw-r--r-- 1 grog lemis 7.7G 22 Mar 2012 Casanova-TV-1-5 -rw-r--r-- 1 grog home 12G 31 Aug 2019 Movie:_Sahara_2005-2016-04-11-2028 The problem for me there is the difficulty comparing lengths, and the implicit inaccuracy. >> Because the number strings are too long and difficult to read, maybe? >> That's the rationale for the -, option. >> >>> Quickly now, without looking: which option shows unprintable >>> characters in a filename? Unless you use it regularly (in which >>> case you have real problems) you would have to look it up; I find >>> that "ls ... | od -bc" to be quicker, especially on filenames with >>> trailing blanks etc (which "-B" won't show). >> >> This is arguably a bug in the -B option. I certainly don't think the >> pipe notation is quicker. But it's nice to have both alternatives. > > By default, plan9 would quote filenames that had characters that > were special to the shell (there wasn't really the concept of > "non-printable characters in the Unix/TTY sense); this could be > disabled by specifying the `-Q` option. Hmm. In this particular case, so does Linux: === grog@bilbo (/dev/pts/11) ~ 2 -> touch "foo " === grog@bilbo (/dev/pts/11) ~ 4 -> l foo* -rw-r--r-- 1 grog grog 1499570 Jun 30 2012 foo -rw-r--r-- 1 grog grog 0 Mar 12 10:40 'foo ' I wonder if that's something we should emulate in FreeBSD. At the very least we should consider whether the lack of identification of trailing blanks is a bug in the FreeBSD implementation of -B. This option isn't in POSIX, and in Linux it means -B, --ignore-backups do not list implied entries ending with ~ So maybe it's a candidate for fixing. Greg -- Sent from my desktop computer. Finger grog@lemis.com for PGP public key. See complete headers for address and phone numbers. This message is digitally signed. If your Microsoft mail program reports problems, please read http://lemis.com/broken-MUA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 163 bytes --]
On Thu, 12 Mar 2020, Greg 'groggy' Lehey wrote:
> On Wednesday, 11 March 2020 at 14:18:08 +1100, Dave Horsfall wrote:
>>
>> The "ls" command for example really needs an option-ectomy; I find that I
>> don't really care about the exact number of bytes there are in a file as
>> the nearest KiB or MiB (or even GiB) is usually good enough, so I'd be
>> happy if "-h" was the default with some way to turn it off (yes, I know
>> that it's occasionally useful to add them all up in a column, but that
>> won't tell you how many media blocks are required).
>
> A good example. But you're not removing options, you're just
> redefining them. In fact I find the -h option particularly emetic, so
> a better choice in removing options would be to remove -h and use a
> filter to mutilate the sizes:
>
> $ ls -l | humanize
>
> But that's a pain, isn't it? That's why there's a -h option for
> people who like it. Note that you can't do it the other way round:
> you can't get the exact size from -h output.
>
> And then there's the question why you don't like the standard output.
> Because the number strings are too long and difficult to read, maybe?
> That's the rationale for the -, option.
>
>> Quickly now, without looking: which option shows unprintable
>> characters in a filename? Unless you use it regularly (in which
>> case you have real problems) you would have to look it up; I find
>> that "ls ... | od -bc" to be quicker, especially on filenames with
>> trailing blanks etc (which "-B" won't show).
>
> This is arguably a bug in the -B option. I certainly don't think the
> pipe notation is quicker. But it's nice to have both alternatives.
>
> Greg
> --
> Sent from my desktop computer.
> Finger grog@lemis.com for PGP public key.
> See complete headers for address and phone numbers.
> This message is digitally signed. If your Microsoft mail program
> reports problems, please read http://lemis.com/broken-MUA
>
I went through all the switches defined by POSIX, and figured that those
26 could be cut down. My concept reduced the number of switches from 26
to 9 (FLRadfiln). Of course, the idea is to be more minimalist than
POSIX, so some people's opinions on what is or isn't necessary may differ
from mine.
Of course, this changes the default behavior of ls because it no longer
would be able to do columnar listings (|column for that).
I felt -A was a redundant "almost -a".
I felt -C and -x were redundant because a tool like column(1) could be
used to do the same job (even though column(1) isn't POSIX).
I felt -H was a redundant "almost -L".
I felt -S, -r and -t could be implemented in other ways using sort(1).
I felt -c and -u were meaningless, but that's because of the filesystems I
usually work with that do not have functional equivalents. -u for one is
completely useless on VFAT even though it has such timestamps! YMMV.
I felt -g and -o could be replaced by cut(1).
I felt -k wasn't really all that important. Just halve the numbers.
I felt -m wasn't really all that important. There's other ways to convert
to that format, no doubt, through filters.
I felt -p was a redundant "almost -F".
I felt -q could be done just fine with something like tr(1).
I felt -s was a redundant "kindasorta -l".
And -1 becomes the new default, so it's redundant. ;)
Again, YMMV. ;)
-uso.
[-- Attachment #1: Type: text/plain, Size: 4120 bytes --] On Wednesday, 11 March 2020 at 20:53:12 -0400, Steve Nickolas wrote: > I went through all the switches defined by POSIX, and figured that > those 26 could be cut down. A brave man to defy POSIX! I wasn't so brave, which is why we have the -y option. > My concept reduced the number of switches from 26 to 9 (FLRadfiln). > Of course, the idea is to be more minimalist than POSIX, so some > people's opinions on what is or isn't necessary may differ from > mine. OK, let's compare notes: > I felt -A was a redundant "almost -a". Arguably -a could go too. The distinction seems arbitrary. > I felt -C and -x were redundant because a tool like column(1) could be > used to do the same job (even though column(1) isn't POSIX). Neither would this ls(1) be. > I felt -H was a redundant "almost -L". No arguments, but I suspect that somebody had a good reason for this distinction, and removing it could cause problems. > I felt -S, -r and -t could be implemented in other ways using sort(1). -S isn't POSIX. And to implement it without an option would mean removing -h. As I mentioned earlier, -t can't be done by a filter without significantly modifying the timestamp output. That was my rationale for the -D option, which allows sorting by an external filter. -r could work. > I felt -c and -u were meaningless, but that's because of the filesystems I > usually work with that do not have functional equivalents. -u for one is > completely useless on VFAT even though it has such timestamps! YMMV. I think this says more about your file systems than about the options. I find both incredibly useful, and there's no easy way to get the information elsewhere. stat(1) would be an option, but then that could replace ls(1) completely. > I felt -g and -o could be replaced by cut(1). -g is already obsolete in FreeBSD (accepted and ignored). -o has already been repurposed (show file flags). > I felt -k wasn't really all that important. Just halve the numbers. Agreed. > I felt -m wasn't really all that important. There's other ways to convert > to that format, no doubt, through filters. Possibly. Certainly I wouldn't miss it. > I felt -p was a redundant "almost -F". OK. > I felt -q could be done just fine with something like tr(1). I think that it could be replaced by -b. "?" isn't really very helpful. > I felt -s was a redundant "kindasorta -l". I can't agree with that, but I've never used it. The only sensible use would appear to be talking about disk blocks, but on FreeBSD at any rate it looks at the BLOCKSIZE environment variable, which I have set to 1048576 (so that utilities will display in MB where appropriate), and that's what -s does too: 2079 -rw-r--r-- 1 grog wheel 2,178,735,915 4 Oct 11:15 Willkommen-bei-den-Honeckers---Spielfilm,-Deutschland-2016-20191003-125200.mp4 That makes it pretty useless. So, any others? -G: Colorized output. I'd be *really* happy to get rid of this, but it's not easy to instate with a filter, so I suppose there are enough people who like it that it will have to stay. -P: Seems only to be there to cancel a -H or -L. -W: "Display whiteouts when scanning directories". I don't even understand what that is. -a: See discussion of -A. --color: Again, no thanks. -f: We haven't really discussed this one. If you want to remove -S, -r and -t, then arguably -f should become the default and be -removed. -n: Make it the default and require a filter to convert group and user numbers to IDs. -y: If we get rid of all sorting, it will no longer be needed. -,: Make the option standard: output numbers with commas every 3 digits. Then this option specification wouldn't be needed. Of course, none of this will happen. But it is interesting to think about it. In particular, options like -g and -o, which are no longer modern. Greg -- Sent from my desktop computer. Finger grog@lemis.com for PGP public key. See complete headers for address and phone numbers. This message is digitally signed. If your Microsoft mail program reports problems, please read http://lemis.com/broken-MUA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 163 bytes --]
On Thu, 12 Mar 2020, Greg 'groggy' Lehey wrote: > On Wednesday, 11 March 2020 at 20:53:12 -0400, Steve Nickolas wrote: >> I went through all the switches defined by POSIX, and figured that >> those 26 could be cut down. > > A brave man to defy POSIX! I wasn't so brave, which is why we have > the -y option. xD >> My concept reduced the number of switches from 26 to 9 (FLRadfiln). >> Of course, the idea is to be more minimalist than POSIX, so some >> people's opinions on what is or isn't necessary may differ from >> mine. > > OK, let's compare notes: > >> I felt -A was a redundant "almost -a". > > Arguably -a could go too. The distinction seems arbitrary. Well, I think one or the other would be desirable. I figured -a was the better to keep - since it shows all dotfiles where -A leaves off . and .. . >> I felt -C and -x were redundant because a tool like column(1) could be >> used to do the same job (even though column(1) isn't POSIX). > > Neither would this ls(1) be. Of course. ;) <snip> > -S isn't POSIX. And to implement it without an option would mean > removing -h. -h is a gnuism, isn't it? https://pubs.opengroup.org/onlinepubs/9699919799/utilities/ls.html does specify the -S switch. That's POSIX, isn't it? > As I mentioned earlier, -t can't be done by a filter without > significantly modifying the timestamp output. That was my rationale > for the -D option, which allows sorting by an external filter. Understandable. Honestly if the date format weren't standardized as it were, I would've standardized on "yyyy-mm-dd,mm:ss" - which wouldn't need special processing in order to pump into sort(1). >> I felt -c and -u were meaningless, but that's because of the filesystems I >> usually work with that do not have functional equivalents. -u for one is >> completely useless on VFAT even though it has such timestamps! YMMV. > > I think this says more about your file systems than about the options. > I find both incredibly useful, and there's no easy way to get the > information elsewhere. stat(1) would be an option, but then that > could replace ls(1) completely. Perhaps true. <snip> > So, any others? > > -G: Colorized output. I'd be *really* happy to get rid of this, but > it's not easy to instate with a filter, so I suppose there are > enough people who like it that it will have to stay. > > -P: Seems only to be there to cancel a -H or -L. > > -W: "Display whiteouts when scanning directories". I don't even > understand what that is. I was using the link I referenced as my "standard", which doesn't have any of those. I can take or leave color ls. I don't like the GNU defaults because dark blue is TOO dark on my default settings. I think the flags are adequate to know what kind of file I'm dealing with. > -f: We haven't really discussed this one. If you want to remove -S, > -r and -t, then arguably -f should become the default and be > -removed. I used to use "dir|sort" a lot on PC DOS before it got "dir /o" in 5.0. I wouldn't have a problem with removing sort from ls altogether. <snip> > Of course, none of this will happen. But it is interesting to think > about it. In particular, options like -g and -o, which are no longer > modern. -uso.
On Thu, 12 Mar 2020, Greg 'groggy' Lehey wrote: > A good example. But you're not removing options, you're just redefining > them. In fact I find the -h option particularly emetic, so a better > choice in removing options would be to remove -h and use a filter to > mutilate the sizes: > > $ ls -l | humanize I also had something like that in mind, except being British/Australian I'd spell it with an "s" :-) > But that's a pain, isn't it? That's why there's a -h option for people > who like it. Note that you can't do it the other way round: you can't > get the exact size from -h output. Which is why I suggested there be a means to turn it off; I'm becoming a fan of environment variables to modify the standard behaviour of tools (but I loathe the Penguin/OS default to use colours). > And then there's the question why you don't like the standard output. > Because the number strings are too long and difficult to read, maybe? > That's the rationale for the -, option. More than likely; as I approach age 68 I notice that I'm losing some cognitive facility... I might start using "," and see if I like it, but I see that the Mac doesn't have it (my Penguin is off the air at the moment), and having it as an environment variable would be nice. >> Quickly now, without looking: which option shows unprintable >> characters in a filename? Unless you use it regularly (in which >> case you have real problems) you would have to look it up; I find >> that "ls ... | od -bc" to be quicker, especially on filenames with >> trailing blanks etc (which "-B" won't show). > > This is arguably a bug in the -B option. I certainly don't think the > pipe notation is quicker. But it's nice to have both alternatives. Agreed; as for the bug I think it comes down to what is meant by an unprintable character. I certainly remember finding "hidden" set-uid shells with the name of ".. " etc back when I was going after the UNSW kiddies with an axe back in the late 70s... -- Dave
On Thu, 12 Mar 2020, Dave Horsfall wrote: > Which is why I suggested there be a means to turn it off; I'm becoming a fan > of environment variables to modify the standard behaviour of tools (but I > loathe the Penguin/OS default to use colours). When I first used Linux, that wasn't the default. Personally, I don't think it should be (actually I think there simply shouldn't be a color mode at all to ls). > More than likely; as I approach age 68 I notice that I'm losing some > cognitive facility... I might start using "," and see if I like it, but I > see that the Mac doesn't have it (my Penguin is off the air at the moment), > and having it as an environment variable would be nice. GNU ls does not appear to have a -, switch. IBM, interestingly, introduced an environment variable in PC DOS 6.3 that did the opposite thing. If the NO_SEP variable existed, it suppressed commas in file sizes. -uso.
On Wed, 11 Mar 2020, Steve Nickolas wrote:
> I felt -c and -u were meaningless, but that's because of the filesystems
> I usually work with that do not have functional equivalents. -u for one
> is completely useless on VFAT even though it has such timestamps!
> YMMV.
I find those flags really useful when doing forensic analysis on a file
system :-) One particular instance was at $ORKPLACE some years back when
a critical chunk of a file system had somehow disappeared overnight (it
was our source base!). I got to work by comparing login sessions with
those someone-unknown "ls" flags and had just about nailed the perp who
was online at the time when I was ordered off it in no uncertain terms.
Ummm, did I mention that my then $BOSS had a habit of working from home
after a few (and quite a few) drinks? As I said, I was this -><- far away
from fingering him... As it stood I knew who it was but wasn't able to
prove it in time.
-- Dave
[-- Attachment #1: Type: text/plain, Size: 729 bytes --] On 2020-Mar-11 20:53:12 -0400, Steve Nickolas <usotsuki@buric.co> wrote: >On Thu, 12 Mar 2020, Greg 'groggy' Lehey wrote: >> a better choice in removing options would be to remove -h and use a >> filter to mutilate the sizes: >> >> $ ls -l | humanize How does humanize decide which column to work on? If it only works on "ls -l", then it's not useful if I want other columns as well. Maybe it could just humanize any large number it found, but you probably don't want to "humanize" the inode number or filename. >I felt -s was a redundant "kindasorta -l". Except they are reporting completely different things - consider sparse files or filesystems (like ZFS) that support compression. -- Peter Jeremy [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 963 bytes --]
On Thu, 12 Mar 2020, Peter Jeremy wrote:
> On 2020-Mar-11 20:53:12 -0400, Steve Nickolas <usotsuki@buric.co> wrote:
>
>> I felt -s was a redundant "kindasorta -l".
>
> Except they are reporting completely different things - consider sparse
> files or filesystems (like ZFS) that support compression.
I was under the impression that -s simply showed the file size divided by
512 and didn't account for sparseness or compression.
(Of the filesystems I frequently work with, one of them does actually
support sparseness (ProDOS).)
-uso.
[-- Attachment #1: Type: text/plain, Size: 790 bytes --] On Thu, Mar 12, 2020, 1:37 AM Steve Nickolas <usotsuki@buric.co> wrote: > On Thu, 12 Mar 2020, Peter Jeremy wrote: > > > On 2020-Mar-11 20:53:12 -0400, Steve Nickolas <usotsuki@buric.co> wrote: > > > >> I felt -s was a redundant "kindasorta -l". > > > > Except they are reporting completely different things - consider sparse > > files or filesystems (like ZFS) that support compression. > > I was under the impression that -s simply showed the file size divided by > 512 and didn't account for sparseness or compression. > Stat returns two values. The offset of the last byte and the number of blocks allocated to the file. Useful if you have a sparse file too... Warner (Of the filesystems I frequently work with, one of them does actually > support sparseness (ProDOS).) > > -uso. > [-- Attachment #2: Type: text/html, Size: 1514 bytes --]
[-- Attachment #1: Type: text/plain, Size: 1791 bytes --] My error. I was looking at getopt(1) rather than getopt(3). Of course optind is documented, it's the way to find non-flag arguments. I don't know why the Hancock authors chose to make rsort into a subroutine rather than just piping into the command. Perhaps something to do with the software release process? On Wed, Mar 11, 2020 at 5:29 PM Warner Losh <imp@bsdimp.com> wrote: > > > On Wed, Mar 11, 2020 at 11:43 AM John P. Linderman <jpl.jpl@gmail.com> > wrote: > >> This is *great*, Kurt. The source in src/runtime/hrs/src for rsort.c is >> their version of my external sort, modified to be a subroutine. There's >> some lessons to be learned about "software hygiene". I was cavalier about >> freeing what I allocated dynamically. As a result, their version leaks like >> a sieve if the subroutine is called repeatedly. Apropos of which, they came >> to me having noted that only the first call was acting as expected. There's >> a wonderful irony (I'm big on irony). I had replaced my do-it-yourself >> argument processing with getopt. The code has the following comment >> >> ** Use getopt() for portability. >> >> A few lines later, you see >> >> optind = 1; /* reset after use in Hancock program * >> while ((c = getopt(argc, argv, "cCiIjmrsSuvb:f:D:o:p:T:x:y:z:")) != >> EOF) { >> >> optind??? Seems getopt has an undocumented global flag to prevent >> reprocessing the arguments. How portable:-) >> > > It's documented: > > The variables opterr and optind are both initialized to 1. The optind > variable may be set to another value before a set of calls to > getopt() in > order to skip over more or less argv entries. > > is what the FreeBSD man page has to say about it. So this just resets any > scanning that had happened before this... > > Warner > [-- Attachment #2: Type: text/html, Size: 3143 bytes --]
John P. Linderman wrote in <CAC0cEp_fQsq6-EaG-nhvXTvZij+PSab5PNTEx7WhNjYwnFVnaw@mail.gmail.com>: |My error. I was looking at getopt(1) rather than getopt(3). Of course \ |optind is documented, it's the way to find non-flag arguments. |I don't know why the Hancock authors chose to make rsort into a subroutine \ |rather than just piping into the command. Perhaps something to do with \ |the software release process? I really like a lot of such old code, and reading it. One can only learn from it. Even though i discovered all this in (Free)BSD land, after coming over from Linux, I loved reading those "old-hand" comment blocks, it was inspiration and kindled something here. For the few pieces of code that i am prowd of aka that i thought were worth it i followed their example. This rsort.c is however more verbose and spiritful than anything i ever wrote. I keep it in my box of precious things. getopt(3) on the other hand is portable but terrible. Just on the 10th i resorted a small SCSI MMC-3 cdda access tool (~50 KB C source are necessary for that in 2020, missing Solaris and MacOS, but including CD-TEXT and all that!!) to it because people are used to option and/or argument joining etc, but it lost long option support. Not worth commenting a lot, but here is an option parser of 6359 bytes when development verification code and dump_doc() are not counted, but is uses a carrier struct, supports long options, and documentation strings as part of long option strings (one .RODATA entry). FreeBSD's standard compatible and thus naked lib/libc/stdlib/getopt.c is 4312 bytes. And GNU's getopt_long is huge and even permutates arguments. At least getopt(3) is predictable once a user gets it. Things are different for sed(1)s -i and some sccs commands i have forgotten. I think it has even be tried to standardize optional arguments in that respect, but i would argue this is not a good direction to go, consider for example "sed -ie". Isn't this asking for troubles without accompanying comments. * static char const a_sopts[] = "A:h#"; * static char const * const a_lopts[] = { * "account:;A;" N_("execute an `account' command"), .. * "long-help;\201;" N_("this listing"), * NIL * }; .. * struct su_avopt avo; .. * su_avopt_setup(&avo, --argc, C(char const*const*,++argv), * a_sopts, a_lopts); * while((i = su_avopt_parse(&avo)) != su_AVOPT_STATE_DONE){ * switch(i){ * case 'A': * "account_name" = avo.avo_current_arg; * break; * case 'h': * case S(char,S(u8,'\201')): * a_main_usage(n_stdout); * if(i != 'h'){ * fprintf(n_stdout, "\nLong options:\n"); * su_avopt_dump_doc(&avo, &a_main_dump_doc, S(up,n_stdout)); * } * exit(0); .. * argc = avo.avo_argc; * argv = C(char**,avo.avo_argv); --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)
[-- Attachment #1: Type: text/plain, Size: 1085 bytes --] On Thursday, 12 March 2020 at 17:48:07 +1100, Peter Jeremy wrote: > On 2020-Mar-11 20:53:12 -0400, Steve Nickolas <usotsuki@buric.co> wrote: >> On Thu, 12 Mar 2020, Greg 'groggy' Lehey wrote: >>> a better choice in removing options would be to remove -h and use a >>> filter to mutilate the sizes: >>> >>> $ ls -l | humanize > > How does humanize decide which column to work on? It knows. It was written that way. > If it only works on "ls -l", then it's not useful if I want other > columns as well. Right. You'd have to change it. Recall that this was just an example. > Maybe it could just humanize any large number it found, but you > probably don't want to "humanize" the inode number or filename. Yes, this is exactly the scenario I described in an earlier mail message, where I called it $ ls -l | commafy 5 Greg -- Sent from my desktop computer. Finger grog@lemis.com for PGP public key. See complete headers for address and phone numbers. This message is digitally signed. If your Microsoft mail program reports problems, please read http://lemis.com/broken-MUA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 163 bytes --]
[-- Attachment #1: Type: text/plain, Size: 2396 bytes --] On Thursday, 12 March 2020 at 16:22:01 +1100, Dave Horsfall wrote: > On Thu, 12 Mar 2020, Greg 'groggy' Lehey wrote: > >> A good example. But you're not removing options, you're just redefining >> them. In fact I find the -h option particularly emetic, so a better >> choice in removing options would be to remove -h and use a filter to >> mutilate the sizes: >> >> $ ls -l | humanize > > I also had something like that in mind, except being British/Australian > I'd spell it with an "s" :-) It's a common misconception that -ize is US English. The Oxford English Dictionary, normally not prescriptive, prefers it. See https://www.oed.com/page/faqs/Frequently+asked+questions#spell. I personally had -ise drummed out of me by my uncle, very much Australian. >> And then there's the question why you don't like the standard output. >> Because the number strings are too long and difficult to read, maybe? >> That's the rationale for the -, option. > > More than likely; as I approach age 68 I notice that I'm losing some > cognitive facility... I might start using "," and see if I like it, but I > see that the Mac doesn't have it (my Penguin is off the air at the > moment), and having it as an environment variable would be nice. Yes, currently only FreeBSD has it. But you have the sources. Apart from option handling, it's only: --- print.c (.../head/bin/ls/print.c) (revision 241014) +++ print.c (.../stable/10/bin/ls/print.c) (working copy) @@ -606,6 +606,10 @@ humanize_number(buf, sizeof(buf), (int64_t)bytes, "", HN_AUTOSCALE, HN_B | HN_NOSPACE | HN_DECIMAL); (void)printf("%*s ", (u_int)width, buf); + } else if (f_thousands) { /* with commas */ + /* This format assignment needed to work round gcc bug. */ + const char *format = "%*j'd "; + (void)printf(format, (u_int)width, bytes); } else (void)printf("%*jd ", (u_int)width, bytes); } A quick and dirty fix would be simply to replace the format string. Greg -- Sent from my desktop computer. Finger grog@lemis.com for PGP public key. See complete headers for address and phone numbers. This message is digitally signed. If your Microsoft mail program reports problems, please read http://lemis.com/broken-MUA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 163 bytes --]
[-- Attachment #1: Type: text/plain, Size: 1946 bytes --] On Wednesday, 11 March 2020 at 23:34:46 -0400, Steve Nickolas wrote: > On Thu, 12 Mar 2020, Greg 'groggy' Lehey wrote: >> -S isn't POSIX. And to implement it without an option would mean >> removing -h. > > -h is a gnuism, isn't it? It might have originated there, but then I would expect it to be spelt '--produce-human-readable-output'. I haven't been able to establish from the FreeBSD sources or commit logs when it was introduced. It would clearly have been a reimplementation. > https://pubs.opengroup.org/onlinepubs/9699919799/utilities/ls.html does > specify the -S switch. That's POSIX, isn't it? So it is! This was the first option that I wanted to add, back when I still had practice wheels. I asked my mentor, and he said "not the Unix way", so I let it be. Then Wes Peters came up with the idea, and I thought he committed it, but it seems that it ultimately came from Kostas Blekos in 2005, based on the same feature on NetBSD and OpenBSD. I wonder when it made it to POSIX. >> As I mentioned earlier, -t can't be done by a filter without >> significantly modifying the timestamp output. That was my rationale >> for the -D option, which allows sorting by an external filter. > > Understandable. > > Honestly if the date format weren't standardized as it were, I would've > standardized on "yyyy-mm-dd,mm:ss" - which wouldn't need special > processing in order to pump into sort(1). Yes, that was one of the possibilities I thought of. Another obvious one was time_t, which is even easier to process. And then there's ISO 8601. That's why it didn't take me long to decide "do it *your* wayâ with the -D option. Greg -- Sent from my desktop computer. Finger grog@lemis.com for PGP public key. See complete headers for address and phone numbers. This message is digitally signed. If your Microsoft mail program reports problems, please read http://lemis.com/broken-MUA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 163 bytes --]
Meant for the list (and don't get me started on Reply All)... -- Dave ---------- Forwarded message ---------- Date: Fri, 13 Mar 2020 21:43:51 +1100 (EST) From: Dave Horsfall <dave@horsfall.org> To: Greg 'groggy' Lehey <grog@lemis.com> Subject: Re: [TUHS] Command line options and complexity On Fri, 13 Mar 2020, Greg 'groggy' Lehey wrote: >> -h is a gnuism, isn't it? > > It might have originated there, but then I would expect it to be spelt > '--produce-human-readable-output'. I haven't been able to establish from the > FreeBSD sources or commit logs when it was introduced. It would clearly have > been a reimplementation. It's in "df" as well, praise Cthulu: aneurin# df -h Filesystem Size Used Avail Capacity Mounted on /dev/ad0s1a 496M 302M 154M 66% / devfs 1.0K 1.0K 0B 100% /dev tmpfs 1000 272K 999M 0% /tmp /dev/ad0s1d 2.9G 1.4G 1.2G 54% /usr /dev/ad0s1e 989M 581M 329M 64% /var /dev/ad0s1f 3.9G 2.2G 1.4G 62% /home /dev/ad0s1g 8.9G 8.0G 127M 98% /usr/local fdescfs 1.0K 1.0K 0B 100% /dev/fd procfs 4.0K 4.0K 0B 100% /proc (Memo to self: see where all the room has gone in /usr/local, as that's where I assigned the leftover space after the other partitions.) No, I've never liked stuffing everything under the root file system as both the Mac and Penguin do; fill the root file system and you're hosed (and I also have an itch about /tmp being there as it's a world-writable directory). >> https://pubs.opengroup.org/onlinepubs/9699919799/utilities/ls.html does >> specify the -S switch. That's POSIX, isn't it? > > So it is! This was the first option that I wanted to add, back when I still > had practice wheels. I asked my mentor, and he said "not the Unix way", so I > let it be. Then Wes Peters came up with the idea, and I thought he committed > it, but it seems that it ultimately came from Kostas Blekos in 2005, based on > the same feature on NetBSD and OpenBSD. I wonder when it made it to POSIX. Years ago I wrote a simple script "lss" which did the sort after being howled down on one of the FreeBSD lists; what a surprise to see "-S"... Heck, back in my UNSW days I suggested extending stty() to cover non-TTY devices and got trashed by the AGSM/ElecEng mob; well well, look at ioctl() when it appeared. -- Dave
On Fri, 13 Mar 2020, Greg 'groggy' Lehey wrote: >>> $ ls -l | humanize >> >> I also had something like that in mind, except being British/Australian >> I'd spell it with an "s" :-) > > It's a common misconception that -ize is US English. The Oxford English > Dictionary, normally not prescriptive, prefers it. See > https://www.oed.com/page/faqs/Frequently+asked+questions#spell. I > personally had -ise drummed out of me by my uncle, very much Australian. I'm familiar with that (and also the fact that "aluminum" and "color" etc were British spelling). Being born and bred British with pedantic parents I've always hated "American" spelling as we called it, and it's sad to see such noted media as the Sydney Morning Herald slowly adopting it over the past few years; Australia has used British spelling at least since I emigrated here in 1965. Oh, it was meant to be a creat/create joke, BTW... >> More than likely; as I approach age 68 I notice that I'm losing some >> cognitive facility... I might start using "," and see if I like it, >> but I see that the Mac doesn't have it (my Penguin is off the air at >> the moment), and having it as an environment variable would be nice. > > Yes, currently only FreeBSD has it. But you have the sources. Apart > from option handling, it's only: [...] I don't like my chances with suggesting that to Apple; I'm not even sure if they even take user contributions (although back when I was on the dole and having delusions of grandeur I did register as an Apple developer, but I suspect that that's for non-Apple stuff i.e. it goes into the Apple Store). > A quick and dirty fix would be simply to replace the format string. I have done the odd binary patch (usually to reconfigure Unify database volumes back when I was with FGH)... Not right now, though, as it's time for bed. -- Dave
[-- Attachment #1: Type: text/plain, Size: 1230 bytes --] At Fri, 13 Mar 2020 11:36:47 +1100, Greg 'groggy' Lehey <grog@lemis.com> wrote: Subject: Re: [TUHS] Command line options and complexity > > On Thursday, 12 March 2020 at 16:22:01 +1100, Dave Horsfall wrote: > > On Thu, 12 Mar 2020, Greg 'groggy' Lehey wrote: > > > > > > And then there's the question why you don't like the standard output. > > > Because the number strings are too long and difficult to read, maybe? > > > That's the rationale for the -, option. > > > > More than likely; as I approach age 68 I notice that I'm losing some > > cognitive facility... I might start using "," and see if I like it, but I > > see that the Mac doesn't have it (my Penguin is off the air at the > > moment), and having it as an environment variable would be nice. > > Yes, currently only FreeBSD has it. Because of course NetBSD has chosen a different option letter: 'M' Unfortunately on NetBSD and FreeBSD the appearance of commas (or whatever is appropriate) depends on the locale being correctly configured, and this is not always so easy to do! -- Greg A. Woods <gwoods@acm.org> Kelowna, BC +1 250 762-7675 RoboHack <woods@robohack.ca> Planix, Inc. <woods@planix.com> Avoncote Farms <woods@avoncote.ca> [-- Attachment #2: OpenPGP Digital Signature --] [-- Type: application/pgp-signature, Size: 195 bytes --]
[-- Attachment #1: Type: text/plain, Size: 933 bytes --] On Friday, 13 March 2020 at 19:13:53 -0700, Greg A. Woods wrote: > At Fri, 13 Mar 2020 11:36:47 +1100, Greg 'groggy' Lehey <grog@lemis.com> wrote: >> Yes, currently only FreeBSD has it. > > Because of course NetBSD has chosen a different option letter: 'M' Oh. Somehow I missed that. Damn. > Unfortunately on NetBSD and FreeBSD the appearance of commas (or > whatever is appropriate) depends on the locale being correctly > configured, and this is not always so easy to do! Agreed. I've been meaning to default to , if the locale doesn't specify a delimiter, but haven't got round to it. Give me a problem report (https://bugs.freebsd.org/bugzilla/) and I'll fix it. Greg -- Sent from my desktop computer. Finger grog@lemis.com for PGP public key. See complete headers for address and phone numbers. This message is digitally signed. If your Microsoft mail program reports problems, please read http://lemis.com/broken-MUA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 163 bytes --]
[-- Attachment #1: Type: text/plain, Size: 1439 bytes --] On Friday, 13 March 2020 at 21:45:21 +1100, Dave Horsfall wrote: > On Fri, 13 Mar 2020, Greg 'groggy' Lehey wrote: > >>> -h is a gnuism, isn't it? >> >> It might have originated there, but then I would expect it to be spelt >> '--produce-human-readable-output'. I haven't been able to establish from the >> FreeBSD sources or commit logs when it was introduced. It would clearly have >> been a reimplementation. > > It's in "df" as well, praise Cthulu: > > aneurin# df -h > Filesystem Size Used Avail Capacity Mounted on > /dev/ad0s1a 496M 302M 154M 66% / > /dev/ad0s1d 2.9G 1.4G 1.2G 54% /usr > /dev/ad0s1e 989M 581M 329M 64% /var ... It also has the , option: === grog@eureka (/dev/pts/72) ~ 8 -> df -, Filesystem 1048576-blocks Used Avail Capacity Mounted on /dev/ada0p4 39,662 21,918 14,571 60% / /dev/ada0p2 39,662 13,447 23,042 37% /destdir /dev/ada0p5 3,705,520 1,831,345 1,577,733 54% /home /dev/ada1p1 7,629,565 6,358,607 1,194,661 84% /Photos I find it much easier to see the relative size like that. Greg -- Sent from my desktop computer. Finger grog@lemis.com for PGP public key. See complete headers for address and phone numbers. This message is digitally signed. If your Microsoft mail program reports problems, please read http://lemis.com/broken-MUA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 163 bytes --]
[-- Attachment #1: Type: text/plain, Size: 2974 bytes --] Here's a command I wrote long ago using a different way to deal with options: *isee* Usage: isee format file ... Display specified inode information for files passed as arguments. Items of the form ``%X'' in format will be replaced for these X: dev inode ino mode nlink uid gid rdev size atime mtime ctime now filename Parenthesized printf-style format specifications can follow a % to override the default format for the various items. %filename is the name of the current file argument. %now is the time (in seconds) when the command started running. The other items are from the stat structure. Example: isee "%(40s)filename: %mtime %mode" /dev/null Show file modification time and mode of /dev/null inode is just a synonym for ino. Instead of a kazillion options, the %-stat-field items identify *what* you want to see and the printf-style formats identify *how* you want them shown. Someone in the Murray Hill library added strftime formats for date fields, a fine addition, in my view. Adding readable user and group names rather than numerical ids would be worth considering. *Maybe* having a "rwx"-style form for mode. Sorting can be done by piping the output through sort. Don't get hung up on shortcomings of the command, just consider how a few familiar concepts and pipes can be combined to provide a large number of options. On Sat, Mar 14, 2020 at 12:35 AM Greg 'groggy' Lehey <grog@lemis.com> wrote: > On Friday, 13 March 2020 at 21:45:21 +1100, Dave Horsfall wrote: > > On Fri, 13 Mar 2020, Greg 'groggy' Lehey wrote: > > > >>> -h is a gnuism, isn't it? > >> > >> It might have originated there, but then I would expect it to be spelt > >> '--produce-human-readable-output'. I haven't been able to establish > from the > >> FreeBSD sources or commit logs when it was introduced. It would > clearly have > >> been a reimplementation. > > > > It's in "df" as well, praise Cthulu: > > > > aneurin# df -h > > Filesystem Size Used Avail Capacity Mounted on > > /dev/ad0s1a 496M 302M 154M 66% / > > /dev/ad0s1d 2.9G 1.4G 1.2G 54% /usr > > /dev/ad0s1e 989M 581M 329M 64% /var > ... > > It also has the , option: > > === grog@eureka (/dev/pts/72) ~ 8 -> df -, > Filesystem 1048576-blocks Used Avail Capacity Mounted on > /dev/ada0p4 39,662 21,918 14,571 60% / > /dev/ada0p2 39,662 13,447 23,042 37% /destdir > /dev/ada0p5 3,705,520 1,831,345 1,577,733 54% /home > /dev/ada1p1 7,629,565 6,358,607 1,194,661 84% /Photos > > I find it much easier to see the relative size like that. > > Greg > -- > Sent from my desktop computer. > Finger grog@lemis.com for PGP public key. > See complete headers for address and phone numbers. > This message is digitally signed. If your Microsoft mail program > reports problems, please read http://lemis.com/broken-MUA > [-- Attachment #2: Type: text/html, Size: 4297 bytes --]
John P. Linderman wrote in <CAC0cEp-dL2iPikiGvaQ_s9_6AS=mFO4RvbT423fNJ3gQiLdthQ@mail.gmail.com>: |Here's a command I wrote long ago using a different way to deal with \ |options: | | isee |Usage: isee format file ... | Display specified inode information for files passed as arguments. | Items of the form ``%X'' in format will be replaced for these X: |dev inode ino mode nlink uid gid rdev size atime |mtime ctime now filename | Parenthesized printf-style format specifications can follow a % | to override the default format for the various items. | %filename is the name of the current file argument. | %now is the time (in seconds) when the command started running. | The other items are from the stat structure. | | Example: isee "%(40s)filename: %mtime %mode" /dev/null | Show file modification time and mode of /dev/null | |inode is just a synonym for ino. | |Instead of a kazillion options, the %-stat-field items identify what \ |you want to see and the printf-style formats identify how you want \ |them shown. Someone in the Murray Hill library added strftime |formats for date fields, a fine addition, in my view. Adding readable \ |user and group names rather than numerical ids would be worth considering. \ |Maybe having a "rwx"-style form for mode. Sorting can be |done by piping the output through sort. Don't get hung up on shortcomings \ |of the command, just consider how a few familiar concepts and pipes \ |can be combined to provide a large number of options. When i switched to FreeBSD around 2001, the handbook was on the CDs i had, and i stumbled upon a very impressive assembler example. It is still there[1], at least in parts(?). Coming from C64, then DOS/4DOS and <2 years Linux, aka kid games, grey-industry, MS and xeyes background, i read [1] https://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/x86-fpu.html Personally, I like to keep it simple. Something either is a number, so I process it. Or it is not a number, so I discard it. I do not like the computer complaining about me typing in an extra character when it is obvious that it is an extra character. Duh! Plus, it allows me to break up the monotony of computing and type in a query instead of just a number: What is the best pinhole diameter for the focal length of 150? There is no reason for the computer to spit out a number of complaints: Syntax error: What Syntax error: is Syntax error: the Syntax error: best Et cetera, et cetera, et cetera. Secondly, I like the # character to denote the start of a comment which extends to the end of the line. This does not take too much effort to code, and lets me treat input files for my software as executable scripts. and it was like being warped from Chaplin's Modern Times to a rich man's California style living! And that in assembler!! % pinhole Computer, What size pinhole do I need for the focal length of 150? 150 490 306 362 2930 12 Hmmm... How about 160? 160 506 316 362 3125 12 Let's make it 155, please. 155 498 311 362 3027 12 Ah, let's try 157... 157 501 313 362 3066 12 156? 156 500 312 362 3047 12 That's it! Perfect! Thank you very much! ^D Nonetheless: i never managed to create Hippie-proof programs in real life. --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)