The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
From: Ralph Corderoy <ralph@inputplus.co.uk>
Subject: [TUHS] Re: Likely a one-liner in Unix
Date: Tue, 11 Jun 2024 09:05:06 +0100	[thread overview]
Message-ID: <20240611080506.73D7B21309@orac.inputplus.co.uk> (raw)
In-Reply-To: <a5ddb4f9-e72f-4e0e-ac65-48aadcaed458@ucsb.edu>

Hi James,

> > >    "Show me the last 5 files read in a directory tree"

Given sort(1) gained -u for efficiency, I've often wondered why, in
those constrained times, it didn't have a ‘-m n’ to output only the
n ‘minimums’, e.g. ‘sed ${n}q’.  With ‘-m 5’, this would let sort track
the current fifth entry and discard input which was bigger, so avoiding
both storing many unwanted lines and finding the current line's location
within them.


> OK, I'll bite (NB: using GNU find):

I think the POSIX way of getting the atime would be ‘LC_CTIME=C ls -lu’
and then parsing the two possible date formats.  So non-POSIX find is
simpler.  Also, GNU find shows me the sub-second part but ls doesn't.
Neither does GNU ‘stat -c '%X %n'’.

> find "$directory_tree" -type f -printf "%A+ %p\n" | sort -r | cut -d' ' -f2 | head -5

- I'd switch the atime format to seconds since epoch for easier
  formatting given it's discarded.
- When atimes tie, sort's -r will give file Z before A so I'd add some
  -k's so A comes first.
- I'd move the head to before the cut so cut processes fewer lines...
- But on so few lines, I'd just use sed to do both in one.

    find "$@" -type f -printf '%A@ %p\n' |
    sort -k1,1nr -k2 |
    sed 's/^[^ ]* //; 5q'


Remaining issues...

If tied entries bridge the top-five border then this isn't shown.
Is the real requirement to show files with the five most recent distinct
atimes?

    awk '{t += !s[$0]; s[$0] = 1; print} t == 5 {exit}'

Though this might give many lines.  Instead, an ellipsis could show
a tie bridged the cut-off.

    awk 't {if ($0 == l) print "..."; exit} NR == 5 {l = $0; t = 1} 1'

Paths can contain linefeeds and some versions allow handling NULs to be
tediously employed.

    find "$@" -type f -printf '%A@ %p\0' |
    sort -z -k1,1nr -k2 |
    sed -z 's/[^ ]* //; 5q' |
    tr \\0 \\n


David Wheeler has a nice article he maintains on unusual characters in
filenames: how to cope, and what other systems do, e.g. Plan 9.

    Fixing Unix/Linux/POSIX filenames: control characters (such as
        newline), leading dashes, and other problems
    David A. Wheeler, 2023-08-22 (originally 2009-03-24)
    https://dwheeler.com/essays/fixing-unix-linux-filenames.html

As he writes, Linux already returns EINVAL for some paths on some
filesystem types.  A mount option which had a syscall return an error on
meeting an insensible path would be useful.  It avoids any attempt at
escapement and its greater risk of implementation errors.  I could
always re-mount some old volume without the option to list the directory
and fix up its entries.  The second-best day to plant a tree is today.

-- 
Cheers, Ralph.

  reply	other threads:[~2024-06-11  8:05 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-09 11:34 [TUHS] Re: most direct Unix descendant Douglas McIlroy
2024-06-09 11:59 ` A. P. Garcia
2024-06-09 12:31   ` Ralph Corderoy
2024-06-09 14:06     ` A. P. Garcia
2024-06-10  5:13   ` Ed Bradford
2024-06-10  5:25     ` G. Branden Robinson
2024-06-10  8:39     ` Dave Horsfall
2024-06-10  9:36       ` Marc Donner
2024-06-10 19:40         ` Steffen Nurpmeso
2024-06-10 20:09           ` Marc Donner
2024-06-10 20:19             ` Steffen Nurpmeso
2024-06-11  3:15       ` [TUHS] Re: Likely a one-liner in Unix James Frew
2024-06-11  8:05         ` Ralph Corderoy [this message]
2024-06-11 21:01           ` Steffen Nurpmeso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240611080506.73D7B21309@orac.inputplus.co.uk \
    --to=ralph@inputplus.co.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).