The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
From: Dan Cross <crossd@gmail.com>
To: Doug McIlroy <doug@cs.dartmouth.edu>
Cc: The Eunuchs Hysterical Society <tuhs@tuhs.org>
Subject: Re: [TUHS] Command line options and complexity
Date: Tue, 10 Mar 2020 13:38:23 -0400	[thread overview]
Message-ID: <CAEoi9W4nNbUT6_NnJGsgWFPj4GHTeupmDVpZAMZRw8-70wnk+w@mail.gmail.com> (raw)
In-Reply-To: <202003101615.02AGFLgS075920@coolidge.cs.dartmouth.edu>

[-- Attachment #1: Type: text/plain, Size: 1381 bytes --]

On Tue, Mar 10, 2020 at 12:16 PM Doug McIlroy <doug@cs.dartmouth.edu> wrote:

> > The idea of a simple rule is great, but the suggested rule fails on sort
> -u
> > which afaik came after sort | uniq for performance reasons.
>
> As the guilty party for most of sort's comparison options, I can
> attest that efficiency was not an objective of -u. It was invented
> precisely because uniq had proved useful, but not when one was
> interested in uniqueness only of some key aspect of the data.
>
> -u differs from uniq in that -u selects samples based on
> equality of keys, not equality of lines. In the default
> case of whole-line keys, sort -u of course does exactly
> what sort|uniq does.
>
> For many applications of -u with keys, the non-key fields
> are not of interest. Then sed s/nonkeys//|sort|uniq may
> suffice. But sed did not exist when -u was invented.
> And not all sort key specs are easily imitated in sed.
>

This begs questions of stability: in the event of non-unique keys and
non-key fields in the sortable data, which "records" (lines) are kept and
which are discarded? Surely the "first" is kept and subsequent entries with
the same key suppressed, but I confess I don't know enough about the
internals of sed to know even what algorithm it uses (I assume a disk-based
merge sort?), but I would imagine these details have changed over time.

        - Dan C.

[-- Attachment #2: Type: text/html, Size: 1790 bytes --]

  reply	other threads:[~2020-03-10 17:40 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-10 16:15 Doug McIlroy
2020-03-10 17:38 ` Dan Cross [this message]
2020-03-10 17:44   ` Bakul Shah
2020-03-10 18:09     ` Dan Cross
  -- strict thread matches above, loose matches on Subject: below --
2020-03-13 10:45 Dave Horsfall
2020-03-14  4:35 ` Greg 'groggy' Lehey
2020-03-14 19:52   ` John P. Linderman
2020-03-14 20:25     ` Steffen Nurpmeso
2020-03-10 18:42 Doug McIlroy
2020-03-10 19:38 ` Dan Cross
2020-03-05  4:57 Doug McIlroy
2020-03-05 22:17 ` Diomidis Spinellis
2020-03-04 14:06 Nelson H. F. Beebe
2020-03-04 16:17 ` John P. Linderman
2020-03-04 17:25   ` Bakul Shah
2020-03-05  0:55   ` Rob Pike
2020-03-05  2:05   ` Kurt H Maier
2020-03-05  4:17     ` Ken Thompson via TUHS
2020-03-05 14:53       ` Dan Cross
2020-03-05 21:50       ` Dave Horsfall
2020-03-05 21:56         ` Warner Losh
2020-03-08  5:26           ` Greg 'groggy' Lehey
2020-03-08  5:32             ` Jon Steinhart
2020-03-08  9:30               ` Tyler Adams
     [not found]                 ` <CAC0cEp8eFRkkLTw88WVaKZoKy+qsrhuC8LkzmmsbqtdZgMf8eQ@mail.gmail.com>
     [not found]                   ` <CAEuQd1D7+dfap98AwPo2W41+06prrcVaAWk3Ve-ve0uQ0xBu3Q@mail.gmail.com>
2020-03-09 21:06                     ` John P. Linderman
2020-03-09 21:22                       ` Kurt H Maier
2020-03-11 17:41                         ` John P. Linderman
2020-03-11 21:29                           ` Warner Losh
2020-03-12  0:13                             ` John P. Linderman
2020-03-12  0:34                               ` Chet Ramey
2020-03-12 12:57                             ` John P. Linderman
2020-03-12 19:24                               ` Steffen Nurpmeso
2020-03-08  9:51             ` Michael Kjörling
2020-03-03 18:15 Jon Steinhart
2020-03-03 18:44 ` Adam Thornton
2020-03-04  4:11   ` Tyler Adams
2020-03-04  6:03     ` Dave Horsfall
2020-03-04  6:48       ` arnold
2020-03-04 21:17         ` Dave Horsfall
2020-03-05  0:49         ` Lyndon Nerenberg
2020-03-05 20:54           ` Dave Horsfall
2020-03-05 22:01             ` William Cheswick
2020-03-04 21:50   ` Random832
2020-03-04 23:19     ` Steffen Nurpmeso
2020-03-05  6:12     ` Alan D. Salewski
2020-03-04 22:03   ` Random832
2020-03-04 23:25     ` Terry Jones
2020-03-10 23:03 ` Dan Stromberg
2020-03-11  3:18   ` Dave Horsfall
2020-03-11  4:02     ` Steve Nickolas
2020-03-11 22:56     ` Greg 'groggy' Lehey
2020-03-11 23:14       ` Dan Cross
2020-03-12  0:42         ` Greg 'groggy' Lehey
2020-03-12  0:53       ` Steve Nickolas
2020-03-12  3:09         ` Greg 'groggy' Lehey
2020-03-12  3:34           ` Steve Nickolas
2020-03-13  1:02             ` Greg 'groggy' Lehey
2020-03-12  5:38         ` Dave Horsfall
2020-03-12  6:48         ` Peter Jeremy
2020-03-12  7:37           ` Steve Nickolas
2020-03-12  7:42             ` Warner Losh
2020-03-12 23:57           ` Greg 'groggy' Lehey
2020-03-12  5:22       ` Dave Horsfall
2020-03-12  5:35         ` Steve Nickolas
2020-03-13  0:36         ` Greg 'groggy' Lehey
2020-03-13 11:26           ` Dave Horsfall
2020-03-14  2:13           ` Greg A. Woods
2020-03-14  4:31             ` Greg 'groggy' Lehey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAEoi9W4nNbUT6_NnJGsgWFPj4GHTeupmDVpZAMZRw8-70wnk+w@mail.gmail.com \
    --to=crossd@gmail.com \
    --cc=doug@cs.dartmouth.edu \
    --cc=tuhs@tuhs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).