The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
From: Dan Cross <crossd@gmail.com>
To: Doug McIlroy <doug@cs.dartmouth.edu>
Cc: The Eunuchs Hysterical Society <tuhs@tuhs.org>
Subject: Re: [TUHS] Command line options and complexity
Date: Tue, 10 Mar 2020 15:38:50 -0400	[thread overview]
Message-ID: <CAEoi9W5+FiuMxjq6dD16tBkq+2oSfxE3Nw4SpEn1bTOdZwrNZQ@mail.gmail.com> (raw)
In-Reply-To: <202003101842.02AIgr2x082209@coolidge.cs.dartmouth.edu>

[-- Attachment #1: Type: text/plain, Size: 2210 bytes --]

On Tue, Mar 10, 2020 at 2:43 PM Doug McIlroy <doug@cs.dartmouth.edu> wrote:

> > This begs questions of stability
>
> Astute question. I had that in my original draft, but eliminited
> it for what I thought was clarity. Anyway, depending on implementation
> of sort, you may need sort -s. Of course it doesn't matter which copy
> among several equal lines uniq produces, nor does it matter in sort
> when there are no comparison options--they're all the same.
>

Thanks. That's interesting.

Did `sort -s` come later? The idea that you preferred clarity over
stability for `sort -u` would indicate so, otherwise one might imagine that
`-u` would just imply `-s` and that would be that.

> I don't know enough about the
> > internals of sed to know even what algorithm it uses
> > (... a disk-based merge sort?)
>
> sed is not a sorting program--basically it copies input to
> output, making line-by-line editing changes. That's the
> way I meant to use it in sed s/nonkeys//|sort -keys|uniq.
> (I have added options to sort, hopefully for clarity).
> The argument to sed here means substitute the empty
> string for the nonkey fields (specified by a regular expression).
>

`sed` in my email was a typo, as you speculated below.

Interestingly, this `sed` construction prior to `sort` loses information,
which perhaps doesn't matter in any given specific case, but is
insufficient in general, which I gathered to be the entire reason you
implemented `sort -u`.

If "sed" was a typo for "sort",


It was.

all versions of sort that
> I know of use an internal sorting algorithm for big chunks
> of the file, then combines the chunks by merge. But internal
> sorting varies all over the map--variations on quicksort,
> radix sort, merge sort, ...
>

It's the details of the internal sorts that are most interesting in some
sense, as the merges are probably fairly straight forward but the internal
sorts will affect stability and have other interesting characteristics.

As an aside, one must imagine that, in this day and age, a "big chunk" is
probably big enough to hold the vast majority of files entirely in RAM, and
only exceptionally large files actually require merging multiple blocks.

        - Dan C.

[-- Attachment #2: Type: text/html, Size: 3279 bytes --]

  reply	other threads:[~2020-03-10 19:39 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-10 18:42 Doug McIlroy
2020-03-10 19:38 ` Dan Cross [this message]
  -- strict thread matches above, loose matches on Subject: below --
2020-03-13 10:45 Dave Horsfall
2020-03-14  4:35 ` Greg 'groggy' Lehey
2020-03-14 19:52   ` John P. Linderman
2020-03-14 20:25     ` Steffen Nurpmeso
2020-03-10 16:15 Doug McIlroy
2020-03-10 17:38 ` Dan Cross
2020-03-10 17:44   ` Bakul Shah
2020-03-10 18:09     ` Dan Cross
2020-03-05  4:57 Doug McIlroy
2020-03-05 22:17 ` Diomidis Spinellis
2020-03-04 14:06 Nelson H. F. Beebe
2020-03-04 16:17 ` John P. Linderman
2020-03-04 17:25   ` Bakul Shah
2020-03-05  0:55   ` Rob Pike
2020-03-05  2:05   ` Kurt H Maier
2020-03-05  4:17     ` Ken Thompson via TUHS
2020-03-05 14:53       ` Dan Cross
2020-03-05 21:50       ` Dave Horsfall
2020-03-05 21:56         ` Warner Losh
2020-03-08  5:26           ` Greg 'groggy' Lehey
2020-03-08  5:32             ` Jon Steinhart
2020-03-08  9:30               ` Tyler Adams
     [not found]                 ` <CAC0cEp8eFRkkLTw88WVaKZoKy+qsrhuC8LkzmmsbqtdZgMf8eQ@mail.gmail.com>
     [not found]                   ` <CAEuQd1D7+dfap98AwPo2W41+06prrcVaAWk3Ve-ve0uQ0xBu3Q@mail.gmail.com>
2020-03-09 21:06                     ` John P. Linderman
2020-03-09 21:22                       ` Kurt H Maier
2020-03-11 17:41                         ` John P. Linderman
2020-03-11 21:29                           ` Warner Losh
2020-03-12  0:13                             ` John P. Linderman
2020-03-12  0:34                               ` Chet Ramey
2020-03-12 12:57                             ` John P. Linderman
2020-03-12 19:24                               ` Steffen Nurpmeso
2020-03-08  9:51             ` Michael Kjörling
2020-03-03 18:15 Jon Steinhart
2020-03-03 18:44 ` Adam Thornton
2020-03-04  4:11   ` Tyler Adams
2020-03-04  6:03     ` Dave Horsfall
2020-03-04  6:48       ` arnold
2020-03-04 21:17         ` Dave Horsfall
2020-03-05  0:49         ` Lyndon Nerenberg
2020-03-05 20:54           ` Dave Horsfall
2020-03-05 22:01             ` William Cheswick
2020-03-04 21:50   ` Random832
2020-03-04 23:19     ` Steffen Nurpmeso
2020-03-05  6:12     ` Alan D. Salewski
2020-03-04 22:03   ` Random832
2020-03-04 23:25     ` Terry Jones
2020-03-10 23:03 ` Dan Stromberg
2020-03-11  3:18   ` Dave Horsfall
2020-03-11  4:02     ` Steve Nickolas
2020-03-11 22:56     ` Greg 'groggy' Lehey
2020-03-11 23:14       ` Dan Cross
2020-03-12  0:42         ` Greg 'groggy' Lehey
2020-03-12  0:53       ` Steve Nickolas
2020-03-12  3:09         ` Greg 'groggy' Lehey
2020-03-12  3:34           ` Steve Nickolas
2020-03-13  1:02             ` Greg 'groggy' Lehey
2020-03-12  5:38         ` Dave Horsfall
2020-03-12  6:48         ` Peter Jeremy
2020-03-12  7:37           ` Steve Nickolas
2020-03-12  7:42             ` Warner Losh
2020-03-12 23:57           ` Greg 'groggy' Lehey
2020-03-12  5:22       ` Dave Horsfall
2020-03-12  5:35         ` Steve Nickolas
2020-03-13  0:36         ` Greg 'groggy' Lehey
2020-03-13 11:26           ` Dave Horsfall
2020-03-14  2:13           ` Greg A. Woods
2020-03-14  4:31             ` Greg 'groggy' Lehey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAEoi9W5+FiuMxjq6dD16tBkq+2oSfxE3Nw4SpEn1bTOdZwrNZQ@mail.gmail.com \
    --to=crossd@gmail.com \
    --cc=doug@cs.dartmouth.edu \
    --cc=tuhs@tuhs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).