zsh-workers
 help / color / mirror / code / Atom feed
From: Stephane Chazelas <stephane.chazelas@gmail.com>
To: Bart Schaefer <schaefer@brasslantern.com>
Cc: Zsh hackers list <zsh-workers@zsh.org>
Subject: Re: Surprising behaviour with numeric glob sort
Date: Sat, 3 Jun 2017 22:16:46 +0100	[thread overview]
Message-ID: <20170603211645.GA17785@chaz.gmail.com> (raw)
In-Reply-To: <170602161905.ZM10488@torch.brasslantern.com>

2017-06-02 16:19:05 -0700, Bart Schaefer:
> On Jun 2, 10:03am, Stephane Chazelas wrote:
> }
> } $ echo *(n)
> } zsh-10 zsh2 zsh10 zsh-3
> } 
> } (here in my en_GB.UTF-8 GNU locale)
> } 
> } is unexpected/broken. "zsh" sorts before "zsh-" in my locale, so
> } I'd expect the zsh2, zsh10 to come before zsh-3, zsh-10 which is
> } the basis of my proposal. In any case, zsh-3 should come before
> } zsh-10, nobody can argue against that.
> 
> Well, one could argue that "-10" should be treated as negative ten
> and therefore should sort before negative three, but I'm not sure
> we want to get into that.

The (my at least) main usage for *(n) is to sort version numbers
like zsh-3.0, zsh-3.1, zsh-4. So handling negative numbers
wouldn't help in those cases.

[...]
> That is, "zsh-3" is never
> compared numerically to "zsh2" because "zsh2" and "zsh-" are
> considered already to differ.
[...]
> So I think what you propose is that when "zsh1" is found to have a
> difference with "zsh-", the algorithm should look forward across
> "zsh-" to find "3" and at that point end up comparing "10" to "3"?
> That would lead to the order in your example becoming
>     zsh2 zsh-3 zsh10 zsh-10.
[...]

No, what I propose is very simple.

When comparing "zsh-3" with "zsh2", we compare the non-numeric
prefix: "zsh-" and "zsh". And already, at that point, "zsh" is
less than "zsh-", so we stop here (zsh2 < zsh-3)

If it was

zsh-3.1 vs zsh-3

["zsh-", 3, ".", 1] vs ["zsh-", 3]

- strcoll(zsh-,  zsh-) => 0
- 3 == 3
- strcoll(".", "") => zsh-3 < zsh-3.1

Now there are some aspects of the current implementation that
one might find useful like:

$ echo *
a a-3.1 a-3+1 a-3.2 a-3+2
$ (LC_ALL=C; echo *)
a a-3+1 a-3+2 a-3.1 a-3.2
$ echo *(n)
a a-3.1 a-3+1 a-3.2 a-3+2
$ (LC_ALL=C; echo *(n))
a a-3+1 a-3+2 a-3.1 a-3.2


The fact that those "-" and "." are ignored in the first
strcoll() pass in some locales makes it for a more "numerical"
sort. Though again, it's easily broken with:

$ touch a-3.10
$ echo *(n)
a a-3.1 a-3+1 a-3.2 a-3.10 a-3+2

Ideally, we'd want to hook into the strcoll() algorithm to
introduce the numerical comparisons in there. Maybe that can be
done using zero-padding like for the above, just do a strcoll()
comparison after transformation (a sort of pre-strxfrm()) of the
strings from:

a a-3.1 a-3+1 a-3.2 a-3.10 a-3+2

to:

a
a-03.01
a-03.01
a-03+01
a-03.02
a-03.10
a-03+02

adjusting the length of the padding as needed.

The above would sort to

a
a-03.01
a-03.01
a-03+01
a-03.02
a-03+02
a-03.10

In my GNU British locale and

a
a-03+01
a-03+02
a-03.01
a-03.01
a-03.02
a-03.10

In the C locale.

-- 
Stephane


  reply	other threads:[~2017-06-03 21:16 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-31 21:24 Stephane Chazelas
2017-06-01 22:29 ` Bart Schaefer
2017-06-02  9:03   ` Stephane Chazelas
2017-06-02 23:19     ` Bart Schaefer
2017-06-03 21:16       ` Stephane Chazelas [this message]
2017-06-04  0:07         ` Bart Schaefer
2017-06-04 17:31           ` Stephane Chazelas
2017-06-04 22:01             ` Bart Schaefer
2017-06-05 11:54               ` Stephane Chazelas
2017-06-05 19:15                 ` Stephane Chazelas
2017-06-06  3:13                 ` Bart Schaefer
2017-06-06  9:22                   ` Stephane Chazelas
2017-06-07  8:41                 ` Stephane Chazelas
2017-06-17 18:11                   ` Bart Schaefer
2017-06-06 14:44         ` Vincent Lefevre
2017-06-06 16:47           ` Stephane Chazelas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170603211645.GA17785@chaz.gmail.com \
    --to=stephane.chazelas@gmail.com \
    --cc=schaefer@brasslantern.com \
    --cc=zsh-workers@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).