zsh-workers
 help / color / mirror / code / Atom feed
From: Bart Schaefer <schaefer@brasslantern.com>
To: Zsh hackers list <zsh-workers@zsh.org>
Subject: Re: Surprising behaviour with numeric glob sort
Date: Fri, 2 Jun 2017 16:19:05 -0700	[thread overview]
Message-ID: <170602161905.ZM10488@torch.brasslantern.com> (raw)
In-Reply-To: <20170602090332.GA6574@chaz.gmail.com>

On Jun 2, 10:03am, Stephane Chazelas wrote:
}
} $ echo *(n)
} zsh-10 zsh2 zsh10 zsh-3
} 
} (here in my en_GB.UTF-8 GNU locale)
} 
} is unexpected/broken. "zsh" sorts before "zsh-" in my locale, so
} I'd expect the zsh2, zsh10 to come before zsh-3, zsh-10 which is
} the basis of my proposal. In any case, zsh-3 should come before
} zsh-10, nobody can argue against that.

Well, one could argue that "-10" should be treated as negative ten
and therefore should sort before negative three, but I'm not sure
we want to get into that.

The reason you get the result above is of course because most sort
algorithms assume there is a total order and therefore that it is
not necessary to compare every possible pairing of elements.

Your proposal was
> } break down the strings
> } between non-numeric and numeric parts and use strcoll() on the
> } non-numeric and number comparison on the numeric parts

As far as I can tell that's exactly zstrbcmp in zle_tricky.c does.
zstrcmp in sort.c on the other hand first attempts strcoll and
only compares numeric parts if it can find corresponding numeric
substrings in both input strings.  That is, "zsh-3" is never
compared numerically to "zsh2" because "zsh2" and "zsh-" are
considered already to differ.

In either case, if zstrcmp or zstrbcomp find a digit, they consume
more digits until they hit a non-digit or two not-equal digits, and
then look both backward and forward for digits to calculate the
numeric value for comparison.

So I think what you propose is that when "zsh1" is found to have a
difference with "zsh-", the algorithm should look forward across
"zsh-" to find "3" and at that point end up comparing "10" to "3"?
That would lead to the order in your example becoming
    zsh2 zsh-3 zsh10 zsh-10.

However, that would also mean that in strings with different sets
of numeric substrings the numeric comparisons might be be "detected"
after different prefixes for different pairs of strings; I think
the result there might be even more confusing, but I can't come up
with a specific example.

It also means having to copy non-numeric substrings during every
comparison, so as to be able to call strcoll without modifying the
input strings.  (What's the alternative?)  This would probably make
sorting prohibitively slow.


  reply	other threads:[~2017-06-02 23:18 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-31 21:24 Stephane Chazelas
2017-06-01 22:29 ` Bart Schaefer
2017-06-02  9:03   ` Stephane Chazelas
2017-06-02 23:19     ` Bart Schaefer [this message]
2017-06-03 21:16       ` Stephane Chazelas
2017-06-04  0:07         ` Bart Schaefer
2017-06-04 17:31           ` Stephane Chazelas
2017-06-04 22:01             ` Bart Schaefer
2017-06-05 11:54               ` Stephane Chazelas
2017-06-05 19:15                 ` Stephane Chazelas
2017-06-06  3:13                 ` Bart Schaefer
2017-06-06  9:22                   ` Stephane Chazelas
2017-06-07  8:41                 ` Stephane Chazelas
2017-06-17 18:11                   ` Bart Schaefer
2017-06-06 14:44         ` Vincent Lefevre
2017-06-06 16:47           ` Stephane Chazelas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=170602161905.ZM10488@torch.brasslantern.com \
    --to=schaefer@brasslantern.com \
    --cc=zsh-workers@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).