From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 2004 invoked by alias); 3 Jun 2017 21:16:56 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 41208 Received: (qmail 6064 invoked from network); 3 Jun 2017 21:16:56 -0000 X-Qmail-Scanner-Diagnostics: from mail-wm0-f54.google.com by f.primenet.com.au (envelope-from , uid 7791) with qmail-scanner-2.11 (clamdscan: 0.99.2/21882. spamassassin: 3.4.1. Clear:RC:0(74.125.82.54):SA:0(-0.0/5.0):. Processed in 0.9465 secs); 03 Jun 2017 21:16:56 -0000 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-0.0 required=5.0 tests=FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_PASS, T_DKIM_INVALID autolearn=unavailable autolearn_force=no version=3.4.1 X-Envelope-From: stephane.chazelas@gmail.com X-Qmail-Scanner-Mime-Attachments: | X-Qmail-Scanner-Zip-Files: | Received-SPF: pass (ns1.primenet.com.au: SPF record at _netblocks.google.com designates 74.125.82.54 as permitted sender) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:mail-followup-to:references :mime-version:content-disposition:in-reply-to:user-agent; bh=Z23e6AgIpeZCNa1CA5ApkePxF3ebF887db0ISb3gAA4=; b=jeJMJ+beER4zNU6FbpSom2G4waOa+CS+P+x7PBu36Rq+Bf0pa12DAilYPv4H1z7H/7 9VTgbdwTK7G5XoK3ilhB9IpqDxvVFNXQ8LH2hoVmNYKvurlssVnLgk79Zp8f8VLGp1Ul WQqTtyAjlRCzf0v4lzyMx7KypvfZ3+JyMM2mrQOTd3y9tV/07ydgrjnMS6PsCoKm4boU ht7Yn3owmiQ+DjK62yTgR175vrCoy4Idlf98G+paicmKomj7lFB0ctYQGEBqx2G4mueO HoNzYd7BefboXmaW9tLaPrb9Vo5HveDMPnyBeCdWSF0lfR62vH6ASHKAM3jr9cEsPnwZ JL4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id :mail-followup-to:references:mime-version:content-disposition :in-reply-to:user-agent; bh=Z23e6AgIpeZCNa1CA5ApkePxF3ebF887db0ISb3gAA4=; b=iwpSAPNry5j6A39R0A4QgEfGc6VwbeQsRMvrzEvSWhTwav91iP+vekv71KeoB8OPZb QS0pK1VFiU6/Dht2mAgPYrlfMBe3kSEgiqhCeu73tO0Xxiyw55uzwZ8Uv4eg/QsYyg6I 3Uc79bJd+OQGpgGiPZ8bxLELJfhc2+WqTPD5S6UVpHoXPA/5Y45lhUIhQH3Rm6SOMwm+ 0JqqXnBGMNkV/85e31aAaVtxPHzLhFViAy4tlB+K3l82VZTVO/EWL6comfnQU0w0xs0C ufMZB6MtRjTFHu3hnyDqazl9lyvYNHXGhG6NWtboOeSnwTG/4aEu6j76hWpTT4QB686P AEUg== X-Gm-Message-State: AODbwcDqnct4ZouY0ihRL1Ghyw8QW/ZVTWjUke1y4VAG7CyZQJ0aSIK6 DCk/McbBviASGbT+ X-Received: by 10.28.20.198 with SMTP id 189mr3225968wmu.17.1496524608617; Sat, 03 Jun 2017 14:16:48 -0700 (PDT) Date: Sat, 3 Jun 2017 22:16:46 +0100 From: Stephane Chazelas To: Bart Schaefer Cc: Zsh hackers list Subject: Re: Surprising behaviour with numeric glob sort Message-ID: <20170603211645.GA17785@chaz.gmail.com> Mail-Followup-To: Bart Schaefer , Zsh hackers list References: <20170531212453.GA31563@chaz.gmail.com> <170601152943.ZM4783@torch.brasslantern.com> <20170602090332.GA6574@chaz.gmail.com> <170602161905.ZM10488@torch.brasslantern.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <170602161905.ZM10488@torch.brasslantern.com> User-Agent: Mutt/1.5.24 (2015-08-30) 2017-06-02 16:19:05 -0700, Bart Schaefer: > On Jun 2, 10:03am, Stephane Chazelas wrote: > } > } $ echo *(n) > } zsh-10 zsh2 zsh10 zsh-3 > } > } (here in my en_GB.UTF-8 GNU locale) > } > } is unexpected/broken. "zsh" sorts before "zsh-" in my locale, so > } I'd expect the zsh2, zsh10 to come before zsh-3, zsh-10 which is > } the basis of my proposal. In any case, zsh-3 should come before > } zsh-10, nobody can argue against that. > > Well, one could argue that "-10" should be treated as negative ten > and therefore should sort before negative three, but I'm not sure > we want to get into that. The (my at least) main usage for *(n) is to sort version numbers like zsh-3.0, zsh-3.1, zsh-4. So handling negative numbers wouldn't help in those cases. [...] > That is, "zsh-3" is never > compared numerically to "zsh2" because "zsh2" and "zsh-" are > considered already to differ. [...] > So I think what you propose is that when "zsh1" is found to have a > difference with "zsh-", the algorithm should look forward across > "zsh-" to find "3" and at that point end up comparing "10" to "3"? > That would lead to the order in your example becoming > zsh2 zsh-3 zsh10 zsh-10. [...] No, what I propose is very simple. When comparing "zsh-3" with "zsh2", we compare the non-numeric prefix: "zsh-" and "zsh". And already, at that point, "zsh" is less than "zsh-", so we stop here (zsh2 < zsh-3) If it was zsh-3.1 vs zsh-3 ["zsh-", 3, ".", 1] vs ["zsh-", 3] - strcoll(zsh-, zsh-) => 0 - 3 == 3 - strcoll(".", "") => zsh-3 < zsh-3.1 Now there are some aspects of the current implementation that one might find useful like: $ echo * a a-3.1 a-3+1 a-3.2 a-3+2 $ (LC_ALL=C; echo *) a a-3+1 a-3+2 a-3.1 a-3.2 $ echo *(n) a a-3.1 a-3+1 a-3.2 a-3+2 $ (LC_ALL=C; echo *(n)) a a-3+1 a-3+2 a-3.1 a-3.2 The fact that those "-" and "." are ignored in the first strcoll() pass in some locales makes it for a more "numerical" sort. Though again, it's easily broken with: $ touch a-3.10 $ echo *(n) a a-3.1 a-3+1 a-3.2 a-3.10 a-3+2 Ideally, we'd want to hook into the strcoll() algorithm to introduce the numerical comparisons in there. Maybe that can be done using zero-padding like for the above, just do a strcoll() comparison after transformation (a sort of pre-strxfrm()) of the strings from: a a-3.1 a-3+1 a-3.2 a-3.10 a-3+2 to: a a-03.01 a-03.01 a-03+01 a-03.02 a-03.10 a-03+02 adjusting the length of the padding as needed. The above would sort to a a-03.01 a-03.01 a-03+01 a-03.02 a-03+02 a-03.10 In my GNU British locale and a a-03+01 a-03+02 a-03.01 a-03.01 a-03.02 a-03.10 In the C locale. -- Stephane