From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 1386 invoked by alias); 2 Jun 2017 23:18:55 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 41202 Received: (qmail 9808 invoked from network); 2 Jun 2017 23:18:55 -0000 X-Qmail-Scanner-Diagnostics: from mail-vk0-f43.google.com by f.primenet.com.au (envelope-from , uid 7791) with qmail-scanner-2.11 (clamdscan: 0.99.2/21882. spamassassin: 3.4.1. Clear:RC:0(209.85.213.43):SA:0(-2.3/5.0):. Processed in 1.0352 secs); 02 Jun 2017 23:18:55 -0000 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-2.3 required=5.0 tests=RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,RCVD_IN_SORBS_SPAM,SPF_PASS,T_DKIM_INVALID autolearn=unavailable autolearn_force=no version=3.4.1 X-Envelope-From: schaefer@brasslantern.com X-Qmail-Scanner-Mime-Attachments: | X-Qmail-Scanner-Zip-Files: | Received-SPF: pass (ns1.primenet.com.au: SPF record at _netblocks.google.com designates 209.85.213.43 as permitted sender) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=brasslantern-com.20150623.gappssmtp.com; s=20150623; h=from:message-id:date:in-reply-to:comments:references:to:subject :mime-version; bh=Wm0ZL3Y6xQmKiGBCwdT5vyw4ytrvRNoq9gqpuzGM++s=; b=woa6QyQlQWoRN89QrCJh4z3ryLDAiHHlunOFAWuKPkmXFKuWVRC/WS6T9x9bC7t/Vf dO5gfp6k14CLSPPFhWfiRCOddCpGlwBXseukY3SLrhxBKo3bYn0/yt6GGUS6D2AfyCWw 8+0Rm+NhVYJZzYkJc2dSvc+XTLPF6Ms12uythqJVwOORZbMu/83X276jmj0h1x9JL79H AngkBCfm0PQM22+2cUjI9TKRUFMqabJ2yy+gIud8PhBqy7UK4KkxIxlEJa+SZzmKCtxH +gu0qOkIuCuzpWL4nyajw5KNUkuMtZY8Yjx/JNMhgXzKoP5g8T8eCqfT2NWXQs9OP9hY mfgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:date:in-reply-to:comments :references:to:subject:mime-version; bh=Wm0ZL3Y6xQmKiGBCwdT5vyw4ytrvRNoq9gqpuzGM++s=; b=meyrrGpnYZxoEQo4qCW7iEG+bgCE16UYqn0Nh2ufTYvLHTNFgJAhO+4Bmrfw3TgRcO tjvOKQ0wxV9ty0aVQKZdpKI04fjPGXD4QQPKmUxQAto1NllYygF8y2pkMuGf0iOIYW/w kY0zYDvFOorK1Md5RDwFW6JtSb/DNX3QiR6wnKceswq+0REjfKCLlkggZ/EWY+cj6eT9 JvF/DsIa4iF2lM/Cl/h8ts0Iuist1ZpvK8qk0K2PIINh1+QSQ3eQV1akUo0JwSAGJ/qK GBwSlFfbJGCsFnFKR4CcxkyVIbSlASHQTFepPT2CTvugXyfYzQbuEzpDKZG6qAufVoVw yTNA== X-Gm-Message-State: AODbwcB5OUpkHBxL/5Ervu3uQ3ZN5ZDiNoQgoVZWgHC7FLRGYOtWLgc0 a15qJEPjf7Z8QAa6x6Q= X-Received: by 10.31.217.129 with SMTP id q123mr4737144vkg.109.1496445526133; Fri, 02 Jun 2017 16:18:46 -0700 (PDT) From: Bart Schaefer Message-Id: <170602161905.ZM10488@torch.brasslantern.com> Date: Fri, 2 Jun 2017 16:19:05 -0700 In-Reply-To: <20170602090332.GA6574@chaz.gmail.com> Comments: In reply to Stephane Chazelas "Re: Surprising behaviour with numeric glob sort" (Jun 2, 10:03am) References: <20170531212453.GA31563@chaz.gmail.com> <170601152943.ZM4783@torch.brasslantern.com> <20170602090332.GA6574@chaz.gmail.com> X-Mailer: OpenZMail Classic (0.9.2 24April2005) To: Zsh hackers list Subject: Re: Surprising behaviour with numeric glob sort MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii On Jun 2, 10:03am, Stephane Chazelas wrote: } } $ echo *(n) } zsh-10 zsh2 zsh10 zsh-3 } } (here in my en_GB.UTF-8 GNU locale) } } is unexpected/broken. "zsh" sorts before "zsh-" in my locale, so } I'd expect the zsh2, zsh10 to come before zsh-3, zsh-10 which is } the basis of my proposal. In any case, zsh-3 should come before } zsh-10, nobody can argue against that. Well, one could argue that "-10" should be treated as negative ten and therefore should sort before negative three, but I'm not sure we want to get into that. The reason you get the result above is of course because most sort algorithms assume there is a total order and therefore that it is not necessary to compare every possible pairing of elements. Your proposal was > } break down the strings > } between non-numeric and numeric parts and use strcoll() on the > } non-numeric and number comparison on the numeric parts As far as I can tell that's exactly zstrbcmp in zle_tricky.c does. zstrcmp in sort.c on the other hand first attempts strcoll and only compares numeric parts if it can find corresponding numeric substrings in both input strings. That is, "zsh-3" is never compared numerically to "zsh2" because "zsh2" and "zsh-" are considered already to differ. In either case, if zstrcmp or zstrbcomp find a digit, they consume more digits until they hit a non-digit or two not-equal digits, and then look both backward and forward for digits to calculate the numeric value for comparison. So I think what you propose is that when "zsh1" is found to have a difference with "zsh-", the algorithm should look forward across "zsh-" to find "3" and at that point end up comparing "10" to "3"? That would lead to the order in your example becoming zsh2 zsh-3 zsh10 zsh-10. However, that would also mean that in strings with different sets of numeric substrings the numeric comparisons might be be "detected" after different prefixes for different pairs of strings; I think the result there might be even more confusing, but I can't come up with a specific example. It also means having to copy non-numeric substrings during every comparison, so as to be able to call strcoll without modifying the input strings. (What's the alternative?) This would probably make sorting prohibitively slow.