From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 5800 invoked by alias); 6 Jun 2017 03:13:38 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 41231 Received: (qmail 2505 invoked from network); 6 Jun 2017 03:13:38 -0000 X-Qmail-Scanner-Diagnostics: from mail-vk0-f45.google.com by f.primenet.com.au (envelope-from , uid 7791) with qmail-scanner-2.11 (clamdscan: 0.99.2/21882. spamassassin: 3.4.1. Clear:RC:0(209.85.213.45):SA:0(-0.0/5.0):. Processed in 1.027763 secs); 06 Jun 2017 03:13:38 -0000 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_PASS,T_DKIM_INVALID autolearn=unavailable autolearn_force=no version=3.4.1 X-Envelope-From: schaefer@brasslantern.com X-Qmail-Scanner-Mime-Attachments: | X-Qmail-Scanner-Zip-Files: | Received-SPF: pass (ns1.primenet.com.au: SPF record at _netblocks.google.com designates 209.85.213.45 as permitted sender) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=brasslantern-com.20150623.gappssmtp.com; s=20150623; h=from:message-id:date:in-reply-to:comments:references:to:subject :mime-version; bh=wCqJsFZuElUntZnIYBXD0bYr5YaOnJN9WSbDB+MTtB8=; b=XxBaYNk1yqVKV0X/BH5/Io47McU1aYBFaVGzh64LNXTrgK+3Ocw4e1FJa9ru2kFalw XyN7s6evXdQoBYCDx1KVeRtuinRh0VBUlTq1UW45tmYyGVpPeoQXKKASiQLOgVlvt3gY T71LCkWkHQPvIzkuDBkknz8Q+cjTZiezUna5gzU6Mo3imKx9XRsjYMF7OJGx/pmbqLZx bseKom11GQc5/F4/spqeUEDbTPA6qUTXHn/LxnczhTwwyM7Clu9w8bN2ZFo2i8Wht5g3 txqFANEgquGbMSoU47iFQ7TKS9ID15uDhfkni5dIlPHkoQOm3tt38aAWe36TTtEkXsg1 Npgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:date:in-reply-to:comments :references:to:subject:mime-version; bh=wCqJsFZuElUntZnIYBXD0bYr5YaOnJN9WSbDB+MTtB8=; b=gRGl+mss5m0JCvhaaBz0WrN2XDxobdQQswKCHyTV47i7HfPh8jI+AE86Wa2IlrKl6N 3gJq+0pVmHUBYwVEim3b3KIbucPc3ZSrGBF4maav+khafkEaQWHCaEh6X6/BSxI2lzPp hY0V+QU826zkodG7Q8eiwbV723fkYD+T93qwbdnPDI3kegN53AF9D4+YJj2U5fCNWIpx tFY2/QQMQ3HddbTA/CfP0/moCSyM9u4TKkZ3DN820Gtl81r0W5IUmYh6+vDfFo2IrHJG 1OSAemJURGbyvjlGJAMolqI4ycLrwbd/Mi9XiZ6X3lJTZkGnE7RydrWMrIxZhjVO7fJU IERA== X-Gm-Message-State: AODbwcBFF6BITcmhion+XEDCBiCcWkcwMSjQLYvCEX5HSmfcdtmNrC/S DK8f4gU0ixzUpBmguPE= X-Received: by 10.31.137.145 with SMTP id l139mr12074630vkd.39.1496718809537; Mon, 05 Jun 2017 20:13:29 -0700 (PDT) From: Bart Schaefer Message-Id: <170605201354.ZM16693@torch.brasslantern.com> Date: Mon, 5 Jun 2017 20:13:54 -0700 In-Reply-To: <20170605115439.GA15325@chaz.gmail.com> Comments: In reply to Stephane Chazelas "Re: Surprising behaviour with numeric glob sort" (Jun 5, 12:54pm) References: <20170531212453.GA31563@chaz.gmail.com> <170601152943.ZM4783@torch.brasslantern.com> <20170602090332.GA6574@chaz.gmail.com> <170602161905.ZM10488@torch.brasslantern.com> <20170603211645.GA17785@chaz.gmail.com> <170603170724.ZM15645@torch.brasslantern.com> <20170604173157.GB9094@chaz.gmail.com> <170604150135.ZM13291@torch.brasslantern.com> <20170605115439.GA15325@chaz.gmail.com> X-Mailer: OpenZMail Classic (0.9.2 24April2005) To: Zsh hackers list Subject: Re: Surprising behaviour with numeric glob sort MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii On Jun 5, 12:54pm, Stephane Chazelas wrote: } } > Zsh has the additional complication of needing to deal with strings } > having embedded '\0' bytes, which neither strcoll nor strxfrm is able } > to deal with. I'm not 100% confident that zsh's current algorithm } > deals correct with this either. } } From what I can see (by using ltrace -e strcoll), zsh removes } the NUL bytes before calling strcoll, so $'\0\0x' sorts the same } as $'\0x' or 'x'. Like I said, I think it does this wrong. If I'm reading the code correctly, it first compares the strings for absolute identity while searching for embedded nuls, and if they are identical up to the nul it then orders the shorter string before the longer one; otherwise it skips past the last nul and then relies on strcoll() for the rest of both strings. It would seem to me that the collation order should be checked before any nul as well as after, otherwise the first loop might conclude the strings differ when strcoll() would order them the same. (However, read below.) } You mean a Schwartzian transform Yes, much like that. Src/sort.c already has a SortElt structure that is used to sort metafied strings by comparing their unmetafied forms. We only [*] need to add strxfrm() of the unmetafied strings in front and remove strcoll() of transformed strings at comparison, and then we're in business. For example, following strxfrm() the assumptions about absolute identity for nul handling suddenly become valid, so we don't have to fix that separately -- we just have to strxfrm() all the nul-separated substrings. } With a comparison function that does memcmp() on the "string" } parts and a number comparison on the "num" parts? Equivlent to that, yes. (I don't think zero-padding will work as we don't know how many zeroes are needed to make the strings be the same number of digits.) } > For globbing, we'd have to rely on something else such as } > whether MULTIBYTE is set. } } Note that for globbing, the "numeric" sort applies after the } "o+myfunc" or "oe:...:" transformation, so the strings to sort } on may still contain all sorts of things Whether there have been other globbing transforms turns out not to matter. The point about MULTIBYTE is that we have no glob flag we can push around to indicate that the shell should assume there are [not] wide characters in the comparions strings. [*] That "only" is deceptive; it's actually a fairly hefty ask, for reasons such as needing to handle case-insensitive comparisons too (currently everything is forced through towlower() in that event).