zsh-workers
 help / color / mirror / code / Atom feed
From: Bart Schaefer <schaefer@brasslantern.com>
To: Zsh hackers list <zsh-workers@zsh.org>
Subject: Re: Surprising behaviour with numeric glob sort
Date: Mon, 5 Jun 2017 20:13:54 -0700	[thread overview]
Message-ID: <170605201354.ZM16693@torch.brasslantern.com> (raw)
In-Reply-To: <20170605115439.GA15325@chaz.gmail.com>

On Jun 5, 12:54pm, Stephane Chazelas wrote:
}
} > Zsh has the additional complication of needing to deal with strings
} > having embedded '\0' bytes, which neither strcoll nor strxfrm is able
} > to deal with.  I'm not 100% confident that zsh's current algorithm
} > deals correct with this either.
} 
} From what I can see (by using ltrace -e strcoll), zsh removes
} the NUL bytes before calling strcoll, so $'\0\0x' sorts the same
} as $'\0x' or 'x'.

Like I said, I think it does this wrong.  If I'm reading the code
correctly, it first compares the strings for absolute identity while
searching for embedded nuls, and if they are identical up to the nul
it then orders the shorter string before the longer one; otherwise
it skips past the last nul and then relies on strcoll() for the rest
of both strings.  It would seem to me that the collation order should
be checked before any nul as well as after, otherwise the first loop
might conclude the strings differ when strcoll() would order them the
same.  (However, read below.)

} You mean a Schwartzian transform

Yes, much like that.  Src/sort.c already has a SortElt structure that
is used to sort metafied strings by comparing their unmetafied forms.
We only [*] need to add strxfrm() of the unmetafied strings in front
and remove strcoll() of transformed strings at comparison, and then
we're in business.  For example, following strxfrm() the assumptions
about absolute identity for nul handling suddenly become valid, so we
don't have to fix that separately -- we just have to strxfrm() all
the nul-separated substrings.

} With a comparison function that does memcmp() on the "string"
} parts and a number comparison on the "num" parts?

Equivlent to that, yes.  (I don't think zero-padding will work as we
don't know how many zeroes are needed to make the strings be the same
number of digits.)

} > For globbing, we'd have to rely on something else such as
} > whether MULTIBYTE is set.
} 
} Note that for globbing, the "numeric" sort applies after the
} "o+myfunc" or "oe:...:" transformation, so the strings to sort
} on may still contain all sorts of things

Whether there have been other globbing transforms turns out not to
matter.  The point about MULTIBYTE is that we have no glob flag we
can push around to indicate that the shell should assume there are
[not] wide characters in the comparions strings.

[*] That "only" is deceptive; it's actually a fairly hefty ask, for
reasons such as needing to handle case-insensitive comparisons too
(currently everything is forced through towlower() in that event).


  parent reply	other threads:[~2017-06-06  3:13 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-31 21:24 Stephane Chazelas
2017-06-01 22:29 ` Bart Schaefer
2017-06-02  9:03   ` Stephane Chazelas
2017-06-02 23:19     ` Bart Schaefer
2017-06-03 21:16       ` Stephane Chazelas
2017-06-04  0:07         ` Bart Schaefer
2017-06-04 17:31           ` Stephane Chazelas
2017-06-04 22:01             ` Bart Schaefer
2017-06-05 11:54               ` Stephane Chazelas
2017-06-05 19:15                 ` Stephane Chazelas
2017-06-06  3:13                 ` Bart Schaefer [this message]
2017-06-06  9:22                   ` Stephane Chazelas
2017-06-07  8:41                 ` Stephane Chazelas
2017-06-17 18:11                   ` Bart Schaefer
2017-06-06 14:44         ` Vincent Lefevre
2017-06-06 16:47           ` Stephane Chazelas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=170605201354.ZM16693@torch.brasslantern.com \
    --to=schaefer@brasslantern.com \
    --cc=zsh-workers@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).