From: Peter Stephenson <p.stephenson@samsung.com>
To: zsh-workers@zsh.org
Subject: Re: Incorrect sorting of Polish characters
Date: Mon, 18 Jul 2016 11:17:35 +0100 [thread overview]
Message-ID: <20160718111735.6adea125@pwslap01u.europe.root.pri> (raw)
In-Reply-To: <20160718103329.7acbb1b1@pwslap01u.europe.root.pri>
On Mon, 18 Jul 2016 10:33:29 +0100
Peter Stephenson <p.stephenson@samsung.com> wrote:
> On Sat, 16 Jul 2016 13:07:18 -0700
> Bart Schaefer <schaefer@brasslantern.com> wrote:
> > On Jul 16, 7:17pm, M. Bartoszkiewicz wrote:
> > } I have noticed that some Polish characters
> > } are sorted incorrectly in glob expansion (but
> > } correctly in other contexts).
>
> A simple-minded change to pass strcoll() unmetafied versions of the
> strings does seem to fix the problem, so it looks like this is the
> case. However, that's not the right fix as we only want to unmetafy
> once per input string, not once per comparison, and below the call to
> qsort() there's quite a lot of internal string handling. An equally
> simple-minded fix around the call to qsort() (saving and restoring the
> strings) didn't seem to work. So this needs a bit more thought.
Adding an umetafied entry to the glob match that only gets used for
sorting seems to do the trick. I think an additional single pass
through the array of matches isn't a big deal. Possibly the sort code
needs a check through to confirm it really is unmeta-friendly for
globbing as there are different ways in. Any other suggestions?
pws
diff --git a/Src/glob.c b/Src/glob.c
index 2051016..146b4db 100644
--- a/Src/glob.c
+++ b/Src/glob.c
@@ -41,7 +41,10 @@
typedef struct gmatch *Gmatch;
struct gmatch {
+ /* Metafied file name */
char *name;
+ /* Unmetafied file name; embedded nulls can't occur in file names */
+ char *uname;
/*
* Array of sort strings: one for each GS_EXEC sort type in
* the glob qualifiers.
@@ -911,7 +914,8 @@ gmatchcmp(Gmatch a, Gmatch b)
for (i = gf_nsorts, s = gf_sortlist; i; i--, s++) {
switch (s->tp & ~GS_DESC) {
case GS_NAME:
- r = zstrcmp(b->name, a->name, gf_numsort ? SORTIT_NUMERICALLY : 0);
+ r = zstrcmp(b->uname, a->uname,
+ gf_numsort ? SORTIT_NUMERICALLY : 0);
break;
case GS_DEPTH:
{
@@ -1859,6 +1863,7 @@ zglob(LinkList list, LinkNode np, int nountok)
int nexecs = 0;
struct globsort *sortp;
struct globsort *lastsortp = gf_sortlist + gf_nsorts;
+ Gmatch gmptr;
/* First find out if there are any GS_EXECs, counting them. */
for (sortp = gf_sortlist; sortp < lastsortp; sortp++)
@@ -1910,6 +1915,29 @@ zglob(LinkList list, LinkNode np, int nountok)
}
}
+ /*
+ * Where necessary, create unmetafied version of names
+ * for comparison. If no Meta characters just point
+ * to original string. All on heap.
+ */
+ for (gmptr = matchbuf; gmptr < matchptr; gmptr++)
+ {
+ char *nptr;
+ for (nptr = gmptr->name; *nptr; nptr++)
+ {
+ if (*nptr == Meta)
+ break;
+ }
+ if (*nptr == Meta)
+ {
+ int dummy;
+ gmptr->uname = dupstring(gmptr->name);
+ unmetafy(gmptr->uname, &dummy);
+ } else {
+ gmptr->uname = gmptr->name;
+ }
+ }
+
/* Sort arguments in to lexical (and possibly numeric) order. *
* This is reversed to facilitate insertion into the list. */
qsort((void *) & matchbuf[0], matchct, sizeof(struct gmatch),
diff --git a/Test/D07multibyte.ztst b/Test/D07multibyte.ztst
index dedf241..1b1d042 100644
--- a/Test/D07multibyte.ztst
+++ b/Test/D07multibyte.ztst
@@ -562,3 +562,20 @@
}
: $functions)
0:Multibtye handled of functions parameter
+
+ if [[ -n ${$(locale -a 2>/dev/null)[(R)pl_PL.utf8]} ]]; then
+ (
+ export LC_ALL=pl_PL.UTF-8
+ local -a names=(a b c d e f $'\u0105' $'\u0107' $'\u0119')
+ print -o $names
+ mkdir -p plchars
+ cd plchars
+ touch $names
+ print ?
+ )
+ else
+ ZTST_skip="No Polish UTF-8 local found, skipping sort test"
+ fi
+0:Sorting of metafied Polish characters
+>a ą b c ć d e ę f
+>a ą b c ć d e ę f
next prev parent reply other threads:[~2016-07-18 10:17 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-16 17:17 Michał Bartoszkiewicz
2016-07-16 20:07 ` Bart Schaefer
2016-07-18 9:33 ` Peter Stephenson
2016-07-18 10:17 ` Peter Stephenson [this message]
2016-07-20 5:05 ` Bart Schaefer
2016-07-20 8:35 ` Peter Stephenson
2016-07-22 0:38 ` Bart Schaefer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160718111735.6adea125@pwslap01u.europe.root.pri \
--to=p.stephenson@samsung.com \
--cc=zsh-workers@zsh.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/zsh/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).