zsh-workers
 help / color / mirror / code / Atom feed
From: Peter Stephenson <p.stephenson@samsung.com>
To: zsh-workers@zsh.org
Subject: Re: Incorrect sorting of Polish characters
Date: Mon, 18 Jul 2016 11:17:35 +0100	[thread overview]
Message-ID: <20160718111735.6adea125@pwslap01u.europe.root.pri> (raw)
In-Reply-To: <20160718103329.7acbb1b1@pwslap01u.europe.root.pri>

On Mon, 18 Jul 2016 10:33:29 +0100
Peter Stephenson <p.stephenson@samsung.com> wrote:
> On Sat, 16 Jul 2016 13:07:18 -0700
> Bart Schaefer <schaefer@brasslantern.com> wrote:
> > On Jul 16,  7:17pm, M. Bartoszkiewicz wrote:
> > } I have noticed that some Polish characters
> > } are sorted incorrectly in glob expansion (but
> > } correctly in other contexts).
> 
> A simple-minded change to pass strcoll() unmetafied versions of the
> strings does seem to fix the problem, so it looks like this is the
> case.  However, that's not the right fix as we only want to unmetafy
> once per input string, not once per comparison, and below the call to
> qsort() there's quite a lot of internal string handling.  An equally
> simple-minded fix around the call to qsort() (saving and restoring the
> strings) didn't seem to work.  So this needs a bit more thought.

Adding an umetafied entry to the glob match that only gets used for
sorting seems to do the trick.  I think an additional single pass
through the array of matches isn't a big deal.  Possibly the sort code
needs a check through to confirm it really is unmeta-friendly for
globbing as there are different ways in.  Any other suggestions?

pws

diff --git a/Src/glob.c b/Src/glob.c
index 2051016..146b4db 100644
--- a/Src/glob.c
+++ b/Src/glob.c
@@ -41,7 +41,10 @@
 typedef struct gmatch *Gmatch;
 
 struct gmatch {
+    /* Metafied file name */
     char *name;
+    /* Unmetafied file name; embedded nulls can't occur in file names */
+    char *uname;
     /*
      * Array of sort strings:  one for each GS_EXEC sort type in
      * the glob qualifiers.
@@ -911,7 +914,8 @@ gmatchcmp(Gmatch a, Gmatch b)
     for (i = gf_nsorts, s = gf_sortlist; i; i--, s++) {
 	switch (s->tp & ~GS_DESC) {
 	case GS_NAME:
-	    r = zstrcmp(b->name, a->name, gf_numsort ? SORTIT_NUMERICALLY : 0);
+	    r = zstrcmp(b->uname, a->uname,
+			gf_numsort ? SORTIT_NUMERICALLY : 0);
 	    break;
 	case GS_DEPTH:
 	    {
@@ -1859,6 +1863,7 @@ zglob(LinkList list, LinkNode np, int nountok)
 	int nexecs = 0;
 	struct globsort *sortp;
 	struct globsort *lastsortp = gf_sortlist + gf_nsorts;
+	Gmatch gmptr;
 
 	/* First find out if there are any GS_EXECs, counting them. */
 	for (sortp = gf_sortlist; sortp < lastsortp; sortp++)
@@ -1910,6 +1915,29 @@ zglob(LinkList list, LinkNode np, int nountok)
 	    }
 	}
 
+	/*
+	 * Where necessary, create unmetafied version of names
+	 * for comparison.  If no Meta characters just point
+	 * to original string.  All on heap.
+	 */
+	for (gmptr = matchbuf; gmptr < matchptr; gmptr++)
+	{
+	    char *nptr;
+	    for (nptr = gmptr->name; *nptr; nptr++)
+	    {
+		if (*nptr == Meta)
+		    break;
+	    }
+	    if (*nptr == Meta)
+	    {
+		int dummy;
+		gmptr->uname = dupstring(gmptr->name);
+		unmetafy(gmptr->uname, &dummy);
+	    } else {
+		gmptr->uname = gmptr->name;
+	    }
+	}
+
 	/* Sort arguments in to lexical (and possibly numeric) order. *
 	 * This is reversed to facilitate insertion into the list.    */
 	qsort((void *) & matchbuf[0], matchct, sizeof(struct gmatch),
diff --git a/Test/D07multibyte.ztst b/Test/D07multibyte.ztst
index dedf241..1b1d042 100644
--- a/Test/D07multibyte.ztst
+++ b/Test/D07multibyte.ztst
@@ -562,3 +562,20 @@
   }
   : $functions)
 0:Multibtye handled of functions parameter
+
+  if [[ -n ${$(locale -a 2>/dev/null)[(R)pl_PL.utf8]} ]]; then
+  (
+    export LC_ALL=pl_PL.UTF-8
+    local -a names=(a b c d e f $'\u0105' $'\u0107' $'\u0119')
+    print -o $names
+    mkdir -p plchars
+    cd plchars
+    touch $names
+    print ?
+  )
+  else
+    ZTST_skip="No Polish UTF-8 local found, skipping sort test"
+  fi
+0:Sorting of metafied Polish characters
+>a ą b c ć d e ę f
+>a ą b c ć d e ę f


  reply	other threads:[~2016-07-18 10:17 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-16 17:17 Michał Bartoszkiewicz
2016-07-16 20:07 ` Bart Schaefer
2016-07-18  9:33   ` Peter Stephenson
2016-07-18 10:17     ` Peter Stephenson [this message]
2016-07-20  5:05       ` Bart Schaefer
2016-07-20  8:35         ` Peter Stephenson
2016-07-22  0:38           ` Bart Schaefer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160718111735.6adea125@pwslap01u.europe.root.pri \
    --to=p.stephenson@samsung.com \
    --cc=zsh-workers@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).