From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 26136 invoked by alias); 18 Jul 2016 10:17:48 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 38879 Received: (qmail 20583 invoked from network); 18 Jul 2016 10:17:48 -0000 X-Qmail-Scanner-Diagnostics: from mailout4.w1.samsung.com by f.primenet.com.au (envelope-from , uid 7791) with qmail-scanner-2.11 (clamdscan: 0.99.2/21882. spamassassin: 3.4.1. Clear:RC:0(210.118.77.14):SA:0(-1.3/5.0):. Processed in 0.228418 secs); 18 Jul 2016 10:17:48 -0000 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=RP_MATCHES_RCVD autolearn=unavailable autolearn_force=no version=3.4.1 X-Envelope-From: p.stephenson@samsung.com X-Qmail-Scanner-Mime-Attachments: | X-Qmail-Scanner-Zip-Files: | Received-SPF: none (ns1.primenet.com.au: domain at samsung.com does not designate permitted sender hosts) X-AuditID: cbfec7f4-f796c6d000001486-66-578cacc365a9 Date: Mon, 18 Jul 2016 11:17:35 +0100 From: Peter Stephenson To: zsh-workers@zsh.org Subject: Re: Incorrect sorting of Polish characters Message-id: <20160718111735.6adea125@pwslap01u.europe.root.pri> In-reply-to: <20160718103329.7acbb1b1@pwslap01u.europe.root.pri> References: <160716130718.ZM4513@torch.brasslantern.com> <20160718103329.7acbb1b1@pwslap01u.europe.root.pri> Organization: Samsung Cambridge Solution Centre X-Mailer: Claws Mail 3.7.9 (GTK+ 2.22.0; i386-redhat-linux-gnu) MIME-version: 1.0 Content-type: text/plain; charset=UTF-8 Content-transfer-encoding: quoted-printable X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrELMWRmVeSWpSXmKPExsVy+t/xy7qH1/SEG6w/p2NxsPkhkwOjx6qD H5gCGKO4bFJSczLLUov07RK4Mj4sm81acESu4tXbfpYGxvXiXYycHBICJhLzP31mg7DFJC7c Ww9kc3EICSxllLh47BEzhDODSeLjgnVMEM45RokXD7ZCZc4yShz6+IcZpJ9FQFWi+/tFMJtN wFBi6qbZjCC2iIC4xNm151lAbGEBY4mml8vBangF7CX+7FoNtptTwEFi6Yub7BBDtzNKzNl2 FqyZX0Bf4urfT0wQB9pLzLxyhhGiWVDix+R7YEOZBdQlJs1bxAxha0s8eXeBFcQWAorfuLub fQKj8CwkLbOQtMxC0rKAkXkVo2hqaXJBcVJ6rqFecWJucWleul5yfu4mRkhIf9nBuPiY1SFG AQ5GJR7eG2u7w4VYE8uKK3MPMUpwMCuJ8P5Z1RMuxJuSWFmVWpQfX1Sak1p8iFGag0VJnHfu rvchQgLpiSWp2ampBalFMFkmDk6pBkZ56wOH4+2kt0nfuGh3lzlnfWGMQN98f7+wHVlzeyez tWWWv16y5fOtYPN9UvN2pWzITlC+HtSi9orhfNy0/+laS3u7RBUDJbVcV7S7N+dOqGK6+9R7 9UHTjlN+Vd7OcU3avxs6Hx9dI6uZJ3k4L8SS4wOjhen9kig/bp6fGYt2M5U7Wf9tU2Ipzkg0 1GIuKk4EAPYiv61lAgAA On Mon, 18 Jul 2016 10:33:29 +0100 Peter Stephenson wrote: > On Sat, 16 Jul 2016 13:07:18 -0700 > Bart Schaefer wrote: > > On Jul 16, 7:17pm, M. Bartoszkiewicz wrote: > > } I have noticed that some Polish characters > > } are sorted incorrectly in glob expansion (but > > } correctly in other contexts). >=20 > A simple-minded change to pass strcoll() unmetafied versions of the > strings does seem to fix the problem, so it looks like this is the > case. However, that's not the right fix as we only want to unmetafy > once per input string, not once per comparison, and below the call to > qsort() there's quite a lot of internal string handling. An equally > simple-minded fix around the call to qsort() (saving and restoring the > strings) didn't seem to work. So this needs a bit more thought. Adding an umetafied entry to the glob match that only gets used for sorting seems to do the trick. I think an additional single pass through the array of matches isn't a big deal. Possibly the sort code needs a check through to confirm it really is unmeta-friendly for globbing as there are different ways in. Any other suggestions? pws diff --git a/Src/glob.c b/Src/glob.c index 2051016..146b4db 100644 --- a/Src/glob.c +++ b/Src/glob.c @@ -41,7 +41,10 @@ typedef struct gmatch *Gmatch; =20 struct gmatch { + /* Metafied file name */ char *name; + /* Unmetafied file name; embedded nulls can't occur in file names */ + char *uname; /* * Array of sort strings: one for each GS_EXEC sort type in * the glob qualifiers. @@ -911,7 +914,8 @@ gmatchcmp(Gmatch a, Gmatch b) for (i =3D gf_nsorts, s =3D gf_sortlist; i; i--, s++) { switch (s->tp & ~GS_DESC) { case GS_NAME: - r =3D zstrcmp(b->name, a->name, gf_numsort ? SORTIT_NUMERICALLY : 0); + r =3D zstrcmp(b->uname, a->uname, + gf_numsort ? SORTIT_NUMERICALLY : 0); break; case GS_DEPTH: { @@ -1859,6 +1863,7 @@ zglob(LinkList list, LinkNode np, int nountok) int nexecs =3D 0; struct globsort *sortp; struct globsort *lastsortp =3D gf_sortlist + gf_nsorts; + Gmatch gmptr; =20 /* First find out if there are any GS_EXECs, counting them. */ for (sortp =3D gf_sortlist; sortp < lastsortp; sortp++) @@ -1910,6 +1915,29 @@ zglob(LinkList list, LinkNode np, int nountok) } } =20 + /* + * Where necessary, create unmetafied version of names + * for comparison. If no Meta characters just point + * to original string. All on heap. + */ + for (gmptr =3D matchbuf; gmptr < matchptr; gmptr++) + { + char *nptr; + for (nptr =3D gmptr->name; *nptr; nptr++) + { + if (*nptr =3D=3D Meta) + break; + } + if (*nptr =3D=3D Meta) + { + int dummy; + gmptr->uname =3D dupstring(gmptr->name); + unmetafy(gmptr->uname, &dummy); + } else { + gmptr->uname =3D gmptr->name; + } + } + /* Sort arguments in to lexical (and possibly numeric) order. * * This is reversed to facilitate insertion into the list. */ qsort((void *) & matchbuf[0], matchct, sizeof(struct gmatch), diff --git a/Test/D07multibyte.ztst b/Test/D07multibyte.ztst index dedf241..1b1d042 100644 --- a/Test/D07multibyte.ztst +++ b/Test/D07multibyte.ztst @@ -562,3 +562,20 @@ } : $functions) 0:Multibtye handled of functions parameter + + if [[ -n ${$(locale -a 2>/dev/null)[(R)pl_PL.utf8]} ]]; then + ( + export LC_ALL=3Dpl_PL.UTF-8 + local -a names=3D(a b c d e f $'\u0105' $'\u0107' $'\u0119') + print -o $names + mkdir -p plchars + cd plchars + touch $names + print ? + ) + else + ZTST_skip=3D"No Polish UTF-8 local found, skipping sort test" + fi +0:Sorting of metafied Polish characters +>a =C4=85 b c =C4=87 d e =C4=99 f +>a =C4=85 b c =C4=87 d e =C4=99 f