From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 12344 invoked by alias); 31 Jan 2017 10:54:46 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 40470 Received: (qmail 15859 invoked from network); 31 Jan 2017 10:54:46 -0000 X-Qmail-Scanner-Diagnostics: from rcpt-mqugw.biglobe.ne.jp by f.primenet.com.au (envelope-from , uid 7791) with qmail-scanner-2.11 (clamdscan: 0.99.2/21882. spamassassin: 3.4.1. Clear:RC:0(133.208.100.4):SA:0(-3.9/5.0):. Processed in 2.325264 secs); 31 Jan 2017 10:54:46 -0000 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-3.9 required=5.0 tests=RCVD_IN_DNSWL_LOW, RP_MATCHES_RCVD,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.1 X-Envelope-From: takimoto-j@kba.biglobe.ne.jp X-Qmail-Scanner-Mime-Attachments: | X-Qmail-Scanner-Zip-Files: | Received-SPF: pass (ns1.primenet.com.au: SPF record at spf01.biglobe.ne.jp designates 133.208.100.4 as permitted sender) X-Biglobe-Sender: Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\)) Subject: Re: UTF-8 locales on BSDs do not support collation correctly From: "Jun T." In-Reply-To: Date: Tue, 31 Jan 2017 19:09:58 +0900 Content-Transfer-Encoding: quoted-printable Message-Id: References: To: zsh-workers@zsh.org X-Mailer: Apple Mail (2.1510) X-Biglobe-Spnum: 63335 On 2017/01/30, at 12:59, Bart Schaefer = wrote: > Is this > just an issue with the test or is there a real problem here? I believe there is no problem in zsh. The problem is that macOS does not support UTF-8 collation at all. For example, on macOS, /usr/share/locale/pl_PL.UTF-8/LC_COLLATE is a symlink to /usr/share/locale/la_LN.US-ASCII/LC_COLLATE and the strcoll(3) always uses ASCII collation. The commit 72e5fe7 modifies glob.c so that unmetafied file names are (correctly) used for glob sorting. In order to test this on both Linux and macOS, we need two characters (or strings) c1 and c2 which satisfy c1 < c2 and metafy(c1) > metafy(c2) in both UTF-8 and ASCII collations. It seems the following two characters can be used: Unicode UTF-8 metafied --------------------------------------- c1 =C4=84 U+0104 c4 84 c4 83 a4 c2 =C4=A0 U+0120 c4 a0 c4 83 80 So how about the following patch? With this patch, the test fails without the commit 72e5fe7 but succeeds with it, on both Linux and = macOS. diff --git a/Test/D07multibyte.ztst b/Test/D07multibyte.ztst index 0ff65c7..e203153 100644 --- a/Test/D07multibyte.ztst +++ b/Test/D07multibyte.ztst @@ -551,22 +551,20 @@ : $functions) 0:Multibyte handling of functions parameter =20 - if [[ -n ${$(locale -a 2>/dev/null)[(R)pl_PL.(utf8|UTF-8)]} ]]; then - ( - export LC_ALL=3Dpl_PL.UTF-8 - local -a names=3D(a b c d e f $'\u0105' $'\u0107' $'\u0119') - print -o $names - mkdir -p plchars - cd plchars - touch $names - print ? - ) - else - ZTST_skip=3D"No Polish UTF-8 locale found, skipping sort test" - fi -0:Sorting of metafied Polish characters ->a =C4=85 b c =C4=87 d e =C4=99 f ->a =C4=85 b c =C4=87 d e =C4=99 f +# c1=3DU+0104 (=C4=84) and c2=3DU+0120 (=C4=A0) are chosen so that +# u1 =3D utf8(c1) =3D c4 84 < u2 =3D utf8(c2) =3D c4 a0 +# metafy(u1) =3D c4 83 a4 > metafy(u2) =3D c4 83 80 +# in both UTF-8 and ASCII collations (the latter is used in macOS +# and some versions of BSDs). + local -a names=3D( $'\u0104' $'\u0120' ) + print -o $names + mkdir -p colltest + cd colltest + touch $names + print ? +0:Sorting of metafied characters +>=C4=84 =C4=A0 +>=C4=84 =C4=A0 =20 printf '%q%q\n' =E4=BD=A0=E4=BD=A0 0:printf %q and quotestring and general metafy / token madness