zsh-workers
 help / color / mirror / code / Atom feed
From: "Jun T." <takimoto-j@kba.biglobe.ne.jp>
To: zsh-workers@zsh.org
Subject: Re: UTF-8 locales on BSDs do not support collation correctly
Date: Tue, 31 Jan 2017 19:09:58 +0900	[thread overview]
Message-ID: <E5967E7D-7FB8-4888-959D-E411358EF839@kba.biglobe.ne.jp> (raw)
In-Reply-To: <CAH+w=7amXAO1S7GjT35WA9_KcQsHG838t4CaTnWmaYHv4ARHmg@mail.gmail.com>


On 2017/01/30, at 12:59, Bart Schaefer <schaefer@brasslantern.com> wrote:

> Is this
> just an issue with the test or is there a real problem here?

I believe there is no problem in zsh.
The problem is that macOS does not support UTF-8 collation at all.
For example, on macOS,
  /usr/share/locale/pl_PL.UTF-8/LC_COLLATE
is a symlink to
  /usr/share/locale/la_LN.US-ASCII/LC_COLLATE
and the strcoll(3) always uses ASCII collation.

The commit 72e5fe7 modifies glob.c so that unmetafied file names are
(correctly) used for glob sorting. In order to test this on both Linux
and macOS, we need two characters (or strings) c1 and c2 which satisfy

  c1 < c2    and    metafy(c1) > metafy(c2)

in both UTF-8 and ASCII collations. It seems the following two
characters can be used:

       Unicode      UTF-8      metafied
---------------------------------------
c1  Ą   U+0104      c4 84      c4 83 a4
c2  Ġ   U+0120      c4 a0      c4 83 80

So how about the following patch? With this patch, the test fails
without the commit 72e5fe7 but succeeds with it, on both Linux and macOS.


diff --git a/Test/D07multibyte.ztst b/Test/D07multibyte.ztst
index 0ff65c7..e203153 100644
--- a/Test/D07multibyte.ztst
+++ b/Test/D07multibyte.ztst
@@ -551,22 +551,20 @@
   : $functions)
 0:Multibyte handling of functions parameter
 
-  if [[ -n ${$(locale -a 2>/dev/null)[(R)pl_PL.(utf8|UTF-8)]} ]]; then
-  (
-    export LC_ALL=pl_PL.UTF-8
-    local -a names=(a b c d e f $'\u0105' $'\u0107' $'\u0119')
-    print -o $names
-    mkdir -p plchars
-    cd plchars
-    touch $names
-    print ?
-  )
-  else
-    ZTST_skip="No Polish UTF-8 locale found, skipping sort test"
-  fi
-0:Sorting of metafied Polish characters
->a ą b c ć d e ę f
->a ą b c ć d e ę f
+# c1=U+0104 (Ą) and c2=U+0120 (Ġ) are chosen so that
+#   u1 = utf8(c1) = c4 84  <  u2 = utf8(c2) = c4 a0
+#   metafy(u1) = c4 83 a4  >  metafy(u2) = c4 83 80
+# in both UTF-8 and ASCII collations (the latter is used in macOS
+# and some versions of BSDs).
+  local -a names=( $'\u0104' $'\u0120' )
+  print -o $names
+  mkdir -p colltest
+  cd colltest
+  touch $names
+  print ?
+0:Sorting of metafied characters
+>Ą Ġ
+>Ą Ġ
 
   printf '%q%q\n' 你你
 0:printf %q and quotestring and general metafy / token madness




  parent reply	other threads:[~2017-01-31 10:54 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-25 14:27 Jun T.
2017-01-25 18:02 ` Mikael Magnusson
2017-01-26 17:57   ` Peter Stephenson
2017-01-26 19:30     ` Jens Elkner
2017-01-27  9:41     ` Peter Stephenson
2017-01-28 20:26       ` Bart Schaefer
2017-01-28 20:42         ` Peter Stephenson
2017-01-28 23:27           ` Bart Schaefer
2017-01-30 10:46             ` Peter Stephenson
2017-01-30  3:59 ` Bart Schaefer
2017-01-30  9:49   ` Peter Stephenson
2017-01-31 10:09   ` Jun T. [this message]
2017-01-31 11:19     ` Peter Stephenson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E5967E7D-7FB8-4888-959D-E411358EF839@kba.biglobe.ne.jp \
    --to=takimoto-j@kba.biglobe.ne.jp \
    --cc=zsh-workers@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).