Earlier today an Alpine dev posted on IRC that an E02 test is failing for them on 5.9: https://gitlab.alpinelinux.org/alpine/aports/-/merge_requests/34456 It's an issue with the way the function name ヌ is printed: -$'\M-c\M-\C-C\M-\C-L' () { +$'\udfe3\udf83\udf8c' () { I assume it's to do with this: > Starting with version 1.1.11, musl provides a special C locale where bytes > 0x80-0xff are treated as abstract single-byte-character units with no actual > character identity (they’re mapped into wchar_t values that occupy the > Unicode surrogates range). ( https://wiki.musl-libc.org/functional-differences-from-glibc.html ) dana
> On 16 May 2022 at 08:14 dana <dana@dana.is> wrote:
> Earlier today an Alpine dev posted on IRC that an E02 test is failing for them
> on 5.9:
>
> https://gitlab.alpinelinux.org/alpine/aports/-/merge_requests/34456
>
> It's an issue with the way the function name ヌ is printed:
>
> -$'\M-c\M-\C-C\M-\C-L' () {
> +$'\udfe3\udf83\udf8c' () {
This probably isn't a big deal for this test, which isn't even a multibyte
test, it's just to check we've got come consistent representation for strange
output with a meta bit. So arguably we could just pick a simpler string to
test, for an easier life.
The character output isn't right, though (try passing both versions to print),
so I suppose it's a real bug somewhere. The question is whether it's our job
to chase it.
pws
> 2022/05/16 16:14, dana <dana@dana.is> wrote:
>
> I assume it's to do with this:
>
>> Starting with version 1.1.11, musl provides a special C locale where bytes
>> 0x80-0xff are treated as abstract single-byte-character units with no actual
>> character identity (they’re mapped into wchar_t values that occupy the
>> Unicode surrogates range).
I tried Alpine for the first time, and found that E02 and two other tests
(see below) failed due to this "special" C locale.
In this "special" C locale,
str[0] = 0xXX; /* any value in the range 0x80-0xff */
mbrtowc(&wc, str, 1, &mbs);
sets wc to 0xdfXX (not just 0xXX).
For example, if 0xXX is 0x83 then wc is set to 0xdf83.
This is indeed "special", but it seems globbing etc. works without problem.
So I think we need/should not "fix" this, because 0xfdXX (or \ufdXX) is the
correct representation in their "special" C loale.
IF they want they can just change (in their package) the expected outputs
of the tests to their correct values.
These are the two tests that fail due to the same reason:
./A03quoting.ztst: starting.
--- /tmp/zsh.ztst.13004/ztst.out
+++ /tmp/zsh.ztst.13004/ztst.tout
@@ -4,4 +4,4 @@
16#4D
16#42
16#53
-16#DC
+16#DFDC
Test ./A03quoting.ztst failed: output differs from expected as shown above for:
chars=$(print -r $'BS\\MBS\M-\\')
for (( i = 1; i <= $#chars; i++ )); do
char=$chars[$i]
print $(( [#16] #char ))
done
Was testing: $'-style quote with metafied backslash
./B03print.ztst: starting.
--- /tmp/zsh.ztst.20798/ztst.out
+++ /tmp/zsh.ztst.20798/ztst.tout
@@ -1 +1 @@
-f0
+dff0
Test ./B03print.ztst failed: output differs from expected as shown above for:
printf '%x\n' $(printf '"\xf0')
Was testing: numeric value of high numbered character
On Mon 16 May 2022, at 21:33, Jun T wrote:
> So I think we need/should not "fix" this, because 0xfdXX (or \ufdXX) is the
> correct representation in their "special" C loale.
I think i see the argument for not trying to do any 'special' accounting
of this locale in the shell. As far as the tests, i guess we are
technically making assumptions about the wchar values of non-'portable'
characters that POSIX says we can't actually make, but not making those
assumptions seems annoying
For the E02 test in particular, as Peter says, it isn't a multi-byte test.
If there's not anything special about the code path for xtrace
preservation that's sensitive to weird function names maybe that aspect of
the test belongs in B13, C04, or D07...?
Here is some additional context/history behind these failing tests, in
case anyone's ever looking for it later. Don't read this, you probably
don't care:
The A03 and B03 tests that Jun mentioned here have been failing on musl
since at least zsh-5.5 — probably longer (despite workers/48578 indicating
that it'd only started 'recently'), since the """special""" (lol) locale
was introduced to musl in August 2015, and made its way into Alpine very
shortly afterwards
The LC_ALL=C in the failing E02 test was introduced by me and Jun in
workers/45537+45550 to fix a similar issue i was seeing with the way the
function name ヌ was being printed by `which` on macOS Mojave. I bet i was
having this problem because i had explicitly set LC_CTYPE to a UTF-8
locale, and Jun had not yet made the change in workers/49908 to have ztst
reset that back to C like it did with LANG and LC_ALL. It does now reset
it with the others so the LC_ALL=C is probably superfluous in that respect
However, if you don't have *any* LANG/LC_* variables set, on some systems,
including Alpine, where the 'implementation-defined default locale' is
UTF-8, you can get the same behaviour i was seeing where `which` just
prints ヌ back out without any escaping
I mention that because there are basically only two possibilities on a
typical musl system (either the 'special' POSIX locale or a UTF-8 one)
and both of them will cause the test to fail as written. And also because
there might be other systems that have a UTF-8 default locale where this
test and others could fail without an explicit LC_ALL=C because ztst only
resets the locale to C if we're *not* using the default one (which i don't
think i understand the reasoning for)
dana