zsh-workers
 help / color / mirror / code / Atom feed
* E02 failing on Alpine / musl libc
@ 2022-05-16  7:14 dana
  2022-05-16 10:54 ` Peter Stephenson
  2022-05-17  2:33 ` Jun T
  0 siblings, 2 replies; 4+ messages in thread
From: dana @ 2022-05-16  7:14 UTC (permalink / raw)
  To: Zsh hackers list

Earlier today an Alpine dev posted on IRC that an E02 test is failing for them
on 5.9:

  https://gitlab.alpinelinux.org/alpine/aports/-/merge_requests/34456

It's an issue with the way the function name ヌ is printed:

  -$'\M-c\M-\C-C\M-\C-L' () {
  +$'\udfe3\udf83\udf8c' () {

I assume it's to do with this:

> Starting with version 1.1.11, musl provides a special C locale where bytes
> 0x80-0xff are treated as abstract single-byte-character units with no actual
> character identity (they’re mapped into wchar_t values that occupy the
> Unicode surrogates range).

( https://wiki.musl-libc.org/functional-differences-from-glibc.html )

dana


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: E02 failing on Alpine / musl libc
  2022-05-16  7:14 E02 failing on Alpine / musl libc dana
@ 2022-05-16 10:54 ` Peter Stephenson
  2022-05-17  2:33 ` Jun T
  1 sibling, 0 replies; 4+ messages in thread
From: Peter Stephenson @ 2022-05-16 10:54 UTC (permalink / raw)
  To: Zsh hackers list

> On 16 May 2022 at 08:14 dana <dana@dana.is> wrote:
> Earlier today an Alpine dev posted on IRC that an E02 test is failing for them
> on 5.9:
> 
>   https://gitlab.alpinelinux.org/alpine/aports/-/merge_requests/34456
> 
> It's an issue with the way the function name ヌ is printed:
> 
>   -$'\M-c\M-\C-C\M-\C-L' () {
>   +$'\udfe3\udf83\udf8c' () {

This probably isn't a big deal for this test, which isn't even a multibyte
test, it's just to check we've got come consistent representation for strange
output with a meta bit.  So arguably we could just pick a simpler string to
test, for an easier life.

The character output isn't right, though (try passing both versions to print),
so I suppose it's a real bug somewhere.  The question is whether it's our job
to chase it.

pws


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: E02 failing on Alpine / musl libc
  2022-05-16  7:14 E02 failing on Alpine / musl libc dana
  2022-05-16 10:54 ` Peter Stephenson
@ 2022-05-17  2:33 ` Jun T
  2022-05-19  3:27   ` dana
  1 sibling, 1 reply; 4+ messages in thread
From: Jun T @ 2022-05-17  2:33 UTC (permalink / raw)
  To: zsh-workers


> 2022/05/16 16:14, dana <dana@dana.is> wrote:
> 
> I assume it's to do with this:
> 
>> Starting with version 1.1.11, musl provides a special C locale where bytes
>> 0x80-0xff are treated as abstract single-byte-character units with no actual
>> character identity (they’re mapped into wchar_t values that occupy the
>> Unicode surrogates range).

I tried Alpine for the first time, and found that E02 and two other tests
(see below) failed due to this "special" C locale.

In this "special" C locale,
   str[0] = 0xXX;  /* any value in the range 0x80-0xff */
   mbrtowc(&wc, str, 1, &mbs);
sets wc to 0xdfXX (not just 0xXX).
For example, if 0xXX is 0x83 then wc is set to 0xdf83.

This is indeed "special", but it seems globbing etc. works without problem.
So I think we need/should not "fix" this, because 0xfdXX (or \ufdXX) is the
correct representation in their "special" C loale.

IF they want they can just change (in their package) the expected outputs
of the tests to their correct values.


These are the two tests that fail due to the same reason:

./A03quoting.ztst: starting.
--- /tmp/zsh.ztst.13004/ztst.out
+++ /tmp/zsh.ztst.13004/ztst.tout
@@ -4,4 +4,4 @@
 16#4D
 16#42
 16#53
-16#DC
+16#DFDC
Test ./A03quoting.ztst failed: output differs from expected as shown above for:
  chars=$(print -r $'BS\\MBS\M-\\')
  for (( i = 1; i <= $#chars; i++ )); do
    char=$chars[$i]
    print $(( [#16] #char ))
  done
Was testing: $'-style quote with metafied backslash


./B03print.ztst: starting.
--- /tmp/zsh.ztst.20798/ztst.out
+++ /tmp/zsh.ztst.20798/ztst.tout
@@ -1 +1 @@
-f0
+dff0
Test ./B03print.ztst failed: output differs from expected as shown above for:
 printf '%x\n' $(printf '"\xf0')
Was testing: numeric value of high numbered character




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: E02 failing on Alpine / musl libc
  2022-05-17  2:33 ` Jun T
@ 2022-05-19  3:27   ` dana
  0 siblings, 0 replies; 4+ messages in thread
From: dana @ 2022-05-19  3:27 UTC (permalink / raw)
  To: Jun T; +Cc: Zsh hackers list

On Mon 16 May 2022, at 21:33, Jun T wrote:
> So I think we need/should not "fix" this, because 0xfdXX (or \ufdXX) is the
> correct representation in their "special" C loale.

I think i see the argument for not trying to do any 'special' accounting
of this locale in the shell. As far as the tests, i guess we are
technically making assumptions about the wchar values of non-'portable'
characters that POSIX says we can't actually make, but not making those
assumptions seems annoying

For the E02 test in particular, as Peter says, it isn't a multi-byte test.
If there's not anything special about the code path for xtrace
preservation that's sensitive to weird function names maybe that aspect of
the test belongs in B13, C04, or D07...?


Here is some additional context/history behind these failing tests, in
case anyone's ever looking for it later. Don't read this, you probably
don't care:

The A03 and B03 tests that Jun mentioned here have been failing on musl
since at least zsh-5.5 — probably longer (despite workers/48578 indicating
that it'd only started 'recently'), since the """special""" (lol) locale
was introduced to musl in August 2015, and made its way into Alpine very
shortly afterwards

The LC_ALL=C in the failing E02 test was introduced by me and Jun in
workers/45537+45550 to fix a similar issue i was seeing with the way the
function name ヌ was being printed by `which` on macOS Mojave. I bet i was
having this problem because i had explicitly set LC_CTYPE to a UTF-8
locale, and Jun had not yet made the change in workers/49908 to have ztst
reset that back to C like it did with LANG and LC_ALL. It does now reset
it with the others so the LC_ALL=C is probably superfluous in that respect

However, if you don't have *any* LANG/LC_* variables set, on some systems,
including Alpine, where the 'implementation-defined default locale' is
UTF-8, you can get the same behaviour i was seeing where `which` just
prints ヌ back out without any escaping

I mention that because there are basically only two possibilities on a
typical musl system (either the 'special' POSIX locale or a UTF-8 one)
and both of them will cause the test to fail as written. And also because
there might be other systems that have a UTF-8 default locale where this
test and others could fail without an explicit LC_ALL=C because ztst only
resets the locale to C if we're *not* using the default one (which i don't
think i understand the reasoning for)


dana


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-05-19  3:28 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-16  7:14 E02 failing on Alpine / musl libc dana
2022-05-16 10:54 ` Peter Stephenson
2022-05-17  2:33 ` Jun T
2022-05-19  3:27   ` dana

Code repositories for project(s) associated with this inbox:

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).