On Mon, May 24, 2021 at 12:39:35AM -0400, Konstantin Isakov wrote:
> Hi,
>
> The following program:
>
> ===================================
> #include <stdio.h>
> #include <wchar.h>
>
> int main()
> {
> wchar_t buf[ 32 ];
>
> swprintf( buf, sizeof( buf ) / sizeof( *buf ), L"ab\u00E1c" );
>
> for ( wchar_t * p = buf; *p; ++p )
> printf( "%u\n", ( unsigned ) *p );
>
> return 0;
> }
> ===================================
>
> With musl 1.2.2 produces the following output:
> 97
> 98
>
> The expected output is:
> 97
> 98
> 225
> 99
>
> With musl, only the first two characters ('a' and 'b') are processed, and
> the string ends on a Unicode character (U+00E1, which is an 'a' with acute
> accent), instead of outputting it and the last character, 'c'.
>
> Please CC me when replying. Thanks!
You need to call setlocale(LC_CTYPE, ""). Otherwise the character
\u00e1 is unrepresentable, because POSIX requires the C locale be
single-byte and you're in the C locale until you call setlocale, and
thus produces an encoding error (EILSEQ).
Rich