Thanks, Rich, that was very informative! On Mon, May 24, 2021 at 9:09 PM Rich Felker wrote: > On Mon, May 24, 2021 at 08:46:01PM -0400, Konstantin Isakov wrote: > > Is swprintf() a form of fwprintf() though? > > As specified, it is. They're all covered together under > https://pubs.opengroup.org/onlinepubs/9699919799/functions/swprintf.html > > and "all forms" is in contrast to just "fwprintf() and wprintf()" (the > other 2/3) mentioned above which can fail for any of the fputwc > reasons (which would already cover EILSEQ anyway). > > > fwprintf() and wprintf() output > > to single-byte streams, so the conversion is necessary there, while > > swprintf() outputs to a wide buffer. Performing double conversion (to > > single chars and back) seems like unnecessary work in that case (though, > of > > course, it's less work to implement swprintf() like that). > > It's what gives consistent behavior, and it's what you get > automatically if you don't want either a completely independent > implementation of swprintf (that behaves surprisingly unlike fwprintf) > or the wide-mode buffering glibc does. > > (Note: the original reason they did separate wide-mode buffering was > that gconv is very slow for individual character conversions and was > designed only for bulk conversion calls, which would happen at flush > time. Making individual conversions fast was one of the original > design goals of musl before there even was a whole libc around it.) > > Rich > > > > On Mon, May 24, 2021 at 8:30 PM Rich Felker wrote: > > > > > On Mon, May 24, 2021 at 08:04:04PM -0400, Konstantin Isakov wrote: > > > > Thanks for replying! > > > > > > > > That fixed it. > > > > > > > > I'm surprised, however, that this is required given that in this case > > > > swprintf() operates on wchars exclusively -- taking wchar arguments > and > > > > producing wchar output. I'd expect that in the worst case scenario it > > > would > > > > have to convert from single chars to wide chars, but never the other > way > > > > around, so the representation requirement seems strange. That > setlocale() > > > > step also doesn't seem to be needed with glibc. > > > > > > Yes, it's not clear to me whether the glibc behavior is conforming or > > > not. As specified, > > > > > > In addition, all forms of fwprintf() shall fail if: > > > > > > [EILSEQ] > > > A wide-character code that does not correspond > > > to a valid character has been detected. > > > > > > ... > > > > > > The "has been detected" wording may allow for the possibility of > > > ignoring the error, as glibc does, if the function is implemented such > > > that no conversion takes place (or, for fwprintf, such that conversion > > > is deferred until flush time) and thus no "detection" takes place. But > > > it's wrong to assume the operation will succeed. > > > > > > In musl, there is no separate wide stdio buffering mode; conversion to > > > a multibyte sequence happens at (logical) fputwc time, and in the case > > > of swprintf, conversion (in this case, conversion back) to a wchar_t[] > > > string occurs at flush time. > > > > > > Rich > > > > > > > > > > > > > > > > On Mon, May 24, 2021 at 5:50 PM Rich Felker wrote: > > > > > > > > > On Mon, May 24, 2021 at 12:39:35AM -0400, Konstantin Isakov wrote: > > > > > > Hi, > > > > > > > > > > > > The following program: > > > > > > > > > > > > =================================== > > > > > > #include > > > > > > #include > > > > > > > > > > > > int main() > > > > > > { > > > > > > wchar_t buf[ 32 ]; > > > > > > > > > > > > swprintf( buf, sizeof( buf ) / sizeof( *buf ), L"ab\u00E1c" ); > > > > > > > > > > > > for ( wchar_t * p = buf; *p; ++p ) > > > > > > printf( "%u\n", ( unsigned ) *p ); > > > > > > > > > > > > return 0; > > > > > > } > > > > > > =================================== > > > > > > > > > > > > With musl 1.2.2 produces the following output: > > > > > > 97 > > > > > > 98 > > > > > > > > > > > > The expected output is: > > > > > > 97 > > > > > > 98 > > > > > > 225 > > > > > > 99 > > > > > > > > > > > > With musl, only the first two characters ('a' and 'b') are > > > processed, and > > > > > > the string ends on a Unicode character (U+00E1, which is an 'a' > with > > > > > acute > > > > > > accent), instead of outputting it and the last character, 'c'. > > > > > > > > > > > > Please CC me when replying. Thanks! > > > > > > > > > > You need to call setlocale(LC_CTYPE, ""). Otherwise the character > > > > > \u00e1 is unrepresentable, because POSIX requires the C locale be > > > > > single-byte and you're in the C locale until you call setlocale, > and > > > > > thus produces an encoding error (EILSEQ). > > > > > > > > > > Rich > > > > > > > > >