From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 21618 invoked from network); 24 May 2021 21:50:37 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 24 May 2021 21:50:37 -0000 Received: (qmail 26499 invoked by uid 550); 24 May 2021 21:50:35 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 26478 invoked from network); 24 May 2021 21:50:34 -0000 Date: Mon, 24 May 2021 17:50:22 -0400 From: Rich Felker To: Konstantin Isakov Cc: musl@lists.openwall.com Message-ID: <20210524215021.GC2546@brightrain.aerifal.cx> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] [BUG] swprintf() doesn't handle Unicode characters correctly On Mon, May 24, 2021 at 12:39:35AM -0400, Konstantin Isakov wrote: > Hi, > > The following program: > > =================================== > #include > #include > > int main() > { > wchar_t buf[ 32 ]; > > swprintf( buf, sizeof( buf ) / sizeof( *buf ), L"ab\u00E1c" ); > > for ( wchar_t * p = buf; *p; ++p ) > printf( "%u\n", ( unsigned ) *p ); > > return 0; > } > =================================== > > With musl 1.2.2 produces the following output: > 97 > 98 > > The expected output is: > 97 > 98 > 225 > 99 > > With musl, only the first two characters ('a' and 'b') are processed, and > the string ends on a Unicode character (U+00E1, which is an 'a' with acute > accent), instead of outputting it and the last character, 'c'. > > Please CC me when replying. Thanks! You need to call setlocale(LC_CTYPE, ""). Otherwise the character \u00e1 is unrepresentable, because POSIX requires the C locale be single-byte and you're in the C locale until you call setlocale, and thus produces an encoding error (EILSEQ). Rich