From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.4 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 26745 invoked from network); 9 Dec 2020 14:45:11 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 9 Dec 2020 14:45:11 -0000 Received: (qmail 15660 invoked by uid 550); 9 Dec 2020 14:45:08 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 15641 invoked from network); 9 Dec 2020 14:45:07 -0000 X-Virus-Scanned: Debian amavisd-new at disroot.org Mime-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=disroot.org; s=mail; t=1607525094; bh=XTTgc+QWYq4JhZSpldX6hMUZYwV6rib+A8CK+pZNX2o=; h=Subject:From:To:Date:In-Reply-To; b=IVPragYpygu4cPkfSFMFCH8H2bEDvK3lWahNYsOnnpgFUXEtVLswWSIvsSnCEcaBt k73QF0jC+ehoN8WTJq/GNwLZVzsljgE5A3n64OMuj8We1q2eBFQBX/3tGy9H6Vi4Xp Xf4jN7JYNkseeJ11tV3u/G3gmiw6o+iI/4lcMT1eYlYvV/qWnIFjwMm2PUJsIFTt3W hBMwtgl6F1jIUoBbUdAc2dlmforUlvXKDG437nSCRNYLrQUMT+aU2AkU2MRZPG/kTD nzaPZ8XaLtVJx6AE+ZpCmNeVY3AhbFqBVWpCATtAEwY0+FZuONlNT/KcSo5Ru6GR/M f/2+zUMNlnQOA== Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 From: =?utf-8?q?=C3=89rico_Nogueira?= To: , "Dong Brett" Date: Wed, 09 Dec 2020 11:35:57 -0300 Message-Id: In-Reply-To: <20201130145126.GO534@brightrain.aerifal.cx> Subject: Re: [musl] Question on C++ locale On Mon Nov 30, 2020 at 11:51 AM -03, Rich Felker wrote: > On Mon, Nov 30, 2020 at 06:41:33PM +0800, Dong Brett wrote: > > Hi all, > >=20 > > I am troubleshooting a locale related issue of our C++ software when bu= ilding with musl. With some efforts I narrowed our problem down to the inab= ility of setting a UTF-8 locale in C++ standard library. > >=20 > > The following C code prints UTF-8 characters correctly: > > #include > > #include > > #include > >=20 > > int main() > > { > > setlocale(LC_ALL, ""); > > initscr(); > > printw("LC_ALL: %s\n", setlocale(LC_ALL, NULL)); > > printw("CODESET: %s\n", nl_langinfo(CODESET)); > > printw("Hello, world!\n"); > > printw("=E4=BD=A0=E5=A5=BD=EF=BC=8C=E4=B8=96=E7=95=8C!\n"); > > refresh(); > > getch(); > > endwin(); > > return 0; > > } > >=20 > > Giving the output of > > LC_ALL: C.UTF-8;C;C;C;C;C > > CODESET: UTF-8 > > Hello, world! > > =E4=BD=A0=E5=A5=BD=EF=BC=8C=E4=B8=96=E7=95=8C! > >=20 > > However, the following C++ code does not work (our software uses std::l= ocale in C++ standard library for locale related stuff): > > #include > > #include > > #include > > using namespace std; > > int main() > > { > > std::locale::global(locale("")); > > initscr(); > > printw("LC_ALL: %s\n", setlocale(LC_ALL, NULL)); > > printw("C++ locale: %s\n", locale().name().c_str()); > > printw("CODESET: %s\n", nl_langinfo(CODESET)); > > printw("Hello, world!\n"); > > printw("=E4=BD=A0=E5=A5=BD=EF=BC=8C=E4=B8=96=E7=95=8C!\n"); > > refresh(); > > getch(); > > endwin(); > > return 0; > > } > >=20 > > Giving a corrupted output: > > LC_ALL: C > > C++ locale: C > > CODESET: ASCII > > Hello, world! > > =E4=BD=A0=E5=A5=BD?~L?~V?~U~L! > >=20 > > Seems only ASCII C locale is available in C++. If I run the above C++ c= ode with LANG=3D"C.UTF-8", an exception is thrown and the program is aborte= d: > > terminate called after throwing an instance of 'std::runtime_error' > > what(): locale::facet::_S_create_c_locale name not valid > > Aborted > >=20 > > I also tried LANG=3D"UTF-8=E2=80=9D, LANG=3D"en_US.UTF-8" but none of t= hose > > works. Only LANG=3D"C" could make the program run but then only ASCII > > characters are supported. > >=20 > > My question is that is there a way to make locale in C++ standard > > library work with musl? Or had I done anything wrong with it? > > Thanks for raising this. Indeed you've uncovered a (pile of) bug(s) in > libstdc++, but they don't seem to be relevant to your usage with > ncurses. Being a C library, not a C++ one, curses behavior depends on > the locale as set through the C/POSIX mechanisms, setlocale and/or > newlocale/uselocale. You shouldn't be using C++'s locale framework for > this. Any program using ncurses should start with either > setlocale(LC_ALL,"") or setlocale(LC_CTYPE,"") (depending on whether > you want the behavior of the other categories). > > I'll try to figure out what we need to do to get this fixed in > libstdc++. Since it's never been reported before, I suspect just very > few programs are using the C++ locale API so hopefully at least the > problem is low-impact. As another data point for an application that uses C++ locales, there is snapper. From [1]: try { locale::global(locale("")); } catch (const runtime_error& e) { cerr << _("Failed to set locale. Fix your system.") << endl; } Fortunately, they have a try-catch around the call, which will also catch other errors like bad LANG values, if I understand correctly. I wonder if other applications that make use of the API usually have this block, which can mask the error for the user. That said, I don't think the project can be built on musl without any external patches yet (some pieces relied heavily on glibc extensions), so having locale issues isn't the biggest problem with snapper on musl. - [1] https://github.com/openSUSE/snapper/blob/9e795ed4f0d87e6afcd5065f26c1= 350942f8ab38/client/snapper.cc#L126 > > Rich =C3=89rico