From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 7695 invoked from network); 9 Dec 2020 16:41:17 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 9 Dec 2020 16:41:17 -0000 Received: (qmail 7765 invoked by uid 550); 9 Dec 2020 16:41:15 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 7741 invoked from network); 9 Dec 2020 16:41:15 -0000 Date: Wed, 9 Dec 2020 11:41:03 -0500 From: Rich Felker To: =?utf-8?B?w4lyaWNv?= Nogueira Cc: musl@lists.openwall.com, Dong Brett Message-ID: <20201209164102.GM534@brightrain.aerifal.cx> References: <20201130145126.GO534@brightrain.aerifal.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] Question on C++ locale On Wed, Dec 09, 2020 at 11:35:57AM -0300, Érico Nogueira wrote: > On Mon Nov 30, 2020 at 11:51 AM -03, Rich Felker wrote: > > On Mon, Nov 30, 2020 at 06:41:33PM +0800, Dong Brett wrote: > > > Hi all, > > > > > > I am troubleshooting a locale related issue of our C++ software when building with musl. With some efforts I narrowed our problem down to the inability of setting a UTF-8 locale in C++ standard library. > > > > > > The following C code prints UTF-8 characters correctly: > > > #include > > > #include > > > #include > > > > > > int main() > > > { > > > setlocale(LC_ALL, ""); > > > initscr(); > > > printw("LC_ALL: %s\n", setlocale(LC_ALL, NULL)); > > > printw("CODESET: %s\n", nl_langinfo(CODESET)); > > > printw("Hello, world!\n"); > > > printw("你好,世界!\n"); > > > refresh(); > > > getch(); > > > endwin(); > > > return 0; > > > } > > > > > > Giving the output of > > > LC_ALL: C.UTF-8;C;C;C;C;C > > > CODESET: UTF-8 > > > Hello, world! > > > 你好,世界! > > > > > > However, the following C++ code does not work (our software uses std::locale in C++ standard library for locale related stuff): > > > #include > > > #include > > > #include > > > using namespace std; > > > int main() > > > { > > > std::locale::global(locale("")); > > > initscr(); > > > printw("LC_ALL: %s\n", setlocale(LC_ALL, NULL)); > > > printw("C++ locale: %s\n", locale().name().c_str()); > > > printw("CODESET: %s\n", nl_langinfo(CODESET)); > > > printw("Hello, world!\n"); > > > printw("你好,世界!\n"); > > > refresh(); > > > getch(); > > > endwin(); > > > return 0; > > > } > > > > > > Giving a corrupted output: > > > LC_ALL: C > > > C++ locale: C > > > CODESET: ASCII > > > Hello, world! > > > 你好?~L?~V?~U~L! > > > > > > Seems only ASCII C locale is available in C++. If I run the above C++ code with LANG="C.UTF-8", an exception is thrown and the program is aborted: > > > terminate called after throwing an instance of 'std::runtime_error' > > > what(): locale::facet::_S_create_c_locale name not valid > > > Aborted > > > > > > I also tried LANG="UTF-8”, LANG="en_US.UTF-8" but none of those > > > works. Only LANG="C" could make the program run but then only ASCII > > > characters are supported. > > > > > > My question is that is there a way to make locale in C++ standard > > > library work with musl? Or had I done anything wrong with it? > > > > Thanks for raising this. Indeed you've uncovered a (pile of) bug(s) in > > libstdc++, but they don't seem to be relevant to your usage with > > ncurses. Being a C library, not a C++ one, curses behavior depends on > > the locale as set through the C/POSIX mechanisms, setlocale and/or > > newlocale/uselocale. You shouldn't be using C++'s locale framework for > > this. Any program using ncurses should start with either > > setlocale(LC_ALL,"") or setlocale(LC_CTYPE,"") (depending on whether > > you want the behavior of the other categories). > > > > I'll try to figure out what we need to do to get this fixed in > > libstdc++. Since it's never been reported before, I suspect just very > > few programs are using the C++ locale API so hopefully at least the > > problem is low-impact. > > As another data point for an application that uses C++ locales, there is > snapper. From [1]: > > try > { > locale::global(locale("")); > } > catch (const runtime_error& e) > { > cerr << _("Failed to set locale. Fix your system.") << endl; > } > > Fortunately, they have a try-catch around the call, which will also > catch other errors like bad LANG values, if I understand correctly. I > wonder if other applications that make use of the API usually have this > block, which can mask the error for the user. On musl there are no bad LANG values. setlocale to "" (and likewise newlocale for LC_ALL/"" or LC_CTYPE/"") can never fail. But this could matter on other systems. Rich