From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.4 Received: from second.openwall.net (second.openwall.net [193.110.157.125]) by inbox.vuxu.org (Postfix) with SMTP id E34F126B7F for ; Sat, 11 May 2024 23:33:45 +0200 (CEST) Received: (qmail 29778 invoked by uid 550); 11 May 2024 21:33:41 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 29740 invoked from network); 11 May 2024 21:33:40 -0000 Date: Sat, 11 May 2024 17:33:55 -0400 From: Rich Felker To: Petr Pisar Cc: musl@lists.openwall.com Message-ID: <20240511213354.GT10433@brightrain.aerifal.cx> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] nl_langinfo(CODESET) does not match locale On Sat, May 11, 2024 at 07:58:09PM +0200, Petr Pisar wrote: > When debugging test failures in libisds on Gentoo with musl > , I found that > nl_langinfo(CODESET) does not match current locale. > > A reproducer: > > #include > #include > #include > > int main(void) { > char *old_locale = setlocale(LC_ALL, "cs_CZ.ISO8859-2"); > if (old_locale == NULL) { > perror("setlocale() set failed"); > return 1; > } > old_locale = setlocale(LC_ALL, NULL); > if (old_locale == NULL) { > perror("setlocale() query failed"); > return 1; > } > printf("Current LC_ALL=%s\n", old_locale); > printf("CODESET=%s\n", nl_langinfo(CODESET)); > return 0; > } > > # gcc test.c && ./a.out > Current LC_ALL=cs_CZ.ISO8859-2 > CODESET=UTF-8 > > While on glibc: > > $ gcc test.c && ./a.out > Current LC_ALL=cs_CZ.ISO8859-2 > CODESET=ISO-8859-2 Yes it does match. The encoding on musl is *always* UTF-8. The only weirdness here is that, presently, all locale names exist, and in the absence of a translation file, are just aliases for C.UTF-8. > I can see that for cs_CZ.UTF8 locale, it nl_langinfo() correctly reports UTF-8, > as well for C reports ASCII. However, for any other character set it always > returns UTF-8. > > I found a notice that > musl does not implements non-UTF-8 locales. If that is true, then selocale() for > "cs_CZ.ISO8859-2" should fail, instead of accepting the locale. It's an open issue that users/applications would like to be able to know "no such locale is installed" when attempting to set an explicit locale by name, and it will probably be resolved by making setlocale(...,"explicit_name.bad_encoding") fail (and likewise, any explicit name not matching a file fail) but setlocale(...,"") where the environment contains a bad locale name succeed and produce a default UTF-8 locale. This part of the big pending locale-overhaul project. Rich