mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] nl_langinfo(CODESET) does not match locale
@ 2024-05-11 17:58 Petr Pisar
  2024-05-11 21:33 ` Rich Felker
  0 siblings, 1 reply; 2+ messages in thread
From: Petr Pisar @ 2024-05-11 17:58 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1289 bytes --]

When debugging test failures in libisds on Gentoo with musl
<https://bugs.gentoo.org/show_bug.cgi?id=928107>, I found that
nl_langinfo(CODESET) does not match current locale.

A reproducer:

#include <locale.h>
#include <stdio.h>
#include <langinfo.h>

int main(void) {
    char *old_locale = setlocale(LC_ALL, "cs_CZ.ISO8859-2");
    if (old_locale == NULL) {
        perror("setlocale() set failed");
        return 1;
    }
    old_locale = setlocale(LC_ALL, NULL);
    if (old_locale == NULL) {
        perror("setlocale() query failed");
        return 1;
    }
    printf("Current LC_ALL=%s\n", old_locale);
    printf("CODESET=%s\n", nl_langinfo(CODESET));
    return 0;
}

# gcc test.c && ./a.out
Current LC_ALL=cs_CZ.ISO8859-2
CODESET=UTF-8

While on glibc:

$ gcc test.c && ./a.out
Current LC_ALL=cs_CZ.ISO8859-2
CODESET=ISO-8859-2

I can see that for cs_CZ.UTF8 locale, it nl_langinfo() correctly reports UTF-8,
as well for C reports ASCII. However, for any other character set it always
returns UTF-8.

I found a notice <https://wiki.gentoo.org/wiki/Musl_usage_guide#Locales> that
musl does not implements non-UTF-8 locales. If that is true, then selocale() for
"cs_CZ.ISO8859-2" should fail, instead of accepting the locale.

I observe this behavior with musl-1.2.5.

-- Petr

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [musl] nl_langinfo(CODESET) does not match locale
  2024-05-11 17:58 [musl] nl_langinfo(CODESET) does not match locale Petr Pisar
@ 2024-05-11 21:33 ` Rich Felker
  0 siblings, 0 replies; 2+ messages in thread
From: Rich Felker @ 2024-05-11 21:33 UTC (permalink / raw)
  To: Petr Pisar; +Cc: musl

On Sat, May 11, 2024 at 07:58:09PM +0200, Petr Pisar wrote:
> When debugging test failures in libisds on Gentoo with musl
> <https://bugs.gentoo.org/show_bug.cgi?id=928107>, I found that
> nl_langinfo(CODESET) does not match current locale.
> 
> A reproducer:
> 
> #include <locale.h>
> #include <stdio.h>
> #include <langinfo.h>
> 
> int main(void) {
>     char *old_locale = setlocale(LC_ALL, "cs_CZ.ISO8859-2");
>     if (old_locale == NULL) {
>         perror("setlocale() set failed");
>         return 1;
>     }
>     old_locale = setlocale(LC_ALL, NULL);
>     if (old_locale == NULL) {
>         perror("setlocale() query failed");
>         return 1;
>     }
>     printf("Current LC_ALL=%s\n", old_locale);
>     printf("CODESET=%s\n", nl_langinfo(CODESET));
>     return 0;
> }
> 
> # gcc test.c && ./a.out
> Current LC_ALL=cs_CZ.ISO8859-2
> CODESET=UTF-8
> 
> While on glibc:
> 
> $ gcc test.c && ./a.out
> Current LC_ALL=cs_CZ.ISO8859-2
> CODESET=ISO-8859-2

Yes it does match. The encoding on musl is *always* UTF-8. The only
weirdness here is that, presently, all locale names exist, and in the
absence of a translation file, are just aliases for C.UTF-8.

> I can see that for cs_CZ.UTF8 locale, it nl_langinfo() correctly reports UTF-8,
> as well for C reports ASCII. However, for any other character set it always
> returns UTF-8.
> 
> I found a notice <https://wiki.gentoo.org/wiki/Musl_usage_guide#Locales> that
> musl does not implements non-UTF-8 locales. If that is true, then selocale() for
> "cs_CZ.ISO8859-2" should fail, instead of accepting the locale.

It's an open issue that users/applications would like to be able to
know "no such locale is installed" when attempting to set an explicit
locale by name, and it will probably be resolved by making
setlocale(...,"explicit_name.bad_encoding") fail (and likewise, any
explicit name not matching a file fail) but setlocale(...,"") where
the environment contains a bad locale name succeed and produce a
default UTF-8 locale. This part of the big pending locale-overhaul
project.

Rich

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-05-11 21:33 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-11 17:58 [musl] nl_langinfo(CODESET) does not match locale Petr Pisar
2024-05-11 21:33 ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).