mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] setlocale() behaviour
@ 2023-07-19  8:30 Alastair Houghton
  2023-07-19 16:51 ` Markus Wichmann
  0 siblings, 1 reply; 4+ messages in thread
From: Alastair Houghton @ 2023-07-19  8:30 UTC (permalink / raw)
  To: musl

Hi there,

Presently, musl’s setlocale() function essentially always succeeds, even if it doesn’t actually have data for the requested locale. I note the previous message to the list in 2017

<https://www.openwall.com/lists/musl/2017/11/08/1>

discussing potential solutions, but unless I’m much mistaken nothing has really changed in the code?

This has come up because the test rig for libc++ tries to detect which locale data is installed so that it can run its own locale support tests (it’s trying to test the C++ locale support that it has constructed atop the C library’s underlying locale support).  If, for instance, you don’t have data for fr_FR installed, libc++ won’t run test cases that rely on that data.  On other C library implementations, that’s easy because setlocale() will return NULL in such a case, but musl doesn’t do that - instead, it sets up a copy of C.UTF-8, names it fr_FR and sets that as the current locale :-(

Kind regards,

Alastair.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [musl] setlocale() behaviour
  2023-07-19  8:30 [musl] setlocale() behaviour Alastair Houghton
@ 2023-07-19 16:51 ` Markus Wichmann
  2023-07-19 17:10   ` Alastair Houghton
  0 siblings, 1 reply; 4+ messages in thread
From: Markus Wichmann @ 2023-07-19 16:51 UTC (permalink / raw)
  To: musl

Am Wed, Jul 19, 2023 at 09:30:08AM +0100 schrieb Alastair Houghton:
> Hi there,
>
> Presently, musl’s setlocale() function essentially always succeeds, even if it doesn’t actually have data for the requested locale. I note the previous message to the list in 2017
>
> <https://www.openwall.com/lists/musl/2017/11/08/1>
>
> discussing potential solutions, but unless I’m much mistaken nothing has really changed in the code?
>
> This has come up because the test rig for libc++ tries to detect which locale data is installed so that it can run its own locale support tests (it’s trying to test the C++ locale support that it has constructed atop the C library’s underlying locale support).  If, for instance, you don’t have data for fr_FR installed, libc++ won’t run test cases that rely on that data.  On other C library implementations, that’s easy because setlocale() will return NULL in such a case, but musl doesn’t do that - instead, it sets up a copy of C.UTF-8, names it fr_FR and sets that as the current locale :-(
>
> Kind regards,
>
> Alastair.
>

Well, you must not depend on implementation internals. According to
POSIX, the form of the locale environment variables and the strings to
be plugged into setlocale() (except for "POSIX", "C", "", and the null
pointer) are implementation-defined, and musl defines that absolutely
any name is supported and is a copy of C.UTF-8 (again, except for
"POSIX" and "C"). The name handed in must be returned back out again for
gettext to work.

POSIX talks about the form of those variables in the XSI extension, but
only such that it allows variables to have that form (that being
lang_COUNTRY.codeset@modifier), and the precise meaning is again left to
the implementation.

What do the test cases for libc++ depend upon that is not fulfilled
without the localization data?

Ciao,
Markus

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [musl] setlocale() behaviour
  2023-07-19 16:51 ` Markus Wichmann
@ 2023-07-19 17:10   ` Alastair Houghton
  2023-07-21 10:48     ` Alastair Houghton
  0 siblings, 1 reply; 4+ messages in thread
From: Alastair Houghton @ 2023-07-19 17:10 UTC (permalink / raw)
  To: musl

On 19 Jul 2023, at 17:51, Markus Wichmann <nullplan@gmx.net> wrote:
> 
> Well, you must not depend on implementation internals. According to
> POSIX, the form of the locale environment variables and the strings to
> be plugged into setlocale() (except for "POSIX", "C", "", and the null
> pointer) are implementation-defined, and musl defines that absolutely
> any name is supported and is a copy of C.UTF-8 (again, except for
> "POSIX" and "C"). The name handed in must be returned back out again for
> gettext to work.

As was pointed out in the previous thread, POSIX does say, in the Rationale section:

> If the string does not correspond to a valid locale, setlocale() shall return a null pointer and the international environment is not changed.

And while I think musl probably does get out of it on the technicality that the “contents of this string are implementation-defined” so it can claim that any old string “correspond[s] to a valid locale” that just happens to be C.UTF-8 unless there’s a data file installed, I think the current behaviour is very much not in the spirit of what was intended here.

But I don’t actually care about the standards lawyering here.  As Rich Felker noted previously about this behaviour:

> Unfortunately this turns out to have been something of a tradeoff,
> since there's no way for applications (and, as it turns out,
> especially tests/test suites) to query whether a particular locale is
> "really" available. I've been asked to change the behavior to fail on
> unknown locale names, but of course that's not a working option in
> light of the above.


He then went on in a subsequent message to the list to suggest

> 1. setlocale(cat, explicit_locale_name) - succeeds if the locale
>    actually has a definition file, fails and returns a null pointer
>    otherwise.
> 
> 2. setlocale(cat, "") - always succeeds, honoring the environment
>    variable for the category if a locale definition file by that name
>    exists, but otherwise (the unspecified behavior) treating it as if
>    it were C.UTF-8.

Which would work just fine for libc++.  What it’s trying to do is to check whether e.g. fr_FR is supported, then it can enable additional tests that rely on the French localisation being present.  I appreciate that, per the POSIX standard, you don’t *technically* know what fr_FR means, but in practice that isn’t really true --- an implementation that had a locale installed called fr_FR that *wasn’t* French would be pretty silly.  Unfortunately, because of musl’s current behaviour, it *looks like* such an implementation, even though it actually isn’t (and it totally could have a genuine fr_FR locale if you had the right data in the right place).

I can, of course, check whether setlocale(LC_ALL, “something ridiculous”) succeeds and returns “something ridiculous”, then disable all locales except for POSIX, C and C.UTF-8.  That will work around the current musl behaviour without causing trouble with other C libraries, but libc++’s maintainer isn’t terribly keen on it and would rather we explore the possibility of musl changing its implementation.

TL/DR: What I’m really enquiring about here is the fact that there was discussion about changing it to work in a more useful manner, but nothing changed (and I don’t see anything in that email thread to explain that it was decided to not make the proposed change; but maybe I missed it?)

Kind regards,

Alastair.




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [musl] setlocale() behaviour
  2023-07-19 17:10   ` Alastair Houghton
@ 2023-07-21 10:48     ` Alastair Houghton
  0 siblings, 0 replies; 4+ messages in thread
From: Alastair Houghton @ 2023-07-21 10:48 UTC (permalink / raw)
  To: musl; +Cc: Rich Felker

Rich, any chance of some input from you on this, since it was you that wrote about it previously?

Kind regards,

Alastair.

On 19 Jul 2023, at 18:10, Alastair Houghton <ahoughton@apple.com> wrote:
> 
> As Rich Felker noted previously about this behaviour:
> 
>> Unfortunately this turns out to have been something of a tradeoff,
>> since there's no way for applications (and, as it turns out,
>> especially tests/test suites) to query whether a particular locale is
>> "really" available. I've been asked to change the behavior to fail on
>> unknown locale names, but of course that's not a working option in
>> light of the above.
> 
> 
> He then went on in a subsequent message to the list to suggest
> 
>> 1. setlocale(cat, explicit_locale_name) - succeeds if the locale
>>   actually has a definition file, fails and returns a null pointer
>>   otherwise.
>> 
>> 2. setlocale(cat, "") - always succeeds, honoring the environment
>>   variable for the category if a locale definition file by that name
>>   exists, but otherwise (the unspecified behavior) treating it as if
>>   it were C.UTF-8.
> 
> Which would work just fine for libc++.  What it’s trying to do is to check whether e.g. fr_FR is supported, then it can enable additional tests that rely on the French localisation being present.  I appreciate that, per the POSIX standard, you don’t *technically* know what fr_FR means, but in practice that isn’t really true --- an implementation that had a locale installed called fr_FR that *wasn’t* French would be pretty silly.  Unfortunately, because of musl’s current behaviour, it *looks like* such an implementation, even though it actually isn’t (and it totally could have a genuine fr_FR locale if you had the right data in the right place).
> 
> I can, of course, check whether setlocale(LC_ALL, “something ridiculous”) succeeds and returns “something ridiculous”, then disable all locales except for POSIX, C and C.UTF-8.  That will work around the current musl behaviour without causing trouble with other C libraries, but libc++’s maintainer isn’t terribly keen on it and would rather we explore the possibility of musl changing its implementation.
> 
> TL/DR: What I’m really enquiring about here is the fact that there was discussion about changing it to work in a more useful manner, but nothing changed (and I don’t see anything in that email thread to explain that it was decided to not make the proposed change; but maybe I missed it?)
> 
> Kind regards,
> 
> Alastair.
> 
> 
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-07-21 10:48 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-19  8:30 [musl] setlocale() behaviour Alastair Houghton
2023-07-19 16:51 ` Markus Wichmann
2023-07-19 17:10   ` Alastair Houghton
2023-07-21 10:48     ` Alastair Houghton

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).