mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] setlocale() again
@ 2023-08-10 15:41 Alastair Houghton
  2023-08-10 15:51 ` Rich Felker
  0 siblings, 1 reply; 13+ messages in thread
From: Alastair Houghton @ 2023-08-10 15:41 UTC (permalink / raw)
  To: musl; +Cc: Rich Felker

Hi again,

I spent some time today looking at the setlocale() problem and thought I’d put some notes down in an email.

1. Musl wishes to support UTF-8 “out of the box”.

2. At the same time, it needs to be 8-bit-safe, so the default locale, C, is NOT UTF-8.

3. POSIX, and the C standard, specify that setlocale() should fail if the locale name isn’t a valid locale, but don’t really say what that means precisely.  A program that wants UTF-8 support and that does `setlocale(LC_ALL, “”)` can therefore find itself in the C locale if the one specified in the environment happens to be invalid.

4. This seemed undesirable, so setlocale() presently accepts any locale name as valid; if it doesn’t have a definition file for a locale, it will copy the C.UTF-8 locale, giving it the name passed in and return that.  This avoids the problem in (3), and also means that gettext() will work for any language without installing locale data for Musl.  Unfortunately it also means that there is no way for a program (notably a test suite) to determine the presence of data for a locale, because setlocale() will always succeed, even if we don’t have the data.

5. Back in 2017 (https://www.openwall.com/lists/musl/2017/11/08/2) Rich was proposing to change things so that `setlocale(cat, “”)` always succeeds, but if the environment specifies an unknown locale, treats it as C.UTF-8, while `setlocale(cat, explicit_name)` will fail unless a valid definition file is installed for that locale name.  This would also avoid the problem in (3), although it will mean that gettext() will not work unless a valid locale definition is installed for the C library (BTW, this is exactly the situation Glibc is in here; if Glibc doesn’t have locale data, it will fail setlocale() and then gettext() will find itself in the C locale).  On the other hand, it does mean that programs can detect whether or not a given locale is present.

Why do I care?  Because I’m trying to make libc++ work with Musl and right now it has failing tests because it expects (not entirely unreasonably) that if e.g. `setlocale(LC_ALL, “fr_FR”)` succeeds, then the C library will localise things into French.  While I can test for the unusual behaviour of Musl detailed in (4), the libc++ maintainer understandably doesn’t like it and we would both far rather Musl were fixed to behave similarly to other implementations.

It seems to me that Rich’s proposal (5) was sensible.  Programs that use gettext(), and users relying on it for localization, must already cope with the fact that the C library must have locale data for their chosen locale in order for gettext() to work; that is how things work on Glibc.  It so happened that (4) meant that such programs would work with partial localization on Musl without there being any locale data installed for Musl, but that isn’t really right (e.g. you might get a mix of localized strings from gettext() but with numeric formatting that didn’t match - for French, for instance, numbers would have “.”s instead of “,”s as a decimal separator).

Looking at the 2017 thread, it appears it didn’t go anywhere for whatever reason, so I’d like to understand the status of the proposed change.  Was it nixed for some reason?  Is it likely to happen in the future?  If it’s a matter of resource, if I were to raise a patch for it, would it be accepted, in principle?

Kind regards,

Alastair.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] setlocale() again
  2023-08-10 15:41 [musl] setlocale() again Alastair Houghton
@ 2023-08-10 15:51 ` Rich Felker
  2023-09-05 12:57   ` Alastair Houghton
  0 siblings, 1 reply; 13+ messages in thread
From: Rich Felker @ 2023-08-10 15:51 UTC (permalink / raw)
  To: Alastair Houghton; +Cc: musl

On Thu, Aug 10, 2023 at 04:41:38PM +0100, Alastair Houghton wrote:
> Hi again,
> 
> I spent some time today looking at the setlocale() problem and
> thought I’d put some notes down in an email.
> 
> 1. Musl wishes to support UTF-8 “out of the box”.
> 
> 2. At the same time, it needs to be 8-bit-safe, so the default
> locale, C, is NOT UTF-8.
> 
> 3. POSIX, and the C standard, specify that setlocale() should fail
> if the locale name isn’t a valid locale, but don’t really say what
> that means precisely. A program that wants UTF-8 support and that
> does `setlocale(LC_ALL, “”)` can therefore find itself in the C
> locale if the one specified in the environment happens to be
> invalid.
> 
> 4. This seemed undesirable, so setlocale() presently accepts any
> locale name as valid; if it doesn’t have a definition file for a
> locale, it will copy the C.UTF-8 locale, giving it the name passed
> in and return that. This avoids the problem in (3), and also means
> that gettext() will work for any language without installing locale
> data for Musl. Unfortunately it also means that there is no way for
> a program (notably a test suite) to determine the presence of data
> for a locale, because setlocale() will always succeed, even if we
> don’t have the data.
> 
> 5. Back in 2017 (https://www.openwall.com/lists/musl/2017/11/08/2)
> Rich was proposing to change things so that `setlocale(cat, “”)`
> always succeeds, but if the environment specifies an unknown locale,
> treats it as C.UTF-8, while `setlocale(cat, explicit_name)` will
> fail unless a valid definition file is installed for that locale
> name. This would also avoid the problem in (3), although it will
> mean that gettext() will not work unless a valid locale definition
> is installed for the C library (BTW, this is exactly the situation
> Glibc is in here; if Glibc doesn’t have locale data, it will fail
> setlocale() and then gettext() will find itself in the C locale). On
> the other hand, it does mean that programs can detect whether or not
> a given locale is present.
> 
> Why do I care? Because I’m trying to make libc++ work with Musl and
> right now it has failing tests because it expects (not entirely
> unreasonably) that if e.g. `setlocale(LC_ALL, “fr_FR”)` succeeds,
> then the C library will localise things into French. While I can
> test for the unusual behaviour of Musl detailed in (4), the libc++
> maintainer understandably doesn’t like it and we would both far
> rather Musl were fixed to behave similarly to other implementations.
> 
> It seems to me that Rich’s proposal (5) was sensible. Programs that
> use gettext(), and users relying on it for localization, must
> already cope with the fact that the C library must have locale data
> for their chosen locale in order for gettext() to work; that is how
> things work on Glibc. It so happened that (4) meant that such
> programs would work with partial localization on Musl without there
> being any locale data installed for Musl, but that isn’t really
> right (e.g. you might get a mix of localized strings from gettext()
> but with numeric formatting that didn’t match - for French, for
> instance, numbers would have “.”s instead of “,”s as a decimal
> separator).
> 
> Looking at the 2017 thread, it appears it didn’t go anywhere for
> whatever reason, so I’d like to understand the status of the
> proposed change. Was it nixed for some reason? Is it likely to
> happen in the future? If it’s a matter of resource, if I were to
> raise a patch for it, would it be accepted, in principle?

Thank you for following up on this! The main reason it didn't go
anywhere was lack of feedback/engagement from anyone who cares about
locale behavior. I want whatever steps we take to be informed by what
folks actually need, not just my guesses at that. So in that sense,
your bumping of the issue is helpful in itself!

At this point, it's been quite a while since I looked at the
mechanisms. If you'd like to help move this forward, rather than
starting with a patch, writing a high-level natural language
description of how you'd make the changes (in terms of musl's current
internal representation for locale state) would be the most helpful.
If I'm forgetting and there's already such a good description, just
digging it up and citing it might be fine.

Rich

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] setlocale() again
  2023-08-10 15:51 ` Rich Felker
@ 2023-09-05 12:57   ` Alastair Houghton
  2023-09-18 14:18     ` Alastair Houghton
  0 siblings, 1 reply; 13+ messages in thread
From: Alastair Houghton @ 2023-09-05 12:57 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

[-- Attachment #1: Type: text/plain, Size: 1080 bytes --]

On 10 Aug 2023, at 16:51, Rich Felker <dalias@libc.org> wrote:
> 
> On Thu, Aug 10, 2023 at 04:41:38PM +0100, Alastair Houghton wrote:
>> I spent some time today looking at the setlocale() problem and
>> thought I’d put some notes down in an email.

[snip]

> At this point, it's been quite a while since I looked at the
> mechanisms. If you'd like to help move this forward, rather than
> starting with a patch, writing a high-level natural language
> description of how you'd make the changes (in terms of musl's current
> internal representation for locale state) would be the most helpful.
> If I'm forgetting and there's already such a good description, just
> digging it up and citing it might be fine.

I wrote something up here:

https://gist.github.com/al45tair/15c3ade52b09d0cad67074176ad43e4a

Let me know what you think; I can update the document there until we’re happy that we’ve got the right solution, then I should be able to create a patch and get the relevant permission from my employer to submit it.

Kind regards,

Alastair.


[-- Attachment #2: Type: text/html, Size: 7636 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] setlocale() again
  2023-09-05 12:57   ` Alastair Houghton
@ 2023-09-18 14:18     ` Alastair Houghton
  2023-10-27 20:15       ` Pablo Correa Gomez
  2023-11-28 17:32       ` Alastair Houghton
  0 siblings, 2 replies; 13+ messages in thread
From: Alastair Houghton @ 2023-09-18 14:18 UTC (permalink / raw)
  To: musl, Rich Felker

[-- Attachment #1: Type: text/plain, Size: 1378 bytes --]

Hi all (Rich especially though :-))

Has anyone had time to take a look at this? I’d like to make some progress on this front if possible.

Kind regards,

Alastair.

> On 5 Sep 2023, at 13:57, Alastair Houghton <ahoughton@apple.com> wrote:
> 
> On 10 Aug 2023, at 16:51, Rich Felker <dalias@libc.org> wrote:
>> 
>> On Thu, Aug 10, 2023 at 04:41:38PM +0100, Alastair Houghton wrote:
>>> I spent some time today looking at the setlocale() problem and
>>> thought I’d put some notes down in an email.
> 
> [snip]
> 
>> At this point, it's been quite a while since I looked at the
>> mechanisms. If you'd like to help move this forward, rather than
>> starting with a patch, writing a high-level natural language
>> description of how you'd make the changes (in terms of musl's current
>> internal representation for locale state) would be the most helpful.
>> If I'm forgetting and there's already such a good description, just
>> digging it up and citing it might be fine.
> 
> I wrote something up here:
> 
> https://gist.github.com/al45tair/15c3ade52b09d0cad67074176ad43e4a
> 
> Let me know what you think; I can update the document there until we’re happy that we’ve got the right solution, then I should be able to create a patch and get the relevant permission from my employer to submit it.
> 
> Kind regards,
> 
> Alastair.
> 


[-- Attachment #2: Type: text/html, Size: 8276 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] setlocale() again
  2023-09-18 14:18     ` Alastair Houghton
@ 2023-10-27 20:15       ` Pablo Correa Gomez
  2023-11-28 16:27         ` Alastair Houghton
  2023-11-28 17:32       ` Alastair Houghton
  1 sibling, 1 reply; 13+ messages in thread
From: Pablo Correa Gomez @ 2023-10-27 20:15 UTC (permalink / raw)
  To: Alastair Houghton; +Cc: musl, Rich Felker

Hi Alastair,

I'm Pablo, one of postmarketOS maintainers, and by chance bumped into
this thread. I am super happy that you sent these messages, since I am
in the process of adding support for musl into Glib test suite, and
bumped into exactly the same problem as you did.

From the developer perspective, I really like your proposal. However,
from the user and distro maintainer perspective, I am a bit more
hesitant. As daily user of an alpine system localized in Spanish, I
acknowledge and identify the problem that you present of "1,024" vs
"1.024". However, I would argue that such thing is a small problem,
compared to completely losing localization in Spanish. In practice, I
would say both these examples are only theoric, since both Spanish and
French are supported musl-locales. 

So my real concern, is what could happen to all those users of
languages not supported by musl locales and that currently (to some
extent) have their systems localized? We did a call some months ago for
users with a use in localization to communicate with us to help us
forward and validate possible changes in musl. Unfortunately we only
really had one to two users show up, so possibly the breakage would not
be that big, considering a big chunk of European languages are
currently supported by musl-locales. Do you have any thoughts about
this situation? If this change ever happened, we could of course notify
users both in alpine and postmarketOS in our release notes. But maybe
it would also be helpful if we could have some guidelines or process
for users interested in localization to raise the lack of support for
their language, and how they can fix it?

It's very encouraging to find other people that care about
localization. Thanks a lot, and kind regards,
Pablo.


On lun, 2023-09-18 at 15:18 +0100, Alastair Houghton wrote:
> Hi all (Rich especially though :-))
> 
> Has anyone had time to take a look at this? I’d like to make some
> progress on this front if possible.
> 
> Kind regards,
> 
> Alastair.
> 
> > On 5 Sep 2023, at 13:57, Alastair Houghton <ahoughton@apple.com>
> > wrote:
> > 
> > On 10 Aug 2023, at 16:51, Rich Felker <dalias@libc.org> wrote:
> > > 
> > > On Thu, Aug 10, 2023 at 04:41:38PM +0100, Alastair Houghton
> > > wrote:
> > > > I spent some time today looking at the setlocale() problem and
> > > > thought I’d put some notes down in an email.
> > 
> > 
> > [snip]
> > 
> > > At this point, it's been quite a while since I looked at the
> > > mechanisms. If you'd like to help move this forward, rather than
> > > starting with a patch, writing a high-level natural language
> > > description of how you'd make the changes (in terms of musl's
> > > current
> > > internal representation for locale state) would be the most
> > > helpful.
> > > If I'm forgetting and there's already such a good description,
> > > just
> > > digging it up and citing it might be fine.
> > 
> > 
> > I wrote something up here:
> > 
> > https://gist.github.com/al45tair/15c3ade52b09d0cad67074176ad43e4a
> > 
> > Let me know what you think; I can update the document there until
> > we’re happy that we’ve got the right solution, then I should be
> > able to create a patch and get the relevant permission from my
> > employer to submit it.
> > 
> > Kind regards,
> > 
> > Alastair.
> > 
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] setlocale() again
  2023-10-27 20:15       ` Pablo Correa Gomez
@ 2023-11-28 16:27         ` Alastair Houghton
  2023-11-28 23:15           ` Pablo Correa Gomez
  0 siblings, 1 reply; 13+ messages in thread
From: Alastair Houghton @ 2023-11-28 16:27 UTC (permalink / raw)
  To: Pablo Correa Gomez; +Cc: musl, Rich Felker

On 27 Oct 2023, at 21:15, Pablo Correa Gomez <pabloyoyoista@postmarketos.org> wrote:
> 
> 
> 
> I'm Pablo, one of postmarketOS maintainers, and by chance bumped into
> this thread. I am super happy that you sent these messages, since I am
> in the process of adding support for musl into Glib test suite, and
> bumped into exactly the same problem as you did.

Hi Pablo. Sorry it’s taken me a while to get back to you. I’ve been somewhat busy with other things.

> From the developer perspective, I really like your proposal. However,
> from the user and distro maintainer perspective, I am a bit more
> hesitant. As daily user of an alpine system localized in Spanish, I
> acknowledge and identify the problem that you present of "1,024" vs
> "1.024". However, I would argue that such thing is a small problem,
> compared to completely losing localization in Spanish. In practice, I
> would say both these examples are only theoric, since both Spanish and
> French are supported musl-locales. 

Indeed. I’d add to that though that with the present behaviour, there’s very little incentive for people to add support for their native language to Musl, since they can tolerate the unusual behaviour wherein Musl pretends it has locale data for their language (you yourself say “such thing is a small problem”, which illustrates the point, I think).

> So my real concern, is what could happen to all those users of
> languages not supported by musl locales and that currently (to some
> extent) have their systems localized?

The exact same thing that happens to people in a similar position on Glibc systems, namely they don’t get their programs localized until they install locale data for the C library.  This is not especially unreasonable behaviour, honestly.  I think the community of people using Musl-based distributions is of above average competence and many users will be entirely capable of submitting a patch to musl-locales for their language if it isn’t supported (I’d say the situation on Glibc-based distributions is different; there are definitely less technically able users running desktop Linux these days).

Kind regards,

Alastair.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] setlocale() again
  2023-09-18 14:18     ` Alastair Houghton
  2023-10-27 20:15       ` Pablo Correa Gomez
@ 2023-11-28 17:32       ` Alastair Houghton
  2023-11-28 23:21         ` Pablo Correa Gomez
  2023-12-05 15:19         ` Alastair Houghton
  1 sibling, 2 replies; 13+ messages in thread
From: Alastair Houghton @ 2023-11-28 17:32 UTC (permalink / raw)
  To: musl, Rich Felker

> On 18 Sep 2023, at 15:18, Alastair Houghton <ahoughton@apple.com> wrote:
> 
> Hi all (Rich especially though :-))
> 
> Has anyone had time to take a look at this? I’d like to make some progress on this front if possible.
> 
> Kind regards,
> 
> Alastair.


Maybe I’ve missed a reply somewhere along the lines; here’s a tentative patch that just does the simple thing of making setlocale(LC_ALL, "") pick the C.UTF-8 locale if it’s unable to find the locale specified in the environment.

This will mean that setlocale(LC_ALL, "non-existent locale") will return NULL, which does have an impact on users if they don’t have a locale installed for Musl but *do* have locale data installed for some other software; in that case, their other software won’t be localized until they also install data for Musl.

(This is the same as current Glibc behaviour.)

Kind regards,

Alastair

---- snip ----

diff --git a/src/locale/locale_map.c b/src/locale/locale_map.c
index da61f7fc..bd11f2ca 100644
--- a/src/locale/locale_map.c
+++ b/src/locale/locale_map.c
@@ -31,7 +31,7 @@ static const char envvars[][12] = {
 volatile int __locale_lock[1];
 volatile int *const __locale_lockptr = __locale_lock;
 
-const struct __locale_map *__get_locale(int cat, const char *val)
+const struct __locale_map *__get_locale(int cat, const char *locale)
 {
        static void *volatile loc_head;
        const struct __locale_map *p;
@@ -39,6 +39,7 @@ const struct __locale_map *__get_locale(int cat, const char *val)
        const char *path = 0, *z;
        char buf[256];
        size_t l, n;
+       const char *val = locale;
 
        if (!*val) {
                (val = getenv("LC_ALL")) && *val ||
@@ -92,18 +93,9 @@ const struct __locale_map *__get_locale(int cat, const char *val)
                }
        }
 
-       /* If no locale definition was found, make a locale map
-        * object anyway to store the name, which is kept for the
-        * sake of being able to do message translations at the
-        * application level. */
-       if (!new && (new = malloc(sizeof *new))) {
-               new->map = __c_dot_utf8.map;
-               new->map_size = __c_dot_utf8.map_size;
-               memcpy(new->name, val, n);
-               new->name[n] = 0;
-               new->next = loc_head;
-               loc_head = new;
-       }
+       /* If no locale definition was found, and we specified a
+        * locale name of "", return the C.UTF-8 locale. */
+       if (!new && !*locale) new = (void *)&__c_dot_utf8;
 
        /* For LC_CTYPE, never return a null pointer unless the
         * requested name was "C" or "POSIX". */


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] setlocale() again
  2023-11-28 16:27         ` Alastair Houghton
@ 2023-11-28 23:15           ` Pablo Correa Gomez
  0 siblings, 0 replies; 13+ messages in thread
From: Pablo Correa Gomez @ 2023-11-28 23:15 UTC (permalink / raw)
  To: Alastair Houghton; +Cc: musl, Rich Felker

On mar, 2023-11-28 at 16:27 +0000, Alastair Houghton wrote:
> On 27 Oct 2023, at 21:15, Pablo Correa Gomez
> <pabloyoyoista@postmarketos.org> wrote:
> > 
> > 
> > 
> > I'm Pablo, one of postmarketOS maintainers, and by chance bumped
> > into
> > this thread. I am super happy that you sent these messages, since I
> > am
> > in the process of adding support for musl into Glib test suite, and
> > bumped into exactly the same problem as you did.
> 
> Hi Pablo. Sorry it’s taken me a while to get back to you. I’ve been
> somewhat busy with other things.

Happy to hear back, time catches up on all of us.

> 
> > From the developer perspective, I really like your proposal.
> > However,
> > from the user and distro maintainer perspective, I am a bit more
> > hesitant. As daily user of an alpine system localized in Spanish, I
> > acknowledge and identify the problem that you present of "1,024" vs
> > "1.024". However, I would argue that such thing is a small problem,
> > compared to completely losing localization in Spanish. In practice,
> > I
> > would say both these examples are only theoric, since both Spanish
> > and
> > French are supported musl-locales. 
> 
> Indeed. I’d add to that though that with the present behaviour,
> there’s very little incentive for people to add support for their
> native language to Musl, since they can tolerate the unusual
> behaviour wherein Musl pretends it has locale data for their language
> (you yourself say “such thing is a small problem”, which illustrates
> the point, I think).

Right, this is actually a great point for this change.

> 
> > So my real concern, is what could happen to all those users of
> > languages not supported by musl locales and that currently (to some
> > extent) have their systems localized?
> 
> The exact same thing that happens to people in a similar position on
> Glibc systems, namely they don’t get their programs localized until
> they install locale data for the C library.  This is not especially
> unreasonable behaviour, honestly.  I think the community of people
> using Musl-based distributions is of above average competence and
> many users will be entirely capable of submitting a patch to musl-
> locales for their language if it isn’t supported (I’d say the
> situation on Glibc-based distributions is different; there are
> definitely less technically able users running desktop Linux these
> days).

We at postmarketOS certainly have less technical users. We could solve
it with an announcement though. So as long as we have some time to
adapt, and this is not queued right before a release, I'm happy with it
:)

> 
> Kind regards,
> 
> Alastair.
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] setlocale() again
  2023-11-28 17:32       ` Alastair Houghton
@ 2023-11-28 23:21         ` Pablo Correa Gomez
  2023-12-05 15:19         ` Alastair Houghton
  1 sibling, 0 replies; 13+ messages in thread
From: Pablo Correa Gomez @ 2023-11-28 23:21 UTC (permalink / raw)
  To: Alastair Houghton, musl, Rich Felker

On mar, 2023-11-28 at 17:32 +0000, Alastair Houghton wrote:
> > On 18 Sep 2023, at 15:18, Alastair Houghton <ahoughton@apple.com>
> > wrote:
> > 
> > Hi all (Rich especially though :-))
> > 
> > Has anyone had time to take a look at this? I’d like to make some
> > progress on this front if possible.
> > 
> > Kind regards,
> > 
> > Alastair.
> 
> 
> Maybe I’ve missed a reply somewhere along the lines; here’s a
> tentative patch that just does the simple thing of making
> setlocale(LC_ALL, "") pick the C.UTF-8 locale if it’s unable to find
> the locale specified in the environment.

Hi Alastair,

From the logic of the change, and from the postmarketOS perspective,
would be fine be me if Rich does not commit this right before a
release, as otherwise we would not have much time to inform and help
less technical users get their localization back. Now is probably a
great time, since the new alpine release is around the corner.

However, this also increases the importance of musl-locales for
localization to work at all. Before, its existence as a completely
independent project was totally fine, users would not need it to get a
localized system. However, after this change, that's no longer the
case. So I wonder if Rich or the musl community in general would like
to host/take care for it? Maybe even deciding on a standar locale path
that does not need to be set through an environment variable, though
keeping that as a fallback? I'd love to see that come together with
this change.

Best,
Pablo.

> 
> This will mean that setlocale(LC_ALL, "non-existent locale") will
> return NULL, which does have an impact on users if they don’t have a
> locale installed for Musl but *do* have locale data installed for
> some other software; in that case, their other software won’t be
> localized until they also install data for Musl.
> 
> (This is the same as current Glibc behaviour.)
> 
> Kind regards,
> 
> Alastair
> 
> ---- snip ----
> 
> diff --git a/src/locale/locale_map.c b/src/locale/locale_map.c
> index da61f7fc..bd11f2ca 100644
> --- a/src/locale/locale_map.c
> +++ b/src/locale/locale_map.c
> @@ -31,7 +31,7 @@ static const char envvars[][12] = {
>  volatile int __locale_lock[1];
>  volatile int *const __locale_lockptr = __locale_lock;
>  
> -const struct __locale_map *__get_locale(int cat, const char *val)
> +const struct __locale_map *__get_locale(int cat, const char *locale)
>  {
>         static void *volatile loc_head;
>         const struct __locale_map *p;
> @@ -39,6 +39,7 @@ const struct __locale_map *__get_locale(int cat,
> const char *val)
>         const char *path = 0, *z;
>         char buf[256];
>         size_t l, n;
> +       const char *val = locale;
>  
>         if (!*val) {
>                 (val = getenv("LC_ALL")) && *val ||
> @@ -92,18 +93,9 @@ const struct __locale_map *__get_locale(int cat,
> const char *val)
>                 }
>         }
>  
> -       /* If no locale definition was found, make a locale map
> -        * object anyway to store the name, which is kept for the
> -        * sake of being able to do message translations at the
> -        * application level. */
> -       if (!new && (new = malloc(sizeof *new))) {
> -               new->map = __c_dot_utf8.map;
> -               new->map_size = __c_dot_utf8.map_size;
> -               memcpy(new->name, val, n);
> -               new->name[n] = 0;
> -               new->next = loc_head;
> -               loc_head = new;
> -       }
> +       /* If no locale definition was found, and we specified a
> +        * locale name of "", return the C.UTF-8 locale. */
> +       if (!new && !*locale) new = (void *)&__c_dot_utf8;
>  
>         /* For LC_CTYPE, never return a null pointer unless the
>          * requested name was "C" or "POSIX". */
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] setlocale() again
  2023-11-28 17:32       ` Alastair Houghton
  2023-11-28 23:21         ` Pablo Correa Gomez
@ 2023-12-05 15:19         ` Alastair Houghton
  2023-12-08 10:46           ` Alastair Houghton
  1 sibling, 1 reply; 13+ messages in thread
From: Alastair Houghton @ 2023-12-05 15:19 UTC (permalink / raw)
  To: musl; +Cc: Rich Felker

> Maybe I’ve missed a reply somewhere along the lines; here’s a tentative patch that just does the simple thing of making setlocale(LC_ALL, "") pick the C.UTF-8 locale if it’s unable to find the locale specified in the environment.

[snip]

Hah. So, testing that patch, having removed my hacks to avoid using Musl’s locale support, I find it doesn’t actually work (for two reasons; one, NULL doesn’t mean not found, it means “use ‘C’”; and two, there is some very odd code in setlocale.c that causes things to go wrong if the specified name is longer than LOCALE_NAME_MAX).

I’ll come back with an updated patch in a bit.

Kind regards,

Alastair.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] setlocale() again
  2023-12-05 15:19         ` Alastair Houghton
@ 2023-12-08 10:46           ` Alastair Houghton
  2023-12-08 23:59             ` Rich Felker
  0 siblings, 1 reply; 13+ messages in thread
From: Alastair Houghton @ 2023-12-08 10:46 UTC (permalink / raw)
  To: musl; +Cc: Rich Felker

On 5 Dec 2023, at 15:19, Alastair Houghton <ahoughton@apple.com> wrote:
> 
>> Maybe I’ve missed a reply somewhere along the lines; here’s a tentative patch that just does the simple thing of making setlocale(LC_ALL, "") pick the C.UTF-8 locale if it’s unable to find the locale specified in the environment.
> 
> [snip]
> 
> Hah. So, testing that patch, having removed my hacks to avoid using Musl’s locale support, I find it doesn’t actually work (for two reasons; one, NULL doesn’t mean not found, it means “use ‘C’”; and two, there is some very odd code in setlocale.c that causes things to go wrong if the specified name is longer than LOCALE_NAME_MAX).
> 
> I’ll come back with an updated patch in a bit.

Updated patch:

==== Cut here ====
diff --git a/src/locale/locale_map.c b/src/locale/locale_map.c
index da61f7fc..097da1ad 100644
--- a/src/locale/locale_map.c
+++ b/src/locale/locale_map.c
@@ -31,7 +31,7 @@ static const char envvars[][12] = {
 volatile int __locale_lock[1];
 volatile int *const __locale_lockptr = __locale_lock;
 
-const struct __locale_map *__get_locale(int cat, const char *val)
+const struct __locale_map *__get_locale(int cat, const char *locale)
 {
 	static void *volatile loc_head;
 	const struct __locale_map *p;
@@ -39,6 +39,7 @@ const struct __locale_map *__get_locale(int cat, const char *val)
 	const char *path = 0, *z;
 	char buf[256];
 	size_t l, n;
+	const char *val = locale;
 
 	if (!*val) {
 		(val = getenv("LC_ALL")) && *val ||
@@ -92,22 +93,18 @@ const struct __locale_map *__get_locale(int cat, const char *val)
 		}
 	}
 
-	/* If no locale definition was found, make a locale map
-	 * object anyway to store the name, which is kept for the
-	 * sake of being able to do message translations at the
-	 * application level. */
-	if (!new && (new = malloc(sizeof *new))) {
-		new->map = __c_dot_utf8.map;
-		new->map_size = __c_dot_utf8.map_size;
-		memcpy(new->name, val, n);
-		new->name[n] = 0;
-		new->next = loc_head;
-		loc_head = new;
-	}
+	/* If no locale definition was found, and we specified a
+	 * locale name of "", return the C.UTF-8 locale. */
+	if (!new && !*locale) new = (void *)&__c_dot_utf8;
 
 	/* For LC_CTYPE, never return a null pointer unless the
 	 * requested name was "C" or "POSIX". */
 	if (!new && cat == LC_CTYPE) new = (void *)&__c_dot_utf8;
 
+	/* Returning NULL means "C locale"; if we get here and
+	 * there's no locale, return failure instead. */
+	if (!new)
+		return LOC_MAP_FAILED;
+
 	return new;
 }
diff --git a/src/locale/setlocale.c b/src/locale/setlocale.c
index 360c4437..9842d95d 100644
--- a/src/locale/setlocale.c
+++ b/src/locale/setlocale.c
@@ -28,12 +28,14 @@ char *setlocale(int cat, const char *name)
 			const char *p = name;
 			for (i=0; i<LC_ALL; i++) {
 				const char *z = __strchrnul(p, ';');
-				if (z-p <= LOCALE_NAME_MAX) {
+				if (z-p > LOCALE_NAME_MAX)
+					lm = LOC_MAP_FAILED;
+				else {
 					memcpy(part, p, z-p);
 					part[z-p] = 0;
 					if (*z) p = z+1;
+					lm = __get_locale(i, part);
 				}
-				lm = __get_locale(i, part);
 				if (lm == LOC_MAP_FAILED) {
 					UNLOCK(__locale_lock);
 					return 0;
==== Cut here ====


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] setlocale() again
  2023-12-08 10:46           ` Alastair Houghton
@ 2023-12-08 23:59             ` Rich Felker
  2023-12-09 18:44               ` Pablo Correa Gomez
  0 siblings, 1 reply; 13+ messages in thread
From: Rich Felker @ 2023-12-08 23:59 UTC (permalink / raw)
  To: Alastair Houghton; +Cc: musl

On Fri, Dec 08, 2023 at 10:46:15AM +0000, Alastair Houghton wrote:
> On 5 Dec 2023, at 15:19, Alastair Houghton <ahoughton@apple.com> wrote:
> > 
> >> Maybe I’ve missed a reply somewhere along the lines; here’s a tentative patch that just does the simple thing of making setlocale(LC_ALL, "") pick the C.UTF-8 locale if it’s unable to find the locale specified in the environment.
> > 
> > [snip]
> > 
> > Hah. So, testing that patch, having removed my hacks to avoid using Musl’s locale support, I find it doesn’t actually work (for two reasons; one, NULL doesn’t mean not found, it means “use ‘C’”; and two, there is some very odd code in setlocale.c that causes things to go wrong if the specified name is longer than LOCALE_NAME_MAX).
> > 
> > I’ll come back with an updated patch in a bit.
> 
> Updated patch:
> 
> ==== Cut here ====
> diff --git a/src/locale/locale_map.c b/src/locale/locale_map.c
> index da61f7fc..097da1ad 100644
> --- a/src/locale/locale_map.c
> +++ b/src/locale/locale_map.c
> @@ -31,7 +31,7 @@ static const char envvars[][12] = {
>  volatile int __locale_lock[1];
>  volatile int *const __locale_lockptr = __locale_lock;
>  
> -const struct __locale_map *__get_locale(int cat, const char *val)
> +const struct __locale_map *__get_locale(int cat, const char *locale)
>  {
>  	static void *volatile loc_head;
>  	const struct __locale_map *p;
> @@ -39,6 +39,7 @@ const struct __locale_map *__get_locale(int cat, const char *val)
>  	const char *path = 0, *z;
>  	char buf[256];
>  	size_t l, n;
> +	const char *val = locale;
>  
>  	if (!*val) {
>  		(val = getenv("LC_ALL")) && *val ||
> @@ -92,22 +93,18 @@ const struct __locale_map *__get_locale(int cat, const char *val)
>  		}
>  	}
>  
> -	/* If no locale definition was found, make a locale map
> -	 * object anyway to store the name, which is kept for the
> -	 * sake of being able to do message translations at the
> -	 * application level. */
> -	if (!new && (new = malloc(sizeof *new))) {
> -		new->map = __c_dot_utf8.map;
> -		new->map_size = __c_dot_utf8.map_size;
> -		memcpy(new->name, val, n);
> -		new->name[n] = 0;
> -		new->next = loc_head;
> -		loc_head = new;
> -	}
> +	/* If no locale definition was found, and we specified a
> +	 * locale name of "", return the C.UTF-8 locale. */
> +	if (!new && !*locale) new = (void *)&__c_dot_utf8;
>  
>  	/* For LC_CTYPE, never return a null pointer unless the
>  	 * requested name was "C" or "POSIX". */
>  	if (!new && cat == LC_CTYPE) new = (void *)&__c_dot_utf8;
>  
> +	/* Returning NULL means "C locale"; if we get here and
> +	 * there's no locale, return failure instead. */
> +	if (!new)
> +		return LOC_MAP_FAILED;
> +
>  	return new;
>  }
> diff --git a/src/locale/setlocale.c b/src/locale/setlocale.c
> index 360c4437..9842d95d 100644
> --- a/src/locale/setlocale.c
> +++ b/src/locale/setlocale.c
> @@ -28,12 +28,14 @@ char *setlocale(int cat, const char *name)
>  			const char *p = name;
>  			for (i=0; i<LC_ALL; i++) {
>  				const char *z = __strchrnul(p, ';');
> -				if (z-p <= LOCALE_NAME_MAX) {
> +				if (z-p > LOCALE_NAME_MAX)
> +					lm = LOC_MAP_FAILED;
> +				else {
>  					memcpy(part, p, z-p);
>  					part[z-p] = 0;
>  					if (*z) p = z+1;
> +					lm = __get_locale(i, part);
>  				}
> -				lm = __get_locale(i, part);
>  				if (lm == LOC_MAP_FAILED) {
>  					UNLOCK(__locale_lock);
>  					return 0;
> ==== Cut here ====

Sorry to be late chiming in here. There's something I've been meaning
to ask: back when this was first proposed, I recall there being two
variants we considered: one where setlocale to "" where the env vars
don't resolve to any real locale file produces as its
implementation-defined result "C.UTF-8", and another where it produces
a ghost locale with the requested name but the behavior of "C.UTF-8".

Is there a reason you think the former is a better choice than the
latter? The latter would avoid breaking things for users with
application translations but no libc locale files. However it requires
more complex logic for consistency I think, and I'm not sure we ever
worked out if that could be done in a reasonable way.

Another option that wasn't raised before but that might be worth
considering is keeping the existing behavior if MUSL_LOCPATH is not
set (all names are valid and are aliases for "C.UTF-8" but doing as in
your patch if it's set.

Rich

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] setlocale() again
  2023-12-08 23:59             ` Rich Felker
@ 2023-12-09 18:44               ` Pablo Correa Gomez
  0 siblings, 0 replies; 13+ messages in thread
From: Pablo Correa Gomez @ 2023-12-09 18:44 UTC (permalink / raw)
  To: Rich Felker, Alastair Houghton; +Cc: musl

On vie, 2023-12-08 at 18:59 -0500, Rich Felker wrote:
> > On Fri, Dec 08, 2023 at 10:46:15AM +0000, Alastair Houghton wrote:
> > > > On 5 Dec 2023, at 15:19, Alastair Houghton
> > > > <ahoughton@apple.com>
> > > > wrote:
> > > > > > 
> > > > > > > > Maybe I’ve missed a reply somewhere along the lines;
> > > > > > > > here’s a
> > > > > > > > tentative patch that just does the simple thing of
> > > > > > > > making
> > > > > > > > setlocale(LC_ALL, "") pick the C.UTF-8 locale if it’s
> > > > > > > > unable to
> > > > > > > > find the locale specified in the environment.
> > > > > > 
> > > > > > [snip]
> > > > > > 
> > > > > > Hah. So, testing that patch, having removed my hacks to
> > > > > > avoid
> > > > > > using Musl’s locale support, I find it doesn’t actually
> > > > > > work (for
> > > > > > two reasons; one, NULL doesn’t mean not found, it means
> > > > > > “use
> > > > > > ‘C’”; and two, there is some very odd code in setlocale.c
> > > > > > that
> > > > > > causes things to go wrong if the specified name is longer
> > > > > > than
> > > > > > LOCALE_NAME_MAX).
> > > > > > 
> > > > > > I’ll come back with an updated patch in a bit.
> > > > 
> > > > Updated patch:
> > > > 
> > > > ==== Cut here ====
> > > > diff --git a/src/locale/locale_map.c b/src/locale/locale_map.c
> > > > index da61f7fc..097da1ad 100644
> > > > --- a/src/locale/locale_map.c
> > > > +++ b/src/locale/locale_map.c
> > > > @@ -31,7 +31,7 @@ static const char envvars[][12] = {
> > > >  volatile int __locale_lock[1];
> > > >  volatile int *const __locale_lockptr = __locale_lock;
> > > >  
> > > > -const struct __locale_map *__get_locale(int cat, const char
> > > > *val)
> > > > +const struct __locale_map *__get_locale(int cat, const char
> > > > *locale)
> > > >  {
> > > >   static void *volatile loc_head;
> > > >   const struct __locale_map *p;
> > > > @@ -39,6 +39,7 @@ const struct __locale_map *__get_locale(int
> > > > cat,
> > > > const char *val)
> > > >   const char *path = 0, *z;
> > > >   char buf[256];
> > > >   size_t l, n;
> > > > + const char *val = locale;
> > > >  
> > > >   if (!*val) {
> > > >   (val = getenv("LC_ALL")) && *val ||
> > > > @@ -92,22 +93,18 @@ const struct __locale_map *__get_locale(int
> > > > cat, const char *val)
> > > >   }
> > > >   }
> > > >  
> > > > - /* If no locale definition was found, make a locale map
> > > > - * object anyway to store the name, which is kept for the
> > > > - * sake of being able to do message translations at the
> > > > - * application level. */
> > > > - if (!new && (new = malloc(sizeof *new))) {
> > > > - new->map = __c_dot_utf8.map;
> > > > - new->map_size = __c_dot_utf8.map_size;
> > > > - memcpy(new->name, val, n);
> > > > - new->name[n] = 0;
> > > > - new->next = loc_head;
> > > > - loc_head = new;
> > > > - }
> > > > + /* If no locale definition was found, and we specified a
> > > > + * locale name of "", return the C.UTF-8 locale. */
> > > > + if (!new && !*locale) new = (void *)&__c_dot_utf8;
> > > >  
> > > >   /* For LC_CTYPE, never return a null pointer unless the
> > > >   * requested name was "C" or "POSIX". */
> > > >   if (!new && cat == LC_CTYPE) new = (void *)&__c_dot_utf8;
> > > >  
> > > > + /* Returning NULL means "C locale"; if we get here and
> > > > + * there's no locale, return failure instead. */
> > > > + if (!new)
> > > > + return LOC_MAP_FAILED;
> > > > +
> > > >   return new;
> > > >  }
> > > > diff --git a/src/locale/setlocale.c b/src/locale/setlocale.c
> > > > index 360c4437..9842d95d 100644
> > > > --- a/src/locale/setlocale.c
> > > > +++ b/src/locale/setlocale.c
> > > > @@ -28,12 +28,14 @@ char *setlocale(int cat, const char *name)
> > > >   const char *p = name;
> > > >   for (i=0; i<LC_ALL; i++) {
> > > >   const char *z = __strchrnul(p, ';');
> > > > - if (z-p <= LOCALE_NAME_MAX) {
> > > > + if (z-p > LOCALE_NAME_MAX)
> > > > + lm = LOC_MAP_FAILED;
> > > > + else {
> > > >   memcpy(part, p, z-p);
> > > >   part[z-p] = 0;
> > > >   if (*z) p = z+1;
> > > > + lm = __get_locale(i, part);
> > > >   }
> > > > - lm = __get_locale(i, part);
> > > >   if (lm == LOC_MAP_FAILED) {
> > > >   UNLOCK(__locale_lock);
> > > >   return 0;
> > > > ==== Cut here ====
> > 
> > Sorry to be late chiming in here. There's something I've been
> > meaning
> > to ask: back when this was first proposed, I recall there being two
> > variants we considered: one where setlocale to "" where the env
> > vars
> > don't resolve to any real locale file produces as its
> > implementation-defined result "C.UTF-8", and another where it
> > produces
> > a ghost locale with the requested name but the behavior of "C.UTF-
> > 8".
> > 
> > Is there a reason you think the former is a better choice than the
> > latter? The latter would avoid breaking things for users with
> > application translations but no libc locale files. However it
> > requires
> > more complex logic for consistency I think, and I'm not sure we
> > ever
> > worked out if that could be done in a reasonable way.

Thanks for chimming in again Rich. I think we have discussed in this
thread, that we want to avoid the second option, because it provides an
inconsistent experience to users. As a distro maintainer, my life will
be much easier if users that lack translations get affected by this. We
have 6 months for this to be included in alpine, and for us to work
together with the users to add their language to musl-locales. If we
keep some sort of inconsistent behavior (between what we claim we
support by returning the name, and what we actually support, which is
C), then I'm sure we'll bump into multiple corner-cases, where I'll
have a very hard time supporting users, and they will have less
motivation to work with us provide translations for their language.

> > 
> > Another option that wasn't raised before but that might be worth
> > considering is keeping the existing behavior if MUSL_LOCPATH is not
> > set (all names are valid and are aliases for "C.UTF-8" but doing as
> > in
> > your patch if it's set.

Same here, although probably less critical, since installing musl-
locales in alpine sets the variable by default. However, if we ever
want to have a default path that does not depend on an environment
variable (something I'd really like to have), this might get trickier?

I've also tested the patch on my GNOME setup with Spanish localization,
and everything seems to work as expected.

Best,
Pablo.

> > 
> > Rich


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-12-09 18:44 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-10 15:41 [musl] setlocale() again Alastair Houghton
2023-08-10 15:51 ` Rich Felker
2023-09-05 12:57   ` Alastair Houghton
2023-09-18 14:18     ` Alastair Houghton
2023-10-27 20:15       ` Pablo Correa Gomez
2023-11-28 16:27         ` Alastair Houghton
2023-11-28 23:15           ` Pablo Correa Gomez
2023-11-28 17:32       ` Alastair Houghton
2023-11-28 23:21         ` Pablo Correa Gomez
2023-12-05 15:19         ` Alastair Houghton
2023-12-08 10:46           ` Alastair Houghton
2023-12-08 23:59             ` Rich Felker
2023-12-09 18:44               ` Pablo Correa Gomez

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).