mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Pablo Correa Gomez <pabloyoyoista@postmarketos.org>
To: Rich Felker <dalias@libc.org>, Alastair Houghton <ahoughton@apple.com>
Cc: musl@lists.openwall.com
Subject: Re: [musl] setlocale() again
Date: Sat, 09 Dec 2023 19:44:30 +0100	[thread overview]
Message-ID: <0b2978eb06c7bf6ce78436edc42b132d1797fc2a.camel@postmarketos.org> (raw)
In-Reply-To: <20231208235920.GE4163@brightrain.aerifal.cx>

On vie, 2023-12-08 at 18:59 -0500, Rich Felker wrote:
> > On Fri, Dec 08, 2023 at 10:46:15AM +0000, Alastair Houghton wrote:
> > > > On 5 Dec 2023, at 15:19, Alastair Houghton
> > > > <ahoughton@apple.com>
> > > > wrote:
> > > > > > 
> > > > > > > > Maybe I’ve missed a reply somewhere along the lines;
> > > > > > > > here’s a
> > > > > > > > tentative patch that just does the simple thing of
> > > > > > > > making
> > > > > > > > setlocale(LC_ALL, "") pick the C.UTF-8 locale if it’s
> > > > > > > > unable to
> > > > > > > > find the locale specified in the environment.
> > > > > > 
> > > > > > [snip]
> > > > > > 
> > > > > > Hah. So, testing that patch, having removed my hacks to
> > > > > > avoid
> > > > > > using Musl’s locale support, I find it doesn’t actually
> > > > > > work (for
> > > > > > two reasons; one, NULL doesn’t mean not found, it means
> > > > > > “use
> > > > > > ‘C’”; and two, there is some very odd code in setlocale.c
> > > > > > that
> > > > > > causes things to go wrong if the specified name is longer
> > > > > > than
> > > > > > LOCALE_NAME_MAX).
> > > > > > 
> > > > > > I’ll come back with an updated patch in a bit.
> > > > 
> > > > Updated patch:
> > > > 
> > > > ==== Cut here ====
> > > > diff --git a/src/locale/locale_map.c b/src/locale/locale_map.c
> > > > index da61f7fc..097da1ad 100644
> > > > --- a/src/locale/locale_map.c
> > > > +++ b/src/locale/locale_map.c
> > > > @@ -31,7 +31,7 @@ static const char envvars[][12] = {
> > > >  volatile int __locale_lock[1];
> > > >  volatile int *const __locale_lockptr = __locale_lock;
> > > >  
> > > > -const struct __locale_map *__get_locale(int cat, const char
> > > > *val)
> > > > +const struct __locale_map *__get_locale(int cat, const char
> > > > *locale)
> > > >  {
> > > >   static void *volatile loc_head;
> > > >   const struct __locale_map *p;
> > > > @@ -39,6 +39,7 @@ const struct __locale_map *__get_locale(int
> > > > cat,
> > > > const char *val)
> > > >   const char *path = 0, *z;
> > > >   char buf[256];
> > > >   size_t l, n;
> > > > + const char *val = locale;
> > > >  
> > > >   if (!*val) {
> > > >   (val = getenv("LC_ALL")) && *val ||
> > > > @@ -92,22 +93,18 @@ const struct __locale_map *__get_locale(int
> > > > cat, const char *val)
> > > >   }
> > > >   }
> > > >  
> > > > - /* If no locale definition was found, make a locale map
> > > > - * object anyway to store the name, which is kept for the
> > > > - * sake of being able to do message translations at the
> > > > - * application level. */
> > > > - if (!new && (new = malloc(sizeof *new))) {
> > > > - new->map = __c_dot_utf8.map;
> > > > - new->map_size = __c_dot_utf8.map_size;
> > > > - memcpy(new->name, val, n);
> > > > - new->name[n] = 0;
> > > > - new->next = loc_head;
> > > > - loc_head = new;
> > > > - }
> > > > + /* If no locale definition was found, and we specified a
> > > > + * locale name of "", return the C.UTF-8 locale. */
> > > > + if (!new && !*locale) new = (void *)&__c_dot_utf8;
> > > >  
> > > >   /* For LC_CTYPE, never return a null pointer unless the
> > > >   * requested name was "C" or "POSIX". */
> > > >   if (!new && cat == LC_CTYPE) new = (void *)&__c_dot_utf8;
> > > >  
> > > > + /* Returning NULL means "C locale"; if we get here and
> > > > + * there's no locale, return failure instead. */
> > > > + if (!new)
> > > > + return LOC_MAP_FAILED;
> > > > +
> > > >   return new;
> > > >  }
> > > > diff --git a/src/locale/setlocale.c b/src/locale/setlocale.c
> > > > index 360c4437..9842d95d 100644
> > > > --- a/src/locale/setlocale.c
> > > > +++ b/src/locale/setlocale.c
> > > > @@ -28,12 +28,14 @@ char *setlocale(int cat, const char *name)
> > > >   const char *p = name;
> > > >   for (i=0; i<LC_ALL; i++) {
> > > >   const char *z = __strchrnul(p, ';');
> > > > - if (z-p <= LOCALE_NAME_MAX) {
> > > > + if (z-p > LOCALE_NAME_MAX)
> > > > + lm = LOC_MAP_FAILED;
> > > > + else {
> > > >   memcpy(part, p, z-p);
> > > >   part[z-p] = 0;
> > > >   if (*z) p = z+1;
> > > > + lm = __get_locale(i, part);
> > > >   }
> > > > - lm = __get_locale(i, part);
> > > >   if (lm == LOC_MAP_FAILED) {
> > > >   UNLOCK(__locale_lock);
> > > >   return 0;
> > > > ==== Cut here ====
> > 
> > Sorry to be late chiming in here. There's something I've been
> > meaning
> > to ask: back when this was first proposed, I recall there being two
> > variants we considered: one where setlocale to "" where the env
> > vars
> > don't resolve to any real locale file produces as its
> > implementation-defined result "C.UTF-8", and another where it
> > produces
> > a ghost locale with the requested name but the behavior of "C.UTF-
> > 8".
> > 
> > Is there a reason you think the former is a better choice than the
> > latter? The latter would avoid breaking things for users with
> > application translations but no libc locale files. However it
> > requires
> > more complex logic for consistency I think, and I'm not sure we
> > ever
> > worked out if that could be done in a reasonable way.

Thanks for chimming in again Rich. I think we have discussed in this
thread, that we want to avoid the second option, because it provides an
inconsistent experience to users. As a distro maintainer, my life will
be much easier if users that lack translations get affected by this. We
have 6 months for this to be included in alpine, and for us to work
together with the users to add their language to musl-locales. If we
keep some sort of inconsistent behavior (between what we claim we
support by returning the name, and what we actually support, which is
C), then I'm sure we'll bump into multiple corner-cases, where I'll
have a very hard time supporting users, and they will have less
motivation to work with us provide translations for their language.

> > 
> > Another option that wasn't raised before but that might be worth
> > considering is keeping the existing behavior if MUSL_LOCPATH is not
> > set (all names are valid and are aliases for "C.UTF-8" but doing as
> > in
> > your patch if it's set.

Same here, although probably less critical, since installing musl-
locales in alpine sets the variable by default. However, if we ever
want to have a default path that does not depend on an environment
variable (something I'd really like to have), this might get trickier?

I've also tested the patch on my GNOME setup with Spanish localization,
and everything seems to work as expected.

Best,
Pablo.

> > 
> > Rich


      reply	other threads:[~2023-12-09 18:44 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-10 15:41 Alastair Houghton
2023-08-10 15:51 ` Rich Felker
2023-09-05 12:57   ` Alastair Houghton
2023-09-18 14:18     ` Alastair Houghton
2023-10-27 20:15       ` Pablo Correa Gomez
2023-11-28 16:27         ` Alastair Houghton
2023-11-28 23:15           ` Pablo Correa Gomez
2023-11-28 17:32       ` Alastair Houghton
2023-11-28 23:21         ` Pablo Correa Gomez
2023-12-05 15:19         ` Alastair Houghton
2023-12-08 10:46           ` Alastair Houghton
2023-12-08 23:59             ` Rich Felker
2023-12-09 18:44               ` Pablo Correa Gomez [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0b2978eb06c7bf6ce78436edc42b132d1797fc2a.camel@postmarketos.org \
    --to=pabloyoyoista@postmarketos.org \
    --cc=ahoughton@apple.com \
    --cc=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).