From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, HK_RANDOM_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 9423 invoked from network); 9 Dec 2023 18:44:47 -0000 Received: from second.openwall.net (193.110.157.125) by inbox.vuxu.org with ESMTPUTF8; 9 Dec 2023 18:44:47 -0000 Received: (qmail 26088 invoked by uid 550); 9 Dec 2023 18:44:32 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 26056 invoked from network); 9 Dec 2023 18:44:32 -0000 Message-ID: <0b2978eb06c7bf6ce78436edc42b132d1797fc2a.camel@postmarketos.org> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=postmarketos.org; s=donut; t=1702147472; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=T3/sdP1qDEH+bGl+0jD5r699mECQYCemsZyFpC90s3A=; b=QHbmYugRqKfm3w6f+t4UbTJhBA6b3Nij58Yvs/PtGow1qP5kgJRyCSmv3OAMYlDfqSU8tw 5XIeepmc7wT98pBmFrT94lcaLoNOFvwVDht9mXRV4i1YxL88y2jaE+ttHp92OUiy+EfqBB Km0iMeyJCac9ngBWKdbTn5egLZBduNw= From: Pablo Correa Gomez To: Rich Felker , Alastair Houghton Cc: musl@lists.openwall.com Date: Sat, 09 Dec 2023 19:44:30 +0100 In-Reply-To: <20231208235920.GE4163@brightrain.aerifal.cx> References: <1390B046-C845-406F-8AED-620F2DD16BC0@apple.com> <20230810155115.GT4163@brightrain.aerifal.cx> <267261EB-1DFA-4072-89F0-B62F5DDE5F09@apple.com> <3DD8D02A-0802-494E-B9E8-F00B457B86F6@apple.com> <25283A51-7FB1-4CB1-9C26-DF06F69922BC@apple.com> <9AF32F3B-1889-4799-9379-EF860BE3E85F@apple.com> <20231208235920.GE4163@brightrain.aerifal.cx> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [musl] setlocale() again On vie, 2023-12-08 at 18:59 -0500, Rich Felker wrote: > > On Fri, Dec 08, 2023 at 10:46:15AM +0000, Alastair Houghton wrote: > > > > On 5 Dec 2023, at 15:19, Alastair Houghton > > > > > > > > wrote: > > > > > >=20 > > > > > > > > Maybe I=E2=80=99ve missed a reply somewhere along the lines= ; > > > > > > > > here=E2=80=99s a > > > > > > > > tentative patch that just does the simple thing of > > > > > > > > making > > > > > > > > setlocale(LC_ALL, "") pick the C.UTF-8 locale if it=E2=80= =99s > > > > > > > > unable to > > > > > > > > find the locale specified in the environment. > > > > > >=20 > > > > > > [snip] > > > > > >=20 > > > > > > Hah. So, testing that patch, having removed my hacks to > > > > > > avoid > > > > > > using Musl=E2=80=99s locale support, I find it doesn=E2=80=99t = actually > > > > > > work (for > > > > > > two reasons; one, NULL doesn=E2=80=99t mean not found, it means > > > > > > =E2=80=9Cuse > > > > > > =E2=80=98C=E2=80=99=E2=80=9D; and two, there is some very odd c= ode in setlocale.c > > > > > > that > > > > > > causes things to go wrong if the specified name is longer > > > > > > than > > > > > > LOCALE_NAME_MAX). > > > > > >=20 > > > > > > I=E2=80=99ll come back with an updated patch in a bit. > > > >=20 > > > > Updated patch: > > > >=20 > > > > =3D=3D=3D=3D Cut here =3D=3D=3D=3D > > > > diff --git a/src/locale/locale_map.c b/src/locale/locale_map.c > > > > index da61f7fc..097da1ad 100644 > > > > --- a/src/locale/locale_map.c > > > > +++ b/src/locale/locale_map.c > > > > @@ -31,7 +31,7 @@ static const char envvars[][12] =3D { > > > > =C2=A0volatile int __locale_lock[1]; > > > > =C2=A0volatile int *const __locale_lockptr =3D __locale_lock; > > > > =C2=A0 > > > > -const struct __locale_map *__get_locale(int cat, const char > > > > *val) > > > > +const struct __locale_map *__get_locale(int cat, const char > > > > *locale) > > > > =C2=A0{ > > > > =C2=A0 static void *volatile loc_head; > > > > =C2=A0 const struct __locale_map *p; > > > > @@ -39,6 +39,7 @@ const struct __locale_map *__get_locale(int > > > > cat, > > > > const char *val) > > > > =C2=A0 const char *path =3D 0, *z; > > > > =C2=A0 char buf[256]; > > > > =C2=A0 size_t l, n; > > > > + const char *val =3D locale; > > > > =C2=A0 > > > > =C2=A0 if (!*val) { > > > > =C2=A0 (val =3D getenv("LC_ALL")) && *val || > > > > @@ -92,22 +93,18 @@ const struct __locale_map *__get_locale(int > > > > cat, const char *val) > > > > =C2=A0 } > > > > =C2=A0 } > > > > =C2=A0 > > > > - /* If no locale definition was found, make a locale map > > > > - * object anyway to store the name, which is kept for the > > > > - * sake of being able to do message translations at the > > > > - * application level. */ > > > > - if (!new && (new =3D malloc(sizeof *new))) { > > > > - new->map =3D __c_dot_utf8.map; > > > > - new->map_size =3D __c_dot_utf8.map_size; > > > > - memcpy(new->name, val, n); > > > > - new->name[n] =3D 0; > > > > - new->next =3D loc_head; > > > > - loc_head =3D new; > > > > - } > > > > + /* If no locale definition was found, and we specified a > > > > + * locale name of "", return the C.UTF-8 locale. */ > > > > + if (!new && !*locale) new =3D (void *)&__c_dot_utf8; > > > > =C2=A0 > > > > =C2=A0 /* For LC_CTYPE, never return a null pointer unless the > > > > =C2=A0 * requested name was "C" or "POSIX". */ > > > > =C2=A0 if (!new && cat =3D=3D LC_CTYPE) new =3D (void *)&__c_dot_ut= f8; > > > > =C2=A0 > > > > + /* Returning NULL means "C locale"; if we get here and > > > > + * there's no locale, return failure instead. */ > > > > + if (!new) > > > > + return LOC_MAP_FAILED; > > > > + > > > > =C2=A0 return new; > > > > =C2=A0} > > > > diff --git a/src/locale/setlocale.c b/src/locale/setlocale.c > > > > index 360c4437..9842d95d 100644 > > > > --- a/src/locale/setlocale.c > > > > +++ b/src/locale/setlocale.c > > > > @@ -28,12 +28,14 @@ char *setlocale(int cat, const char *name) > > > > =C2=A0 const char *p =3D name; > > > > =C2=A0 for (i=3D0; i > > > =C2=A0 const char *z =3D __strchrnul(p, ';'); > > > > - if (z-p <=3D LOCALE_NAME_MAX) { > > > > + if (z-p > LOCALE_NAME_MAX) > > > > + lm =3D LOC_MAP_FAILED; > > > > + else { > > > > =C2=A0 memcpy(part, p, z-p); > > > > =C2=A0 part[z-p] =3D 0; > > > > =C2=A0 if (*z) p =3D z+1; > > > > + lm =3D __get_locale(i, part); > > > > =C2=A0 } > > > > - lm =3D __get_locale(i, part); > > > > =C2=A0 if (lm =3D=3D LOC_MAP_FAILED) { > > > > =C2=A0 UNLOCK(__locale_lock); > > > > =C2=A0 return 0; > > > > =3D=3D=3D=3D Cut here =3D=3D=3D=3D > >=20 > > Sorry to be late chiming in here. There's something I've been > > meaning > > to ask: back when this was first proposed, I recall there being two > > variants we considered: one where setlocale to "" where the env > > vars > > don't resolve to any real locale file produces as its > > implementation-defined result "C.UTF-8", and another where it > > produces > > a ghost locale with the requested name but the behavior of "C.UTF- > > 8". > >=20 > > Is there a reason you think the former is a better choice than the > > latter? The latter would avoid breaking things for users with > > application translations but no libc locale files. However it > > requires > > more complex logic for consistency I think, and I'm not sure we > > ever > > worked out if that could be done in a reasonable way. Thanks for chimming in again Rich. I think we have discussed in this thread, that we want to avoid the second option, because it provides an inconsistent experience to users. As a distro maintainer, my life will be much easier if users that lack translations get affected by this. We have 6 months for this to be included in alpine, and for us to work together with the users to add their language to musl-locales. If we keep some sort of inconsistent behavior (between what we claim we support by returning the name, and what we actually support, which is C), then I'm sure we'll bump into multiple corner-cases, where I'll have a very hard time supporting users, and they will have less motivation to work with us provide translations for their language. > >=20 > > Another option that wasn't raised before but that might be worth > > considering is keeping the existing behavior if MUSL_LOCPATH is not > > set (all names are valid and are aliases for "C.UTF-8" but doing as > > in > > your patch if it's set. Same here, although probably less critical, since installing musl- locales in alpine sets the variable by default. However, if we ever want to have a default path that does not depend on an environment variable (something I'd really like to have), this might get trickier? I've also tested the patch on my GNOME setup with Spanish localization, and everything seems to work as expected. Best, Pablo. > >=20 > > Rich