From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 19450 invoked from network); 8 Dec 2023 23:59:23 -0000 Received: from second.openwall.net (193.110.157.125) by inbox.vuxu.org with ESMTPUTF8; 8 Dec 2023 23:59:23 -0000 Received: (qmail 7384 invoked by uid 550); 8 Dec 2023 23:59:10 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 7344 invoked from network); 8 Dec 2023 23:59:10 -0000 Date: Fri, 8 Dec 2023 18:59:21 -0500 From: Rich Felker To: Alastair Houghton Cc: musl@lists.openwall.com Message-ID: <20231208235920.GE4163@brightrain.aerifal.cx> References: <1390B046-C845-406F-8AED-620F2DD16BC0@apple.com> <20230810155115.GT4163@brightrain.aerifal.cx> <267261EB-1DFA-4072-89F0-B62F5DDE5F09@apple.com> <3DD8D02A-0802-494E-B9E8-F00B457B86F6@apple.com> <25283A51-7FB1-4CB1-9C26-DF06F69922BC@apple.com> <9AF32F3B-1889-4799-9379-EF860BE3E85F@apple.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <9AF32F3B-1889-4799-9379-EF860BE3E85F@apple.com> User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] setlocale() again On Fri, Dec 08, 2023 at 10:46:15AM +0000, Alastair Houghton wrote: > On 5 Dec 2023, at 15:19, Alastair Houghton wrote: > > > >> Maybe I’ve missed a reply somewhere along the lines; here’s a tentative patch that just does the simple thing of making setlocale(LC_ALL, "") pick the C.UTF-8 locale if it’s unable to find the locale specified in the environment. > > > > [snip] > > > > Hah. So, testing that patch, having removed my hacks to avoid using Musl’s locale support, I find it doesn’t actually work (for two reasons; one, NULL doesn’t mean not found, it means “use ‘C’”; and two, there is some very odd code in setlocale.c that causes things to go wrong if the specified name is longer than LOCALE_NAME_MAX). > > > > I’ll come back with an updated patch in a bit. > > Updated patch: > > ==== Cut here ==== > diff --git a/src/locale/locale_map.c b/src/locale/locale_map.c > index da61f7fc..097da1ad 100644 > --- a/src/locale/locale_map.c > +++ b/src/locale/locale_map.c > @@ -31,7 +31,7 @@ static const char envvars[][12] = { > volatile int __locale_lock[1]; > volatile int *const __locale_lockptr = __locale_lock; > > -const struct __locale_map *__get_locale(int cat, const char *val) > +const struct __locale_map *__get_locale(int cat, const char *locale) > { > static void *volatile loc_head; > const struct __locale_map *p; > @@ -39,6 +39,7 @@ const struct __locale_map *__get_locale(int cat, const char *val) > const char *path = 0, *z; > char buf[256]; > size_t l, n; > + const char *val = locale; > > if (!*val) { > (val = getenv("LC_ALL")) && *val || > @@ -92,22 +93,18 @@ const struct __locale_map *__get_locale(int cat, const char *val) > } > } > > - /* If no locale definition was found, make a locale map > - * object anyway to store the name, which is kept for the > - * sake of being able to do message translations at the > - * application level. */ > - if (!new && (new = malloc(sizeof *new))) { > - new->map = __c_dot_utf8.map; > - new->map_size = __c_dot_utf8.map_size; > - memcpy(new->name, val, n); > - new->name[n] = 0; > - new->next = loc_head; > - loc_head = new; > - } > + /* If no locale definition was found, and we specified a > + * locale name of "", return the C.UTF-8 locale. */ > + if (!new && !*locale) new = (void *)&__c_dot_utf8; > > /* For LC_CTYPE, never return a null pointer unless the > * requested name was "C" or "POSIX". */ > if (!new && cat == LC_CTYPE) new = (void *)&__c_dot_utf8; > > + /* Returning NULL means "C locale"; if we get here and > + * there's no locale, return failure instead. */ > + if (!new) > + return LOC_MAP_FAILED; > + > return new; > } > diff --git a/src/locale/setlocale.c b/src/locale/setlocale.c > index 360c4437..9842d95d 100644 > --- a/src/locale/setlocale.c > +++ b/src/locale/setlocale.c > @@ -28,12 +28,14 @@ char *setlocale(int cat, const char *name) > const char *p = name; > for (i=0; i const char *z = __strchrnul(p, ';'); > - if (z-p <= LOCALE_NAME_MAX) { > + if (z-p > LOCALE_NAME_MAX) > + lm = LOC_MAP_FAILED; > + else { > memcpy(part, p, z-p); > part[z-p] = 0; > if (*z) p = z+1; > + lm = __get_locale(i, part); > } > - lm = __get_locale(i, part); > if (lm == LOC_MAP_FAILED) { > UNLOCK(__locale_lock); > return 0; > ==== Cut here ==== Sorry to be late chiming in here. There's something I've been meaning to ask: back when this was first proposed, I recall there being two variants we considered: one where setlocale to "" where the env vars don't resolve to any real locale file produces as its implementation-defined result "C.UTF-8", and another where it produces a ghost locale with the requested name but the behavior of "C.UTF-8". Is there a reason you think the former is a better choice than the latter? The latter would avoid breaking things for users with application translations but no libc locale files. However it requires more complex logic for consistency I think, and I'm not sure we ever worked out if that could be done in a reasonable way. Another option that wasn't raised before but that might be worth considering is keeping the existing behavior if MUSL_LOCPATH is not set (all names are valid and are aliases for "C.UTF-8" but doing as in your patch if it's set. Rich