mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@libc.org>
To: Alastair Houghton <ahoughton@apple.com>
Cc: musl@lists.openwall.com
Subject: Re: [musl] setlocale() again
Date: Fri, 8 Dec 2023 18:59:21 -0500	[thread overview]
Message-ID: <20231208235920.GE4163@brightrain.aerifal.cx> (raw)
In-Reply-To: <9AF32F3B-1889-4799-9379-EF860BE3E85F@apple.com>

On Fri, Dec 08, 2023 at 10:46:15AM +0000, Alastair Houghton wrote:
> On 5 Dec 2023, at 15:19, Alastair Houghton <ahoughton@apple.com> wrote:
> > 
> >> Maybe I’ve missed a reply somewhere along the lines; here’s a tentative patch that just does the simple thing of making setlocale(LC_ALL, "") pick the C.UTF-8 locale if it’s unable to find the locale specified in the environment.
> > 
> > [snip]
> > 
> > Hah. So, testing that patch, having removed my hacks to avoid using Musl’s locale support, I find it doesn’t actually work (for two reasons; one, NULL doesn’t mean not found, it means “use ‘C’”; and two, there is some very odd code in setlocale.c that causes things to go wrong if the specified name is longer than LOCALE_NAME_MAX).
> > 
> > I’ll come back with an updated patch in a bit.
> 
> Updated patch:
> 
> ==== Cut here ====
> diff --git a/src/locale/locale_map.c b/src/locale/locale_map.c
> index da61f7fc..097da1ad 100644
> --- a/src/locale/locale_map.c
> +++ b/src/locale/locale_map.c
> @@ -31,7 +31,7 @@ static const char envvars[][12] = {
>  volatile int __locale_lock[1];
>  volatile int *const __locale_lockptr = __locale_lock;
>  
> -const struct __locale_map *__get_locale(int cat, const char *val)
> +const struct __locale_map *__get_locale(int cat, const char *locale)
>  {
>  	static void *volatile loc_head;
>  	const struct __locale_map *p;
> @@ -39,6 +39,7 @@ const struct __locale_map *__get_locale(int cat, const char *val)
>  	const char *path = 0, *z;
>  	char buf[256];
>  	size_t l, n;
> +	const char *val = locale;
>  
>  	if (!*val) {
>  		(val = getenv("LC_ALL")) && *val ||
> @@ -92,22 +93,18 @@ const struct __locale_map *__get_locale(int cat, const char *val)
>  		}
>  	}
>  
> -	/* If no locale definition was found, make a locale map
> -	 * object anyway to store the name, which is kept for the
> -	 * sake of being able to do message translations at the
> -	 * application level. */
> -	if (!new && (new = malloc(sizeof *new))) {
> -		new->map = __c_dot_utf8.map;
> -		new->map_size = __c_dot_utf8.map_size;
> -		memcpy(new->name, val, n);
> -		new->name[n] = 0;
> -		new->next = loc_head;
> -		loc_head = new;
> -	}
> +	/* If no locale definition was found, and we specified a
> +	 * locale name of "", return the C.UTF-8 locale. */
> +	if (!new && !*locale) new = (void *)&__c_dot_utf8;
>  
>  	/* For LC_CTYPE, never return a null pointer unless the
>  	 * requested name was "C" or "POSIX". */
>  	if (!new && cat == LC_CTYPE) new = (void *)&__c_dot_utf8;
>  
> +	/* Returning NULL means "C locale"; if we get here and
> +	 * there's no locale, return failure instead. */
> +	if (!new)
> +		return LOC_MAP_FAILED;
> +
>  	return new;
>  }
> diff --git a/src/locale/setlocale.c b/src/locale/setlocale.c
> index 360c4437..9842d95d 100644
> --- a/src/locale/setlocale.c
> +++ b/src/locale/setlocale.c
> @@ -28,12 +28,14 @@ char *setlocale(int cat, const char *name)
>  			const char *p = name;
>  			for (i=0; i<LC_ALL; i++) {
>  				const char *z = __strchrnul(p, ';');
> -				if (z-p <= LOCALE_NAME_MAX) {
> +				if (z-p > LOCALE_NAME_MAX)
> +					lm = LOC_MAP_FAILED;
> +				else {
>  					memcpy(part, p, z-p);
>  					part[z-p] = 0;
>  					if (*z) p = z+1;
> +					lm = __get_locale(i, part);
>  				}
> -				lm = __get_locale(i, part);
>  				if (lm == LOC_MAP_FAILED) {
>  					UNLOCK(__locale_lock);
>  					return 0;
> ==== Cut here ====

Sorry to be late chiming in here. There's something I've been meaning
to ask: back when this was first proposed, I recall there being two
variants we considered: one where setlocale to "" where the env vars
don't resolve to any real locale file produces as its
implementation-defined result "C.UTF-8", and another where it produces
a ghost locale with the requested name but the behavior of "C.UTF-8".

Is there a reason you think the former is a better choice than the
latter? The latter would avoid breaking things for users with
application translations but no libc locale files. However it requires
more complex logic for consistency I think, and I'm not sure we ever
worked out if that could be done in a reasonable way.

Another option that wasn't raised before but that might be worth
considering is keeping the existing behavior if MUSL_LOCPATH is not
set (all names are valid and are aliases for "C.UTF-8" but doing as in
your patch if it's set.

Rich

  reply	other threads:[~2023-12-08 23:59 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-10 15:41 Alastair Houghton
2023-08-10 15:51 ` Rich Felker
2023-09-05 12:57   ` Alastair Houghton
2023-09-18 14:18     ` Alastair Houghton
2023-10-27 20:15       ` Pablo Correa Gomez
2023-11-28 16:27         ` Alastair Houghton
2023-11-28 23:15           ` Pablo Correa Gomez
2023-11-28 17:32       ` Alastair Houghton
2023-11-28 23:21         ` Pablo Correa Gomez
2023-12-05 15:19         ` Alastair Houghton
2023-12-08 10:46           ` Alastair Houghton
2023-12-08 23:59             ` Rich Felker [this message]
2023-12-09 18:44               ` Pablo Correa Gomez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231208235920.GE4163@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=ahoughton@apple.com \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).