mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@libc.org>
To: Alastair Houghton <ahoughton@apple.com>
Cc: musl@lists.openwall.com
Subject: Re: [musl] setlocale() again
Date: Thu, 10 Aug 2023 11:51:15 -0400	[thread overview]
Message-ID: <20230810155115.GT4163@brightrain.aerifal.cx> (raw)
In-Reply-To: <1390B046-C845-406F-8AED-620F2DD16BC0@apple.com>

On Thu, Aug 10, 2023 at 04:41:38PM +0100, Alastair Houghton wrote:
> Hi again,
> 
> I spent some time today looking at the setlocale() problem and
> thought I’d put some notes down in an email.
> 
> 1. Musl wishes to support UTF-8 “out of the box”.
> 
> 2. At the same time, it needs to be 8-bit-safe, so the default
> locale, C, is NOT UTF-8.
> 
> 3. POSIX, and the C standard, specify that setlocale() should fail
> if the locale name isn’t a valid locale, but don’t really say what
> that means precisely. A program that wants UTF-8 support and that
> does `setlocale(LC_ALL, “”)` can therefore find itself in the C
> locale if the one specified in the environment happens to be
> invalid.
> 
> 4. This seemed undesirable, so setlocale() presently accepts any
> locale name as valid; if it doesn’t have a definition file for a
> locale, it will copy the C.UTF-8 locale, giving it the name passed
> in and return that. This avoids the problem in (3), and also means
> that gettext() will work for any language without installing locale
> data for Musl. Unfortunately it also means that there is no way for
> a program (notably a test suite) to determine the presence of data
> for a locale, because setlocale() will always succeed, even if we
> don’t have the data.
> 
> 5. Back in 2017 (https://www.openwall.com/lists/musl/2017/11/08/2)
> Rich was proposing to change things so that `setlocale(cat, “”)`
> always succeeds, but if the environment specifies an unknown locale,
> treats it as C.UTF-8, while `setlocale(cat, explicit_name)` will
> fail unless a valid definition file is installed for that locale
> name. This would also avoid the problem in (3), although it will
> mean that gettext() will not work unless a valid locale definition
> is installed for the C library (BTW, this is exactly the situation
> Glibc is in here; if Glibc doesn’t have locale data, it will fail
> setlocale() and then gettext() will find itself in the C locale). On
> the other hand, it does mean that programs can detect whether or not
> a given locale is present.
> 
> Why do I care? Because I’m trying to make libc++ work with Musl and
> right now it has failing tests because it expects (not entirely
> unreasonably) that if e.g. `setlocale(LC_ALL, “fr_FR”)` succeeds,
> then the C library will localise things into French. While I can
> test for the unusual behaviour of Musl detailed in (4), the libc++
> maintainer understandably doesn’t like it and we would both far
> rather Musl were fixed to behave similarly to other implementations.
> 
> It seems to me that Rich’s proposal (5) was sensible. Programs that
> use gettext(), and users relying on it for localization, must
> already cope with the fact that the C library must have locale data
> for their chosen locale in order for gettext() to work; that is how
> things work on Glibc. It so happened that (4) meant that such
> programs would work with partial localization on Musl without there
> being any locale data installed for Musl, but that isn’t really
> right (e.g. you might get a mix of localized strings from gettext()
> but with numeric formatting that didn’t match - for French, for
> instance, numbers would have “.”s instead of “,”s as a decimal
> separator).
> 
> Looking at the 2017 thread, it appears it didn’t go anywhere for
> whatever reason, so I’d like to understand the status of the
> proposed change. Was it nixed for some reason? Is it likely to
> happen in the future? If it’s a matter of resource, if I were to
> raise a patch for it, would it be accepted, in principle?

Thank you for following up on this! The main reason it didn't go
anywhere was lack of feedback/engagement from anyone who cares about
locale behavior. I want whatever steps we take to be informed by what
folks actually need, not just my guesses at that. So in that sense,
your bumping of the issue is helpful in itself!

At this point, it's been quite a while since I looked at the
mechanisms. If you'd like to help move this forward, rather than
starting with a patch, writing a high-level natural language
description of how you'd make the changes (in terms of musl's current
internal representation for locale state) would be the most helpful.
If I'm forgetting and there's already such a good description, just
digging it up and citing it might be fine.

Rich

  reply	other threads:[~2023-08-10 15:51 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-10 15:41 Alastair Houghton
2023-08-10 15:51 ` Rich Felker [this message]
2023-09-05 12:57   ` Alastair Houghton
2023-09-18 14:18     ` Alastair Houghton
2023-10-27 20:15       ` Pablo Correa Gomez
2023-11-28 16:27         ` Alastair Houghton
2023-11-28 23:15           ` Pablo Correa Gomez
2023-11-28 17:32       ` Alastair Houghton
2023-11-28 23:21         ` Pablo Correa Gomez
2023-12-05 15:19         ` Alastair Houghton
2023-12-08 10:46           ` Alastair Houghton
2023-12-08 23:59             ` Rich Felker
2023-12-09 18:44               ` Pablo Correa Gomez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230810155115.GT4163@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=ahoughton@apple.com \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).