From: Rich Felker <dalias@libc.org>
To: Gavin Smith <gavinsmith0123@gmail.com>
Cc: musl@lists.openwall.com
Subject: Re: [musl] gettext LC_MESSAGES differences from other libc
Date: Sat, 11 Jan 2025 23:51:05 -0500 [thread overview]
Message-ID: <20250112045105.GI10433@brightrain.aerifal.cx> (raw)
In-Reply-To: <Z4K0xWcQ6tP30CZc@beigestar>
On Sat, Jan 11, 2025 at 06:13:25PM +0000, Gavin Smith wrote:
> (Please CC me in any replies as I am not subscribed to the list.)
>
> As you know, the gettext function in musl does not behave exactly like
> the function in glibc and some other libc implementations. Specifically,
> it does not obey the LANGUAGE variable which can be used to specify that
> translated strings should be in a certain language.
>
> In 2014, you discussed the rationale for not supporting LANGUAGE. There
> were issues with threads and caching:
>
> Rich Felker, Thu, 31 Jul 2014, "How should $LANGUAGE work in our gettext?"
> https://www.openwall.com/lists/musl/2014/07/31/2
>
> Recently in the Texinfo project, we found this incompatibility with musl
> for translations of strings to be placed in output files. The gettext
> API (neither musl or glibc/other) is not a perfect match for Texinfo
> needs as much assumes that the target language is that of the user, of
> the person sitting in front of the computer, whereas the appropriate
> translation language is that of the input document. For example, somebody
> could be generating documentation in Italian to be posted to a website,
> while they don't speak Italian themselves and do not have an Italian
> locale installed.
This sounds like locale is not the right tool for processing it.
> The only way we can support this with glibc is to set LC_MESSAGES and/or
> LC_ALL to a locale that is not "C" or "POSIX", and then to set the LANGUAGE
> variable for the actual target language. This is a nuisance, as sometimes
> it is a struggle to actually find such a locale. The assumption when this
> API was designed was that a user with only a "C" locale does not need
> translations, but this is false when they are generating them for somebody
> else. libc appears to offer no way just to open an arbitrary .mo file (the
> file with the translated strings in it) to get the translations, forcing
> you to go through the locale system.
If you just want to process .mo files without going thru the locale
system, the necessary code is about 42 source lines/329 machine code
bytes that's MIT-licensed in musl that you're free to copy. This
probably makes the most sense.
> musl supports setting LC_MESSAGES to an arbitrary value that is not
> a locale, so can access arbitrary translation files in a different way.
> However, we didn't think it was worth having a special case in the code
> just for musl:
> https://lists.gnu.org/archive/html/bug-texinfo/2024-12/msg00035.html
>
> You also discussed this changing how LC_MESSAGES worked in a post in
> 2017, but as far as I am aware nothing came of it:
>
> Rich Felker, Wed, 8 Nov 2017, "Re: setlocale behavior with 'missing' locales"
>
> One notable issue is that, right now, we rely on being able to set
> LC_MESSAGES to an arbitrary name even if there's no libc locale
> definition for it; this is because gettext() relies on the name of the
> current LC_MESSAGES locale to find (application-specific) translation
> files that might exist even without a libc translation. I'm not sure
> how we would best keep this working under changes similar to the
> above.
> https://www.openwall.com/lists/musl/2017/11/08/2
There's currently a proposal to partly remove this behaior, because it
prevents applictions from being able to detect if there's actually a
meaningful locale installed for a specific locale name. The specifics
have not been worked out, and this is an area I'd really like input
from affected parties on.
The hard constraint from my perspective is that setlocale("",x) can't
be allowed to fail (user stuck with no Unicode because of unsupported
locale name in environment), but both the current behavior of making a
virtual locale by the requested name, and replacing the name by
C.UTF-8 in this case, are options. It's plausible that only
LC_MESSAGES could keep the current behavior if this turns out to be
the most helpful.
Depending on how LC_MESSAGES is to be handled, it's plausible that we
could integrate support for LANGUAGE at the same time, maybe having
the synthesized locale for "" also storing/encoding the value of
LANGUAGE, or some other mechanism to achieve the same thing. But I'm
not sure it's a good idea. There are many reasons already discussed
why the LANGUAGE model is broken, and I'm not sure we can fix it in a
way that's consistent with user expectations.
I'll probably open a new thread on this specific topic soon.
But I suspect your problem is best solved by not using locale for
non-user-language data processing.
Rich
next prev parent reply other threads:[~2025-01-12 4:51 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-11 18:13 Gavin Smith
2025-01-12 2:08 ` Thorsten Glaser
2025-01-12 4:51 ` Rich Felker [this message]
2025-01-21 20:43 ` Gavin Smith
2025-01-28 11:26 ` Patrice Dumas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250112045105.GI10433@brightrain.aerifal.cx \
--to=dalias@libc.org \
--cc=gavinsmith0123@gmail.com \
--cc=musl@lists.openwall.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).