mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Subject: Re: setlocale behavior with 'missing' locales
Date: Wed, 28 Feb 2018 20:13:40 -0500	[thread overview]
Message-ID: <20180301011340.GU1436@brightrain.aerifal.cx> (raw)
In-Reply-To: <20171108052715.GM1627@brightrain.aerifal.cx>

On Wed, Nov 08, 2017 at 12:27:15AM -0500, Rich Felker wrote:
> On Wed, Nov 08, 2017 at 12:03:38AM -0500, Rich Felker wrote:
> > Unfortunately this turns out to have been something of a tradeoff,
> > since there's no way for applications (and, as it turns out,
> > especially tests/test suites) to query whether a particular locale is
> > "really" available. I've been asked to change the behavior to fail on
> > unknown locale names, but of course that's not a working option in
> > light of the above.
> > 
> > I think there may be a solution that makes everyone happy, but I'm not
> > sure yet. I'm going to follow up with a description and analysis of
> > whether it's valid/conforming.
> 
> So here's the possible solution. ISO C leaves the default locale when
> setlocale(cat,"") is called implementation-defined. POSIX however
> defines it in terms of the LANG and LC_* environment variables. See
> the CX text in:
> 
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/setlocale.html
> 
>   "Setting all of the categories of the global locale is similar to
>   successively setting each individual category of the global locale,
>   except that all error checking is done before any actions are
>   performed. To set all the categories of the global locale,
>   setlocale() can be invoked as:
> 
>   setlocale(LC_ALL, "");
> 
>   In this case, setlocale() shall first verify that the values of all
>   the environment variables it needs according to the precedence rules
>   (described in XBD Environment Variables) indicate supported locales.
>   If the value of any of these environment variable searches yields a
>   locale that is not supported (and non-null), setlocale() shall
>   return a null pointer and the global locale shall not be changed. If
>   all environment variables name supported locales, setlocale() shall
>   proceed as if it had been called for each category, using the
>   appropriate value from the associated environment variable or from
>   the implementation-defined default if there is no such value."
> 
> and the Environment Variables text in XBD 8.2:
> 
> http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_02
> 
> The former seems to tie our hands: unless the locales determined by
> the environment variables all exist, setlocale is required to fail and
> leave us in the (unacceptable) "C" locale where UTF-8 doesn't work.
> However the latter seems to offer us a way out. After describing how
> the precedence of the variables work, how locale pathnames work if
> localedef is supported (musl doesn't support it), and how
> implementation-provided/defined locale names work, it specifies:
> 
>   "If the locale value is not recognized by the implementation, the
>   behavior is unspecified."
> 
> My optimistic reading of this is that, in the event the locale name
> provided does not correspond to something we recognize, we're free to
> define how it's interpreted, and always interpret it as C.UTF-8.
> 
> What this would achieve is the following:
> 
> 1. setlocale(cat, explicit_locale_name) - succeeds if the locale
>    actually has a definition file, fails and returns a null pointer
>    otherwise.
> 
> 2. setlocale(cat, "") - always succeeds, honoring the environment
>    variable for the category if a locale definition file by that name
>    exists, but otherwise (the unspecified behavior) treating it as if
>    it were C.UTF-8.
> 
> This way, applications that probe for specific locale names can do so
> and determine if they exist, but applications that just want to use
> the default locale the user configured will still avoid catastrophic
> breakage (failure to support UTF-8) even if they encounter "bad" LC_*
> variables.
> 
> Does this approach sound acceptable? I'm fairly content with
> interpreting it as conforming to the standard; I'm mainly concerned
> about whether there might be unforseen breakage.
> 
> One notable issue is that, right now, we rely on being able to set
> LC_MESSAGES to an arbitrary name even if there's no libc locale
> definition for it; this is because gettext() relies on the name of the
> current LC_MESSAGES locale to find (application-specific) translation
> files that might exist even without a libc translation. I'm not sure
> how we would best keep this working under changes similar to the
> above.

Any further thoughts on this? I'd like to begin addressing these
issues in this release cycle.

I think the above plan works (is conforming, doesn't break things)
except for the LC_MESSAGES issue mentioned at the end. I don't have
any good ideas still for dealing with that. Really since gettext can
be used with any category, not just LC_MESSAGES (although LC_MESSAGES
is the normal choice), it applies to all categories. Maybe we could
still use the ("nonexistant") requested locale name in this case, or
some derivative of it that clarifies that it's synthesized...?

Rich


  parent reply	other threads:[~2018-03-01  1:13 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-08  5:03 Rich Felker
2017-11-08  5:27 ` Rich Felker
2017-11-12 22:19   ` A. Wilcox
2017-11-13  0:15     ` Rich Felker
2018-02-12  6:02       ` A. Wilcox
2018-02-12 20:04         ` Rich Felker
2018-03-01  1:13   ` Rich Felker [this message]
2018-03-01 19:10     ` William Pitcock
2018-03-01 19:25       ` Rich Felker
2018-03-01 20:45         ` Rich Felker
2018-03-02  1:43         ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180301011340.GU1436@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).