mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Subject: Re: setlocale behavior with 'missing' locales
Date: Thu, 1 Mar 2018 15:45:08 -0500	[thread overview]
Message-ID: <20180301204508.GW1436@brightrain.aerifal.cx> (raw)
In-Reply-To: <20180301192545.GV1436@brightrain.aerifal.cx>

On Thu, Mar 01, 2018 at 02:25:45PM -0500, Rich Felker wrote:
> On Thu, Mar 01, 2018 at 01:10:47PM -0600, William Pitcock wrote:
> > >> One notable issue is that, right now, we rely on being able to set
> > >> LC_MESSAGES to an arbitrary name even if there's no libc locale
> > >> definition for it; this is because gettext() relies on the name of the
> > >> current LC_MESSAGES locale to find (application-specific) translation
> > >> files that might exist even without a libc translation. I'm not sure
> > >> how we would best keep this working under changes similar to the
> > >> above.
> > >
> > > Any further thoughts on this? I'd like to begin addressing these
> > > issues in this release cycle.
> > >
> > > I think the above plan works (is conforming, doesn't break things)
> > > except for the LC_MESSAGES issue mentioned at the end. I don't have
> > > any good ideas still for dealing with that. Really since gettext can
> > > be used with any category, not just LC_MESSAGES (although LC_MESSAGES
> > > is the normal choice), it applies to all categories. Maybe we could
> > > still use the ("nonexistant") requested locale name in this case, or
> > > some derivative of it that clarifies that it's synthesized...?
> > 
> > +1 to using this approach.
> > 
> > We could use a locale name such as "en_US@virtual.UTF-8".
> > 
> > glibc uses this style of locale name for locales such as UK english
> > with eurozone LC_CURRENCY: en_UK@euro.UTF-8.
> 
> I was actually just in the process of trying to work out something
> very similar. Here's how I think it might work:
> 
> setlocale(cat, "") -- always succeeds, produces ll_TT@virtual (or
> ll_TT@missing was my idea) if a locale file by the matching name is
> not found.
> 
> setlocale(cat, "ll_TT@virtual") (or whatever name) - always succeeds.
> 
> setlocale(cat, "ll_TT[@other]") - succeeds only if a file matching the
> name is found.
> 
> One thing I don't entirely like is repurposing the @ modifier for
> this; it conflicts with (and perhaps fails to preserve) an existing
> modifier if there is one, and affects how search for gettext
> translation files would happen (searching extra @virtual paths).
> Perhaps we should instead make it a separate component delimited in
> some other way so it can always be dropped by gettext.

Implementation notes if we do this:

__get_locale is the internal backend that loads locale maps, and looks
like the point at which this all should be implemented.

Presently __get_locale has no means to return an error; a null return
value indicates the C locale, which is represented everywhere by the
lack of any locale map.

It seems __get_locale has all the information it needs to decide how
to act:

- If the argument is "", missing/virtual locale synthesis should
  happen. If allocation failures etc. prevent synthesis, it should
  behave as if the argument had been "C.UTF-8".

- If the argument is one of the builtin locales (C/C.UTF-8/POSIX) it
  can return one of the builtin maps. Right now it oddly replaces
  "C.UTF-8" with just plain "C" (null return value) in all categories
  except LC_CTYPE. This behavior might should be revisited but
  newlocale.c and perhaps other places encode assumptions that it's
  done this way.

- If the argument is another name that can't be found, an error should
  be returned to the caller somehow. We could perhaps use MAP_FAILED.
  The alternative seems to be reworking the contract so that null
  doesn't mean C and either using a real locale_map object for the C
  locale or translating to null in the caller, but these choices seem
  to impose worse costs/effects elsewhere.

None of the above covers anything about _how_ the synthesis of names
for missing locales should happen, just where/when it should happen.

Rich


  reply	other threads:[~2018-03-01 20:45 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-08  5:03 Rich Felker
2017-11-08  5:27 ` Rich Felker
2017-11-12 22:19   ` A. Wilcox
2017-11-13  0:15     ` Rich Felker
2018-02-12  6:02       ` A. Wilcox
2018-02-12 20:04         ` Rich Felker
2018-03-01  1:13   ` Rich Felker
2018-03-01 19:10     ` William Pitcock
2018-03-01 19:25       ` Rich Felker
2018-03-01 20:45         ` Rich Felker [this message]
2018-03-02  1:43         ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180301204508.GW1436@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).