mailing list of musl libc
 help / color / mirror / code / Atom feed
* setlocale behavior with 'missing' locales
@ 2017-11-08  5:03 Rich Felker
  2017-11-08  5:27 ` Rich Felker
  0 siblings, 1 reply; 11+ messages in thread
From: Rich Felker @ 2017-11-08  5:03 UTC (permalink / raw)
  To: musl

One of the primary concerns when the byte-based C locale was added(*)
was not to introduce regressions in the property that musl is "always
UTF-8" except when the user or application has explicitly requested a
byte-based ("C"/"POSIX") locale.

First, some background: In order for the standard libc interfaces to
honor character encoding, a portable program has always needed to call
setlocale(LC_CTYPE, "") or setlocale(LC_ALL, ""). Addition of the
byte-based C locale "disabled UTF-8" in any application which wasn't
calling setlocale, but that was deemed acceptable since such
applications were not portable and would not work on other systems
anyway.

The other important cases to consider were failure of setlocale. Prior
to the addition of the byte-based C locale, setlocale was essentially
a no-op, and from a practical standpoint it didn't matter if it
succeeded or failed because the preexisting "C" locale at program
entry already provided UTF-8. But afterwards, if setlocale failed for
some reason, applications that were trying to do the right thing would
suffer regression.

We ruled out spurious failure for resource exhaustion reasons by
making a statically allocated C.UTF-8 locale object. But the other
possible source of failure would have been having LC_* variables in
the environment (perhaps as a result of ssh'ing from another system or
running a musl-linked binary on a glibc-based system) with no
corresponding locale files for musl. If we treated that as an error,
UTF-8 would have suddenly broken in all sorts of real-world
situtations, and one of the core original design goals/values of musl
would have been broken.

The choice I made at the time to avoid this was to declare that all
locale names are valid locales, and if there's no actual file defining
the locale, it's simply a clone of C.UTF-8. So for example if you run
with LC_ALL=fr_FR but no fr_FR translation file, you get a locale
named fr_FR (that's what setlocale reports as the active locale) but
with no translated messages/dates/etc., just UTF-8 character encoding
(so you're still able to access all characters properly and use
localized or multilingual data).

Unfortunately this turns out to have been something of a tradeoff,
since there's no way for applications (and, as it turns out,
especially tests/test suites) to query whether a particular locale is
"really" available. I've been asked to change the behavior to fail on
unknown locale names, but of course that's not a working option in
light of the above.

I think there may be a solution that makes everyone happy, but I'm not
sure yet. I'm going to follow up with a description and analysis of
whether it's valid/conforming.

Rich






(*) References on byte-based C locale:

Subject: [musl] Possible bytelocale patch
Message-ID: <20140703071318.GA10117@brightrain.aerifal.cx>

Subject: [musl] Revisiting byte-based C locale
Message-ID: <20150522022203.GA26651@brightrain.aerifal.cx>

Subject: [musl] [PATCH] Byte-based C locale, draft 1
Message-ID: <20150606214007.GA17398@brightrain.aerifal.cx>

commit 1507ebf837334e9e07cfab1ca1c2e88449069a80
byte-based C locale, phase 1: multibyte character handling functions

commit 16f18d036d9a7bf590ee6eb86785c0a9658220b6
byte-based C locale, phase 2: stdio and iconv (multibyte callers)



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-03-02  1:43 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-08  5:03 setlocale behavior with 'missing' locales Rich Felker
2017-11-08  5:27 ` Rich Felker
2017-11-12 22:19   ` A. Wilcox
2017-11-13  0:15     ` Rich Felker
2018-02-12  6:02       ` A. Wilcox
2018-02-12 20:04         ` Rich Felker
2018-03-01  1:13   ` Rich Felker
2018-03-01 19:10     ` William Pitcock
2018-03-01 19:25       ` Rich Felker
2018-03-01 20:45         ` Rich Felker
2018-03-02  1:43         ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).