From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/12574 Path: news.gmane.org!.POSTED!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: setlocale behavior with 'missing' locales Date: Thu, 1 Mar 2018 20:43:49 -0500 Message-ID: <20180302014349.GZ1436@brightrain.aerifal.cx> References: <20171108050338.GL1627@brightrain.aerifal.cx> <20171108052715.GM1627@brightrain.aerifal.cx> <20180301011340.GU1436@brightrain.aerifal.cx> <20180301192545.GV1436@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1519954923 5480 195.159.176.226 (2 Mar 2018 01:42:03 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 2 Mar 2018 01:42:03 +0000 (UTC) User-Agent: Mutt/1.5.21 (2010-09-15) To: musl@lists.openwall.com Original-X-From: musl-return-12590-gllmg-musl=m.gmane.org@lists.openwall.com Fri Mar 02 02:41:59 2018 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1erZhu-0000sW-IY for gllmg-musl@m.gmane.org; Fri, 02 Mar 2018 02:41:58 +0100 Original-Received: (qmail 18133 invoked by uid 550); 2 Mar 2018 01:44:02 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 18108 invoked from network); 2 Mar 2018 01:44:01 -0000 Content-Disposition: inline In-Reply-To: <20180301192545.GV1436@brightrain.aerifal.cx> Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:12574 Archived-At: On Thu, Mar 01, 2018 at 02:25:45PM -0500, Rich Felker wrote: > On Thu, Mar 01, 2018 at 01:10:47PM -0600, William Pitcock wrote: > > >> One notable issue is that, right now, we rely on being able to set > > >> LC_MESSAGES to an arbitrary name even if there's no libc locale > > >> definition for it; this is because gettext() relies on the name of the > > >> current LC_MESSAGES locale to find (application-specific) translation > > >> files that might exist even without a libc translation. I'm not sure > > >> how we would best keep this working under changes similar to the > > >> above. > > > > > > Any further thoughts on this? I'd like to begin addressing these > > > issues in this release cycle. > > > > > > I think the above plan works (is conforming, doesn't break things) > > > except for the LC_MESSAGES issue mentioned at the end. I don't have > > > any good ideas still for dealing with that. Really since gettext can > > > be used with any category, not just LC_MESSAGES (although LC_MESSAGES > > > is the normal choice), it applies to all categories. Maybe we could > > > still use the ("nonexistant") requested locale name in this case, or > > > some derivative of it that clarifies that it's synthesized...? > > > > +1 to using this approach. > > > > We could use a locale name such as "en_US@virtual.UTF-8". > > > > glibc uses this style of locale name for locales such as UK english > > with eurozone LC_CURRENCY: en_UK@euro.UTF-8. > > I was actually just in the process of trying to work out something > very similar. Here's how I think it might work: > > setlocale(cat, "") -- always succeeds, produces ll_TT@virtual (or > ll_TT@missing was my idea) if a locale file by the matching name is > not found. > > setlocale(cat, "ll_TT@virtual") (or whatever name) - always succeeds. > > setlocale(cat, "ll_TT[@other]") - succeeds only if a file matching the > name is found. > > One thing I don't entirely like is repurposing the @ modifier for > this; it conflicts with (and perhaps fails to preserve) an existing > modifier if there is one, and affects how search for gettext > translation files would happen (searching extra @virtual paths). > Perhaps we should instead make it a separate component delimited in > some other way so it can always be dropped by gettext. On this topic, I did some research on GNU gettext, and just like musl's it ignores the codeset part of the locale name ll[_TT][.codeset][@modifier] while trying combinations of including or omitting _TT and @modifier. So it looks like the only way to make a synthesized locale name that can match all the same translation files as the original name, under either musl or GNU gettext, is by misappropriating the codeset field as the indicator that it's a synthesized locale. That doesn't sound particularly good. If we're only concerned about musl gettext and not GNU gettext or other third-party software trying to parse the resulting synthesized locale names, we can simply adopt any notation we like and have musl's gettext ignore it. Also in the case where the original requested locale had no @modifier component, adding a special @synth/@missing/whatever modifier would not disturb search for translations with either musl or GNU gettext. At worst GNU gettext would search a few extra nonexistant pathnames. One other thing to note is that synthesizing locales without adjusting the name to indicate that they're synthesized does not seem consistent if setlocale is going to reject unknown explicit names. The name that the program reads back from setlocale(cat,0) or NL_LOCALE_NAME would then fail to be valid for subsequent use as an explicit name. One possible alternative to synthesizing names would be just reading back the name of the locale that was actually set ("C.UTF-8" or some fallback like "en" when "en_US" was requested but only "en" was available). In this case GNU gettext or any third-party code would be unable to honor the requested locale. musl's internal gettext could, but I'm not sure this kind of hidden state would be desirable or consistent, so I'd be a bit hesitant to do it. An alternative would be just giving up on the ability to get message translations in a language for which you don't have a locale installed. This would sound a lot more acceptable if we actually had locale definition files, I think.... Rich