From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/12567 Path: news.gmane.org!.POSTED!not-for-mail From: William Pitcock Newsgroups: gmane.linux.lib.musl.general Subject: Re: setlocale behavior with 'missing' locales Date: Thu, 1 Mar 2018 13:10:47 -0600 Message-ID: References: <20171108050338.GL1627@brightrain.aerifal.cx> <20171108052715.GM1627@brightrain.aerifal.cx> <20180301011340.GU1436@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" X-Trace: blaine.gmane.org 1519931346 21952 195.159.176.226 (1 Mar 2018 19:09:06 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 1 Mar 2018 19:09:06 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-12583-gllmg-musl=m.gmane.org@lists.openwall.com Thu Mar 01 20:09:02 2018 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1erTZc-0005CX-Ui for gllmg-musl@m.gmane.org; Thu, 01 Mar 2018 20:09:01 +0100 Original-Received: (qmail 27911 invoked by uid 550); 1 Mar 2018 19:11:03 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 27882 invoked from network); 1 Mar 2018 19:11:02 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dereferenced-org.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=8Gkcs9bELf/UEEfKe1nIyMmNp1JLj5I79GWuxS4i2nE=; b=d1oDDRjMkaTrvvwpAdLzsO08iqR+t1FE5fcdtm+bDWZcv4X9lHuZ2jP1m7XpXuqoVp yLllC2b6PZed1x+UZWfHZUdnbV6+323Nq+AcEAR4lduB/CJHuwYqeipcdvYXjs4Okycq QeW4FneBgnpDvhlzNVlMtHvtVMRrrG+1PW4XjIRCJym8HJd2jPqDHBof5kzF+HokZkrT uL9WSVMPGrytPa/XQlbFEFEfcG29QUA1PhNmM3IOT17ZeO8tg1LzMQyFlUqOoPf1JUo7 B52r8k7Ah+bCb/fo3ejz2mXR13wgQxlbiudVXxx3+BYCuXPeVtgl6CEEFH0Hn0Hx9POj nWTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=8Gkcs9bELf/UEEfKe1nIyMmNp1JLj5I79GWuxS4i2nE=; b=Q57vnVZYg8+RKMUSRVtL34Si52AM7Ko2RWzTx43fXOPJKxahF4UOOPGM9cuXc9ryxq 575IkW0Ihg7Oc/cR8bGYoV0m4pKF9mldSlWx5xLo3qFMzN+w+7VnqQZgsIzA/yoVlNCE f9ON3DfVNiYPy2H0hxSvu1I8ojaCpywLXG950APir7ObnD3CxdQe5YjADzHgzOdVzmxf JX8YTPK530aB2AxHWwQqEDEGeftAJADj99fVQYs+hwMe6eVG8eieVR2qPzXQhEhZmR1U eEQIGJE80bRyBgt+no0ztnSfvQUQej6TdCDF/EhB9kUlR0XKemKDH8AalO1WCtwy5dDf RlZQ== X-Gm-Message-State: AElRT7HpeV+idb+AqPftJh9YITFno1BkhG8s5x+gMBcUR3mcLxMdriU5 lpwi7sjwcAC9ipj3Nhloaq+9AW8LSEZU69s51M8oYw== X-Google-Smtp-Source: AG47ELvBiBI5IMNTQ2cJYIoMrEu8St/Yi7cWVYp2aA6PMo6/poKHMvCcTYFDugfZx+wl28F30MkSvTcRUxoYOpZcpCI= X-Received: by 10.55.154.13 with SMTP id c13mr4174838qke.347.1519931448522; Thu, 01 Mar 2018 11:10:48 -0800 (PST) In-Reply-To: <20180301011340.GU1436@brightrain.aerifal.cx> Xref: news.gmane.org gmane.linux.lib.musl.general:12567 Archived-At: Hi, On Wed, Feb 28, 2018 at 7:13 PM, Rich Felker wrote: > On Wed, Nov 08, 2017 at 12:27:15AM -0500, Rich Felker wrote: >> On Wed, Nov 08, 2017 at 12:03:38AM -0500, Rich Felker wrote: >> > Unfortunately this turns out to have been something of a tradeoff, >> > since there's no way for applications (and, as it turns out, >> > especially tests/test suites) to query whether a particular locale is >> > "really" available. I've been asked to change the behavior to fail on >> > unknown locale names, but of course that's not a working option in >> > light of the above. >> > >> > I think there may be a solution that makes everyone happy, but I'm not >> > sure yet. I'm going to follow up with a description and analysis of >> > whether it's valid/conforming. >> >> So here's the possible solution. ISO C leaves the default locale when >> setlocale(cat,"") is called implementation-defined. POSIX however >> defines it in terms of the LANG and LC_* environment variables. See >> the CX text in: >> >> http://pubs.opengroup.org/onlinepubs/9699919799/functions/setlocale.html >> >> "Setting all of the categories of the global locale is similar to >> successively setting each individual category of the global locale, >> except that all error checking is done before any actions are >> performed. To set all the categories of the global locale, >> setlocale() can be invoked as: >> >> setlocale(LC_ALL, ""); >> >> In this case, setlocale() shall first verify that the values of all >> the environment variables it needs according to the precedence rules >> (described in XBD Environment Variables) indicate supported locales. >> If the value of any of these environment variable searches yields a >> locale that is not supported (and non-null), setlocale() shall >> return a null pointer and the global locale shall not be changed. If >> all environment variables name supported locales, setlocale() shall >> proceed as if it had been called for each category, using the >> appropriate value from the associated environment variable or from >> the implementation-defined default if there is no such value." >> >> and the Environment Variables text in XBD 8.2: >> >> http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_02 >> >> The former seems to tie our hands: unless the locales determined by >> the environment variables all exist, setlocale is required to fail and >> leave us in the (unacceptable) "C" locale where UTF-8 doesn't work. >> However the latter seems to offer us a way out. After describing how >> the precedence of the variables work, how locale pathnames work if >> localedef is supported (musl doesn't support it), and how >> implementation-provided/defined locale names work, it specifies: >> >> "If the locale value is not recognized by the implementation, the >> behavior is unspecified." >> >> My optimistic reading of this is that, in the event the locale name >> provided does not correspond to something we recognize, we're free to >> define how it's interpreted, and always interpret it as C.UTF-8. >> >> What this would achieve is the following: >> >> 1. setlocale(cat, explicit_locale_name) - succeeds if the locale >> actually has a definition file, fails and returns a null pointer >> otherwise. >> >> 2. setlocale(cat, "") - always succeeds, honoring the environment >> variable for the category if a locale definition file by that name >> exists, but otherwise (the unspecified behavior) treating it as if >> it were C.UTF-8. >> >> This way, applications that probe for specific locale names can do so >> and determine if they exist, but applications that just want to use >> the default locale the user configured will still avoid catastrophic >> breakage (failure to support UTF-8) even if they encounter "bad" LC_* >> variables. >> >> Does this approach sound acceptable? I'm fairly content with >> interpreting it as conforming to the standard; I'm mainly concerned >> about whether there might be unforseen breakage. >> >> One notable issue is that, right now, we rely on being able to set >> LC_MESSAGES to an arbitrary name even if there's no libc locale >> definition for it; this is because gettext() relies on the name of the >> current LC_MESSAGES locale to find (application-specific) translation >> files that might exist even without a libc translation. I'm not sure >> how we would best keep this working under changes similar to the >> above. > > Any further thoughts on this? I'd like to begin addressing these > issues in this release cycle. > > I think the above plan works (is conforming, doesn't break things) > except for the LC_MESSAGES issue mentioned at the end. I don't have > any good ideas still for dealing with that. Really since gettext can > be used with any category, not just LC_MESSAGES (although LC_MESSAGES > is the normal choice), it applies to all categories. Maybe we could > still use the ("nonexistant") requested locale name in this case, or > some derivative of it that clarifies that it's synthesized...? +1 to using this approach. We could use a locale name such as "en_US@virtual.UTF-8". glibc uses this style of locale name for locales such as UK english with eurozone LC_CURRENCY: en_UK@euro.UTF-8. William