From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/5583 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: Locale bikeshed time Date: Thu, 24 Jul 2014 12:01:50 -0400 Message-ID: <20140724160150.GA4038@brightrain.aerifal.cx> References: <20140722184932.GA4914@brightrain.aerifal.cx> <20140722201008.GC16795@example.net> <20140722203540.GA11570@brightrain.aerifal.cx> <20140723095031.GE16795@example.net> <20140723163907.GC11570@brightrain.aerifal.cx> <20140723192503.GG16795@example.net> <20140723210120.GD11570@brightrain.aerifal.cx> <20140724153526.GH16795@example.net> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1406217733 24148 80.91.229.3 (24 Jul 2014 16:02:13 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 24 Jul 2014 16:02:13 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-5588-gllmg-musl=m.gmane.org@lists.openwall.com Thu Jul 24 18:02:09 2014 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1XALT2-00014n-8z for gllmg-musl@plane.gmane.org; Thu, 24 Jul 2014 18:02:04 +0200 Original-Received: (qmail 24426 invoked by uid 550); 24 Jul 2014 16:02:03 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 24418 invoked from network); 24 Jul 2014 16:02:03 -0000 Content-Disposition: inline In-Reply-To: <20140724153526.GH16795@example.net> User-Agent: Mutt/1.5.21 (2010-09-15) Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:5583 Archived-At: On Thu, Jul 24, 2014 at 05:35:26PM +0200, u-igbb@aetey.se wrote: > On Wed, Jul 23, 2014 at 05:01:20PM -0400, Rich Felker wrote: > > > This feels appropriate - if the definitions indeed fall into distinctive > > > classes like "full" / "single-category" and also if the naming reflects > > > the distinction > > > > IMO language-based locales should be ll, lll, ll_TT, or lll_TT form > > where ll or lll is lowercase ISO language code and TT is uppercase > > territory code. Non-language-based locale files should avoid these > > patterns. > > Just for certainty: > > I assume you mean "l" above being lower case and non-language-based > definitions to begin/consist of uppercase letters? Totally avoiding two- > and three-letter combinations would be hardly followed by less scrupulous > parties :) but you certainly did not mean this. I just meant that language-based locales should match the pattern: ^[[:lower:]]{2,3}(_[[:upper:]]{2})?([[:punct:]].*)?$ assuming I didn't make any stupid mistakes in writing that regex. And non-language-based locales should not match this pattern. BTW POSIX actually describes this pattern (or similar) for locale names under the XSI option. > Btw do we have to also use lll (the three-letter codes) or would be > the two-letter ones sufficient? I believe there are some languages for which there is no two-letter code. (Note that even the whole 26x26 space is probably insufficient to represent all of the world's languages, and for practical purposes, the letters should have some correspondence with the name of the language.) > I understand that this is not an implementation question but rather a > discipline/policy one but in the long run it helps enormously to have > a clean deployment idea from the beginning. Agreed. > An example of a spectacular failure to do so were the xkb keyboard maps. > [ > Two incompatible representations were in use, for many years (!) One was > reasonable, structured by country i.e. reflecting different countries' > actual standards. The other one was broken by design, using "language" > as the main key without any actual definition of its semantics. This > led to many of the available definitions being a hardly useful hacks > (and of course to a lot of confusion for everyone as this thing was > impossible to document). Remarkably even the maintainers of the maps > at x.org/freedesktop.org at the time did not realize the origin of the > problem. I happen to have been involved into clarifying the issue, > now the structure of xkb/symbols is reasonable. > ] This text is utterly backwards, and I've complained about the policy before, but gotten nowhere with it. Yes many languages have keyboard variants connected to a particular geographic territory (this is mainly true for European languages, not so much for the rest of the world), but it does not make keyboard layout a property of country. You also have: - Users who speak and use languages that have no relation to the country where they're living. - Languages which have no territory. - Languages used in territories where the country it belongs to is disputed. - Etc. All of these issues make country-based keyboard selection at best inconvenient, and at worst culturally and politically offensive, to users. And offending users is utterly bad policy. The same issue exists in glibc -- for a long time, their policy mandated that all locales have a territory associated with them, and this (along with other stupid policy) was preventing the addition of the Esperanto locale. See: https://sourceware.org/bugzilla/show_bug.cgi?id=16190 I believe the policy has been fixed now, but the discussion happened on a different bug tracker issue and/or mailing list thread, and I don't have the link. > I am afraid that not stating a clean usage model may harm musl deployments > too (say by mixing two- and three-letter locale codes so that one can not > sanely know which kind to use). The reasonable approach to this is probably just using the three-letter codes for languages that do not have a two-letter code. In practice I haven't seen such translations/locales on other systems, but we certainly don't want to preclude them. Rich