From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/5610 Path: news.gmane.org!not-for-mail From: u-igbb@aetey.se Newsgroups: gmane.linux.lib.musl.general Subject: Re: Locale bikeshed time Date: Sat, 26 Jul 2014 09:25:03 +0200 Message-ID: <20140726072502.GR16795@example.net> References: <20140723163907.GC11570@brightrain.aerifal.cx> <20140723192503.GG16795@example.net> <20140723210120.GD11570@brightrain.aerifal.cx> <20140724153526.GH16795@example.net> <20140724160150.GA4038@brightrain.aerifal.cx> <20140724201548.GM16795@example.net> <20140724220228.GB4038@brightrain.aerifal.cx> <20140725090649.GN16795@example.net> <20140725201551.GQ16795@example.net> <20140725223239.GG4038@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1406359535 18533 80.91.229.3 (26 Jul 2014 07:25:35 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 26 Jul 2014 07:25:35 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-5615-gllmg-musl=m.gmane.org@lists.openwall.com Sat Jul 26 09:25:29 2014 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1XAwMC-0005dq-0f for gllmg-musl@plane.gmane.org; Sat, 26 Jul 2014 09:25:28 +0200 Original-Received: (qmail 15807 invoked by uid 550); 26 Jul 2014 07:25:27 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 15799 invoked from network); 26 Jul 2014 07:25:27 -0000 X-T2-Spam-Status: No, hits=0.8 required=5.0 tests=BAYES_50 Received-SPF: none receiver=mailfe05.swip.net; client-ip=178.63.97.34; envelope-from=u-igbb@aetey.se Content-Disposition: inline In-Reply-To: <20140725223239.GG4038@brightrain.aerifal.cx> User-Agent: Mutt/1.5.23 (2014-03-12) Xref: news.gmane.org gmane.linux.lib.musl.general:5610 Archived-At: On Fri, Jul 25, 2014 at 06:32:39PM -0400, Rich Felker wrote: > > Somewhat cleaner might be: ("zxx" and "ZZ" below are literals) > > > > no localization C > > language[+territory] ll[l][_TT] > > purely territorial zxx_TT ("no language" code) > > While clean and well-defined, I wonder whether zxx_TT is > counter-intuitive to most users... Sure this contradicts the all that convenient inclination to use short names when possible. Nevertheless I would even argue against myself (again :) and say that we'd better disallow short variants altogether (no TT, nor ll). > > I think that a language code alone should mean "no territory-specific > > stuff included" and nothing else. > > I think that's reasonable. Givet that we'd need both this extra rule and a hope that the future user/maintainer keeps it in mind too > > Then "ll" would be a synonym for "ll_ZZ" and hence "ll_ZZ" will not have > > to exist at all. it would be in fact more robust to to the contrary simply always assume the full ll[l]_TT syntax, with zxx and ZZ being already defined by the corresponding standards to denote the needed special cases. Then this would be fully standard-compliant and consistent. I understand this may feel a bit strange and "too long" even though the extra characters are hardly a burden in practice. Let me compare this to the dns search domains - short names seem convenient but they are not reliable nor do scale. Short locale names as as well prone to be misunderstood and there will be contributions with different semantics and long bikeshed discussions on different forums about which one is right :) In other words, I feel that it is more clear to _not_ include Sweden-specific bits into "sv_ZZ" (which indicates "not _any_ country" and hence "not Sweden") than into "sv". > > LANG=sv_SE (decimal comma, "kr") > > LANG=sv LC_MONETARY=zxx_SE (decimal point from "C", iso4217 "SEK") > > Changing the numeric radix point is explicitly not supported. :) > LC_NUMERIC is just always C because, well, numbers are numbers, not > something to vary by culture, and changing the radix point just breaks > parsing and storing data for interchange. LC_MONETARY on the other I am fully with you on the point of formatting numerical data for intechange. The purpose of locale is though the exact _opposite_, to represent data in a format especially chosen for the specific occasion and a specific user, _differently_ from what would be suitable for the rest of the world. Isn't it? So I would say it is indeed stupid to localize data meant for interchange. Nevertheless it may still be meaningful to format numbers for the user's taste when the data presentation is only meant for some kind of a "local" context. Related to the decimal point issue: I think we (or at least myself) would need a clarification about the role of "C" locale. It is to mean "no localization" which does not say that it is expected to provide representation usable globally (I think it is on the contrary by its origin heavily English/US biased). I assume that you are aiming to reduce this bias as much as possible so that "C" could be neutral and suitable for as many users/uses as possible. Unfortunately this raises more questions, like the following: According to https://en.wikipedia.org/wiki/Decimal_mark " Countries where a dot "." is used to mark the radix point comprise roughly 60% of the world's population.[citation needed] " which indicates that this information is unreliable. Notably, according to the same article (and verifiably :) the living auxiliary languages meant for international communication all made a different choice (apparently for reasons based on some research): " The three most spoken international auxiliary languages, Ido, Esperanto, and Interlingua all use the comma as the official radix point " Is there anything that postulates C locale to use "." as the radix point? Is there any evidence that "." is more widely used than "," ? Do not misunderstand my questions as a cultural bias. I am _much_ more used to the decimal dot than comma, because of the involvement with programming languages using ".". Nevertheless locale is not about representing data for computers, but for humans - and I would love to have a best possible internationally useful locale as the default. Otherwise let us say that "C" locale is for interacting with programs, not with humans, period (those wishing a human-friendly internationally sound environment are to use e.g. LANG=eo_ZZ). This is possibly the only reliable/efficient/robust approach? Yet it would be a pity to not have a common representation for both humans and computers, without a cultural bias. Rune