From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/12577 Path: news.gmane.org!.POSTED!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: Bikeshed invitation for nl_langinfo ambiguities Date: Mon, 5 Mar 2018 12:10:27 -0500 Message-ID: <20180305171027.GE1436@brightrain.aerifal.cx> References: <20171111020612.GV1627@brightrain.aerifal.cx> <5A1B4BEB.5030304@adelielinux.org> <20180303050854.GD1436@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1520269726 3307 195.159.176.226 (5 Mar 2018 17:08:46 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Mon, 5 Mar 2018 17:08:46 +0000 (UTC) User-Agent: Mutt/1.5.21 (2010-09-15) To: musl@lists.openwall.com Original-X-From: musl-return-12593-gllmg-musl=m.gmane.org@lists.openwall.com Mon Mar 05 18:08:42 2018 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1estbK-0008N1-Ku for gllmg-musl@m.gmane.org; Mon, 05 Mar 2018 18:08:38 +0100 Original-Received: (qmail 17604 invoked by uid 550); 5 Mar 2018 17:10:41 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 17583 invoked from network); 5 Mar 2018 17:10:40 -0000 Content-Disposition: inline In-Reply-To: <20180303050854.GD1436@brightrain.aerifal.cx> Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:12577 Archived-At: On Sat, Mar 03, 2018 at 12:08:54AM -0500, Rich Felker wrote: > On Sun, Nov 26, 2017 at 05:19:07PM -0600, A. Wilcox wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA256 > > > > On 10/11/17 20:06, Rich Felker wrote: > > > I've found 2 ambiguous-string-to-translate bugs in musl's locale > > > support in nl_langinfo: The pairs ABMON_5 and MON_5 ("May"), and > > > T_FMT and ERA_T_FMT ("%H:%M:%S"), have the same values in the C > > > locale, and thus can't be translated to distinct values like they > > > need to be in other locales. > > > > > > Any opinions on the cleanest way to handle this? There are various > > > hacks I could do at the implementation level, like adding a prefix > > > character to one or the other then applying +1 to the output > > > string, But whatever solution we choose becomes a public interface > > > for translators, so it should be something that's not horribly > > > ugly. > > > > I would personally recommend actually using the enum values as the > > strings to translate. _("MON_5"), _("ABMON_5"), etc; this is > > non-ambiguous, easily understandable and describable for translators, > > and does not require weird hacks at the implementation or ABI level. > > I think this may be the nicest approach, despite being an incompatible > change from the existing system, which apparently doesn't matter and > isn't being used or people would have noticed that "May" can't be > translated right. One really ugly thing here is that the POSIX key for weekdays is "highly unconventional" - ABDAY_1/DAY_1 is Sunday and ABDAY_7/DAY_7 is Saturday. Even the Unicode CLDR noticed this nonsense and used "sun"..."sat" as the keys rather than using numbers so as to be unambiguous. > > Of course, then a "C" / "POSIX" strings file must be present. But > > this is, in my opinion, a very small sacrifice to ensure full purity > > and ease of translation. > > As noted before, obviously this isn't acceptable. We could drop a .mo > file blob in the musl langinfo.c, but I think it might make more sense > to just use different code paths for translated vs nontranslated case. I did some simple estimates with a toy .po/.mo file, and it looks like either of those approaches is going to more-than-double the size of langinfo.o, and make it a lot more complex. Given that "Sun".."Sat" are nicer keys for days anyway, I'm leaning back towards sticking with what we have and just adding a special case for "May". The other ambiguity is one of the ERA_* formats, which we're not even doing right now anyway; they're "not available in the POSIX locale" according to XBD 7.3.5 LC_TIME, so as I read it they should return "" (not the correspondign non-era string) in the C/POSIX locale, and only return something else if they're defined for the locale. Eventually, we should probably look them up with mo keys like "era_d_fmt", etc. but unless/until we properly support them, the lookups for them should just be removed. > Then we could just synthesize the keys (ABMON_*, MON_*, ABDAY_*, > DAY_*) to pass into LCTRANS() rather than having a table of them all > expanded out. I might change my mind when actually working out how the > code would look, though. I started working on a nice means of doing this synthesis - having a table like the existing c_time etc. but contents like: "ABDAY_1\0\0\0\0\0\0\0" "DAY_1\0\0\0\0\0\0\0" "ABMON_1\0\0\0\0\0\0\0\0\0\0\0\0" "MON_1\0\0\0\0\0\0\0\0\0\0\0\0" ... where, when a zero-length entry is hit, the last non-zero-length one seen gets used as a basis for synthesis. But it still didn't seem possible to avoid significant increase in code size and complexity. Rich