mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Subject: Re: Bikeshed invitation for nl_langinfo ambiguities
Date: Mon, 5 Mar 2018 12:10:27 -0500	[thread overview]
Message-ID: <20180305171027.GE1436@brightrain.aerifal.cx> (raw)
In-Reply-To: <20180303050854.GD1436@brightrain.aerifal.cx>

On Sat, Mar 03, 2018 at 12:08:54AM -0500, Rich Felker wrote:
> On Sun, Nov 26, 2017 at 05:19:07PM -0600, A. Wilcox wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA256
> > 
> > On 10/11/17 20:06, Rich Felker wrote:
> > > I've found 2 ambiguous-string-to-translate bugs in musl's locale 
> > > support in nl_langinfo: The pairs ABMON_5 and MON_5 ("May"), and
> > > T_FMT and ERA_T_FMT ("%H:%M:%S"), have the same values in the C
> > > locale, and thus can't be translated to distinct values like they
> > > need to be in other locales.
> > > 
> > > Any opinions on the cleanest way to handle this? There are various 
> > > hacks I could do at the implementation level, like adding a prefix 
> > > character to one or the other then applying +1 to the output
> > > string, But whatever solution we choose becomes a public interface
> > > for translators, so it should be something that's not horribly
> > > ugly.
> > 
> > I would personally recommend actually using the enum values as the
> > strings to translate.  _("MON_5"), _("ABMON_5"), etc; this is
> > non-ambiguous, easily understandable and describable for translators,
> > and does not require weird hacks at the implementation or ABI level.
> 
> I think this may be the nicest approach, despite being an incompatible
> change from the existing system, which apparently doesn't matter and
> isn't being used or people would have noticed that "May" can't be
> translated right.

One really ugly thing here is that the POSIX key for weekdays is
"highly unconventional" - ABDAY_1/DAY_1 is Sunday and ABDAY_7/DAY_7 is
Saturday. Even the Unicode CLDR noticed this nonsense and used
"sun"..."sat" as the keys rather than using numbers so as to be
unambiguous.

> > Of course, then a "C" / "POSIX" strings file must be present.  But
> > this is, in my opinion, a very small sacrifice to ensure full purity
> > and ease of translation.
> 
> As noted before, obviously this isn't acceptable. We could drop a .mo
> file blob in the musl langinfo.c, but I think it might make more sense
> to just use different code paths for translated vs nontranslated case.

I did some simple estimates with a toy .po/.mo file, and it looks like
either of those approaches is going to more-than-double the size of
langinfo.o, and make it a lot more complex. Given that "Sun".."Sat"
are nicer keys for days anyway, I'm leaning back towards sticking with
what we have and just adding a special case for "May". The other
ambiguity is one of the ERA_* formats, which we're not even doing
right now anyway; they're "not available in the POSIX locale"
according to XBD 7.3.5 LC_TIME, so as I read it they should return ""
(not the correspondign non-era string) in the C/POSIX locale, and only
return something else if they're defined for the locale. Eventually,
we should probably look them up with mo keys like "era_d_fmt", etc.
but unless/until we properly support them, the lookups for them should
just be removed.

> Then we could just synthesize the keys (ABMON_*, MON_*, ABDAY_*,
> DAY_*) to pass into LCTRANS() rather than having a table of them all
> expanded out. I might change my mind when actually working out how the
> code would look, though.

I started working on a nice means of doing this synthesis - having a
table like the existing c_time etc. but contents like:

	"ABDAY_1\0\0\0\0\0\0\0"
	"DAY_1\0\0\0\0\0\0\0"
	"ABMON_1\0\0\0\0\0\0\0\0\0\0\0\0"
	"MON_1\0\0\0\0\0\0\0\0\0\0\0\0"
	...

where, when a zero-length entry is hit, the last non-zero-length one
seen gets used as a basis for synthesis. But it still didn't seem
possible to avoid significant increase in code size and complexity.

Rich


      reply	other threads:[~2018-03-05 17:10 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-11  2:06 Rich Felker
2017-11-26 23:19 ` A. Wilcox
2017-11-27  1:07   ` Rich Felker
2017-11-27  2:57     ` A. Wilcox
2017-11-27  5:09       ` Rich Felker
2018-03-03  5:08   ` Rich Felker
2018-03-05 17:10     ` Rich Felker [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180305171027.GE1436@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).