From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Subject: Re: Bikeshed invitation for nl_langinfo ambiguities
Date: Mon, 5 Mar 2018 12:10:27 -0500 [thread overview]
Message-ID: <20180305171027.GE1436@brightrain.aerifal.cx> (raw)
In-Reply-To: <20180303050854.GD1436@brightrain.aerifal.cx>
On Sat, Mar 03, 2018 at 12:08:54AM -0500, Rich Felker wrote:
> On Sun, Nov 26, 2017 at 05:19:07PM -0600, A. Wilcox wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA256
> >
> > On 10/11/17 20:06, Rich Felker wrote:
> > > I've found 2 ambiguous-string-to-translate bugs in musl's locale
> > > support in nl_langinfo: The pairs ABMON_5 and MON_5 ("May"), and
> > > T_FMT and ERA_T_FMT ("%H:%M:%S"), have the same values in the C
> > > locale, and thus can't be translated to distinct values like they
> > > need to be in other locales.
> > >
> > > Any opinions on the cleanest way to handle this? There are various
> > > hacks I could do at the implementation level, like adding a prefix
> > > character to one or the other then applying +1 to the output
> > > string, But whatever solution we choose becomes a public interface
> > > for translators, so it should be something that's not horribly
> > > ugly.
> >
> > I would personally recommend actually using the enum values as the
> > strings to translate. _("MON_5"), _("ABMON_5"), etc; this is
> > non-ambiguous, easily understandable and describable for translators,
> > and does not require weird hacks at the implementation or ABI level.
>
> I think this may be the nicest approach, despite being an incompatible
> change from the existing system, which apparently doesn't matter and
> isn't being used or people would have noticed that "May" can't be
> translated right.
One really ugly thing here is that the POSIX key for weekdays is
"highly unconventional" - ABDAY_1/DAY_1 is Sunday and ABDAY_7/DAY_7 is
Saturday. Even the Unicode CLDR noticed this nonsense and used
"sun"..."sat" as the keys rather than using numbers so as to be
unambiguous.
> > Of course, then a "C" / "POSIX" strings file must be present. But
> > this is, in my opinion, a very small sacrifice to ensure full purity
> > and ease of translation.
>
> As noted before, obviously this isn't acceptable. We could drop a .mo
> file blob in the musl langinfo.c, but I think it might make more sense
> to just use different code paths for translated vs nontranslated case.
I did some simple estimates with a toy .po/.mo file, and it looks like
either of those approaches is going to more-than-double the size of
langinfo.o, and make it a lot more complex. Given that "Sun".."Sat"
are nicer keys for days anyway, I'm leaning back towards sticking with
what we have and just adding a special case for "May". The other
ambiguity is one of the ERA_* formats, which we're not even doing
right now anyway; they're "not available in the POSIX locale"
according to XBD 7.3.5 LC_TIME, so as I read it they should return ""
(not the correspondign non-era string) in the C/POSIX locale, and only
return something else if they're defined for the locale. Eventually,
we should probably look them up with mo keys like "era_d_fmt", etc.
but unless/until we properly support them, the lookups for them should
just be removed.
> Then we could just synthesize the keys (ABMON_*, MON_*, ABDAY_*,
> DAY_*) to pass into LCTRANS() rather than having a table of them all
> expanded out. I might change my mind when actually working out how the
> code would look, though.
I started working on a nice means of doing this synthesis - having a
table like the existing c_time etc. but contents like:
"ABDAY_1\0\0\0\0\0\0\0"
"DAY_1\0\0\0\0\0\0\0"
"ABMON_1\0\0\0\0\0\0\0\0\0\0\0\0"
"MON_1\0\0\0\0\0\0\0\0\0\0\0\0"
...
where, when a zero-length entry is hit, the last non-zero-length one
seen gets used as a basis for synthesis. But it still didn't seem
possible to avoid significant increase in code size and complexity.
Rich
prev parent reply other threads:[~2018-03-05 17:10 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-11 2:06 Rich Felker
2017-11-26 23:19 ` A. Wilcox
2017-11-27 1:07 ` Rich Felker
2017-11-27 2:57 ` A. Wilcox
2017-11-27 5:09 ` Rich Felker
2018-03-03 5:08 ` Rich Felker
2018-03-05 17:10 ` Rich Felker [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180305171027.GE1436@brightrain.aerifal.cx \
--to=dalias@libc.org \
--cc=musl@lists.openwall.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).