mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Markus Wichmann <nullplan@gmx.net>
To: musl@lists.openwall.com
Subject: Re: Patches: Timezone in %c and POT file
Date: Thu, 31 Dec 2015 10:37:58 +0100	[thread overview]
Message-ID: <20151231093758.GD4425@debian> (raw)
In-Reply-To: <20151230155848.GK238@brightrain.aerifal.cx>

[-- Attachment #1: Type: text/plain, Size: 8102 bytes --]

On Wed, Dec 30, 2015 at 10:58:48AM -0500, Rich Felker wrote:
> On Wed, Dec 30, 2015 at 11:56:33AM +0100, Markus Wichmann wrote:
> > Hi all,
> > 
> > Now I have subscribed, so CC'ing me is no longer necessary.
> > 
> > Today I worked on two things: Firstly, I put the timezone into
> > strftime's %c output. The reason is that glibc's strftime() does the
> > same. That means, that an application dev currently can't depend on
> > either behavior (so strftime("%c %Z") will give me the timezone twice on
> > glibc, but only once on musl, and my app won't be able to tell without
> > inspecting the resulting string).
> > 
> > No biggie, changing that one is easy. Of course, a heated argument can
> > be had over whether or not we want it one way or the other. And it'll
> > come down to personal taste, because as far as I'm aware, POSIX isn't
> > mandating anything about this.
> 
> The format for %c in the C locale is strictly specified by ISO C as
> "%a %b %e %T %Y"; see 7.27.3.5 ¶ 7. If glibc does not match this it's
> a bug in glibc. POSIX is of course aligned with ISO C and says the
> same thing. In other locales it's permitted to differ.
> 

Ah, sorry, I didn't check the C locale. And I didn't look up strftime()
in the C standard. I did look up D_T_FMT in POSIX, though. Ah well, it
happens.

Attached are two new patches, the first reverting this one, the second
updating the POT and the PO to reflect that change.

> > Then I noticed, that for quite some time now, musl has been supporting
> > ..mo files, but no infrastructure is in place for them (i.e. no POT file
> > nor any PO file is shipped). I tried searching around for POT or PO
> > file, but I couldn't find any. So I added a handwritten POT file and a
> > German PO file (I'm not proficient enough in any languages besides
> > English and German to want to create that file for any other languages.
> > And an English PO file would be kind of redundant.)
> > 
> > I filled the POT file with all the strings I could find, that would ever
> > be plugged into __lctrans(). That gives me strerror(), strsignal(),
> > gai_strerror(), hstrerror(), and __getopt_msg() strings.
> 
> Have you read the thread "Call for locales maintainer & contributors"
> from when locale support launched? Here's a link to the start of it:
> 
> http://www.openwall.com/lists/musl/2014/07/24/14
> 

No, I haven't. I'm looking into it now. I did search for a musl locale
repository and couldn't find any, so that's why I sent these patches.

> It might have some useful ideas. The main one I'd like to point out is
> the idea to develop and maintain locales as a separate repo outside of
> the source repo. Unlike glibc, we don't have a lot of messages that
> should be expected to change frequently, so I think the issues with
> keeping sync are minimal, while there are several advantages:
> 
> - Not having translation progress stalled-by/tied-to code release
>   cycles.
> 
> - Saving users who don't want locales from having to download them.
> 
> - No need to have locale patches go through me.
> 

Since we're going with gettext's MO files, the separate repo might work,
but I think you should at least keep an up-to-date POT file in the musl
repo. That way the locale repo just has to check whether the POT file is
still up to date, and if not, what changed, to be able to update all the
language POs.

Also, keeping at least a POT file around allows people interested in
translation work to get into it way more quickly, even without the need
for a repo. After the POT file was done, the German PO was a matter half
an hour at the most, and incidentally, German is the only translation I
was interested in. Since locale is still very much a DIY thing, that
wouldn't be so bad.

But then we probably should annotate the weirder entries (the
nl_langinfo() stuff).

> > Unfortunately this design is running into some problems: At the moment
> > several strings are empty in the C locale (which is fine), but they
> > could translate to something else in some other locale (nl_langinfo()'s
> > ERA* and THOUSEP come to mind).
> 
> Yes. Those are unsupported right now, along with a lot of related
> functionality. There's also no way to set the fields of localeconv(),
> which come mostly (entirely, I think) from LC_MONETARY and LC_NUMERIC.
> Depending on how we end up representing that data in the locale file,
> it might make sense to use some sort of preprocessing script to
> generate this part of the .po file, but I'd like to have the format
> just be simple and natural to do in .po if that doesn't impose heavy
> code or runtime overhead.
> 
> > Some strings in the C locale are the
> > exact same and might translate to something else in some language (the
> > long and short forms of "May" for instance). I think glibc solves that
> > problem with another file format for libc's locales, which is a headache
> > I don't want to think about this year anymore.
> 
> "May" is a good example. Yes, I've never much liked the gettext model
> of keying by untranslated/English string, but for translation it's the
> only one that's translator-friendly, and for musl it was the only
> choice that saved us from having to develop a new file format and code
> to handle it.
> 
> The easiest solution I've come up with is prefixing and doing
> something like __lctrans("<prefix>string")+prefixlen, ideally with a
> prefixlen of 1, e.g. __lctrans("\5May")+1. This would just add 1 byte
> to each string in the built-in C locale data and one inc/add
> instruction, not a significant cost. Do you have any other good ideas?
> 

Well, the prefix idea is good, but it requires changes to all the
interfaces calling __lctrans() and to POT and PO file. And all of that
for a potential problem we don't even have right now. (BTW: Despite all
my searching, I couldn't find out what POSIX means by "era".)

Alternatively we could go the other route and map those string lists
into memory from locale files. I'm thinking of a file that contains a
string like c_time or c_messages, or better yet: All of them. Then we
prefix that file with offsets to where the strings start... Loading
would just be mmap() and setting a couple pointers, and using it would
be just loading str from a different source in nl_langinfo()...

Oh no, that would be the "custom file format" route. Viable, but also a
lot of work. To create those files we'd need tools and before long we'd
be reinventing MO files.


No, as it stands, I'd go with "Let's cross that bridge when we come to
it". musl doesn't support a lot of things that would be necessary for
full locale support (and I don't particularly want it to. Sure, it's
annoying to have to cut long numbers up into groups of three digits
manually, but the code to support it in the POSIX way would just be
insane).

> > [...]
> > +
> > +msgid "."
> > +msgstr ","
> 
> musl explicitly does not support changing the radix point; there's an
> old thread on this topic I can dig up if you'd like to read it. It
> looks to me like nl_langinfo(RADIXCHAR) will return a replacement if
> the locale file defines one, but then you get inconsistent results
> since it won't be used (e.g. by printf or strtod). Probably
> nl_langinfo should avoid passing the "." to __lctrans at all so that
> this inconsistency can't arise. This would also allow us to support
> "mon_decimal_point" (which would otherwise be a duplicate untranslated
> string) if desired, I think.
> 

Oh yes, I remember reading that.

Well, we could remove that string from the PO file. And the POT file. I
haven't yet done that in the appended patches, but you can do that after
applying them (then I don't get a merge conflict), but yeah, printing
numbers in the local way is also something even glibc tries to avoid.
Also I doubt it is necessary.

Since few libcs offer local number support, programs that try to offer
local number I/O have to work around that, anyway (changing commas to
dots before passing it to strtod(), filtering out thousands separators,
etc.) so few if any applications need that. I only put it there for
completeness' sake.

Ciao,
Markus

[-- Attachment #2: 0001-Revert-Add-timezone-to-strftime-s-c.patch --]
[-- Type: text/x-diff, Size: 776 bytes --]

From effa6db773e84f6a2369843e79a8b0e645930a41 Mon Sep 17 00:00:00 2001
From: Markus Wichmann <nullplan@gmx.net>
Date: Thu, 31 Dec 2015 09:33:12 +0100
Subject: [PATCH 1/2] Revert "Add timezone to strftime's %c."

This reverts commit 2371b3107a3cb39fdd53222021a0853bd05341cf.
---
 src/locale/langinfo.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/locale/langinfo.c b/src/locale/langinfo.c
index 7bfc9da..b2c8569 100644
--- a/src/locale/langinfo.c
+++ b/src/locale/langinfo.c
@@ -13,7 +13,7 @@ static const char c_time[] =
 	"May\0"       "June\0"     "July\0"     "August\0"
 	"September\0" "October\0"  "November\0" "December\0"
 	"AM\0" "PM\0"
-	"%a %b %e %T %Y %Z\0"
+	"%a %b %e %T %Y\0"
 	"%m/%d/%y\0"
 	"%H:%M:%S\0"
 	"%I:%M:%S %p\0"
-- 
2.1.4


[-- Attachment #3: 0002-Necessary-changes-for-correct-C-locale.patch --]
[-- Type: text/x-diff, Size: 1186 bytes --]

From 3b048aaa38cc0c753794a7ad8aba4a1412a53d6f Mon Sep 17 00:00:00 2001
From: Markus Wichmann <nullplan@gmx.net>
Date: Thu, 31 Dec 2015 09:39:53 +0100
Subject: [PATCH 2/2] Necessary changes for correct C locale.

Turns out the C locale D_T_FMT string was mandated by C standard. So,
for glibc compatibility, not only did I revert my first commit, but also
changed the POT and PO files to reflect that.
---
 po/de.po    | 5 +----
 po/musl.pot | 3 ---
 2 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/po/de.po b/po/de.po
index 29cf025..7dfb819 100644
--- a/po/de.po
+++ b/po/de.po
@@ -388,9 +388,6 @@ msgstr ""
 msgid "PM"
 msgstr ""
 
-msgid "%a %b %e %T %Y %Z"
-msgstr "%a %e %b %Y %T %Z"
-
 msgid "%m/%d/%y"
 msgstr "%d.%m.%y"
 
@@ -404,7 +401,7 @@ msgid "0123456789"
 msgstr "0123456789"
 
 msgid "%a %b %e %T %Y"
-msgstr "%a %e %b %Y %T"
+msgstr "%a %e %b %Y %T %Z"
 
 msgid "^[yY]"
 msgstr "^[jJ]"
diff --git a/po/musl.pot b/po/musl.pot
index 699e8f5..3b7138f 100644
--- a/po/musl.pot
+++ b/po/musl.pot
@@ -377,9 +377,6 @@ msgstr "AM"
 msgid "PM"
 msgstr "PM"
 
-msgid "%a %b %e %T %Y %Z"
-msgstr "%a %b %e %T %Y %Z"
-
 msgid "%m/%d/%y"
 msgstr "%m/%d/%y"
 
-- 
2.1.4


      reply	other threads:[~2015-12-31  9:37 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-30 10:56 Markus Wichmann
2015-12-30 15:58 ` Rich Felker
2015-12-31  9:37   ` Markus Wichmann [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151231093758.GD4425@debian \
    --to=nullplan@gmx.net \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).