Hi

With the exception of the musl translation itself, I think the most
parts are doable. My problem at the moment is, that I am not a C hero
like you guys and don't know exactly how these locale files should
look like (file format, content). As a consequence fully answering
your questions is non trivial at the moment. I can do some research
and do some dirty work, but first I would need a sample locale file in
the musl format, or some documentation to get kick started. I have
worked in the past and even created locale files for glibc and cldr,
so I am at least not a complete newbie on the topic.

Unfortunately I have not enough time to act as a maintainer, but I
could periodically help out if someone stands up and take the lead.

For the translation of musl itself: Do you plan to add a *.pot file to
the musl repository?

Regards
Kevin

On Sun, Jul 27, 2014 at 5:27 AM, Rich Felker <dalias@libc.org> wrote:
> On Sat, Jul 26, 2014 at 11:27:38PM +0200, Wermut wrote:
>> Hi
>>
>> I don't like the idea of an entirely new tree of locale data written
>> from scratch. Glibc has one (with a lot of unmaintained data) and then
>> there is also the CLDR repository which aims to be the central source
>> for such data, maintained by unicode. The CLDR data is also used as a
>> basis for the Microsoft and Apple locale files and is often maintained
>> by national language experts. What I could offer is an effort to write
>> some magic code that imports the actual CLDR data and converts the
>> relevant information to the musl formatted ones. The CLDR data is
>> freely available from: http://cldr.unicode.org/index/downloads
>
> I have no objection to using data from CLDR if there's no restrictive
> license, but at first glance it looks like most of the data is outside
> the scope of the C/POSIX locale system. What we need is:

CLDR license (botom of the page): http://unicode.org/copyright.html I
my eyes this is a BSD like license. If somebody thinks the license is
not OK, please say so. Copy is attached to this mail.

>
> 1. Weekday and month names (full and abbreviated) - these should
>    almost certainly be available from CLDR or other public sources.
>
> 2. Time format strings for strftime - unless CLDR has C-oriented data
>    like that, these might not be available in a form that's easy to
>    automatically adapt. Research on this topic is welcome.
>
> 3. Regexes for yes and no responses - seems unlikely to be in CLDR,
>    but again I'd be happy for someone to prove me wrong.
>
> 4. Translations of the message strings in libc. Note that musl's
>    strings already deviate some from the legacy strings used on glibc
>    and other systems. For example the strerror strings are adjusted to
>    align more closely with the POSIX description and the actual
>    situations they arise in than the legacy strings (like "Not a
>    typewriter"). I'd like to aim to have our translated strings
>    equally modernized. And before really spending a lot of work on
>    these we should review the English strings again for possible
>    improvements and missing messages (I think some newer error codes
>    may be missing).
>
> 5. Collation rules - these almost certainly can come from Unicode/CLDR
>    but musl does not even support collation yet.
>
> 6. Monetary formatting and currency names - these almost certain can
>    come from CLDR or other public sources, but again the code to use
>    the data isn't there yet.
>
>> Contribution is not completely open, but you normally interested
>> people get access if they want to. I got mine within a week.
>>
>> This is only a suggestion open to discussion. What do you guys think about it?
>
> Overall I like it. But I think we still need a maintainer to manage
> pulling the data, maintaining string translations for messages, etc.
> Any comments on my items 1-6 above?
>
> Rich