mailing list of musl libc
 help / color / mirror / code / Atom feed
* Call for locales maintainer & contributors
@ 2014-07-24 19:32 Rich Felker
  2014-07-26 13:27 ` Olivier Goudron
  0 siblings, 1 reply; 8+ messages in thread
From: Rich Felker @ 2014-07-24 19:32 UTC (permalink / raw)
  To: musl

With the upcoming locale support in musl 1.1.4, there's going to be a
need to create and maintain locale files for use with musl. This is
not a task I want to take on myself; I feel like it will take away too
much time and concentration from actual further development of musl
and future projects. So I'm looking for someone interested in
maintaining this aspect of musl.

I believe it can (and probably should) be an entirely out-of-tree
project. The intent is for locale files not to be version-specific,
although they may need some updates for future versions of musl when
new messages are added or some of the English messages change.
Otherwise the main task involved is working with users who want to add
full translations or just localized time, monetary, collation, etc.
data.

Since I expect locale support to be somewhat "experimental" in 1.1.4
(note: this should not interfere with existing deployments since the
LC_* vars are basically ignored when MUSL_LOCPATH is not set), I don't
think it's urgent that we get locales created and published right
away. It can be more of an ongoing process. Over the next day or two I
should be committing the rest of the source-level support needed for
LC_TIME and LC_MESSAGES to work. LC_COLLATE and LC_MONETARY might or
might not make it into this release; I'd rather omit them for now and
do them right later than have something hackish in a release that we
feel obligated to continute to support. So early efforts should
probably just be focused on translating day/month names and related
formats and the libc messages.

Rich


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Call for locales maintainer & contributors
  2014-07-24 19:32 Call for locales maintainer & contributors Rich Felker
@ 2014-07-26 13:27 ` Olivier Goudron
  2014-07-26 21:27   ` Wermut
  0 siblings, 1 reply; 8+ messages in thread
From: Olivier Goudron @ 2014-07-26 13:27 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1801 bytes --]

Hello,
I don't know how locale files work but i can help for the french
translation if needed.

Olivier.



2014-07-24 21:32 GMT+02:00 Rich Felker <dalias@libc.org>:

> With the upcoming locale support in musl 1.1.4, there's going to be a
> need to create and maintain locale files for use with musl. This is
> not a task I want to take on myself; I feel like it will take away too
> much time and concentration from actual further development of musl
> and future projects. So I'm looking for someone interested in
> maintaining this aspect of musl.
>
> I believe it can (and probably should) be an entirely out-of-tree
> project. The intent is for locale files not to be version-specific,
> although they may need some updates for future versions of musl when
> new messages are added or some of the English messages change.
> Otherwise the main task involved is working with users who want to add
> full translations or just localized time, monetary, collation, etc.
> data.
>
> Since I expect locale support to be somewhat "experimental" in 1.1.4
> (note: this should not interfere with existing deployments since the
> LC_* vars are basically ignored when MUSL_LOCPATH is not set), I don't
> think it's urgent that we get locales created and published right
> away. It can be more of an ongoing process. Over the next day or two I
> should be committing the rest of the source-level support needed for
> LC_TIME and LC_MESSAGES to work. LC_COLLATE and LC_MONETARY might or
> might not make it into this release; I'd rather omit them for now and
> do them right later than have something hackish in a release that we
> feel obligated to continute to support. So early efforts should
> probably just be focused on translating day/month names and related
> formats and the libc messages.
>
> Rich
>

[-- Attachment #2: Type: text/html, Size: 2294 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Call for locales maintainer & contributors
  2014-07-26 13:27 ` Olivier Goudron
@ 2014-07-26 21:27   ` Wermut
  2014-07-27  3:27     ` Rich Felker
  0 siblings, 1 reply; 8+ messages in thread
From: Wermut @ 2014-07-26 21:27 UTC (permalink / raw)
  To: musl

Hi

I don't like the idea of an entirely new tree of locale data written
from scratch. Glibc has one (with a lot of unmaintained data) and then
there is also the CLDR repository which aims to be the central source
for such data, maintained by unicode. The CLDR data is also used as a
basis for the Microsoft and Apple locale files and is often maintained
by national language experts. What I could offer is an effort to write
some magic code that imports the actual CLDR data and converts the
relevant information to the musl formatted ones. The CLDR data is
freely available from: http://cldr.unicode.org/index/downloads

Contribution is not completely open, but you normally interested
people get access if they want to. I got mine within a week.

This is only a suggestion open to discussion. What do you guys think about it?

Regards

Kevin

On Sat, Jul 26, 2014 at 3:27 PM, Olivier Goudron
<olivier.goudron@gmail.com> wrote:
> Hello,
> I don't know how locale files work but i can help for the french translation
> if needed.
>
> Olivier.
>
>
>
> 2014-07-24 21:32 GMT+02:00 Rich Felker <dalias@libc.org>:
>
>> With the upcoming locale support in musl 1.1.4, there's going to be a
>> need to create and maintain locale files for use with musl. This is
>> not a task I want to take on myself; I feel like it will take away too
>> much time and concentration from actual further development of musl
>> and future projects. So I'm looking for someone interested in
>> maintaining this aspect of musl.
>>
>> I believe it can (and probably should) be an entirely out-of-tree
>> project. The intent is for locale files not to be version-specific,
>> although they may need some updates for future versions of musl when
>> new messages are added or some of the English messages change.
>> Otherwise the main task involved is working with users who want to add
>> full translations or just localized time, monetary, collation, etc.
>> data.
>>
>> Since I expect locale support to be somewhat "experimental" in 1.1.4
>> (note: this should not interfere with existing deployments since the
>> LC_* vars are basically ignored when MUSL_LOCPATH is not set), I don't
>> think it's urgent that we get locales created and published right
>> away. It can be more of an ongoing process. Over the next day or two I
>> should be committing the rest of the source-level support needed for
>> LC_TIME and LC_MESSAGES to work. LC_COLLATE and LC_MONETARY might or
>> might not make it into this release; I'd rather omit them for now and
>> do them right later than have something hackish in a release that we
>> feel obligated to continute to support. So early efforts should
>> probably just be focused on translating day/month names and related
>> formats and the libc messages.
>>
>> Rich
>
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Call for locales maintainer & contributors
  2014-07-26 21:27   ` Wermut
@ 2014-07-27  3:27     ` Rich Felker
  2014-07-27 19:43       ` Wermut
  0 siblings, 1 reply; 8+ messages in thread
From: Rich Felker @ 2014-07-27  3:27 UTC (permalink / raw)
  To: musl

On Sat, Jul 26, 2014 at 11:27:38PM +0200, Wermut wrote:
> Hi
> 
> I don't like the idea of an entirely new tree of locale data written
> from scratch. Glibc has one (with a lot of unmaintained data) and then
> there is also the CLDR repository which aims to be the central source
> for such data, maintained by unicode. The CLDR data is also used as a
> basis for the Microsoft and Apple locale files and is often maintained
> by national language experts. What I could offer is an effort to write
> some magic code that imports the actual CLDR data and converts the
> relevant information to the musl formatted ones. The CLDR data is
> freely available from: http://cldr.unicode.org/index/downloads

I have no objection to using data from CLDR if there's no restrictive
license, but at first glance it looks like most of the data is outside
the scope of the C/POSIX locale system. What we need is:

1. Weekday and month names (full and abbreviated) - these should
   almost certainly be available from CLDR or other public sources.

2. Time format strings for strftime - unless CLDR has C-oriented data
   like that, these might not be available in a form that's easy to
   automatically adapt. Research on this topic is welcome.

3. Regexes for yes and no responses - seems unlikely to be in CLDR,
   but again I'd be happy for someone to prove me wrong.

4. Translations of the message strings in libc. Note that musl's
   strings already deviate some from the legacy strings used on glibc
   and other systems. For example the strerror strings are adjusted to
   align more closely with the POSIX description and the actual
   situations they arise in than the legacy strings (like "Not a
   typewriter"). I'd like to aim to have our translated strings
   equally modernized. And before really spending a lot of work on
   these we should review the English strings again for possible
   improvements and missing messages (I think some newer error codes
   may be missing).

5. Collation rules - these almost certainly can come from Unicode/CLDR
   but musl does not even support collation yet.

6. Monetary formatting and currency names - these almost certain can
   come from CLDR or other public sources, but again the code to use
   the data isn't there yet.

> Contribution is not completely open, but you normally interested
> people get access if they want to. I got mine within a week.
> 
> This is only a suggestion open to discussion. What do you guys think about it?

Overall I like it. But I think we still need a maintainer to manage
pulling the data, maintaining string translations for messages, etc.
Any comments on my items 1-6 above?

Rich


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Call for locales maintainer & contributors
  2014-07-27  3:27     ` Rich Felker
@ 2014-07-27 19:43       ` Wermut
  2014-07-27 20:41         ` Rich Felker
  2014-07-28  3:12         ` Isaac Dunham
  0 siblings, 2 replies; 8+ messages in thread
From: Wermut @ 2014-07-27 19:43 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 3860 bytes --]

Hi

With the exception of the musl translation itself, I think the most
parts are doable. My problem at the moment is, that I am not a C hero
like you guys and don't know exactly how these locale files should
look like (file format, content). As a consequence fully answering
your questions is non trivial at the moment. I can do some research
and do some dirty work, but first I would need a sample locale file in
the musl format, or some documentation to get kick started. I have
worked in the past and even created locale files for glibc and cldr,
so I am at least not a complete newbie on the topic.

Unfortunately I have not enough time to act as a maintainer, but I
could periodically help out if someone stands up and take the lead.

For the translation of musl itself: Do you plan to add a *.pot file to
the musl repository?

Regards
Kevin

On Sun, Jul 27, 2014 at 5:27 AM, Rich Felker <dalias@libc.org> wrote:
> On Sat, Jul 26, 2014 at 11:27:38PM +0200, Wermut wrote:
>> Hi
>>
>> I don't like the idea of an entirely new tree of locale data written
>> from scratch. Glibc has one (with a lot of unmaintained data) and then
>> there is also the CLDR repository which aims to be the central source
>> for such data, maintained by unicode. The CLDR data is also used as a
>> basis for the Microsoft and Apple locale files and is often maintained
>> by national language experts. What I could offer is an effort to write
>> some magic code that imports the actual CLDR data and converts the
>> relevant information to the musl formatted ones. The CLDR data is
>> freely available from: http://cldr.unicode.org/index/downloads
>
> I have no objection to using data from CLDR if there's no restrictive
> license, but at first glance it looks like most of the data is outside
> the scope of the C/POSIX locale system. What we need is:

CLDR license (botom of the page): http://unicode.org/copyright.html I
my eyes this is a BSD like license. If somebody thinks the license is
not OK, please say so. Copy is attached to this mail.

>
> 1. Weekday and month names (full and abbreviated) - these should
>    almost certainly be available from CLDR or other public sources.
>
> 2. Time format strings for strftime - unless CLDR has C-oriented data
>    like that, these might not be available in a form that's easy to
>    automatically adapt. Research on this topic is welcome.
>
> 3. Regexes for yes and no responses - seems unlikely to be in CLDR,
>    but again I'd be happy for someone to prove me wrong.
>
> 4. Translations of the message strings in libc. Note that musl's
>    strings already deviate some from the legacy strings used on glibc
>    and other systems. For example the strerror strings are adjusted to
>    align more closely with the POSIX description and the actual
>    situations they arise in than the legacy strings (like "Not a
>    typewriter"). I'd like to aim to have our translated strings
>    equally modernized. And before really spending a lot of work on
>    these we should review the English strings again for possible
>    improvements and missing messages (I think some newer error codes
>    may be missing).
>
> 5. Collation rules - these almost certainly can come from Unicode/CLDR
>    but musl does not even support collation yet.
>
> 6. Monetary formatting and currency names - these almost certain can
>    come from CLDR or other public sources, but again the code to use
>    the data isn't there yet.
>
>> Contribution is not completely open, but you normally interested
>> people get access if they want to. I got mine within a week.
>>
>> This is only a suggestion open to discussion. What do you guys think about it?
>
> Overall I like it. But I think we still need a maintainer to manage
> pulling the data, maintaining string translations for messages, etc.
> Any comments on my items 1-6 above?
>
> Rich

[-- Attachment #2: unicode-license.txt --]
[-- Type: text/plain, Size: 2933 bytes --]

UNICODE, INC. LICENSE AGREEMENT - DATA FILES AND SOFTWARE

    Unicode Data Files include all data files under the directories
http://www.unicode.org/Public/, http://www.unicode.org/reports/, and
http://www.unicode.org/cldr/data/. Unicode Data Files do not include PDF
online code charts under the directory http://www.unicode.org/Public/.
Software includes any source code published in the Unicode Standard or under
the directories http://www.unicode.org/Public/,
http://www.unicode.org/reports/, and http://www.unicode.org/cldr/data/.

    NOTICE TO USER: Carefully read the following legal agreement. BY
DOWNLOADING, INSTALLING, COPYING OR OTHERWISE USING UNICODE INC.'S DATA FILES
("DATA FILES"), AND/OR SOFTWARE ("SOFTWARE"), YOU UNEQUIVOCALLY ACCEPT, AND
AGREE TO BE BOUND BY, ALL OF THE TERMS AND CONDITIONS OF THIS AGREEMENT. IF
YOU DO NOT AGREE, DO NOT DOWNLOAD, INSTALL, COPY, DISTRIBUTE OR USE THE DATA
FILES OR SOFTWARE.

    COPYRIGHT AND PERMISSION NOTICE

    Copyright © 1991-2014 Unicode, Inc. All rights reserved. Distributed under
the Terms of Use in http://www.unicode.org/copyright.html.

    Permission is hereby granted, free of charge, to any person obtaining a
copy of the Unicode data files and any associated documentation (the "Data
Files") or Unicode software and any associated documentation (the "Software")
to deal in the Data Files or Software without restriction, including without
limitation the rights to use, copy, modify, merge, publish, distribute, and/or
sell copies of the Data Files or Software, and to permit persons to whom the
Data Files or Software are furnished to do so, provided that (a) the above
copyright notice(s) and this permission notice appear with all copies of the
Data Files or Software, (b) both the above copyright notice(s) and this
permission notice appear in associated documentation, and (c) there is clear
notice in each modified Data File or in the Software as well as in the
documentation associated with the Data File(s) or Software that the data or
software has been modified.

    THE DATA FILES AND SOFTWARE ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF THIRD
PARTY RIGHTS. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN
THIS NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL
DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR
PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS
ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THE
DATA FILES OR SOFTWARE.

    Except as contained in this notice, the name of a copyright holder shall
not be used in advertising or otherwise to promote the sale, use or other
dealings in these Data Files or Software without prior written authorization
of the copyright holder.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Call for locales maintainer & contributors
  2014-07-27 19:43       ` Wermut
@ 2014-07-27 20:41         ` Rich Felker
  2014-07-28  3:12         ` Isaac Dunham
  1 sibling, 0 replies; 8+ messages in thread
From: Rich Felker @ 2014-07-27 20:41 UTC (permalink / raw)
  To: musl

On Sun, Jul 27, 2014 at 09:43:34PM +0200, Wermut wrote:
> Hi
> 
> With the exception of the musl translation itself, I think the most
> parts are doable. My problem at the moment is, that I am not a C hero
> like you guys and don't know exactly how these locale files should
> look like (file format, content). As a consequence fully answering
> your questions is non trivial at the moment. I can do some research
> and do some dirty work, but first I would need a sample locale file in
> the musl format, or some documentation to get kick started. I have
> worked in the past and even created locale files for glibc and cldr,
> so I am at least not a complete newbie on the topic.
> 
> Unfortunately I have not enough time to act as a maintainer, but I
> could periodically help out if someone stands up and take the lead.
> 
> For the translation of musl itself: Do you plan to add a *.pot file to
> the musl repository?

Is a .pot file a "template" .po file for translation purposes? If so,
I think this would be a very reasonable thing to add, but somebody
needs to make it. Making that might actually be the best place to
start for someone interested in getting translations going.

Rich


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Call for locales maintainer & contributors
  2014-07-27 19:43       ` Wermut
  2014-07-27 20:41         ` Rich Felker
@ 2014-07-28  3:12         ` Isaac Dunham
  2014-07-28  3:28           ` Rich Felker
  1 sibling, 1 reply; 8+ messages in thread
From: Isaac Dunham @ 2014-07-28  3:12 UTC (permalink / raw)
  To: musl

On Sun, Jul 27, 2014 at 09:43:34PM +0200, Wermut wrote:
> CLDR license (botom of the page): http://unicode.org/copyright.html I
> my eyes this is a BSD like license. If somebody thinks the license is
> not OK, please say so. Copy is attached to this mail.

Comments below.

> UNICODE, INC. LICENSE AGREEMENT - DATA FILES AND SOFTWARE
> 
>     Unicode Data Files include all data files under the directories
> http://www.unicode.org/Public/, http://www.unicode.org/reports/, and
> http://www.unicode.org/cldr/data/. Unicode Data Files do not include PDF
> online code charts under the directory http://www.unicode.org/Public/.
> Software includes any source code published in the Unicode Standard or under
> the directories http://www.unicode.org/Public/,
> http://www.unicode.org/reports/, and http://www.unicode.org/cldr/data/.
> 
>     NOTICE TO USER: Carefully read the following legal agreement. BY
> DOWNLOADING, INSTALLING, COPYING OR OTHERWISE USING UNICODE INC.'S DATA FILES
> ("DATA FILES"), AND/OR SOFTWARE ("SOFTWARE"), YOU UNEQUIVOCALLY ACCEPT, AND
> AGREE TO BE BOUND BY, ALL OF THE TERMS AND CONDITIONS OF THIS AGREEMENT. IF
> YOU DO NOT AGREE, DO NOT DOWNLOAD, INSTALL, COPY, DISTRIBUTE OR USE THE DATA
> FILES OR SOFTWARE.
> 
>     COPYRIGHT AND PERMISSION NOTICE
> 
>     Copyright © 1991-2014 Unicode, Inc. All rights reserved. Distributed under
> the Terms of Use in http://www.unicode.org/copyright.html.
> 
>     Permission is hereby granted, free of charge, to any person obtaining a
> copy of the Unicode data files and any associated documentation (the "Data
> Files") or Unicode software and any associated documentation (the "Software")
> to deal in the Data Files or Software without restriction, including without
> limitation the rights to use, copy, modify, merge, publish, distribute, and/or
> sell copies of the Data Files or Software, and to permit persons to whom the
> Data Files or Software are furnished to do so, provided that (a) the above
> copyright notice(s) and this permission notice appear with all copies of the
> Data Files or Software, (b) both the above copyright notice(s) and this
> permission notice appear in associated documentation, and (c) there is clear
> notice in each modified Data File or in the Software as well as in the
> documentation associated with the Data File(s) or Software that the data or
> software has been modified.

(a): this makes it in between Sleepycat and BSD (explicit transfer of right
to modify, without mandating transfer of source).
I *think* this shouldn't be a problem, especially considering that Apple and MS
would seem to be fine with it.
(b): could possibly end up being tedious for those who distribute .mo alongside
a static binary, but probably not very bad.

But most importantly:
(c): If I understand correctly, this _MANDATES_ a comment field in the .mo
format (to note copyright, modifications, etc.)

> 
>     THE DATA FILES AND SOFTWARE ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
> KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF THIRD
> PARTY RIGHTS. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN
> THIS NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL
> DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR
> PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS
> ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THE
> DATA FILES OR SOFTWARE.
> 
>     Except as contained in this notice, the name of a copyright holder shall
> not be used in advertising or otherwise to promote the sale, use or other
> dealings in these Data Files or Software without prior written authorization
> of the copyright holder.

Overall:
I have no objection to this, but thought I'd point out the technical
implications that I noticed.

I note the download/copy rules could possibly make mirrors/redistributors
potentially vulnerable to "secondary infringement" (if they copy someone
else's infringing use, they would seem to be liable under the license).
But I am not a lawyer; I would be inclined to suggest asking one before
making use of CLDR data in musl (I'm wondering if this is likely to prove
problematic for those using musl to provide static binaries, and if
redistributors of such binaries are likely to be liable, et cetera.)

The comment field aspect might be important to vendors even if musl does
not use CLDR data, since a vendor might use CLDR or similarly licensed data
as a source for their own locales.

Hope this helps,
Isaac Dunham


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Call for locales maintainer & contributors
  2014-07-28  3:12         ` Isaac Dunham
@ 2014-07-28  3:28           ` Rich Felker
  0 siblings, 0 replies; 8+ messages in thread
From: Rich Felker @ 2014-07-28  3:28 UTC (permalink / raw)
  To: musl

On Sun, Jul 27, 2014 at 08:12:17PM -0700, Isaac Dunham wrote:
> But most importantly:
> (c): If I understand correctly, this _MANDATES_ a comment field in the .mo
> format (to note copyright, modifications, etc.)

I think that would probably depend on the degree and nature of data
used. Basic facts (like the names of countries, currencies, day and
month names, etc.) are not subject to copyright; at most, the CLDR
compilation and presentation of such data would be, but I see no way
this copyright could apply when the identical data could be obtained
from other sources. Obviously other things like non-trivial string
translations could be subject to copyright, though.

Rich


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-07-28  3:28 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-24 19:32 Call for locales maintainer & contributors Rich Felker
2014-07-26 13:27 ` Olivier Goudron
2014-07-26 21:27   ` Wermut
2014-07-27  3:27     ` Rich Felker
2014-07-27 19:43       ` Wermut
2014-07-27 20:41         ` Rich Felker
2014-07-28  3:12         ` Isaac Dunham
2014-07-28  3:28           ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).