mailing list of musl libc
 help / color / mirror / code / Atom feed
* locale fallback option
@ 2014-07-26 21:46 Wermut
  2014-07-26 21:51 ` writeonce
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Wermut @ 2014-07-26 21:46 UTC (permalink / raw)
  To: musl

Hi

I just read, that you committed the basic locale code and about the
musl firsts and thought of one thing that I would really like to see
in a modern implementation.

Problem: User A speaks a language "xyz" and lives in country "AB". So
he will set the relevant locale environment vars to "xyz_AB". The
problem is, that the language "xyz" is only spoken by a minority of
people and the translation of the software in his language is often
not complete or non existend. The result is, that user A will have to
read the most strings in plain english, because this is the standard
fallback. Because our user A is a member of a minority, he knows also
the language "ts" which is also spoken in "AB", but he does not know
any english.

Status quo: Because the translation "xyz_AB" is not really complete,
the user A gives up, is frustrated and sets his locale to "ts_AB".

What really should be possible: User A sets the locale "xyz_AB" and
sets "ts_AB" as a fallback for definitions and strings not available
in "xyz_AB". Only if a string is not defined in either "xyz_AB" or
"ts_AB", the hardcoded english string is shown to him.

This would require, that the locale definition would accept something
like LANG=xyz_AB:ts_AB

I have worked in the past with some of these translation problems and
worked with people from a lot of minorities that have all the same
problem: The locale subsystem is just no flexible enough. I know that
the implementation is potentially expensive, because you could end up
in looking into a lot of physical files on your hard drive, but it
would definitive be a big improvement and would help that almost
distinguished language would be used more often in computer
translations.

Thanks for reading.

Regards

Kevin


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: locale fallback option
  2014-07-26 21:46 locale fallback option Wermut
@ 2014-07-26 21:51 ` writeonce
  2014-07-27  2:08 ` Rich Felker
  2014-07-27  8:08 ` u-igbb
  2 siblings, 0 replies; 8+ messages in thread
From: writeonce @ 2014-07-26 21:51 UTC (permalink / raw)
  To: musl

On 07/26/2014 05:46 PM, Wermut wrote:
> Hi
>
> I just read, that you committed the basic locale code and about the
> musl firsts and thought of one thing that I would really like to see
> in a modern implementation.
>
> Problem: User A speaks a language "xyz" and lives in country "AB". So
> he will set the relevant locale environment vars to "xyz_AB". The
> problem is, that the language "xyz" is only spoken by a minority of
> people and the translation of the software in his language is often
> not complete or non existend. The result is, that user A will have to
> read the most strings in plain english, because this is the standard
> fallback. Because our user A is a member of a minority, he knows also
> the language "ts" which is also spoken in "AB", but he does not know
> any english.
>
> Status quo: Because the translation "xyz_AB" is not really complete,
> the user A gives up, is frustrated and sets his locale to "ts_AB".
>
> What really should be possible: User A sets the locale "xyz_AB" and
> sets "ts_AB" as a fallback for definitions and strings not available
> in "xyz_AB". Only if a string is not defined in either "xyz_AB" orgg
> "ts_AB", the hardcoded english string is shown to him.
>
> This would require, that the locale definition would accept something
> like LANG=xyz_AB:ts_AB
>
> I have worked in the past with some of these translation problems and
> worked with people from a lot of minorities that have all the same
> problem: The locale subsystem is just no flexible enough. I know that
> the implementation is potentially expensive, because you could end up
> in looking into a lot of physical files on your hard drive, but it
> would definitive be a big improvement and would help that almost
> distinguished language would be used more often in computer
> translations.
+1!
zg
>
> Thanks for reading.
>
> Regards
>
> Kevin
>
>



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: locale fallback option
  2014-07-26 21:46 locale fallback option Wermut
  2014-07-26 21:51 ` writeonce
@ 2014-07-27  2:08 ` Rich Felker
  2014-07-27  5:26   ` Wermut
  2014-07-27  8:08 ` u-igbb
  2 siblings, 1 reply; 8+ messages in thread
From: Rich Felker @ 2014-07-27  2:08 UTC (permalink / raw)
  To: musl

On Sat, Jul 26, 2014 at 11:46:31PM +0200, Wermut wrote:
> Hi
> 
> I just read, that you committed the basic locale code and about the
> musl firsts and thought of one thing that I would really like to see
> in a modern implementation.
> 
> Problem: User A speaks a language "xyz" and lives in country "AB". So
> he will set the relevant locale environment vars to "xyz_AB". The
> problem is, that the language "xyz" is only spoken by a minority of
> people and the translation of the software in his language is often
> not complete or non existend. The result is, that user A will have to
> read the most strings in plain english, because this is the standard
> fallback. Because our user A is a member of a minority, he knows also
> the language "ts" which is also spoken in "AB", but he does not know
> any english.
> 
> Status quo: Because the translation "xyz_AB" is not really complete,
> the user A gives up, is frustrated and sets his locale to "ts_AB".
> 
> What really should be possible: User A sets the locale "xyz_AB" and
> sets "ts_AB" as a fallback for definitions and strings not available
> in "xyz_AB". Only if a string is not defined in either "xyz_AB" or
> "ts_AB", the hardcoded english string is shown to him.
> 
> This would require, that the locale definition would accept something
> like LANG=xyz_AB:ts_AB

What you're asking for is roughly possible with GNU gettext and the
LANGUAGE environment variable. See:

https://www.gnu.org/software/gettext/manual/html_node/The-LANGUAGE-variable.html

This does not facilitate partial translations with fallback to a
different language, but does facilitate the situation where only some
apps have the user's preferred language and others only have a more
widely-used language.

I think we should support the exact same thing in musl's internal
gettext. Whether we should support fallbacks in the LC_* variables for
the locale too is an open question, but I don't think there's any
reason at all to consider "partial translations" with fallback to a
different language for locales. The number of messages is just so
small (and not going to significantly increase) that it really doesn't
make sense to have partial translations. Fallbacks kind of make sense,
but you can always choose a language that libc actually has for the
_locale_, then put the list of application languages in the LANGUAGE
variable, so I'm not clear on how fallbacks would let you do anything
you couldn't otherwise do or make it significantly easier. Does this
make sense?

Rich


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: locale fallback option
  2014-07-27  2:08 ` Rich Felker
@ 2014-07-27  5:26   ` Wermut
  2014-07-27  8:12     ` Rich Felker
  0 siblings, 1 reply; 8+ messages in thread
From: Wermut @ 2014-07-27  5:26 UTC (permalink / raw)
  To: musl

Hi

I think your statements make sense :) BTW, by adding locale to musl
you probably opened the pandora :)

You are right with the LC_* fallback. If you setup a new translation
team, then probably these files are the first that get implemented.

The GNU LANGUAGE is indeed working for some use cases, but still
has/had some limitations, when I tried it some time ago. The problem
was, that LC_TIME etc. where not properly overwritten by gettext
according to LANGUAGE. That means if I set "LC_ALL=xyz_AB" and added
"LANGUAGE=xyz:ts", the dates with programs no translation in  to "xyz"
had still the Dates etc. formatted according to it. In west european
languages this is a no brainer, but it got ugly once you use different
scripts to write both langs. Like mixing arabic and english.

If at least gettext is made properly and would allow a really proper
LANGUAGE functionality, then probably I would be happy.

I will test next week if glibc is still handling it like described.

Regards

 Kevin

On Sun, Jul 27, 2014 at 4:08 AM, Rich Felker <dalias@libc.org> wrote:
> On Sat, Jul 26, 2014 at 11:46:31PM +0200, Wermut wrote:
>> Hi
>>
>> I just read, that you committed the basic locale code and about the
>> musl firsts and thought of one thing that I would really like to see
>> in a modern implementation.
>>
>> Problem: User A speaks a language "xyz" and lives in country "AB". So
>> he will set the relevant locale environment vars to "xyz_AB". The
>> problem is, that the language "xyz" is only spoken by a minority of
>> people and the translation of the software in his language is often
>> not complete or non existend. The result is, that user A will have to
>> read the most strings in plain english, because this is the standard
>> fallback. Because our user A is a member of a minority, he knows also
>> the language "ts" which is also spoken in "AB", but he does not know
>> any english.
>>
>> Status quo: Because the translation "xyz_AB" is not really complete,
>> the user A gives up, is frustrated and sets his locale to "ts_AB".
>>
>> What really should be possible: User A sets the locale "xyz_AB" and
>> sets "ts_AB" as a fallback for definitions and strings not available
>> in "xyz_AB". Only if a string is not defined in either "xyz_AB" or
>> "ts_AB", the hardcoded english string is shown to him.
>>
>> This would require, that the locale definition would accept something
>> like LANG=xyz_AB:ts_AB
>
> What you're asking for is roughly possible with GNU gettext and the
> LANGUAGE environment variable. See:
>
> https://www.gnu.org/software/gettext/manual/html_node/The-LANGUAGE-variable.html
>
> This does not facilitate partial translations with fallback to a
> different language, but does facilitate the situation where only some
> apps have the user's preferred language and others only have a more
> widely-used language.
>
> I think we should support the exact same thing in musl's internal
> gettext. Whether we should support fallbacks in the LC_* variables for
> the locale too is an open question, but I don't think there's any
> reason at all to consider "partial translations" with fallback to a
> different language for locales. The number of messages is just so
> small (and not going to significantly increase) that it really doesn't
> make sense to have partial translations. Fallbacks kind of make sense,
> but you can always choose a language that libc actually has for the
> _locale_, then put the list of application languages in the LANGUAGE
> variable, so I'm not clear on how fallbacks would let you do anything
> you couldn't otherwise do or make it significantly easier. Does this
> make sense?
>
> Rich


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: locale fallback option
  2014-07-26 21:46 locale fallback option Wermut
  2014-07-26 21:51 ` writeonce
  2014-07-27  2:08 ` Rich Felker
@ 2014-07-27  8:08 ` u-igbb
  2014-07-27  8:18   ` Rich Felker
  2 siblings, 1 reply; 8+ messages in thread
From: u-igbb @ 2014-07-27  8:08 UTC (permalink / raw)
  To: musl

On Sat, Jul 26, 2014 at 11:46:31PM +0200, Wermut wrote:
> Problem: User A speaks a language "xyz" and lives in country "AB". So
> he will set the relevant locale environment vars to "xyz_AB". The
> problem is, that the language "xyz" is only spoken by a minority of
> people and the translation of the software in his language is often
> not complete or non existend. The result is, that user A will have to
> read the most strings in plain english, because this is the standard
> fallback. Because our user A is a member of a minority, he knows also
 ...
> This would require, that the locale definition would accept something
> like LANG=xyz_AB:ts_AB

I guess a similar effect could be achieved by

LANG=ts_AB LC_MESSAGES=xyz_AB    (or LC_MESSAGES=xyz_ZZ)

if fallback to LANG happens per-item in contrast to per-category.

This gives of course only two levels to combine but fits more or less
into existing conventions.

Taking some necessary cost into consideration,
which percentage of human population would be made happier by this?
:)

Frankly, I think this is about a redesign of the locale system and
hardly belongs to musl goals.

Rune



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: locale fallback option
  2014-07-27  5:26   ` Wermut
@ 2014-07-27  8:12     ` Rich Felker
  0 siblings, 0 replies; 8+ messages in thread
From: Rich Felker @ 2014-07-27  8:12 UTC (permalink / raw)
  To: musl

On Sun, Jul 27, 2014 at 07:26:27AM +0200, Wermut wrote:
> Hi
> 
> I think your statements make sense :) BTW, by adding locale to musl
> you probably opened the pandora :)
> 
> You are right with the LC_* fallback. If you setup a new translation
> team, then probably these files are the first that get implemented.
> 
> The GNU LANGUAGE is indeed working for some use cases, but still
> has/had some limitations, when I tried it some time ago. The problem
> was, that LC_TIME etc. where not properly overwritten by gettext
> according to LANGUAGE. That means if I set "LC_ALL=xyz_AB" and added
> "LANGUAGE=xyz:ts", the dates with programs no translation in  to "xyz"
> had still the Dates etc. formatted according to it. In west european
> languages this is a no brainer, but it got ugly once you use different
> scripts to write both langs. Like mixing arabic and english.

I think I understand what you're saying, but I don't see any
alternative. The standard libc interfaces are required to honor the
LANG/LC_* variables, not other settings (i.e. LANGUAGE), and thus if
LC_TIME is xyz_AB, time/date strings are going to be in language xyz,
regardless of the language of message strings.

This is probably a bit disconcerting, but won't this kind of thing
happen anyway with mixed LANGUAGE fallback? For instance if app FOO
uses library BAR and BAZ, and only BAR has a translation in language
xyz, you'll see strings from BAR in language xyz mixed (possibly even
in the same text areas) with strings from FOO and BAZ in language ts.

Do you have any suggestions for how this situation could be improved?

> If at least gettext is made properly and would allow a really proper
> LANGUAGE functionality, then probably I would be happy.

I've got gettext working and ready to commit, but so far it's without
the LANGUAGE fallback system. I don't think it will be too hard to
add, and I think I could even make it fallback for individual strings
if desired at little or no extra cost. I'm going to commit the basic
working code first though then look into adding more features.

Rich


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: locale fallback option
  2014-07-27  8:08 ` u-igbb
@ 2014-07-27  8:18   ` Rich Felker
  2014-07-27  8:45     ` u-igbb
  0 siblings, 1 reply; 8+ messages in thread
From: Rich Felker @ 2014-07-27  8:18 UTC (permalink / raw)
  To: musl

On Sun, Jul 27, 2014 at 10:08:56AM +0200, u-igbb@aetey.se wrote:
> On Sat, Jul 26, 2014 at 11:46:31PM +0200, Wermut wrote:
> > Problem: User A speaks a language "xyz" and lives in country "AB". So
> > he will set the relevant locale environment vars to "xyz_AB". The
> > problem is, that the language "xyz" is only spoken by a minority of
> > people and the translation of the software in his language is often
> > not complete or non existend. The result is, that user A will have to
> > read the most strings in plain english, because this is the standard
> > fallback. Because our user A is a member of a minority, he knows also
>  ...
> > This would require, that the locale definition would accept something
> > like LANG=xyz_AB:ts_AB
> 
> I guess a similar effect could be achieved by
> 
> LANG=ts_AB LC_MESSAGES=xyz_AB    (or LC_MESSAGES=xyz_ZZ)
> 
> if fallback to LANG happens per-item in contrast to per-category.

It doesn't. The LC_ALL->LC_*->LANG->system_default fallback system is
simply per category and based on whether the variables are set (and
nonempty), not whether they resolve to a working locale. This is
probably less than ideal. I suppose I could define "system_default" as
doing a fallback with the remaining vars after omitting the ones that
don't work, but this would still be per-category. Per-item is rather
complex and requires having locale objects that are "hybrids" and
having a way to name and identify them (since setlocale has to be able
to return a name for the current setting back to the caller).

> Frankly, I think this is about a redesign of the locale system and
> hardly belongs to musl goals.

Yes. The system is largely broken -- it does too little to actually be
useful for serious adaptation to linguistic and cultural conventions
and to support multilingual text data, and it does far too much in the
sense of breaking use of the standard library functions for
information interchange purposes. For a long time I've wanted to
design and write a very light but powerful library for handling these
things correctly (completely independent of the libc locale system),
but it will probably be a very long time before I get around to doing
a project like that, if ever...

Rich


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: locale fallback option
  2014-07-27  8:18   ` Rich Felker
@ 2014-07-27  8:45     ` u-igbb
  0 siblings, 0 replies; 8+ messages in thread
From: u-igbb @ 2014-07-27  8:45 UTC (permalink / raw)
  To: musl

On Sun, Jul 27, 2014 at 04:18:52AM -0400, Rich Felker wrote:
> > LANG=ts_AB LC_MESSAGES=xyz_AB    (or LC_MESSAGES=xyz_ZZ)
> > 
> > if fallback to LANG happens per-item in contrast to per-category.

> Per-item is rather
> complex and requires having locale objects that are "hybrids" and
> having a way to name and identify them (since setlocale has to be able
> to return a name for the current setting back to the caller).

This looks a way too complicated to be viable.

> > Frankly, I think this is about a redesign of the locale system

> Yes. The system is largely broken -- it does too little to actually be
> useful for serious adaptation to linguistic and cultural conventions
> and to support multilingual text data, and it does far too much in the
> sense of breaking use of the standard library functions for
> information interchange purposes.

+1

> For a long time I've wanted to
> design and write a very light but powerful library for handling these
> things correctly (completely independent of the libc locale system),
> but it will probably be a very long time before I get around to doing
> a project like that, if ever...

You have my sympathy (and empathy).

Rune



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-07-27  8:45 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-26 21:46 locale fallback option Wermut
2014-07-26 21:51 ` writeonce
2014-07-27  2:08 ` Rich Felker
2014-07-27  5:26   ` Wermut
2014-07-27  8:12     ` Rich Felker
2014-07-27  8:08 ` u-igbb
2014-07-27  8:18   ` Rich Felker
2014-07-27  8:45     ` u-igbb

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).