* locale fallback option @ 2014-07-26 21:46 Wermut 2014-07-26 21:51 ` writeonce ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Wermut @ 2014-07-26 21:46 UTC (permalink / raw) To: musl Hi I just read, that you committed the basic locale code and about the musl firsts and thought of one thing that I would really like to see in a modern implementation. Problem: User A speaks a language "xyz" and lives in country "AB". So he will set the relevant locale environment vars to "xyz_AB". The problem is, that the language "xyz" is only spoken by a minority of people and the translation of the software in his language is often not complete or non existend. The result is, that user A will have to read the most strings in plain english, because this is the standard fallback. Because our user A is a member of a minority, he knows also the language "ts" which is also spoken in "AB", but he does not know any english. Status quo: Because the translation "xyz_AB" is not really complete, the user A gives up, is frustrated and sets his locale to "ts_AB". What really should be possible: User A sets the locale "xyz_AB" and sets "ts_AB" as a fallback for definitions and strings not available in "xyz_AB". Only if a string is not defined in either "xyz_AB" or "ts_AB", the hardcoded english string is shown to him. This would require, that the locale definition would accept something like LANG=xyz_AB:ts_AB I have worked in the past with some of these translation problems and worked with people from a lot of minorities that have all the same problem: The locale subsystem is just no flexible enough. I know that the implementation is potentially expensive, because you could end up in looking into a lot of physical files on your hard drive, but it would definitive be a big improvement and would help that almost distinguished language would be used more often in computer translations. Thanks for reading. Regards Kevin ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: locale fallback option 2014-07-26 21:46 locale fallback option Wermut @ 2014-07-26 21:51 ` writeonce 2014-07-27 2:08 ` Rich Felker 2014-07-27 8:08 ` u-igbb 2 siblings, 0 replies; 8+ messages in thread From: writeonce @ 2014-07-26 21:51 UTC (permalink / raw) To: musl On 07/26/2014 05:46 PM, Wermut wrote: > Hi > > I just read, that you committed the basic locale code and about the > musl firsts and thought of one thing that I would really like to see > in a modern implementation. > > Problem: User A speaks a language "xyz" and lives in country "AB". So > he will set the relevant locale environment vars to "xyz_AB". The > problem is, that the language "xyz" is only spoken by a minority of > people and the translation of the software in his language is often > not complete or non existend. The result is, that user A will have to > read the most strings in plain english, because this is the standard > fallback. Because our user A is a member of a minority, he knows also > the language "ts" which is also spoken in "AB", but he does not know > any english. > > Status quo: Because the translation "xyz_AB" is not really complete, > the user A gives up, is frustrated and sets his locale to "ts_AB". > > What really should be possible: User A sets the locale "xyz_AB" and > sets "ts_AB" as a fallback for definitions and strings not available > in "xyz_AB". Only if a string is not defined in either "xyz_AB" orgg > "ts_AB", the hardcoded english string is shown to him. > > This would require, that the locale definition would accept something > like LANG=xyz_AB:ts_AB > > I have worked in the past with some of these translation problems and > worked with people from a lot of minorities that have all the same > problem: The locale subsystem is just no flexible enough. I know that > the implementation is potentially expensive, because you could end up > in looking into a lot of physical files on your hard drive, but it > would definitive be a big improvement and would help that almost > distinguished language would be used more often in computer > translations. +1! zg > > Thanks for reading. > > Regards > > Kevin > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: locale fallback option 2014-07-26 21:46 locale fallback option Wermut 2014-07-26 21:51 ` writeonce @ 2014-07-27 2:08 ` Rich Felker 2014-07-27 5:26 ` Wermut 2014-07-27 8:08 ` u-igbb 2 siblings, 1 reply; 8+ messages in thread From: Rich Felker @ 2014-07-27 2:08 UTC (permalink / raw) To: musl On Sat, Jul 26, 2014 at 11:46:31PM +0200, Wermut wrote: > Hi > > I just read, that you committed the basic locale code and about the > musl firsts and thought of one thing that I would really like to see > in a modern implementation. > > Problem: User A speaks a language "xyz" and lives in country "AB". So > he will set the relevant locale environment vars to "xyz_AB". The > problem is, that the language "xyz" is only spoken by a minority of > people and the translation of the software in his language is often > not complete or non existend. The result is, that user A will have to > read the most strings in plain english, because this is the standard > fallback. Because our user A is a member of a minority, he knows also > the language "ts" which is also spoken in "AB", but he does not know > any english. > > Status quo: Because the translation "xyz_AB" is not really complete, > the user A gives up, is frustrated and sets his locale to "ts_AB". > > What really should be possible: User A sets the locale "xyz_AB" and > sets "ts_AB" as a fallback for definitions and strings not available > in "xyz_AB". Only if a string is not defined in either "xyz_AB" or > "ts_AB", the hardcoded english string is shown to him. > > This would require, that the locale definition would accept something > like LANG=xyz_AB:ts_AB What you're asking for is roughly possible with GNU gettext and the LANGUAGE environment variable. See: https://www.gnu.org/software/gettext/manual/html_node/The-LANGUAGE-variable.html This does not facilitate partial translations with fallback to a different language, but does facilitate the situation where only some apps have the user's preferred language and others only have a more widely-used language. I think we should support the exact same thing in musl's internal gettext. Whether we should support fallbacks in the LC_* variables for the locale too is an open question, but I don't think there's any reason at all to consider "partial translations" with fallback to a different language for locales. The number of messages is just so small (and not going to significantly increase) that it really doesn't make sense to have partial translations. Fallbacks kind of make sense, but you can always choose a language that libc actually has for the _locale_, then put the list of application languages in the LANGUAGE variable, so I'm not clear on how fallbacks would let you do anything you couldn't otherwise do or make it significantly easier. Does this make sense? Rich ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: locale fallback option 2014-07-27 2:08 ` Rich Felker @ 2014-07-27 5:26 ` Wermut 2014-07-27 8:12 ` Rich Felker 0 siblings, 1 reply; 8+ messages in thread From: Wermut @ 2014-07-27 5:26 UTC (permalink / raw) To: musl Hi I think your statements make sense :) BTW, by adding locale to musl you probably opened the pandora :) You are right with the LC_* fallback. If you setup a new translation team, then probably these files are the first that get implemented. The GNU LANGUAGE is indeed working for some use cases, but still has/had some limitations, when I tried it some time ago. The problem was, that LC_TIME etc. where not properly overwritten by gettext according to LANGUAGE. That means if I set "LC_ALL=xyz_AB" and added "LANGUAGE=xyz:ts", the dates with programs no translation in to "xyz" had still the Dates etc. formatted according to it. In west european languages this is a no brainer, but it got ugly once you use different scripts to write both langs. Like mixing arabic and english. If at least gettext is made properly and would allow a really proper LANGUAGE functionality, then probably I would be happy. I will test next week if glibc is still handling it like described. Regards Kevin On Sun, Jul 27, 2014 at 4:08 AM, Rich Felker <dalias@libc.org> wrote: > On Sat, Jul 26, 2014 at 11:46:31PM +0200, Wermut wrote: >> Hi >> >> I just read, that you committed the basic locale code and about the >> musl firsts and thought of one thing that I would really like to see >> in a modern implementation. >> >> Problem: User A speaks a language "xyz" and lives in country "AB". So >> he will set the relevant locale environment vars to "xyz_AB". The >> problem is, that the language "xyz" is only spoken by a minority of >> people and the translation of the software in his language is often >> not complete or non existend. The result is, that user A will have to >> read the most strings in plain english, because this is the standard >> fallback. Because our user A is a member of a minority, he knows also >> the language "ts" which is also spoken in "AB", but he does not know >> any english. >> >> Status quo: Because the translation "xyz_AB" is not really complete, >> the user A gives up, is frustrated and sets his locale to "ts_AB". >> >> What really should be possible: User A sets the locale "xyz_AB" and >> sets "ts_AB" as a fallback for definitions and strings not available >> in "xyz_AB". Only if a string is not defined in either "xyz_AB" or >> "ts_AB", the hardcoded english string is shown to him. >> >> This would require, that the locale definition would accept something >> like LANG=xyz_AB:ts_AB > > What you're asking for is roughly possible with GNU gettext and the > LANGUAGE environment variable. See: > > https://www.gnu.org/software/gettext/manual/html_node/The-LANGUAGE-variable.html > > This does not facilitate partial translations with fallback to a > different language, but does facilitate the situation where only some > apps have the user's preferred language and others only have a more > widely-used language. > > I think we should support the exact same thing in musl's internal > gettext. Whether we should support fallbacks in the LC_* variables for > the locale too is an open question, but I don't think there's any > reason at all to consider "partial translations" with fallback to a > different language for locales. The number of messages is just so > small (and not going to significantly increase) that it really doesn't > make sense to have partial translations. Fallbacks kind of make sense, > but you can always choose a language that libc actually has for the > _locale_, then put the list of application languages in the LANGUAGE > variable, so I'm not clear on how fallbacks would let you do anything > you couldn't otherwise do or make it significantly easier. Does this > make sense? > > Rich ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: locale fallback option 2014-07-27 5:26 ` Wermut @ 2014-07-27 8:12 ` Rich Felker 0 siblings, 0 replies; 8+ messages in thread From: Rich Felker @ 2014-07-27 8:12 UTC (permalink / raw) To: musl On Sun, Jul 27, 2014 at 07:26:27AM +0200, Wermut wrote: > Hi > > I think your statements make sense :) BTW, by adding locale to musl > you probably opened the pandora :) > > You are right with the LC_* fallback. If you setup a new translation > team, then probably these files are the first that get implemented. > > The GNU LANGUAGE is indeed working for some use cases, but still > has/had some limitations, when I tried it some time ago. The problem > was, that LC_TIME etc. where not properly overwritten by gettext > according to LANGUAGE. That means if I set "LC_ALL=xyz_AB" and added > "LANGUAGE=xyz:ts", the dates with programs no translation in to "xyz" > had still the Dates etc. formatted according to it. In west european > languages this is a no brainer, but it got ugly once you use different > scripts to write both langs. Like mixing arabic and english. I think I understand what you're saying, but I don't see any alternative. The standard libc interfaces are required to honor the LANG/LC_* variables, not other settings (i.e. LANGUAGE), and thus if LC_TIME is xyz_AB, time/date strings are going to be in language xyz, regardless of the language of message strings. This is probably a bit disconcerting, but won't this kind of thing happen anyway with mixed LANGUAGE fallback? For instance if app FOO uses library BAR and BAZ, and only BAR has a translation in language xyz, you'll see strings from BAR in language xyz mixed (possibly even in the same text areas) with strings from FOO and BAZ in language ts. Do you have any suggestions for how this situation could be improved? > If at least gettext is made properly and would allow a really proper > LANGUAGE functionality, then probably I would be happy. I've got gettext working and ready to commit, but so far it's without the LANGUAGE fallback system. I don't think it will be too hard to add, and I think I could even make it fallback for individual strings if desired at little or no extra cost. I'm going to commit the basic working code first though then look into adding more features. Rich ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: locale fallback option 2014-07-26 21:46 locale fallback option Wermut 2014-07-26 21:51 ` writeonce 2014-07-27 2:08 ` Rich Felker @ 2014-07-27 8:08 ` u-igbb 2014-07-27 8:18 ` Rich Felker 2 siblings, 1 reply; 8+ messages in thread From: u-igbb @ 2014-07-27 8:08 UTC (permalink / raw) To: musl On Sat, Jul 26, 2014 at 11:46:31PM +0200, Wermut wrote: > Problem: User A speaks a language "xyz" and lives in country "AB". So > he will set the relevant locale environment vars to "xyz_AB". The > problem is, that the language "xyz" is only spoken by a minority of > people and the translation of the software in his language is often > not complete or non existend. The result is, that user A will have to > read the most strings in plain english, because this is the standard > fallback. Because our user A is a member of a minority, he knows also ... > This would require, that the locale definition would accept something > like LANG=xyz_AB:ts_AB I guess a similar effect could be achieved by LANG=ts_AB LC_MESSAGES=xyz_AB (or LC_MESSAGES=xyz_ZZ) if fallback to LANG happens per-item in contrast to per-category. This gives of course only two levels to combine but fits more or less into existing conventions. Taking some necessary cost into consideration, which percentage of human population would be made happier by this? :) Frankly, I think this is about a redesign of the locale system and hardly belongs to musl goals. Rune ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: locale fallback option 2014-07-27 8:08 ` u-igbb @ 2014-07-27 8:18 ` Rich Felker 2014-07-27 8:45 ` u-igbb 0 siblings, 1 reply; 8+ messages in thread From: Rich Felker @ 2014-07-27 8:18 UTC (permalink / raw) To: musl On Sun, Jul 27, 2014 at 10:08:56AM +0200, u-igbb@aetey.se wrote: > On Sat, Jul 26, 2014 at 11:46:31PM +0200, Wermut wrote: > > Problem: User A speaks a language "xyz" and lives in country "AB". So > > he will set the relevant locale environment vars to "xyz_AB". The > > problem is, that the language "xyz" is only spoken by a minority of > > people and the translation of the software in his language is often > > not complete or non existend. The result is, that user A will have to > > read the most strings in plain english, because this is the standard > > fallback. Because our user A is a member of a minority, he knows also > ... > > This would require, that the locale definition would accept something > > like LANG=xyz_AB:ts_AB > > I guess a similar effect could be achieved by > > LANG=ts_AB LC_MESSAGES=xyz_AB (or LC_MESSAGES=xyz_ZZ) > > if fallback to LANG happens per-item in contrast to per-category. It doesn't. The LC_ALL->LC_*->LANG->system_default fallback system is simply per category and based on whether the variables are set (and nonempty), not whether they resolve to a working locale. This is probably less than ideal. I suppose I could define "system_default" as doing a fallback with the remaining vars after omitting the ones that don't work, but this would still be per-category. Per-item is rather complex and requires having locale objects that are "hybrids" and having a way to name and identify them (since setlocale has to be able to return a name for the current setting back to the caller). > Frankly, I think this is about a redesign of the locale system and > hardly belongs to musl goals. Yes. The system is largely broken -- it does too little to actually be useful for serious adaptation to linguistic and cultural conventions and to support multilingual text data, and it does far too much in the sense of breaking use of the standard library functions for information interchange purposes. For a long time I've wanted to design and write a very light but powerful library for handling these things correctly (completely independent of the libc locale system), but it will probably be a very long time before I get around to doing a project like that, if ever... Rich ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: locale fallback option 2014-07-27 8:18 ` Rich Felker @ 2014-07-27 8:45 ` u-igbb 0 siblings, 0 replies; 8+ messages in thread From: u-igbb @ 2014-07-27 8:45 UTC (permalink / raw) To: musl On Sun, Jul 27, 2014 at 04:18:52AM -0400, Rich Felker wrote: > > LANG=ts_AB LC_MESSAGES=xyz_AB (or LC_MESSAGES=xyz_ZZ) > > > > if fallback to LANG happens per-item in contrast to per-category. > Per-item is rather > complex and requires having locale objects that are "hybrids" and > having a way to name and identify them (since setlocale has to be able > to return a name for the current setting back to the caller). This looks a way too complicated to be viable. > > Frankly, I think this is about a redesign of the locale system > Yes. The system is largely broken -- it does too little to actually be > useful for serious adaptation to linguistic and cultural conventions > and to support multilingual text data, and it does far too much in the > sense of breaking use of the standard library functions for > information interchange purposes. +1 > For a long time I've wanted to > design and write a very light but powerful library for handling these > things correctly (completely independent of the libc locale system), > but it will probably be a very long time before I get around to doing > a project like that, if ever... You have my sympathy (and empathy). Rune ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2014-07-27 8:45 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-07-26 21:46 locale fallback option Wermut 2014-07-26 21:51 ` writeonce 2014-07-27 2:08 ` Rich Felker 2014-07-27 5:26 ` Wermut 2014-07-27 8:12 ` Rich Felker 2014-07-27 8:08 ` u-igbb 2014-07-27 8:18 ` Rich Felker 2014-07-27 8:45 ` u-igbb
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).