[Caml-list] error messages in multiple languages ?

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

* [Caml-list] error messages in multiple languages ?
@ 2017-04-08 14:22 Tao Stein
  2017-04-08 14:43 ` Gabriel Scherer
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Tao Stein @ 2017-04-08 14:22 UTC (permalink / raw)
  To: OCaml Mailing List

[-- Attachment #1: Type: text/plain, Size: 540 bytes --]

I've been teaching OCaml to university students in Beijing. I believe
they'd feel more comfortable if the error messages were in Chinese. Has
anyone thought of implementing multi-language strings in the compiler? So,
say with the setting of an environment variable, the compiler user could
receive errors and warnings in their preferred language. I know it would
require a lot of translation work (crowd sourced?), but the internal
language abstraction mechanism would need to be there too.

Tao Stein / 石涛 / تاو شتاين

[-- Attachment #2: Type: text/html, Size: 834 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Caml-list] error messages in multiple languages ?
  2017-04-08 14:22 [Caml-list] error messages in multiple languages ? Tao Stein
@ 2017-04-08 14:43 ` Gabriel Scherer
  2017-04-08 15:03   ` Sébastien Hinderer
  2017-04-08 16:38 ` Xavier Leroy
  2017-04-11 14:05 ` Richard W.M. Jones
  2 siblings, 1 reply; 24+ messages in thread
From: Gabriel Scherer @ 2017-04-08 14:43 UTC (permalink / raw)
  To: Tao Stein; +Cc: OCaml Mailing List, Alexis Irlande

[-- Attachment #1: Type: text/plain, Size: 1671 bytes --]

Hi,

This is an interesting question and the issue was discussed in 2012 on the
list:

  https://sympa.inria.fr/sympa/arc/caml-list/2012-11/msg00100.html

Currently there is no mechanism in the compiler codebase to have
multi-language strings in the compiler. Alexis Irland proposed a patch in
the above thread ( https://sympa.inria.fr/sympa/arc/caml-list/2012-11/
msg00152.html ) that certainly enabled some kind of parametrization, but
unfortunately the patch files (which were hosted on a personal dropbox)
seem lost today. (What I have at hand is a French translation of the
compiler messages made by Jacques-Henri Jourdan, but without a
parametrization mechanism.)

I think that this is an interesting idea and I would personally be willing
to support a well-engineered patch providing the feature (the question of
whether the internationalized messages should be hardcoded in a source file
or use a sort of gettext-like mechanism is delicate), but I don't know the
opinion of the compiler maintainers.

On Sat, Apr 8, 2017 at 10:22 AM, Tao Stein <taostein@gmail.com> wrote:

>
> I've been teaching OCaml to university students in Beijing. I believe
> they'd feel more comfortable if the error messages were in Chinese. Has
> anyone thought of implementing multi-language strings in the compiler? So,
> say with the setting of an environment variable, the compiler user could
> receive errors and warnings in their preferred language. I know it would
> require a lot of translation work (crowd sourced?), but the internal
> language abstraction mechanism would need to be there too.
>
> Tao Stein / 石涛 / تاو شتاين
>
>

[-- Attachment #2: Type: text/html, Size: 2526 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Caml-list] error messages in multiple languages ?
  2017-04-08 14:43 ` Gabriel Scherer
@ 2017-04-08 15:03   ` Sébastien Hinderer
  0 siblings, 0 replies; 24+ messages in thread
From: Sébastien Hinderer @ 2017-04-08 15:03 UTC (permalink / raw)
  To: caml-list

Hi,

Gabriel Scherer (2017/04/08 10:43 -0400):
> Hi,
> 
> This is an interesting question and the issue was discussed in 2012 on the
> list:
> 
>   https://sympa.inria.fr/sympa/arc/caml-list/2012-11/msg00100.html
> 
> Currently there is no mechanism in the compiler codebase to have
> multi-language strings in the compiler. Alexis Irland proposed a patch in
> the above thread ( https://sympa.inria.fr/sympa/arc/caml-list/2012-11/
> msg00152.html ) that certainly enabled some kind of parametrization, but
> unfortunately the patch files (which were hosted on a personal dropbox)
> seem lost today. (What I have at hand is a French translation of the
> compiler messages made by Jacques-Henri Jourdan, but without a
> parametrization mechanism.)
> 
> I think that this is an interesting idea and I would personally be willing
> to support a well-engineered patch providing the feature

I like the idea, too.

> (the question of
> whether the internationalized messages should be hardcoded in a source file
> or use a sort of gettext-like mechanism is delicate),

Why is it delicate? Why not use gettext itself?

Sébastien.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Caml-list] error messages in multiple languages ?
  2017-04-08 14:22 [Caml-list] error messages in multiple languages ? Tao Stein
  2017-04-08 14:43 ` Gabriel Scherer
@ 2017-04-08 16:38 ` Xavier Leroy
  2017-04-08 16:51   ` Sébastien Hinderer
  2017-04-11 14:05 ` Richard W.M. Jones
  2 siblings, 1 reply; 24+ messages in thread
From: Xavier Leroy @ 2017-04-08 16:38 UTC (permalink / raw)
  To: caml-list

On 04/08/2017 04:22 PM, Tao Stein wrote:

> I've been teaching OCaml to university students in Beijing. I believe they'd
> feel more comfortable if the error messages were in Chinese. Has anyone
> thought of implementing multi-language strings in the compiler? So, say with
> the setting of an environment variable, the compiler user could receive
> errors and warnings in their preferred language.

Caml Light, the ancestor of OCaml, was internationalized in this manner.  It
had messages in English, French, German, Spanish and Italian.  Curious or
nostalgic minds can have a look at the text file containing the translations:

https://github.com/camllight/camllight/blob/master/sources/src/camlmsgs.txt

and at the i18n engine itself, which was just a wrapper around "printf" that
used the english format message as an index into the translations:

https://github.com/camllight/camllight/blob/master/sources/src/compiler/interntl.ml

This implementation was pretty short and sweet, if I may say so myself, and
possibly easier to use than gettext because by construction the english
message was always available, even if translations were missing by mistake.

Yet it was a lot of work, and quite painful, to keep the message file and
the translations up to date.  Keep in mind that Caml Light had perhaps
1/10th as many messages as OCaml does.  So, the chances of getting i18n to
work for OCaml look thin.  One advantage for me, though, is that it would
make it harder to add new warnings :-)

Your Caml historian,

- Xavier Leroy

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Caml-list] error messages in multiple languages ?
  2017-04-08 16:38 ` Xavier Leroy
@ 2017-04-08 16:51   ` Sébastien Hinderer
  2017-04-08 16:56     ` Xavier Leroy
  0 siblings, 1 reply; 24+ messages in thread
From: Sébastien Hinderer @ 2017-04-08 16:51 UTC (permalink / raw)
  To: caml-list

Hi Xavier,

Many thanks for providing the context, it's very interesting!

Xavier Leroy (2017/04/08 18:38 +0200):
> This implementation was pretty short and sweet, if I may say so myself, and
> possibly easier to use than gettext because by construction the english
> message was always available, even if translations were missing by
> mistake.

I am not following you here. Isn't that exactly the behaviour gettext
provides?

Sébastien.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Caml-list] error messages in multiple languages ?
  2017-04-08 16:51   ` Sébastien Hinderer
@ 2017-04-08 16:56     ` Xavier Leroy
  2017-04-09 19:50       ` Adrien Nader
  0 siblings, 1 reply; 24+ messages in thread
From: Xavier Leroy @ 2017-04-08 16:56 UTC (permalink / raw)
  To: caml-list

On 04/08/2017 06:51 PM, Sébastien Hinderer wrote:

>> This implementation was pretty short and sweet, if I may say so myself, and
>> possibly easier to use than gettext because by construction the english
>> message was always available, even if translations were missing by
>> mistake.
> 
> I am not following you here. Isn't that exactly the behaviour gettext
> provides?

Oops, yes, you're probably right.  I was confusing gettext with the (early?)
Java i18n library where every message was to be given a unique, short
identifier, then be looked up in a resource file.

- Xavier Leroy



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Caml-list] error messages in multiple languages ?
  2017-04-08 16:56     ` Xavier Leroy
@ 2017-04-09 19:50       ` Adrien Nader
  2017-04-10  6:14         ` Ian Zimmerman
  0 siblings, 1 reply; 24+ messages in thread
From: Adrien Nader @ 2017-04-09 19:50 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: caml-list

On Sat, Apr 08, 2017, Xavier Leroy wrote:
> On 04/08/2017 06:51 PM, Sébastien Hinderer wrote:
> 
> >> This implementation was pretty short and sweet, if I may say so myself, and
> >> possibly easier to use than gettext because by construction the english
> >> message was always available, even if translations were missing by
> >> mistake.
> > 
> > I am not following you here. Isn't that exactly the behaviour gettext
> > provides?
> 
> Oops, yes, you're probably right.  I was confusing gettext with the (early?)
> Java i18n library where every message was to be given a unique, short
> identifier, then be looked up in a resource file.

As far as I know this is also similar to catgets(3) which is the
i18n facility standardized in POSIX. The gettext documentation gives an
overview of it:
  http://gnu.huihoo.org/gettext-0.10.35/html_chapter/gettext_8.html

Unsurprisingly, pretty much everyone uses gettext rather than catgets.

Personally I've enjoyed using gettext and I've found that it provided
the features needed for a proper translation in a pretty good way.

I know that several large projects do translation updates during feature
freezes and bug fix periods (i.e. during release candidates). There are
also some (web) platforms to open the translation process to more people
(LibreOffice, XFCE and many others use transifex).

-- 
Adrien Nader

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Caml-list] error messages in multiple languages ?
  2017-04-09 19:50       ` Adrien Nader
@ 2017-04-10  6:14         ` Ian Zimmerman
  2017-04-10 13:20           ` Tao Stein
  0 siblings, 1 reply; 24+ messages in thread
From: Ian Zimmerman @ 2017-04-10  6:14 UTC (permalink / raw)
  To: caml-list

On 2017-04-09 21:50, Adrien Nader wrote:

> Unsurprisingly, pretty much everyone uses gettext rather than catgets.
> 
> Personally I've enjoyed using gettext and I've found that it provided
> the features needed for a proper translation in a pretty good way.

The one problem with gettext (which catgets lacks) is that it relies on
a piece of global data (the "text domain binding").  This makes any way
to handle translations in a shared library somewhat distasteful.

Admittedly one can wave the problem away by relying on the default
binding established when glibc or libintl is installed, and never
calling bindtextdomain().

-- 
Please *no* private Cc: on mailing lists and newsgroups
Personal signed mail: please _encrypt_ and sign
Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Caml-list] error messages in multiple languages ?
  2017-04-10  6:14         ` Ian Zimmerman
@ 2017-04-10 13:20           ` Tao Stein
  2017-04-10 13:45             ` Evgeny Roubinchtein
  0 siblings, 1 reply; 24+ messages in thread
From: Tao Stein @ 2017-04-10 13:20 UTC (permalink / raw)
  To: OCaml Mailing List

[-- Attachment #1: Type: text/plain, Size: 2054 bytes --]

Would people have concerns creating a compiler build dependency on
libgettext ?

Another concern is that xgettext seems to lack an OCaml back-end.

Also, there may be some advantages to having all the language strings
together in one file, as in the 1997 Caml Light implementation Xavier
shared. As opposed to the many .po files of a typical gettext workflow.
With one file it's easy to see all the translations for a string at once,
to ensure consistency. The gettext workflow, though somewhat complex, may
be more scalable. Though it's not clear to me paying for the scalability
with the additional complexity is worth it in this case. I'm undecided.

In terms of gettext versus catgets, some more knowledgeable people may have
better opinions. Searching around a bit, it seems that gettext is used more
often in open-source projects.

Tao Stein / 石涛 / تاو شتاين

On 10 April 2017 at 14:14, Ian Zimmerman <itz@primate.net> wrote:

> On 2017-04-09 21:50, Adrien Nader wrote:
>
> > Unsurprisingly, pretty much everyone uses gettext rather than catgets.
> >
> > Personally I've enjoyed using gettext and I've found that it provided
> > the features needed for a proper translation in a pretty good way.
>
> The one problem with gettext (which catgets lacks) is that it relies on
> a piece of global data (the "text domain binding").  This makes any way
> to handle translations in a shared library somewhat distasteful.
>
> Admittedly one can wave the problem away by relying on the default
> binding established when glibc or libintl is installed, and never
> calling bindtextdomain().
>
> --
> Please *no* private Cc: on mailing lists and newsgroups
> Personal signed mail: please _encrypt_ and sign
> Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

[-- Attachment #2: Type: text/html, Size: 3333 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Caml-list] error messages in multiple languages ?
  2017-04-10 13:20           ` Tao Stein
@ 2017-04-10 13:45             ` Evgeny Roubinchtein
  2017-04-10 14:04               ` Tao Stein
  0 siblings, 1 reply; 24+ messages in thread
From: Evgeny Roubinchtein @ 2017-04-10 13:45 UTC (permalink / raw)
  To: Tao Stein; +Cc: OCaml Mailing List

[-- Attachment #1: Type: text/plain, Size: 3085 bytes --]

> With one file it's easy to see all the translations for a string at once,
to ensure consistency.

Aren't you making an implicit assumption that a single person is able to
read all the languages?  If so, is that a good assumption?  I would argue
that it isn't.  Besides, bringing together several translations could
conceivably be done with tooling built on top of gettext.  One more
observation is that, if all translations are in one file, then there needs
to be a single text encoding that encodes all those languages.  Yes, I do
know about Unicode, but I also know that people may still wish to use other
encodings as a matter of habit, local convention, or even perceived
problems with Unicode (Han unification comes to mind, but there may be
other issues, for example different languages being best served by
different normalization forms).

-- 
Best,
Zhenya

On Mon, Apr 10, 2017 at 9:20 AM, Tao Stein <taostein@gmail.com> wrote:

>
> Would people have concerns creating a compiler build dependency on
> libgettext ?
>
> Another concern is that xgettext seems to lack an OCaml back-end.
>
> Also, there may be some advantages to having all the language strings
> together in one file, as in the 1997 Caml Light implementation Xavier
> shared. As opposed to the many .po files of a typical gettext workflow.
> With one file it's easy to see all the translations for a string at once,
> to ensure consistency. The gettext workflow, though somewhat complex, may
> be more scalable. Though it's not clear to me paying for the scalability
> with the additional complexity is worth it in this case. I'm undecided.
>
> In terms of gettext versus catgets, some more knowledgeable people may
> have better opinions. Searching around a bit, it seems that gettext is used
> more often in open-source projects.
>
> Tao Stein / 石涛 / تاو شتاين
>
> On 10 April 2017 at 14:14, Ian Zimmerman <itz@primate.net> wrote:
>
>> On 2017-04-09 21:50, Adrien Nader wrote:
>>
>> > Unsurprisingly, pretty much everyone uses gettext rather than catgets.
>> >
>> > Personally I've enjoyed using gettext and I've found that it provided
>> > the features needed for a proper translation in a pretty good way.
>>
>> The one problem with gettext (which catgets lacks) is that it relies on
>> a piece of global data (the "text domain binding").  This makes any way
>> to handle translations in a shared library somewhat distasteful.
>>
>> Admittedly one can wave the problem away by relying on the default
>> binding established when glibc or libintl is installed, and never
>> calling bindtextdomain().
>>
>> --
>> Please *no* private Cc: on mailing lists and newsgroups
>> Personal signed mail: please _encrypt_ and sign
>> Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html
>>
>> --
>> Caml-list mailing list.  Subscription management and archives:
>> https://sympa.inria.fr/sympa/arc/caml-list
>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>> Bug reports: http://caml.inria.fr/bin/caml-bugs
>>
>
>

[-- Attachment #2: Type: text/html, Size: 4796 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Caml-list] error messages in multiple languages ?
  2017-04-10 13:45             ` Evgeny Roubinchtein
@ 2017-04-10 14:04               ` Tao Stein
  2017-04-10 18:07                 ` Adrien Nader
  0 siblings, 1 reply; 24+ messages in thread
From: Tao Stein @ 2017-04-10 14:04 UTC (permalink / raw)
  To: Evgeny Roubinchtein; +Cc: OCaml Mailing List

[-- Attachment #1: Type: text/plain, Size: 3865 bytes --]

I'm not sure one would have to be able to read all the languages for there
to be some win. There may be some similarity between subsets of the
languages -- within the Latin languages, Traditional and Simplified
Chinese, for example. If I were putting in Traditional Chinese messages,
I'd want them to be consistent with the Simplified Chinese messages, and I
need not be able to read Spanish. Anyways, I'm not convinced this is a big
win. Maybe just something to think about. Any thoughts on using UTF-8 for
the file(s)?

And thoughts on the other points?

Tao Stein / 石涛 / تاو شتاين

On 10 April 2017 at 21:45, Evgeny Roubinchtein <zhenya1007@gmail.com> wrote:

> > With one file it's easy to see all the translations for a string at
> once, to ensure consistency.
>
> Aren't you making an implicit assumption that a single person is able to
> read all the languages?  If so, is that a good assumption?  I would argue
> that it isn't.  Besides, bringing together several translations could
> conceivably be done with tooling built on top of gettext.  One more
> observation is that, if all translations are in one file, then there needs
> to be a single text encoding that encodes all those languages.  Yes, I do
> know about Unicode, but I also know that people may still wish to use other
> encodings as a matter of habit, local convention, or even perceived
> problems with Unicode (Han unification comes to mind, but there may be
> other issues, for example different languages being best served by
> different normalization forms).
>
> --
> Best,
> Zhenya
>
> On Mon, Apr 10, 2017 at 9:20 AM, Tao Stein <taostein@gmail.com> wrote:
>
>>
>> Would people have concerns creating a compiler build dependency on
>> libgettext ?
>>
>> Another concern is that xgettext seems to lack an OCaml back-end.
>>
>> Also, there may be some advantages to having all the language strings
>> together in one file, as in the 1997 Caml Light implementation Xavier
>> shared. As opposed to the many .po files of a typical gettext workflow.
>> With one file it's easy to see all the translations for a string at once,
>> to ensure consistency. The gettext workflow, though somewhat complex, may
>> be more scalable. Though it's not clear to me paying for the scalability
>> with the additional complexity is worth it in this case. I'm undecided.
>>
>> In terms of gettext versus catgets, some more knowledgeable people may
>> have better opinions. Searching around a bit, it seems that gettext is used
>> more often in open-source projects.
>>
>> Tao Stein / 石涛 / تاو شتاين
>>
>> On 10 April 2017 at 14:14, Ian Zimmerman <itz@primate.net> wrote:
>>
>>> On 2017-04-09 21:50, Adrien Nader wrote:
>>>
>>> > Unsurprisingly, pretty much everyone uses gettext rather than catgets.
>>> >
>>> > Personally I've enjoyed using gettext and I've found that it provided
>>> > the features needed for a proper translation in a pretty good way.
>>>
>>> The one problem with gettext (which catgets lacks) is that it relies on
>>> a piece of global data (the "text domain binding").  This makes any way
>>> to handle translations in a shared library somewhat distasteful.
>>>
>>> Admittedly one can wave the problem away by relying on the default
>>> binding established when glibc or libintl is installed, and never
>>> calling bindtextdomain().
>>>
>>> --
>>> Please *no* private Cc: on mailing lists and newsgroups
>>> Personal signed mail: please _encrypt_ and sign
>>> Don't clear-text sign: http://cr.yp.to/smtp/8bitmime.html
>>>
>>> --
>>> Caml-list mailing list.  Subscription management and archives:
>>> https://sympa.inria.fr/sympa/arc/caml-list
>>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>>> Bug reports: http://caml.inria.fr/bin/caml-bugs
>>>
>>
>>
>

[-- Attachment #2: Type: text/html, Size: 6310 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Caml-list] error messages in multiple languages ?
  2017-04-10 14:04               ` Tao Stein
@ 2017-04-10 18:07                 ` Adrien Nader
  2017-04-10 19:45                   ` Hendrik Boom
  0 siblings, 1 reply; 24+ messages in thread
From: Adrien Nader @ 2017-04-10 18:07 UTC (permalink / raw)
  To: Tao Stein; +Cc: Evgeny Roubinchtein, OCaml Mailing List

On Mon, Apr 10, 2017, Tao Stein wrote:
> I'm not sure one would have to be able to read all the languages for there
> to be some win. There may be some similarity between subsets of the
> languages -- within the Latin languages, Traditional and Simplified
> Chinese, for example. If I were putting in Traditional Chinese messages,
> I'd want them to be consistent with the Simplified Chinese messages, and I
> need not be able to read Spanish. Anyways, I'm not convinced this is a big
> win. Maybe just something to think about. Any thoughts on using UTF-8 for
> the file(s)?
> 
> And thoughts on the other points?

Just think about the fact that some languages such as Polish have
several plurals (*) yet can be understood a bit by French speakers.
There are reasons translations need to be done by people knowledgeable
about both languages involved in a translation.

It is also worth pointing out that gettext's update-po task gives
statistics about the translations: number of translated strings, number
of untranslated strings, number of "fuzzy" translations (and maybe
others). It also seems to be able to perform some simple updates
automatically (was quite astonished to find out about this). The
workflow itself is really simple and usually amounts to "make
update-po", edit po/$LANG.po, update the translation which are marked
"fuzzy" and remove that keyword when you're done.

Without saying anything about the likelihood of such a change being
integrated, linking against libgettext would have to be optional and
note that there's also ocaml-gettext to take a look at. There are also
many many places that would need changes so any evolution would need
some plan and would also probably need to be done in steps.

(*) hopefully I didn't get that one wrong :P 

-- 
Adrien Nader

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Caml-list] error messages in multiple languages ?
  2017-04-10 18:07                 ` Adrien Nader
@ 2017-04-10 19:45                   ` Hendrik Boom
  2017-04-10 19:49                     ` Dušan Kolář
  0 siblings, 1 reply; 24+ messages in thread
From: Hendrik Boom @ 2017-04-10 19:45 UTC (permalink / raw)
  To: caml-list

On Mon, Apr 10, 2017 at 08:07:14PM +0200, Adrien Nader wrote:
> 
> Just think about the fact that some languages such as Polish have
> several plurals (*) yet can be understood a bit by French speakers.
...
...
> (*) hopefully I didn't get that one wrong :P 

Don't know about Polish, but there are languages that disttinguish 
singular (one thing), dual (two things) and plural (many things).

-- hendrik

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Caml-list] error messages in multiple languages ?
  2017-04-10 19:45                   ` Hendrik Boom
@ 2017-04-10 19:49                     ` Dušan Kolář
  2017-04-11  0:38                       ` Tao Stein
  0 siblings, 1 reply; 24+ messages in thread
From: Dušan Kolář @ 2017-04-10 19:49 UTC (permalink / raw)
  To: caml-list



On 10.4.2017 21:45, Hendrik Boom wrote:
> On Mon, Apr 10, 2017 at 08:07:14PM +0200, Adrien Nader wrote:
>> Just think about the fact that some languages such as Polish have
>> several plurals (*) yet can be understood a bit by French speakers.
> ...
> ...
>> (*) hopefully I didn't get that one wrong :P
> Don't know about Polish, but there are languages that disttinguish
> singular (one thing), dual (two things) and plural (many things).
>
> -- hendrik
>
And even worse :-)
In Czech, we distinguish one thing (singular), 2-4 things, and 5 and 
more things...

Dušan


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Caml-list] error messages in multiple languages ?
  2017-04-10 19:49                     ` Dušan Kolář
@ 2017-04-11  0:38                       ` Tao Stein
  0 siblings, 0 replies; 24+ messages in thread
From: Tao Stein @ 2017-04-11  0:38 UTC (permalink / raw)
  To: Hendrik Boom; +Cc: OCaml Mailing List

[-- Attachment #1: Type: text/plain, Size: 1478 bytes --]

On 10.4.2017 21:45, Hendrik Boom wrote:

> It is also worth pointing out that gettext's update-po task gives
> statistics about the translations: number of translated strings, number
> of untranslated strings, number of "fuzzy" translations


Does GNU gettext have an xgettext OCaml back-end to do this extraction and
automation? ...
I poked around in the source and the man page and didn't find one. (Most of
the OCaml compiler being written in OCaml as opposed to C)

Tao Stein / 石涛 / تاو شتاين

On 11 April 2017 at 03:49, Dušan Kolář <kolar@fit.vut.cz> wrote:

>
>
> On 10.4.2017 21:45, Hendrik Boom wrote:
>
>> On Mon, Apr 10, 2017 at 08:07:14PM +0200, Adrien Nader wrote:
>>
>>> Just think about the fact that some languages such as Polish have
>>> several plurals (*) yet can be understood a bit by French speakers.
>>>
>> ...
>> ...
>>
>>> (*) hopefully I didn't get that one wrong :P
>>>
>> Don't know about Polish, but there are languages that disttinguish
>> singular (one thing), dual (two things) and plural (many things).
>>
>> -- hendrik
>>
>> And even worse :-)
> In Czech, we distinguish one thing (singular), 2-4 things, and 5 and more
> things...
>
> Dušan
>
>
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

[-- Attachment #2: Type: text/html, Size: 3236 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Caml-list] error messages in multiple languages ?
  2017-04-08 14:22 [Caml-list] error messages in multiple languages ? Tao Stein
  2017-04-08 14:43 ` Gabriel Scherer
  2017-04-08 16:38 ` Xavier Leroy
@ 2017-04-11 14:05 ` Richard W.M. Jones
  2017-04-11 14:18   ` Gabriel Scherer
  2 siblings, 1 reply; 24+ messages in thread
From: Richard W.M. Jones @ 2017-04-11 14:05 UTC (permalink / raw)
  To: Tao Stein; +Cc: OCaml Mailing List

It looks like people have already mentioned getttext.

I want to add that OCaml already has an excellent gettext
implementation.  No need to reinvent any wheels.

  https://forge.ocamlcore.org/projects/ocaml-gettext/

We use it every day in libguestfs, an example picked at random
(there are thousands more):

  https://github.com/libguestfs/libguestfs/blob/master/v2v/input_libvirt.ml#L39

Therefore you might think I'd be very exciting about having the OCaml
compiler messages being localized.  That not so much.  I find that it
makes it considerably easier to search for error messages, and also to
help people, if they are in a single language.  It's for this reason
that we don't translate debugging and other internal messages in our
tools.  (But being an English native speaker it's a lot easier for me,
so take this with a pinch of salt.)

Rich.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Caml-list] error messages in multiple languages ?
  2017-04-11 14:05 ` Richard W.M. Jones
@ 2017-04-11 14:18   ` Gabriel Scherer
  2017-04-11 14:59     ` Tao Stein
  0 siblings, 1 reply; 24+ messages in thread
From: Gabriel Scherer @ 2017-04-11 14:18 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: Tao Stein, OCaml Mailing List

[-- Attachment #1: Type: text/plain, Size: 1700 bytes --]

> I find that it makes it considerably easier to search for error messages

On this specific topic, I would be interested in having OCaml compiler
error messages numbered, just as warnings already are, precisely because it
makes it much easier to reference them (is robust to change of wording),
and for example look up a specific error in the manual for further
explanations -- we recently started doing this for warnings, see
http://caml.inria.fr/pub/docs/manual-ocaml/comp.html#sec270 .



On Tue, Apr 11, 2017 at 10:05 AM, Richard W.M. Jones <rich@annexia.org>
wrote:

> It looks like people have already mentioned getttext.
>
> I want to add that OCaml already has an excellent gettext
> implementation.  No need to reinvent any wheels.
>
>   https://forge.ocamlcore.org/projects/ocaml-gettext/
>
> We use it every day in libguestfs, an example picked at random
> (there are thousands more):
>
>   https://github.com/libguestfs/libguestfs/blob/master/v2v/
> input_libvirt.ml#L39
>
> Therefore you might think I'd be very exciting about having the OCaml
> compiler messages being localized.  That not so much.  I find that it
> makes it considerably easier to search for error messages, and also to
> help people, if they are in a single language.  It's for this reason
> that we don't translate debugging and other internal messages in our
> tools.  (But being an English native speaker it's a lot easier for me,
> so take this with a pinch of salt.)
>
> Rich.
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

[-- Attachment #2: Type: text/html, Size: 2784 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Caml-list] error messages in multiple languages ?
  2017-04-11 14:18   ` Gabriel Scherer
@ 2017-04-11 14:59     ` Tao Stein
  2017-04-11 17:17       ` Allan Wegan
  0 siblings, 1 reply; 24+ messages in thread
From: Tao Stein @ 2017-04-11 14:59 UTC (permalink / raw)
  To: Gabriel Scherer; +Cc: Richard W.M. Jones, OCaml Mailing List

[-- Attachment #1: Type: text/plain, Size: 2553 bytes --]

Numbering the messages is a great idea, even if it is using Latin script. I
don't know about education in Arabic countries, but in China learners
generally learn math and arithmetic using Latin numbers as opposed to the
Chinese script numbers (一, 二, 三, 四, etc) which is used in written text. So
everyone is familiar with Latin scripting for numbers. Latin for error
numbers would probably be fine.

Richard W.M. Jones <rich@annexia.org> wrote:

> I want to add that OCaml already has an excellent gettext implementation.
> No need to reinvent any wheels.


Thank you Richard. I will take a look at that.

Tao Stein / 石涛 / تاو شتاين

On 11 April 2017 at 22:18, Gabriel Scherer <gabriel.scherer@gmail.com>
wrote:

> > I find that it makes it considerably easier to search for error messages
>
> On this specific topic, I would be interested in having OCaml compiler
> error messages numbered, just as warnings already are, precisely because it
> makes it much easier to reference them (is robust to change of wording),
> and for example look up a specific error in the manual for further
> explanations -- we recently started doing this for warnings, see
> http://caml.inria.fr/pub/docs/manual-ocaml/comp.html#sec270 .
>
>
>
> On Tue, Apr 11, 2017 at 10:05 AM, Richard W.M. Jones <rich@annexia.org>
> wrote:
>
>> It looks like people have already mentioned getttext.
>>
>> I want to add that OCaml already has an excellent gettext
>> implementation.  No need to reinvent any wheels.
>>
>>   https://forge.ocamlcore.org/projects/ocaml-gettext/
>>
>> We use it every day in libguestfs, an example picked at random
>> (there are thousands more):
>>
>>   https://github.com/libguestfs/libguestfs/blob/master/v2v/inp
>> ut_libvirt.ml#L39
>>
>> Therefore you might think I'd be very exciting about having the OCaml
>> compiler messages being localized.  That not so much.  I find that it
>> makes it considerably easier to search for error messages, and also to
>> help people, if they are in a single language.  It's for this reason
>> that we don't translate debugging and other internal messages in our
>> tools.  (But being an English native speaker it's a lot easier for me,
>> so take this with a pinch of salt.)
>>
>> Rich.
>>
>> --
>> Caml-list mailing list.  Subscription management and archives:
>> https://sympa.inria.fr/sympa/arc/caml-list
>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>> Bug reports: http://caml.inria.fr/bin/caml-bugs
>>
>
>

[-- Attachment #2: Type: text/html, Size: 4858 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Caml-list] error messages in multiple languages ?
  2017-04-11 14:59     ` Tao Stein
@ 2017-04-11 17:17       ` Allan Wegan
  2017-04-11 19:07         ` Glen Mével
  0 siblings, 1 reply; 24+ messages in thread
From: Allan Wegan @ 2017-04-11 17:17 UTC (permalink / raw)
  To: caml-list


[-- Attachment #1.1: Type: text/plain, Size: 875 bytes --]

> Numbering the messages is a great idea, even if it is using Latin
> script. I don't know about education in Arabic countries, but in
> China learners generally learn math and arithmetic using Latin
> numbers as opposed to the Chinese script numbers (一, 二, 三, 四, etc)
> which is used in written text.

Actually, the arabic digits are commonly used in languages using Latin
letters.

The Arabic countries should have no problem with the use of their digits
for error numbers. ;)

https://en.wikipedia.org/wiki/Arabic_numerals



-- 
Allan Wegan
<http://www.allanwegan.de/>
Jabber: allanwegan@ffnord.net
 OTR-Fingerprint: E4DCAA40 4859428E B3912896 F2498604 8CAA126F
Jabber: allanwegan@jabber.ccc.de
 OTR-Fingerprint: A1AAA1B9 C067F988 4A424D33 98343469 29164587
ICQ: 209459114
 OTR-Fingerprint: 71DE5B5E 67D6D758 A93BF1CE 7DA06625 205AC6EC


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Caml-list] error messages in multiple languages ?
  2017-04-11 17:17       ` Allan Wegan
@ 2017-04-11 19:07         ` Glen Mével
  2017-04-11 23:04           ` Allan Wegan
  0 siblings, 1 reply; 24+ messages in thread
From: Glen Mével @ 2017-04-11 19:07 UTC (permalink / raw)
  To: caml-list; +Cc: Allan Wegan

[-- Attachment #1: Type: text/plain, Size: 790 bytes --]

Allan Wegan a écrit (le 11/04/2017 à 19:17) :

> Actually, the arabic digits are commonly used in languages using Latin
> letters.
>
> The Arabic countries should have no problem with the use of their
> digits for error numbers. ;)
>
> https://en.wikipedia.org/wiki/Arabic_numerals

careful here, the “(hindu‐)arabic digits” used in European languages
(0123456789) are similar, but not identical to, the symbols that actual
arabic languages use nowadays (“eastern arabic digits”,
٠‎١‎٢‎٣‎٤‎٥‎٦‎٧‎٨‎٩). there even are false friends (e·g· the eastern 4
looks like a reversed western 3, the eastern 5 looks like a western 0,
the eastern 6 looks like a western 7).

yeah. confusing.

-- 
غلين ميفيل,
helpful as always.

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Caml-list] error messages in multiple languages ?
  2017-04-11 19:07         ` Glen Mével
@ 2017-04-11 23:04           ` Allan Wegan
  2017-04-12  0:12             ` Tao Stein
  0 siblings, 1 reply; 24+ messages in thread
From: Allan Wegan @ 2017-04-11 23:04 UTC (permalink / raw)
  To: caml-list


[-- Attachment #1.1: Type: text/plain, Size: 1165 bytes --]

> careful here, the “(hindu‐)arabic digits” used in European languages
> (0123456789) are similar, but not identical to, the symbols that actual
> arabic languages use nowadays (“eastern arabic digits”,
> ٠‎١‎٢‎٣‎٤‎٥‎٦‎٧‎٨‎٩). there even are false friends (e·g· the eastern 4
> looks like a reversed western 3, the eastern 5 looks like a western 0,
> the eastern 6 looks like a western 7).
> 
> yeah. confusing.

Ideed. Must have been wishfull thinking on my side.

Not translating the thing at all may be the wiser option. It might serve
the greater goal of finally establishing one universal world script and
language, everyone has to learn to be able to participate in the global
tech community (and written English is at least somewhat easy to learn)...



Greetings from Germany
-- 
Allan Wegan
<http://www.allanwegan.de/>
Jabber: allanwegan@ffnord.net
 OTR-Fingerprint: E4DCAA40 4859428E B3912896 F2498604 8CAA126F
Jabber: allanwegan@jabber.ccc.de
 OTR-Fingerprint: A1AAA1B9 C067F988 4A424D33 98343469 29164587
ICQ: 209459114
 OTR-Fingerprint: 71DE5B5E 67D6D758 A93BF1CE 7DA06625 205AC6EC


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Caml-list] error messages in multiple languages ?
  2017-04-11 23:04           ` Allan Wegan
@ 2017-04-12  0:12             ` Tao Stein
  2017-04-16 22:37               ` Evgeny Roubinchtein
  0 siblings, 1 reply; 24+ messages in thread
From: Tao Stein @ 2017-04-12  0:12 UTC (permalink / raw)
  To: Allan Wegan; +Cc: OCaml Mailing List

[-- Attachment #1: Type: text/plain, Size: 4713 bytes --]

German and French are closer to English than Arabic or Chinese, especially
in the script.

As an experiment in empathy, I encourage folks to examine this working
OCaml code where I've replaced the Latin tokens and identifiers with
Chinese ones: https://github.com/taostein/hanma/blob/master/example.hm .
Chinese lacks capital letters [1], so I use the prefix "卜" instead. The
mapping of tokens is here (in the parsing/lexer.mll diff):
https://github.com/taostein/hanma/blob/master/lexer.mll.diff

Reading code is hard when the script model isn't functioning in the fast
processing part of your brain. Granted, Chinese has more characters than
Latin, but training a brain to do fast processing of script takes years,
even if it's Latin. Sometimes we forget it took us years to learn to read,
for most of us that was a long time ago.

I've taught Chinese students OCaml programming using Latin tokens and I've
taught the same replacing those Latin tokens with Chinese ones. I tried
this as an experiment and I was surprised at the outcome. Previously, I
thought as most of you probably do -- come on, it's just a few tokens plus
logic -- not hard. How many tokens are there in C, like 30? I could
memorize those in a day! I WAS WRONG. The students were markedly more
motivated and enthusiastic when coding in their own script. And these are
smart people, among China's brightest. Motivated learners learn better and
are also more fun to teach. This teaching experience is what inspired me to
undertake this translation project.

My observations are qualitative, because I've been focused on the teaching
part, as opposed to the research about teaching part, but I hope to gather
more data in future semesters and write a report about these findings. The
qualitative results were strong -- script matters. I believe it's about
script, not language. Parsing a foreign script quickly is really hard on
the brain. We need the brain for the hard parts of programming.

There are obviously many pieces of OCaml that need translation; manuals,
errors and warnings, libraries, the core code, comments. I think error
messages are a good place to start. We can work on different pieces in
parallel. And hopefully we can build something useful for scripts other
than Chinese, like Arabic and Russian. If you are interested in helping
with this project, please get in touch with me directly.

Yes, we want to build a global tech community. We must start from empathy.
Maybe the Arabs and Chinese (and Russians and Koreans and Japanese)
"should" or "shouldn't" learn English (or German or French or Latin or some
other Western European language), under some definition of "should" (refer
to various moral theories). But "should" is academic -- they're NOT going
to learn English. If anything, the trend is moving in the other direction.
China, for example, is lowering its university-level english requirements.
So the question is: how global and how big do we want this so-called
"global" tech community to be? Empathy and good translation tools can help
us make it a real global (no scare quotes) community.

Tao Stein / 石涛 / تاو شتاين

Yes, by Arabic numbers I meant the numeric script used by Arabs, not what
the Oxford English Dictionary calls arabic (lower-case) numbers.

[1] Chinese also lacks a plural form, which does somewhat ease error
messaging.

On 12 April 2017 at 07:04, Allan Wegan <allanwegan@allanwegan.de> wrote:

> > careful here, the “(hindu‐)arabic digits” used in European languages
> > (0123456789) are similar, but not identical to, the symbols that actual
> > arabic languages use nowadays (“eastern arabic digits”,
> > ٠‎١‎٢‎٣‎٤‎٥‎٦‎٧‎٨‎٩). there even are false friends (e·g· the eastern 4
> > looks like a reversed western 3, the eastern 5 looks like a western 0,
> > the eastern 6 looks like a western 7).
> >
> > yeah. confusing.
>
> Ideed. Must have been wishfull thinking on my side.
>
> Not translating the thing at all may be the wiser option. It might serve
> the greater goal of finally establishing one universal world script and
> language, everyone has to learn to be able to participate in the global
> tech community (and written English is at least somewhat easy to learn)...
>
>
>
> Greetings from Germany
> --
> Allan Wegan
> <http://www.allanwegan.de/>
> Jabber: allanwegan@ffnord.net
>  OTR-Fingerprint: E4DCAA40 4859428E B3912896 F2498604 8CAA126F
> Jabber: allanwegan@jabber.ccc.de
>  OTR-Fingerprint: A1AAA1B9 C067F988 4A424D33 98343469 29164587
> ICQ: 209459114
>  OTR-Fingerprint: 71DE5B5E 67D6D758 A93BF1CE 7DA06625 205AC6EC
>
>

[-- Attachment #2: Type: text/html, Size: 5889 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Caml-list] error messages in multiple languages ?
  2017-04-12  0:12             ` Tao Stein
@ 2017-04-16 22:37               ` Evgeny Roubinchtein
  0 siblings, 0 replies; 24+ messages in thread
From: Evgeny Roubinchtein @ 2017-04-16 22:37 UTC (permalink / raw)
  To: Tao Stein, OCaml Mailing List

[-- Attachment #1: Type: text/plain, Size: 5781 bytes --]

I will point out that reading Java documentation suggests to me that it
solves the "locale-appropriate digits" problem.  See, for example, the
bottom of the following page, where Thai digits are being used to print out
a number:
https://docs.oracle.com/javase/tutorial/i18n/locale/create.html#constants.
The relevance of Java is that most of JDK is under an open source license
(though I cannot comment on whether the license would allow lifting that
portion of implementation into OCaml).  The important point here is that
this is a problem that has been solved in at least one largely open-source
technology.

I deliberately am choosing not to comment on your other points, because I
view them as only tangentially related to the issue at hand, which is how
to handle translation of OCaml error messages.

-- 
Best,
Zhenya

On Tue, Apr 11, 2017 at 8:12 PM, Tao Stein <taostein@gmail.com> wrote:

>
> German and French are closer to English than Arabic or Chinese, especially
> in the script.
>
> As an experiment in empathy, I encourage folks to examine this working
> OCaml code where I've replaced the Latin tokens and identifiers with
> Chinese ones: https://github.com/taostein/hanma/blob/master/example.hm .
> Chinese lacks capital letters [1], so I use the prefix "卜" instead. The
> mapping of tokens is here (in the parsing/lexer.mll diff):
> https://github.com/taostein/hanma/blob/master/lexer.mll.diff
>
> Reading code is hard when the script model isn't functioning in the fast
> processing part of your brain. Granted, Chinese has more characters than
> Latin, but training a brain to do fast processing of script takes years,
> even if it's Latin. Sometimes we forget it took us years to learn to read,
> for most of us that was a long time ago.
>
> I've taught Chinese students OCaml programming using Latin tokens and I've
> taught the same replacing those Latin tokens with Chinese ones. I tried
> this as an experiment and I was surprised at the outcome. Previously, I
> thought as most of you probably do -- come on, it's just a few tokens plus
> logic -- not hard. How many tokens are there in C, like 30? I could
> memorize those in a day! I WAS WRONG. The students were markedly more
> motivated and enthusiastic when coding in their own script. And these are
> smart people, among China's brightest. Motivated learners learn better and
> are also more fun to teach. This teaching experience is what inspired me to
> undertake this translation project.
>
> My observations are qualitative, because I've been focused on the teaching
> part, as opposed to the research about teaching part, but I hope to gather
> more data in future semesters and write a report about these findings. The
> qualitative results were strong -- script matters. I believe it's about
> script, not language. Parsing a foreign script quickly is really hard on
> the brain. We need the brain for the hard parts of programming.
>
> There are obviously many pieces of OCaml that need translation; manuals,
> errors and warnings, libraries, the core code, comments. I think error
> messages are a good place to start. We can work on different pieces in
> parallel. And hopefully we can build something useful for scripts other
> than Chinese, like Arabic and Russian. If you are interested in helping
> with this project, please get in touch with me directly.
>
> Yes, we want to build a global tech community. We must start from empathy.
> Maybe the Arabs and Chinese (and Russians and Koreans and Japanese)
> "should" or "shouldn't" learn English (or German or French or Latin or some
> other Western European language), under some definition of "should" (refer
> to various moral theories). But "should" is academic -- they're NOT going
> to learn English. If anything, the trend is moving in the other direction.
> China, for example, is lowering its university-level english requirements.
> So the question is: how global and how big do we want this so-called
> "global" tech community to be? Empathy and good translation tools can help
> us make it a real global (no scare quotes) community.
>
> Tao Stein / 石涛 / تاو شتاين
>
> Yes, by Arabic numbers I meant the numeric script used by Arabs, not what
> the Oxford English Dictionary calls arabic (lower-case) numbers.
>
> [1] Chinese also lacks a plural form, which does somewhat ease error
> messaging.
>
> On 12 April 2017 at 07:04, Allan Wegan <allanwegan@allanwegan.de> wrote:
>
>> > careful here, the “(hindu‐)arabic digits” used in European languages
>> > (0123456789) are similar, but not identical to, the symbols that actual
>> > arabic languages use nowadays (“eastern arabic digits”,
>> > ٠‎١‎٢‎٣‎٤‎٥‎٦‎٧‎٨‎٩). there even are false friends (e·g· the eastern 4
>> > looks like a reversed western 3, the eastern 5 looks like a western 0,
>> > the eastern 6 looks like a western 7).
>> >
>> > yeah. confusing.
>>
>> Ideed. Must have been wishfull thinking on my side.
>>
>> Not translating the thing at all may be the wiser option. It might serve
>> the greater goal of finally establishing one universal world script and
>> language, everyone has to learn to be able to participate in the global
>> tech community (and written English is at least somewhat easy to learn)...
>>
>>
>>
>> Greetings from Germany
>> --
>> Allan Wegan
>> <http://www.allanwegan.de/>
>> Jabber: allanwegan@ffnord.net
>>  OTR-Fingerprint: E4DCAA40 4859428E B3912896 F2498604 8CAA126F
>> Jabber: allanwegan@jabber.ccc.de
>>  OTR-Fingerprint: A1AAA1B9 C067F988 4A424D33 98343469 29164587
>> ICQ: 209459114
>>  OTR-Fingerprint: 71DE5B5E 67D6D758 A93BF1CE 7DA06625 205AC6EC
>>
>>
>

[-- Attachment #2: Type: text/html, Size: 7442 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Caml-list] error messages in multiple languages ?
@ 2017-04-09 17:15 Андрей Бергман
  0 siblings, 0 replies; 24+ messages in thread
From: Андрей Бергман @ 2017-04-09 17:15 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: caml-list

> and at the i18n engine itself, which was just a wrapper around "printf" that
> used the english format message as an index into the translations:
>
> https://github.com/camllight/camllight/blob/master/sources/src/compiler/interntl.ml
>
> This implementation was pretty short and sweet, if I may say so myself, and
> possibly easier to use than gettext because by construction the english
> message was always available, even if translations were missing by mistake.

Why not to forward port it? The only problems I foresee - are:

1. Keeping translations up to date. It is actually quite minor - can be done by Linux distribution maintainers and professors like Tao Stein, who know correct terminology.

All you have to provide is nice and detailed instructions (in english) on how to make/update a translation and submit a patch to upstream. This is common - many projects have README.i18n and similar files or have instructions on-line. See, for instance https://github.com/doxygen/doxygen/blob/master/LANGUAGE.HOWTO

2. Support in different OSes, especially Windows, of languages with non-Latin codepages. I.e. the initial testing and/or contribution would be best done by people from Asian countries. ;-)

For russian this problem is quite complicated. There are 5 still used different codepages (cp866, cp1251, koi8-r, UTF8, UTF16). For Linux it is more-less possible to limit ourselves to UTF8). In Windows, which is very unfortunately, we still have to use almost all of them.

For instance, default console codepage is cp866. But if you redirect the output to file and open it with notepad, it will read it using cp1251 codepage. And you'll get garbage. I expect similar problems for other asian languages.

So, it is doable for Linux, but for Windows the only reliably working solution is to use English as output messages. OSX, if I remember right, uses UTF8.

---------------------------
On the other hand I recently read about project on translation of Harris & Harris book. The motivation was to reduce the complexity of the subject (CPU pipeline + foreign language) for better understanding by students. Experiments proved that translation did really help.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2017-04-16 22:37 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-08 14:22 [Caml-list] error messages in multiple languages ? Tao Stein
2017-04-08 14:43 ` Gabriel Scherer
2017-04-08 15:03   ` Sébastien Hinderer
2017-04-08 16:38 ` Xavier Leroy
2017-04-08 16:51   ` Sébastien Hinderer
2017-04-08 16:56     ` Xavier Leroy
2017-04-09 19:50       ` Adrien Nader
2017-04-10  6:14         ` Ian Zimmerman
2017-04-10 13:20           ` Tao Stein
2017-04-10 13:45             ` Evgeny Roubinchtein
2017-04-10 14:04               ` Tao Stein
2017-04-10 18:07                 ` Adrien Nader
2017-04-10 19:45                   ` Hendrik Boom
2017-04-10 19:49                     ` Dušan Kolář
2017-04-11  0:38                       ` Tao Stein
2017-04-11 14:05 ` Richard W.M. Jones
2017-04-11 14:18   ` Gabriel Scherer
2017-04-11 14:59     ` Tao Stein
2017-04-11 17:17       ` Allan Wegan
2017-04-11 19:07         ` Glen Mével
2017-04-11 23:04           ` Allan Wegan
2017-04-12  0:12             ` Tao Stein
2017-04-16 22:37               ` Evgeny Roubinchtein
2017-04-09 17:15 Андрей Бергман

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).