I will point out that reading Java documentation suggests to me that it
solves the "locale-appropriate digits" problem.  See, for example, the
bottom of the following page, where Thai digits are being used to print out
a number:
https://docs.oracle.com/javase/tutorial/i18n/locale/create.html#constants.
The relevance of Java is that most of JDK is under an open source license
(though I cannot comment on whether the license would allow lifting that
portion of implementation into OCaml).  The important point here is that
this is a problem that has been solved in at least one largely open-source
technology.

I deliberately am choosing not to comment on your other points, because I
view them as only tangentially related to the issue at hand, which is how
to handle translation of OCaml error messages.

-- 
Best,
Zhenya

On Tue, Apr 11, 2017 at 8:12 PM, Tao Stein <taostein@gmail.com> wrote:

>
> German and French are closer to English than Arabic or Chinese, especially
> in the script.
>
> As an experiment in empathy, I encourage folks to examine this working
> OCaml code where I've replaced the Latin tokens and identifiers with
> Chinese ones: https://github.com/taostein/hanma/blob/master/example.hm .
> Chinese lacks capital letters [1], so I use the prefix "卜" instead. The
> mapping of tokens is here (in the parsing/lexer.mll diff):
> https://github.com/taostein/hanma/blob/master/lexer.mll.diff
>
> Reading code is hard when the script model isn't functioning in the fast
> processing part of your brain. Granted, Chinese has more characters than
> Latin, but training a brain to do fast processing of script takes years,
> even if it's Latin. Sometimes we forget it took us years to learn to read,
> for most of us that was a long time ago.
>
> I've taught Chinese students OCaml programming using Latin tokens and I've
> taught the same replacing those Latin tokens with Chinese ones. I tried
> this as an experiment and I was surprised at the outcome. Previously, I
> thought as most of you probably do -- come on, it's just a few tokens plus
> logic -- not hard. How many tokens are there in C, like 30? I could
> memorize those in a day! I WAS WRONG. The students were markedly more
> motivated and enthusiastic when coding in their own script. And these are
> smart people, among China's brightest. Motivated learners learn better and
> are also more fun to teach. This teaching experience is what inspired me to
> undertake this translation project.
>
> My observations are qualitative, because I've been focused on the teaching
> part, as opposed to the research about teaching part, but I hope to gather
> more data in future semesters and write a report about these findings. The
> qualitative results were strong -- script matters. I believe it's about
> script, not language. Parsing a foreign script quickly is really hard on
> the brain. We need the brain for the hard parts of programming.
>
> There are obviously many pieces of OCaml that need translation; manuals,
> errors and warnings, libraries, the core code, comments. I think error
> messages are a good place to start. We can work on different pieces in
> parallel. And hopefully we can build something useful for scripts other
> than Chinese, like Arabic and Russian. If you are interested in helping
> with this project, please get in touch with me directly.
>
> Yes, we want to build a global tech community. We must start from empathy.
> Maybe the Arabs and Chinese (and Russians and Koreans and Japanese)
> "should" or "shouldn't" learn English (or German or French or Latin or some
> other Western European language), under some definition of "should" (refer
> to various moral theories). But "should" is academic -- they're NOT going
> to learn English. If anything, the trend is moving in the other direction.
> China, for example, is lowering its university-level english requirements.
> So the question is: how global and how big do we want this so-called
> "global" tech community to be? Empathy and good translation tools can help
> us make it a real global (no scare quotes) community.
>
> Tao Stein / 石涛 / تاو شتاين
>
> Yes, by Arabic numbers I meant the numeric script used by Arabs, not what
> the Oxford English Dictionary calls arabic (lower-case) numbers.
>
> [1] Chinese also lacks a plural form, which does somewhat ease error
> messaging.
>
> On 12 April 2017 at 07:04, Allan Wegan <allanwegan@allanwegan.de> wrote:
>
>> > careful here, the “(hindu‐)arabic digits” used in European languages
>> > (0123456789) are similar, but not identical to, the symbols that actual
>> > arabic languages use nowadays (“eastern arabic digits”,
>> > ٠‎١‎٢‎٣‎٤‎٥‎٦‎٧‎٨‎٩). there even are false friends (e·g· the eastern 4
>> > looks like a reversed western 3, the eastern 5 looks like a western 0,
>> > the eastern 6 looks like a western 7).
>> >
>> > yeah. confusing.
>>
>> Ideed. Must have been wishfull thinking on my side.
>>
>> Not translating the thing at all may be the wiser option. It might serve
>> the greater goal of finally establishing one universal world script and
>> language, everyone has to learn to be able to participate in the global
>> tech community (and written English is at least somewhat easy to learn)...
>>
>>
>>
>> Greetings from Germany
>> --
>> Allan Wegan
>> <http://www.allanwegan.de/>
>> Jabber: allanwegan@ffnord.net
>>  OTR-Fingerprint: E4DCAA40 4859428E B3912896 F2498604 8CAA126F
>> Jabber: allanwegan@jabber.ccc.de
>>  OTR-Fingerprint: A1AAA1B9 C067F988 4A424D33 98343469 29164587
>> ICQ: 209459114
>>  OTR-Fingerprint: 71DE5B5E 67D6D758 A93BF1CE 7DA06625 205AC6EC
>>
>>
>