I will point out that reading Java documentation suggests to me that it solves the "locale-appropriate digits" problem. See, for example, the bottom of the following page, where Thai digits are being used to print out a number: https://docs.oracle.com/javase/tutorial/i18n/locale/create.html#constants. The relevance of Java is that most of JDK is under an open source license (though I cannot comment on whether the license would allow lifting that portion of implementation into OCaml). The important point here is that this is a problem that has been solved in at least one largely open-source technology. I deliberately am choosing not to comment on your other points, because I view them as only tangentially related to the issue at hand, which is how to handle translation of OCaml error messages. -- Best, Zhenya On Tue, Apr 11, 2017 at 8:12 PM, Tao Stein wrote: > > German and French are closer to English than Arabic or Chinese, especially > in the script. > > As an experiment in empathy, I encourage folks to examine this working > OCaml code where I've replaced the Latin tokens and identifiers with > Chinese ones: https://github.com/taostein/hanma/blob/master/example.hm . > Chinese lacks capital letters [1], so I use the prefix "卜" instead. The > mapping of tokens is here (in the parsing/lexer.mll diff): > https://github.com/taostein/hanma/blob/master/lexer.mll.diff > > Reading code is hard when the script model isn't functioning in the fast > processing part of your brain. Granted, Chinese has more characters than > Latin, but training a brain to do fast processing of script takes years, > even if it's Latin. Sometimes we forget it took us years to learn to read, > for most of us that was a long time ago. > > I've taught Chinese students OCaml programming using Latin tokens and I've > taught the same replacing those Latin tokens with Chinese ones. I tried > this as an experiment and I was surprised at the outcome. Previously, I > thought as most of you probably do -- come on, it's just a few tokens plus > logic -- not hard. How many tokens are there in C, like 30? I could > memorize those in a day! I WAS WRONG. The students were markedly more > motivated and enthusiastic when coding in their own script. And these are > smart people, among China's brightest. Motivated learners learn better and > are also more fun to teach. This teaching experience is what inspired me to > undertake this translation project. > > My observations are qualitative, because I've been focused on the teaching > part, as opposed to the research about teaching part, but I hope to gather > more data in future semesters and write a report about these findings. The > qualitative results were strong -- script matters. I believe it's about > script, not language. Parsing a foreign script quickly is really hard on > the brain. We need the brain for the hard parts of programming. > > There are obviously many pieces of OCaml that need translation; manuals, > errors and warnings, libraries, the core code, comments. I think error > messages are a good place to start. We can work on different pieces in > parallel. And hopefully we can build something useful for scripts other > than Chinese, like Arabic and Russian. If you are interested in helping > with this project, please get in touch with me directly. > > Yes, we want to build a global tech community. We must start from empathy. > Maybe the Arabs and Chinese (and Russians and Koreans and Japanese) > "should" or "shouldn't" learn English (or German or French or Latin or some > other Western European language), under some definition of "should" (refer > to various moral theories). But "should" is academic -- they're NOT going > to learn English. If anything, the trend is moving in the other direction. > China, for example, is lowering its university-level english requirements. > So the question is: how global and how big do we want this so-called > "global" tech community to be? Empathy and good translation tools can help > us make it a real global (no scare quotes) community. > > Tao Stein / 石涛 / تاو شتاين > > Yes, by Arabic numbers I meant the numeric script used by Arabs, not what > the Oxford English Dictionary calls arabic (lower-case) numbers. > > [1] Chinese also lacks a plural form, which does somewhat ease error > messaging. > > On 12 April 2017 at 07:04, Allan Wegan wrote: > >> > careful here, the “(hindu‐)arabic digits” used in European languages >> > (0123456789) are similar, but not identical to, the symbols that actual >> > arabic languages use nowadays (“eastern arabic digits”, >> > ٠‎١‎٢‎٣‎٤‎٥‎٦‎٧‎٨‎٩). there even are false friends (e·g· the eastern 4 >> > looks like a reversed western 3, the eastern 5 looks like a western 0, >> > the eastern 6 looks like a western 7). >> > >> > yeah. confusing. >> >> Ideed. Must have been wishfull thinking on my side. >> >> Not translating the thing at all may be the wiser option. It might serve >> the greater goal of finally establishing one universal world script and >> language, everyone has to learn to be able to participate in the global >> tech community (and written English is at least somewhat easy to learn)... >> >> >> >> Greetings from Germany >> -- >> Allan Wegan >> >> Jabber: allanwegan@ffnord.net >> OTR-Fingerprint: E4DCAA40 4859428E B3912896 F2498604 8CAA126F >> Jabber: allanwegan@jabber.ccc.de >> OTR-Fingerprint: A1AAA1B9 C067F988 4A424D33 98343469 29164587 >> ICQ: 209459114 >> OTR-Fingerprint: 71DE5B5E 67D6D758 A93BF1CE 7DA06625 205AC6EC >> >> >