German and French are closer to English than Arabic or Chinese, especially in the script.

As an experiment in empathy, I encourage folks to examine this working OCaml code where I've replaced the Latin tokens and identifiers with Chinese ones: https://github.com/taostein/hanma/blob/master/example.hm . Chinese lacks capital letters [1], so I use the prefix "卜" instead. The mapping of tokens is here (in the parsing/lexer.mll diff): https://github.com/taostein/hanma/blob/master/lexer.mll.diff

Reading code is hard when the script model isn't functioning in the fast processing part of your brain. Granted, Chinese has more characters than Latin, but training a brain to do fast processing of script takes years, even if it's Latin. Sometimes we forget it took us years to learn to read, for most of us that was a long time ago.

I've taught Chinese students OCaml programming using Latin tokens and I've taught the same replacing those Latin tokens with Chinese ones. I tried this as an experiment and I was surprised at the outcome. Previously, I thought as most of you probably do -- come on, it's just a few tokens plus logic -- not hard. How many tokens are there in C, like 30? I could memorize those in a day! I WAS WRONG. The students were markedly more motivated and enthusiastic when coding in their own script. And these are smart people, among China's brightest. Motivated learners learn better and are also more fun to teach. This teaching experience is what inspired me to undertake this translation project.

My observations are qualitative, because I've been focused on the teaching part, as opposed to the research about teaching part, but I hope to gather more data in future semesters and write a report about these findings. The qualitative results were strong -- script matters. I believe it's about script, not language. Parsing a foreign script quickly is really hard on the brain. We need the brain for the hard parts of programming.

There are obviously many pieces of OCaml that need translation; manuals, errors and warnings, libraries, the core code, comments. I think error messages are a good place to start. We can work on different pieces in parallel. And hopefully we can build something useful for scripts other than Chinese, like Arabic and Russian. If you are interested in helping with this project, please get in touch with me directly.

Yes, we want to build a global tech community. We must start from empathy. Maybe the Arabs and Chinese (and Russians and Koreans and Japanese) "should" or "shouldn't" learn English (or German or French or Latin or some other Western European language), under some definition of "should" (refer to various moral theories). But "should" is academic -- they're NOT going to learn English. If anything, the trend is moving in the other direction. China, for example, is lowering its university-level english requirements. So the question is: how global and how big do we want this so-called "global" tech community to be? Empathy and good translation tools can help us make it a real global (no scare quotes) community.

Tao Stein / 石涛 / تاو شتاين

Yes, by Arabic numbers I meant the numeric script used by Arabs, not what the Oxford English Dictionary calls arabic (lower-case) numbers.

[1] Chinese also lacks a plural form, which does somewhat ease error messaging.

On 12 April 2017 at 07:04, Allan Wegan <allanwegan@allanwegan.de> wrote:
> careful here, the “(hindu‐)arabic digits” used in European languages
> (0123456789) are similar, but not identical to, the symbols that actual
> arabic languages use nowadays (“eastern arabic digits”,
> ٠‎١‎٢‎٣‎٤‎٥‎٦‎٧‎٨‎٩). there even are false friends (e·g· the eastern 4
> looks like a reversed western 3, the eastern 5 looks like a western 0,
> the eastern 6 looks like a western 7).
>
> yeah. confusing.

Ideed. Must have been wishfull thinking on my side.

Not translating the thing at all may be the wiser option. It might serve
the greater goal of finally establishing one universal world script and
language, everyone has to learn to be able to participate in the global
tech community (and written English is at least somewhat easy to learn)...



Greetings from Germany
--
Allan Wegan
<http://www.allanwegan.de/>
Jabber: allanwegan@ffnord.net
 OTR-Fingerprint: E4DCAA40 4859428E B3912896 F2498604 8CAA126F
Jabber: allanwegan@jabber.ccc.de
 OTR-Fingerprint: A1AAA1B9 C067F988 4A424D33 98343469 29164587
ICQ: 209459114
 OTR-Fingerprint: 71DE5B5E 67D6D758 A93BF1CE 7DA06625 205AC6EC