From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by sympa.inria.fr (Postfix) with ESMTPS id 3BE9C8011C for ; Mon, 17 Apr 2017 00:37:22 +0200 (CEST) Authentication-Results: mail2-smtp-roc.national.inria.fr; spf=None smtp.pra=zhenya1007@gmail.com; spf=Pass smtp.mailfrom=zhenya1007@gmail.com; spf=None smtp.helo=postmaster@mail-yw0-f173.google.com Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of zhenya1007@gmail.com) identity=pra; client-ip=209.85.161.173; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="zhenya1007@gmail.com"; x-sender="zhenya1007@gmail.com"; x-conformance=sidf_compatible Received-SPF: Pass (mail2-smtp-roc.national.inria.fr: domain of zhenya1007@gmail.com designates 209.85.161.173 as permitted sender) identity=mailfrom; client-ip=209.85.161.173; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="zhenya1007@gmail.com"; x-sender="zhenya1007@gmail.com"; x-conformance=sidf_compatible; x-record-type="v=spf1" Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of postmaster@mail-yw0-f173.google.com) identity=helo; client-ip=209.85.161.173; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="zhenya1007@gmail.com"; x-sender="postmaster@mail-yw0-f173.google.com"; x-conformance=sidf_compatible IronPort-PHdr: =?us-ascii?q?9a23=3AEy9r6hO7WnjoQtzhLscl6mtUPXoX/o7sNwtQ0KIM?= =?us-ascii?q?zox0I/z4rarrMEGX3/hxlliBBdydsKMYzbKO+4nbGkU4qa6bt34DdJEeHzQksu?= =?us-ascii?q?4x2zIaPcieFEfgJ+TrZSFpVO5LVVti4m3peRMNQJW2aFLduGC94iAPERvjKwV1?= =?us-ascii?q?Ov71GonPhMiryuy+4ZPebgFHiTanfb9+MAi9oBnMuMURnYZsMLs6xAHTontPde?= =?us-ascii?q?RWxGdoKkyWkh3h+Mq+/4Nt/jpJtf45+MFOTav1f6IjTbxFFzsmKHw65NfqtRbY?= =?us-ascii?q?UwSC4GYXX3gMnRpJBwjF6wz6Xov0vyDnuOdxxDWWMMvrRr0vRz+s87lkRwPpiC?= =?us-ascii?q?cfNj427mfXitBrjKlGpB6tvgFzz5LIbI2QMvd1Y6HTcs4ARWdZXMlRWSxPDI2/?= =?us-ascii?q?YYUSEeQOIf1VoJPhq1YUtxayGRWgCeHpxzRVhnH2x6o60+E5HA/JwgwgEMwBsH?= =?us-ascii?q?LUrd7oKKkSVv21w7LJzTXFc/xW2Sv955bJchAnvPqBWrNxccrPxkkpFwLKlEic?= =?us-ascii?q?pZD5Mz6XzekNvG2b4PBhVeKrkWIotwZxoj22y8oql4LHiIUVylXe+iV4xoY4Pc?= =?us-ascii?q?e3SEp/YdG+FptRuT+VN4RsTcMkWW1npTg1xqUJuZ66YCgKyIknyAXFZ/ObdIiI?= =?us-ascii?q?5wrvVOGLIThimH1lfKywiwyu/kinz+3xUNS/3lVSriddjNXAqnQA2wbQ58WHUP?= =?us-ascii?q?dx40as1SuV2wzO6OxJL0Y5nrfBJZE72L4/jJ8TvFzDHiDonEX2i7ebdkA+9eip?= =?us-ascii?q?7+Tre7Xnp5GAO4NthAHyL6Yjl8KlDeQ3NQgOWGeb+eCi27H54UL5R7BKguU3kq?= =?us-ascii?q?nfrp/aOdwWqrClDwJRyIou6BayAy273NkZgHULNk9JdRCJgoTxPlHBOvH4DfOx?= =?us-ascii?q?g1S2lzdrwujLP7zhAprTKHjCkK3ucath50JAygc+1t9f55dOBbEAJPL/QFP+tN?= =?us-ascii?q?vdDhMhKQy73/7nCMlh1oMZQW+AHrWWMKbWsVOR4uIvIvKMZJMOtTbmK/kl4ubu?= =?us-ascii?q?gmUjlV8ce6mpx5oXZ2qiEvRoOUXKKUbr1+sIFi8xogc/SqS+lFSFSiJeZnCaUK?= =?us-ascii?q?c15zV9A4WjW8OLYaukhbjJ+S66GpxQaShiC0uQWSPjfoCAHvMNcz66I8l7kzVC?= =?us-ascii?q?W6L3GKE70hT7mRH1xrx2ZtDT5ycZs5njyNV04aWHhxE/8y19VpzFjEmCSmh1mi?= =?us-ascii?q?UDQDpgj/M3mlB01lrWifswuPdfD9EGoqoRCgo=3D?= X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: =?us-ascii?q?A0AvAAA98fNYhq2hVdFcGgEBAQECAQEBA?= =?us-ascii?q?QgBAQEBFQEBAQECAQEBAQgBAQEBhAmBCweDGUaKFZAtAQEGgSWINIUCiCmCDy6?= =?us-ascii?q?FdgKDeAc/GAEBAQEBAQEBAQEBEgEBAQgLCwgoL4IzIgGCPwEBAQECAQwXHQEbH?= =?us-ascii?q?gMBCwYDAgsNKgICIQEBEQEFARwGARIUCIliAQMIBQgOjRWRGj+MBIIEBQEcgwk?= =?us-ascii?q?Fg0kKGScNVoMAAQEIAQEBAQEbAgYShXqEMVowgX5TgjOCWYJfBYErAQEBmzIhE?= =?us-ascii?q?AgBAYE/CYU8hx2ERIF/hTCDXYY6iweHOxQfgRUfgT1jWRhQg0UqH4ILJDUBigQ?= =?us-ascii?q?BAQE?= X-IPAS-Result: =?us-ascii?q?A0AvAAA98fNYhq2hVdFcGgEBAQECAQEBAQgBAQEBFQEBAQE?= =?us-ascii?q?CAQEBAQgBAQEBhAmBCweDGUaKFZAtAQEGgSWINIUCiCmCDy6FdgKDeAc/GAEBA?= =?us-ascii?q?QEBAQEBAQEBEgEBAQgLCwgoL4IzIgGCPwEBAQECAQwXHQEbHgMBCwYDAgsNKgI?= =?us-ascii?q?CIQEBEQEFARwGARIUCIliAQMIBQgOjRWRGj+MBIIEBQEcgwkFg0kKGScNVoMAA?= =?us-ascii?q?QEIAQEBAQEbAgYShXqEMVowgX5TgjOCWYJfBYErAQEBmzIhEAgBAYE/CYU8hx2?= =?us-ascii?q?ERIF/hTCDXYY6iweHOxQfgRUfgT1jWRhQg0UqH4ILJDUBigQBAQE?= X-IronPort-AV: E=Sophos;i="5.37,210,1488841200"; d="scan'208,217";a="269260470" Received: from mail-yw0-f173.google.com ([209.85.161.173]) by mail2-smtp-roc.national.inria.fr with ESMTP/TLS/AES128-GCM-SHA256; 17 Apr 2017 00:37:20 +0200 Received: by mail-yw0-f173.google.com with SMTP id k13so50353345ywk.1 for ; Sun, 16 Apr 2017 15:37:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=ZgSZlYhQBo1lArbUFWaoY72wcahEXJfesbxpPkOUd9s=; b=ERyFb1hqv58LfmgOiZ5SFubMtq2yufCMMO0FZ9w9tmJ4mL9/GXDqCQkdil88NviH5d 03LBT5mErWJBvOlNeZNCRCEB1B3IP3rlAg2SH6b9VFCTqba+HLMFKS060eTnj6+ZMJb+ kojGSze+PXXZdaObot7FuWm7h9raTfWcpjPhCgbyexuHAwB02TMNatq5I9poksN3waJq R2KoOzudG0S0787befvNILUflf5imJvq4wbZsCV16gg0TDG7DFquw6O1bwVsRiWtZZ9X pIBWdOmJku2w6OGqsjQXrxeaSqTGAjA6opurf77AEcONHR1KTQgPA+l2u46nKx4FlERB +KsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=ZgSZlYhQBo1lArbUFWaoY72wcahEXJfesbxpPkOUd9s=; b=F0KQ1X78fQYLZ2SzKKFG1n0rfgqeCl1EteBJ98+x6ftYeD4t0FWiX9+3KXPQzwL9se kihNL81btcb8rxEN9AhJnBmCTcoIrySUQNNVHD0UxncfTxto3/crc293kcDD9P3KJq/D v5Zh6NaBD3Bxxk3XLSCc4XgxaVgqzQu/Y9igE3xengo3hmwDU40WNFlx4oKIKPpzQYQL CZ7ehElQNzOwKK7A9BsfyzW/Av6LHg0CRvEJZKp9NZaMGg8YmoBxwSF/JDnH0hIaHeti Q52hwXUD1yg4D56NMQp6ijhaY6ZaT880BwU/4sZzCXpYAxYNZtKtkJJsYlTYeiJOa4fJ 3bLA== X-Gm-Message-State: AN3rC/6XiIjntfoWzXDFXhd8Ac6xLUP7w/nB9Abs7dsyiedmBjMEjwwr zRTgryPo7FdOO3wr/pFd4PTDpC+Mfg== X-Received: by 10.13.202.194 with SMTP id m185mr13939442ywd.282.1492382239130; Sun, 16 Apr 2017 15:37:19 -0700 (PDT) MIME-Version: 1.0 Received: by 10.37.104.206 with HTTP; Sun, 16 Apr 2017 15:37:17 -0700 (PDT) In-Reply-To: References: <20170411140512.GK28111@annexia.org> <0a2f848f-c697-b267-e371-d53ae281aec2@crans.org> <041c1cd1-b697-9889-55e6-2db7f611dc6b@allanwegan.de> From: Evgeny Roubinchtein Date: Sun, 16 Apr 2017 18:37:17 -0400 Message-ID: To: Tao Stein , OCaml Mailing List Content-Type: multipart/alternative; boundary=001a11482e783ba551054d505199 Subject: Re: [Caml-list] error messages in multiple languages ? --001a11482e783ba551054d505199 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I will point out that reading Java documentation suggests to me that it solves the "locale-appropriate digits" problem. See, for example, the bottom of the following page, where Thai digits are being used to print out a number: https://docs.oracle.com/javase/tutorial/i18n/locale/create.html#constants. The relevance of Java is that most of JDK is under an open source license (though I cannot comment on whether the license would allow lifting that portion of implementation into OCaml). The important point here is that this is a problem that has been solved in at least one largely open-source technology. I deliberately am choosing not to comment on your other points, because I view them as only tangentially related to the issue at hand, which is how to handle translation of OCaml error messages. --=20 Best, Zhenya On Tue, Apr 11, 2017 at 8:12 PM, Tao Stein wrote: > > German and French are closer to English than Arabic or Chinese, especially > in the script. > > As an experiment in empathy, I encourage folks to examine this working > OCaml code where I've replaced the Latin tokens and identifiers with > Chinese ones: https://github.com/taostein/hanma/blob/master/example.hm . > Chinese lacks capital letters [1], so I use the prefix "=E5=8D=9C" instea= d. The > mapping of tokens is here (in the parsing/lexer.mll diff): > https://github.com/taostein/hanma/blob/master/lexer.mll.diff > > Reading code is hard when the script model isn't functioning in the fast > processing part of your brain. Granted, Chinese has more characters than > Latin, but training a brain to do fast processing of script takes years, > even if it's Latin. Sometimes we forget it took us years to learn to read, > for most of us that was a long time ago. > > I've taught Chinese students OCaml programming using Latin tokens and I've > taught the same replacing those Latin tokens with Chinese ones. I tried > this as an experiment and I was surprised at the outcome. Previously, I > thought as most of you probably do -- come on, it's just a few tokens plus > logic -- not hard. How many tokens are there in C, like 30? I could > memorize those in a day! I WAS WRONG. The students were markedly more > motivated and enthusiastic when coding in their own script. And these are > smart people, among China's brightest. Motivated learners learn better and > are also more fun to teach. This teaching experience is what inspired me = to > undertake this translation project. > > My observations are qualitative, because I've been focused on the teaching > part, as opposed to the research about teaching part, but I hope to gather > more data in future semesters and write a report about these findings. The > qualitative results were strong -- script matters. I believe it's about > script, not language. Parsing a foreign script quickly is really hard on > the brain. We need the brain for the hard parts of programming. > > There are obviously many pieces of OCaml that need translation; manuals, > errors and warnings, libraries, the core code, comments. I think error > messages are a good place to start. We can work on different pieces in > parallel. And hopefully we can build something useful for scripts other > than Chinese, like Arabic and Russian. If you are interested in helping > with this project, please get in touch with me directly. > > Yes, we want to build a global tech community. We must start from empathy. > Maybe the Arabs and Chinese (and Russians and Koreans and Japanese) > "should" or "shouldn't" learn English (or German or French or Latin or so= me > other Western European language), under some definition of "should" (refer > to various moral theories). But "should" is academic -- they're NOT going > to learn English. If anything, the trend is moving in the other direction. > China, for example, is lowering its university-level english requirements. > So the question is: how global and how big do we want this so-called > "global" tech community to be? Empathy and good translation tools can help > us make it a real global (no scare quotes) community. > > Tao Stein / =E7=9F=B3=E6=B6=9B / =D8=AA=D8=A7=D9=88 =D8=B4=D8=AA=D8=A7=D9= =8A=D9=86 > > Yes, by Arabic numbers I meant the numeric script used by Arabs, not what > the Oxford English Dictionary calls arabic (lower-case) numbers. > > [1] Chinese also lacks a plural form, which does somewhat ease error > messaging. > > On 12 April 2017 at 07:04, Allan Wegan wrote: > >> > careful here, the =E2=80=9C(hindu=E2=80=90)arabic digits=E2=80=9D used= in European languages >> > (0123456789) are similar, but not identical to, the symbols that actual >> > arabic languages use nowadays (=E2=80=9Ceastern arabic digits=E2=80=9D, >> > =D9=A0=E2=80=8E=D9=A1=E2=80=8E=D9=A2=E2=80=8E=D9=A3=E2=80=8E=D9=A4=E2= =80=8E=D9=A5=E2=80=8E=D9=A6=E2=80=8E=D9=A7=E2=80=8E=D9=A8=E2=80=8E=D9=A9). = there even are false friends (e=C2=B7g=C2=B7 the eastern 4 >> > looks like a reversed western 3, the eastern 5 looks like a western 0, >> > the eastern 6 looks like a western 7). >> > >> > yeah. confusing. >> >> Ideed. Must have been wishfull thinking on my side. >> >> Not translating the thing at all may be the wiser option. It might serve >> the greater goal of finally establishing one universal world script and >> language, everyone has to learn to be able to participate in the global >> tech community (and written English is at least somewhat easy to learn).= .. >> >> >> >> Greetings from Germany >> -- >> Allan Wegan >> >> Jabber: allanwegan@ffnord.net >> OTR-Fingerprint: E4DCAA40 4859428E B3912896 F2498604 8CAA126F >> Jabber: allanwegan@jabber.ccc.de >> OTR-Fingerprint: A1AAA1B9 C067F988 4A424D33 98343469 29164587 >> ICQ: 209459114 >> OTR-Fingerprint: 71DE5B5E 67D6D758 A93BF1CE 7DA06625 205AC6EC >> >> > --001a11482e783ba551054d505199 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I will point out that reading Java document= ation suggests to me that it solves the "locale-appropriate digits&quo= t; problem.=C2=A0 See, for example, the bottom of the following page, where= Thai digits are being used to print out a number: https://docs.= oracle.com/javase/tutorial/i18n/locale/create.html#constants.=C2=A0 The= relevance of Java is that most of JDK is under an open source license (tho= ugh I cannot comment on whether the license would allow lifting that portio= n of implementation into OCaml).=C2=A0 The important point here is that thi= s is a problem that has been solved in at least one largely open-source tec= hnology.

I deliberately am choosing not to comment on your oth= er points, because I view them as only tangentially related to the issue at= hand, which is how to handle translation of OCaml error messages.

-= -
Best,
Zhenya

<= div class=3D"gmail_quote">On Tue, Apr 11, 2017 at 8:12 PM, Tao Stein <tao= stein@gmail.com> wrote:

German and French are closer to English t= han Arabic or Chinese, especially in the script.

A= s an experiment in empathy, I encourage folks to examine this working OCaml= code where I've replaced the Latin tokens and identifiers with Chinese= ones:=C2=A0https://github.com/taostein/hanma/blob/master/= example.hm=C2=A0. Chinese lacks capital letters [1], so I use the = prefix "=E5=8D=9C" instead. The mapping of tokens is here (in the= parsing/lexer.mll diff):=C2=A0https://github.com/taos= tein/hanma/blob/master/lexer.mll.diff

Rea= ding code is hard when the script model isn't functioning in the fast p= rocessing part of your brain. Granted, Chinese has more characters than Lat= in, but training a brain to do fast processing of script takes years, even = if it's Latin. Sometimes we forget it took us years to learn to read, f= or most of us that was a long time ago.

I've t= aught Chinese students OCaml programming using Latin tokens and I've ta= ught the same replacing those Latin tokens with Chinese ones. I tried this = as an experiment and I was surprised at the outcome. Previously, I thought = as most of you probably do -- come on, it's just a few tokens plus logi= c -- not hard. How many tokens are there in C, like 30? I could memorize th= ose in a day! I WAS WRONG. The students were markedly more motivated and en= thusiastic when coding in their own script. And these are smart people, amo= ng China's brightest. Motivated learners learn better and are also more= fun to teach. This teaching experience is what inspired me to undertake th= is translation project.

My observations are qualit= ative, because I've been focused on the teaching part, as opposed to th= e research about teaching part, but I hope to gather more data in future se= mesters and write a report about these findings. The qualitative results we= re strong -- script matters. I believe it's about script, not language.= Parsing a foreign script quickly is really hard on the brain. We need the = brain for the hard parts of programming.

There= are obviously many pieces of OCaml that need translation; manuals, errors = and warnings, libraries, the core code, comments. I think error messages ar= e a good place to start. We can work on different pieces in parallel. And h= opefully we can build something useful for scripts other than Chinese, like= Arabic and Russian. If you are interested in helping with this project, pl= ease get in touch with me directly.

Yes, we want t= o build a global tech community. We must start from empathy. Maybe the Arab= s and Chinese (and Russians and Koreans and Japanese) "should" or= "shouldn't" learn English (or German or French or Latin or s= ome other Western European language), under some definition of "should= " (refer to various moral theories). But "should" is academi= c -- they're NOT going to learn English. If anything, the trend is movi= ng in the other direction. China, for example, is lowering its university-l= evel english requirements. So the question is: how global and how big do we= want this so-called "global" tech community to be? Empathy and g= ood translation tools can help us make it a real global (no scare quotes)= =C2=A0community.

Tao Stein / =E7= =9F=B3=E6=B6=9B /=C2=A0=D8=AA=D8=A7=D9=88 =D8=B4=D8=AA=D8=A7=D9=8A=D9=86

Yes, by Arabic numbers I meant the numeri= c script used by Arabs, not what the Oxford English Dictionary calls arabic= (lower-case) numbers.

[1] Chinese also lacks a plural form, which does somewhat = ease error messaging.

On 12 Apr= il 2017 at 07:04, Allan Wegan <allanwegan@allanwegan.de> wrote:
> careful here, the =E2=80=9C(hindu=E2= =80=90)arabic digits=E2=80=9D used in European languages
> (0123456789) are similar, but not identical to, the symbols that actua= l
> arabic languages use nowadays (=E2=80=9Ceastern arabic digits=E2=80=9D= ,
> =D9=A0=E2=80=8E=D9=A1=E2=80=8E=D9=A2=E2=80=8E=D9=A3=E2=80=8E=D9=A4=E2= =80=8E=D9=A5=E2=80=8E=D9=A6=E2=80=8E=D9=A7=E2=80=8E=D9=A8=E2=80=8E=D9=A9). = there even are false friends (e=C2=B7g=C2=B7 the eastern 4
> looks like a reversed western 3, the eastern 5 looks like a western 0,=
> the eastern 6 looks like a western 7).
>
> yeah. confusing.

Ideed. Must have been wishfull thinking on my side.

Not translating the thing at all may be the wiser option. It might serve
the greater goal of finally establishing one universal world script and
language, everyone has to learn to be able to participate in the global
tech community (and written English is at least somewhat easy to learn)...<= br>


Greetings from Germany
--
Allan Wegan
<http://www.allanwegan.de/>
Jabber: allanweg= an@ffnord.net
=C2=A0OTR-Fingerprint: E4DCAA40 4859428E B3912896 F2498604 8CAA126F
Jabber: allan= wegan@jabber.ccc.de
=C2=A0OTR-Fingerprint: A1AAA1B9 C067F988 4A424D33 98343469 29164587
ICQ: 209459114
=C2=A0OTR-Fingerprint: 71DE5B5E 67D6D758 A93BF1CE 7DA06625 205AC6EC



--001a11482e783ba551054d505199--