From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: weis Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id TAA03304 for caml-redistribution; Thu, 21 Oct 1999 19:11:42 +0200 (MET DST) Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id RAA03898 for ; Thu, 21 Oct 1999 17:41:50 +0200 (MET DST) Received: from ruby (pm1-3.triode.net.au [203.63.235.19]) by nez-perce.inria.fr (8.8.7/8.8.7) with ESMTP id RAA02724 for ; Thu, 21 Oct 1999 17:41:45 +0200 (MET DST) Received: from maxtal.com.au (IDENT:root@localhost [127.0.0.1]) by ruby (8.9.3/8.9.3) with ESMTP id BAA25586; Fri, 22 Oct 1999 01:35:00 +1000 Sender: weis Message-ID: <380F32A4.8E5ECC5F@maxtal.com.au> Date: Fri, 22 Oct 1999 01:35:00 +1000 From: skaller Organization: Maxtal X-Mailer: Mozilla 4.51 [en] (X11; I; Linux 2.2.12 i586) X-Accept-Language: en MIME-Version: 1.0 To: matias@k-bell.com CC: caml-list@inria.fr, Gerd.Stolpmann@darmstadt.netsurf.de Subject: Re: localization, internationalization and Caml References: <380CB30E.56D1A8A2@maxtal.com.au> <99102100543400.15513@ice> <380F0157.CDBBAD7D@k-bell.com> Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Matías Giovannini wrote: > OCaml uses Latin1 for its *internal* encoding of identifiers. While I'll > agree that my view is chauvinistic (and selfish, perhaps: I already have > "¿¡áéíóúuñÁÉÍÓÚÜÑ" for writing in Spanish, why should I ask for more?), > I see no restriction in that (well, If I were Chinese, or Egiptian, I > would see things differently). Exactly. There are quite a lot of Chinese, Indian, Russian ... and non-Latin people in the world: more than Latins. And many are faced with a barrier, participating in the computing world because of language problems. >What's more, the whole syntactic > apparatus of a programming language *assumes* a Latin setting, where > things make sense when read from left to right, from top to bottom; and > where punctuation is what we're used to. Programming languages suited > for a Han, or Arab, or even a Hebrew audience would have to be rethinked > from the grounds up. Actually, no. Most of these peoples learn English and learn computing, if they are to work with computers. But they still wish to use comments, strings, and identifiers in their native script. Have you ever seen a Japanese program? I have. Quite an interesting challenge: normal C/C++ code, with Latin characters encoding Japanese character names in identifiers, and actual Japanese characters in comments and strings. I had no idea what the code did. My point: for a non-native speaker, being forced to use a foreign language for identifiers and comments is a serious impediment, not having native characters in string is not an impediment, but a complete disaster (how will the users of the program understand it -- they may not know any Latin language) > On the other hand, OCaml provides a String type that *can be* seen as a > variable-length sequence of uninterpreted bytes. Yes. What ocaml does not provide is a way of encoding extended characters -- \uXXXX \UXXXXXXXXX in strings, or in identifiers. >We have uninterpreted > bytes! It's all we need to build whatever I18NString type we may need. > What is missing is *library* facilities to abstract that view into a > full-fledged i18n machinery. I agree. >Of course, there's a problem with the > manipulation of 32-bit integer values, but if used with care, the Nat > datatype could serve perfectly well as the underlying, low-level datatype. > > Which makes me think, John, you already have variable-length int arrays. But they're not standard (yet). Actually, ocaml 'int' is 31 bits, which is enough bits for ISO10646 (with some careful fiddling to avoid problems with the sign?). So there are TWO issues -- one is to make ocaml itself ISO10646 aware (i.e., the compiler), and the other is to provide users with libraries to manipulate extended characters. Please note: neither of these features would be optional, were ocaml to be submitted for ISO standardisation. ISO directives require all ISO languages to upgrade to provide international support. I know ocaml isn't an ISO language, but I think the basic intent is sound. [In some sense, ocaml is already a leader, accepting Latin-1 characters when other languages only allowed ASCII] -- John Skaller, mailto:skaller@maxtal.com.au 1/10 Toxteth Rd Glebe NSW 2037 Australia homepage: http://www.maxtal.com.au/~skaller downloads: http://www.triode.net.au/~skaller