caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: skaller <skaller@maxtal.com.au>
To: matias@k-bell.com
Cc: caml-list@inria.fr, Gerd.Stolpmann@darmstadt.netsurf.de
Subject: Re: localization, internationalization and Caml
Date: Fri, 22 Oct 1999 01:35:00 +1000	[thread overview]
Message-ID: <380F32A4.8E5ECC5F@maxtal.com.au> (raw)
In-Reply-To: <380F0157.CDBBAD7D@k-bell.com>

Matías Giovannini wrote:

> OCaml uses Latin1 for its *internal* encoding of identifiers. While I'll
> agree that my view is chauvinistic (and selfish, perhaps: I already have
> "¿¡áéíóúuñÁÉÍÓÚÜÑ" for writing in Spanish, why should I ask for more?),
> I see no restriction in that (well, If I were Chinese, or Egiptian, I
> would see things differently). 

	Exactly. There are quite a lot of Chinese, Indian,
Russian ... and non-Latin people in the world: more than Latins.
And many are faced with a barrier, participating in the computing world
because of language problems.

>What's more, the whole syntactic
> apparatus of a programming language *assumes* a Latin setting, where
> things make sense when read from left to right, from top to bottom; and
> where punctuation is what we're used to. Programming languages suited
> for a Han, or Arab, or even a Hebrew audience would have to be rethinked
> from the grounds up.

	Actually, no. Most of these peoples learn English and learn
computing, if they are to work with computers. But they still wish
to use comments, strings, and identifiers in their native script.

	Have you ever seen a Japanese program? I have.
Quite an interesting challenge: normal C/C++ code, with 
Latin characters encoding Japanese character names in identifiers,
and actual Japanese characters in comments and strings.
 
	I had no idea what the code did. My point: for a non-native
speaker, being forced to use a foreign language for identifiers and
comments is a serious impediment, not having native characters
in string is not an impediment, but a complete disaster (how will
the users of the program understand it -- they may not know any
Latin language)

> On the other hand, OCaml provides a String type that *can be* seen as a
> variable-length sequence of uninterpreted bytes. 

	Yes. What ocaml does not provide is a way of encoding
extended characters -- \uXXXX \UXXXXXXXXX in strings, or in identifiers.

>We have uninterpreted
> bytes! It's all we need to build whatever I18NString type we may need.
> What is missing is *library* facilities to abstract that view into a
> full-fledged i18n machinery. 

	I agree.

>Of course, there's a problem with the
> manipulation of 32-bit integer values, but if used with care, the Nat
> datatype could serve perfectly well as the underlying, low-level datatype.
> 
> Which makes me think, John, you already have variable-length int arrays.

	But they're not standard (yet). Actually, ocaml 'int' is 31 bits,
which is enough bits for ISO10646 (with some careful fiddling to avoid
problems with the sign?).

	So there are TWO issues -- one is to make ocaml itself
ISO10646 aware (i.e., the compiler), and the other is to provide
users with libraries to manipulate extended characters.

	Please note: neither of these features would be optional,
were ocaml to be submitted for ISO standardisation. ISO directives
require all ISO languages to upgrade to provide international
support. I know ocaml isn't an ISO language, but I think the 
basic intent is sound. [In some sense, ocaml is already a leader,
accepting Latin-1 characters when other languages only allowed ASCII]

-- 
John Skaller, mailto:skaller@maxtal.com.au
1/10 Toxteth Rd Glebe NSW 2037 Australia
homepage: http://www.maxtal.com.au/~skaller
downloads: http://www.triode.net.au/~skaller




  reply	other threads:[~1999-10-21 17:11 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
1999-10-15 13:53 Gerard Huet
1999-10-15 20:28 ` Gerd Stolpmann
1999-10-19 18:06   ` skaller
1999-10-20 21:05     ` Gerd Stolpmann
1999-10-21  4:42       ` skaller
1999-10-21 12:05       ` Matías Giovannini
1999-10-21 15:35         ` skaller [this message]
1999-10-21 16:27           ` Matías Giovannini
1999-10-21 16:36             ` skaller
1999-10-21 17:21               ` Matías Giovannini
1999-10-23  9:53               ` Benoit Deboursetty
1999-10-25 21:06                 ` Jan Skibinski
1999-10-26 18:02                 ` skaller
1999-10-25  0:54               ` How to format a float? skaller
1999-10-26  0:53                 ` Michel Quercia
1999-10-26  4:36         ` Go for ultimate localization! Benoit Deboursetty
1999-10-28 17:04           ` Pierre Weis
1999-10-28 17:41           ` Matías Giovannini
1999-10-28 17:59           ` Matías Giovannini
1999-10-29  9:44             ` Francois Pottier
1999-10-28 21:00           ` Gerd Stolpmann
1999-10-29  4:29           ` skaller
1999-10-17 14:29 ` localization, internationalization and Caml Xavier Leroy
1999-10-19 18:36   ` skaller
  -- strict thread matches above, loose matches on Subject: below --
1999-10-13 12:12 STARYNKEVITCH Basile
1999-10-14 22:20 ` skaller
1999-10-15  8:26   ` Francis Dupont
1999-10-17 11:27     ` skaller
1999-10-17 15:54       ` Francis Dupont
1999-10-19 18:48         ` skaller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=380F32A4.8E5ECC5F@maxtal.com.au \
    --to=skaller@maxtal.com.au \
    --cc=Gerd.Stolpmann@darmstadt.netsurf.de \
    --cc=caml-list@inria.fr \
    --cc=matias@k-bell.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).