caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Re: localization, internationalization and Caml
@ 1999-10-15 13:53 Gerard Huet
  1999-10-15 20:28 ` Gerd Stolpmann
  1999-10-17 14:29 ` localization, internationalization and Caml Xavier Leroy
  0 siblings, 2 replies; 30+ messages in thread
From: Gerard Huet @ 1999-10-15 13:53 UTC (permalink / raw)
  To: Francis Dupont, skaller; +Cc: STARYNKEVITCH Basile, caml-list

Just to put my 2 cents on this issue...

At 10:26 15/10/99 +0200, Francis Dupont wrote:
> In your previous mail you wrote:
>
>   	The current 'support' for 8 bit characters in ocaml should be
>   deprecated immediately. It is an extremely bad thing to have, since
>   Latin-1 et al are archaic 8 bit standards incompatible with the
>   international standard for ISO10646 communication, namely
>   the UTF-8 encoding.

I do not agree. What we need is not ayatollah dictats, but careful thinking
about evolution of standards.

First of all, ISO-Latin is as international a standard as ISO10646, only a bit
more mature. By essence, international standards are not immediately obsolete,
they are here to stay because we need some stability in a world of sound
engineering, as opposed to the permanent hype which our discipline is
subjected to.

Secondly, the string data type of Ocaml is not about ASCII or ISO-Latin or
whatever. It is a low-level data type of implementation of lists of bytes
of data efficiently represented in machine memory. These bytes may be used
for encoding elements of various finite sets such as ASCII or ISO-Latin,
but the string library does not care about such intentions.

When such strings are used to represent natural language sentences, there
is a natural tendency to sophistication, from UPPER CASE letters of the
computer printers of old to ASCII to ISO-Latin 1, 2, etc to Unicode. At
some point (256) these sets of codes cannot be represented in a one to one
fashion into bytes, and so multi-bytes representations must be designed,
such as UTF-8.
Such multi-bytes representations are inconsistent with ISO-Latin convention
somewhere, and thus the ISO-Latin character set must be shifted out of its
usual representation since the 8th bit is needed for the multi-byte encoding.

So for instance engineers designing natural language interfaces must make
the choice of sticking to the old convention in a purely local software, or
upgrading their software to the international standard, typically for Web
applications. At some point I am sure some brave soul from the Ocaml
implementation team will write a Unicode library for implementing the
non-trivial manipulations of lists of Unicode characters, so that the above
engineers will have a generic tool to use. Such libraries will typically
implement a NEW datatype of "unistring" or whatever, with proper conversion
to string representations of course, but the string data type is surely
here to stay, because bytes are not going to become obsolete overnight. :-)

>=> there is a rather strong opposition against UTF-8 in France
>because it is not a natural encoding (ie. if ASCII maps to ASCII
>it is not the case for ISO 8859-* characters, imagine a new UTF-X
>encoding maps ASCII to strange things and you'd be able to understand
>our concern).

I do not share Francis' pessimism. The ISO commitees are not entirely
stupid, and care has been taken to make the move as painless as possible.
ISO-Latin has just been shifted by a mere translation. Here is my Ocaml
code for translating strings of ISO-Latin 1 characters to UTF-8 HTML:

let print_unicode c     let ascii = int_of_char c in (* test for ISO-LATIN *)
        if ascii < 128 then print_char c (* 7 bit ascii *)
        else print_string ("&#" ^ (string_of_int ascii) ^ ";");

This is hardly mysterious or complicated or inefficient.

>=> my problem is the output of the filter will be no more readable when
>I've put too much French in the program (in comments for instance).

Come on, Francis, we do not read core dumps nowadays, we read through the
eyes of HTML or TeX or whatever !

>=> I believe internationalization should not be done by countries
>where English is the only used language: this is at least awkward...

I simply do not understand this remark in a WWW world.

Cheers
Gérard






^ permalink raw reply	[flat|nested] 30+ messages in thread
* localization, internationalization and Caml
@ 1999-10-13 12:12 STARYNKEVITCH Basile
  1999-10-14 22:20 ` skaller
  0 siblings, 1 reply; 30+ messages in thread
From: STARYNKEVITCH Basile @ 1999-10-13 12:12 UTC (permalink / raw)
  To: caml-list

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=us-ascii, Size: 2388 bytes --]


Hello All,

Just a small remark about localization and internationalization (see
your setlocale printf strtod man pages), which means adapting a
software to culturally different users. Problems include date
representation, number representation, error messages, and even
character sets and left-right or right-left human reading. For example
some french people want "Taux d'inflation = 3,14% - TROP" instead of
"TOO MUCH inflation 3.14%" (message in english/french, numbers with
decimal point/comma, argument 3.14 and string "TOO MUCH" or "TROP"
(locale dependent) in different order.

I am not at all a fan of localization. But I do have a wish if it ever
occur in Ocaml:

* do not depend on C localization (This means Printf.printf should not
  depend on LC_NUMERIC environment variable. Is this true now?)

* make the locale an explicit argument, or at least a property bound
  to a channel. Several channels may need different locales (for
  instance an HTTP socket needs a C locale, while the user stderr
  could be in French locale)

  so 

    lprintf Locale.French "%d %g" 2 3.14

  is much better than

    set_locale LC_ALL "FR"
    printf "%d %g" 2 3.14



By the way, I more and more believe that the printf interface is (in C
as in Ocaml) a big mistake (which could easily be avoided in Ocaml,
thanks to it typing)

We should code

  print [Int 2; String " < "; Float 3.14]

instead of 

  printf "%d < %g" 2 3.14


Again, I am *not* asking for localization in Ocaml, but if somebody
needs it (I don't) I still hope it would be implemented better than in
C. And I think that Unicode would be more useful than localization.


I'm saying all this because I have now a headache regarding C
localization, so hope that Ocaml will avoid that mistake.

################

Court Resumé: je pense que la localisation en Ocaml -dont je ne ressens pas
le besoin- ne devrait pas être faite comme en C.

N.B. Any opinions expressed here are only mine, and not of my organization.
N.B. Les opinions exprimees ici me sont personnelles et n engagent pas le CEA.

---------------------------------------------------------------------
Basile STARYNKEVITCH   ----  Commissariat à l Energie Atomique 
DTA/LETI/DEIN/SLA * CEA/Saclay b.528 (p111f) * 91191 GIF/YVETTE CEDEX * France
phone: 1,69.08.60.55; fax: 1.69.08.83.95 home: 1,46.65.45.53
email: Basile point Starynkevitch at cea point fr 




^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~1999-10-29 17:21 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-10-15 13:53 localization, internationalization and Caml Gerard Huet
1999-10-15 20:28 ` Gerd Stolpmann
1999-10-19 18:06   ` skaller
1999-10-20 21:05     ` Gerd Stolpmann
1999-10-21  4:42       ` skaller
1999-10-21 12:05       ` Matías Giovannini
1999-10-21 15:35         ` skaller
1999-10-21 16:27           ` Matías Giovannini
1999-10-21 16:36             ` skaller
1999-10-21 17:21               ` Matías Giovannini
1999-10-23  9:53               ` Benoit Deboursetty
1999-10-25 21:06                 ` Jan Skibinski
1999-10-26 18:02                 ` skaller
1999-10-25  0:54               ` How to format a float? skaller
1999-10-26  0:53                 ` Michel Quercia
1999-10-26  4:36         ` Go for ultimate localization! Benoit Deboursetty
1999-10-28 17:04           ` Pierre Weis
1999-10-28 17:41           ` Matías Giovannini
1999-10-28 17:59           ` Matías Giovannini
1999-10-29  9:44             ` Francois Pottier
1999-10-28 21:00           ` Gerd Stolpmann
1999-10-29  4:29           ` skaller
1999-10-17 14:29 ` localization, internationalization and Caml Xavier Leroy
1999-10-19 18:36   ` skaller
  -- strict thread matches above, loose matches on Subject: below --
1999-10-13 12:12 STARYNKEVITCH Basile
1999-10-14 22:20 ` skaller
1999-10-15  8:26   ` Francis Dupont
1999-10-17 11:27     ` skaller
1999-10-17 15:54       ` Francis Dupont
1999-10-19 18:48         ` skaller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).