From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: weis Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id MAA17641 for caml-redistribution; Fri, 15 Oct 1999 12:06:15 +0200 (MET DST) Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id KAA28908 for ; Fri, 15 Oct 1999 10:26:23 +0200 (MET DST) Received: from givry.inria.fr (givry.inria.fr [193.51.193.144]) by concorde.inria.fr (8.8.7/8.8.7) with ESMTP id KAA16825; Fri, 15 Oct 1999 10:26:14 +0200 (MET DST) Received: from givry.inria.fr (givry.inria.fr [193.51.193.144]) by givry.inria.fr (8.7.6/8.7.3) with ESMTP id KAA12903; Fri, 15 Oct 1999 10:26:13 +0200 (MET DST) Message-Id: <199910150826.KAA12903@givry.inria.fr> From: Francis Dupont To: skaller cc: STARYNKEVITCH Basile , caml-list@inria.fr Subject: Re: localization, internationalization and Caml In-reply-to: Your message of Fri, 15 Oct 1999 08:20:20 +1000. <38065724.1CEC055F@maxtal.com.au> Date: Fri, 15 Oct 1999 10:26:12 +0200 Sender: weis In your previous mail you wrote: The current 'support' for 8 bit characters in ocaml should be deprecated immediately. It is an extremely bad thing to have, since Latin-1 et al are archaic 8 bit standards incompatible with the international standard for ISO10646 communication, namely the UTF-8 encoding. => there is a rather strong opposition against UTF-8 in France because it is not a natural encoding (ie. if ASCII maps to ASCII it is not the case for ISO 8859-* characters, imagine a new UTF-X encoding maps ASCII to strange things and you'd be able to understand our concern). Yes, I know Latin-1 is useful now for French. => it is more than useful, Latin-1 (soon ISO IS 8859-15) is necessary if you need really readable texts in French. The way forward may well be to provide an input filter to convert Latin-1 (or any other encoding) to UTF8, and have ocaml process that. => my problem is the output of the filter will be no more readable when I've put too much French in the program (in comments for instance). This requires almost no changes to the compiler: the design should open the set of characters acceptable in identifiers, probably to some subset of the set recommended in one of the ISO10646 related documents; the other change required is to accept \uXXXX and \UXXXXXXXX escapes in strings. String processing functions should generally continue to be 8 bit [per octet]: full internationalisation of client string handling functions is a very complex, non-trivial, task] => I believe internationalization should not be done by countries where English is the only used language: this is at least awkward... Regards Francis.Dupont@inria.fr