* localization, internationalization and Caml @ 1999-10-13 12:12 STARYNKEVITCH Basile 1999-10-14 22:20 ` skaller 0 siblings, 1 reply; 21+ messages in thread From: STARYNKEVITCH Basile @ 1999-10-13 12:12 UTC (permalink / raw) To: caml-list [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset=us-ascii, Size: 2388 bytes --] Hello All, Just a small remark about localization and internationalization (see your setlocale printf strtod man pages), which means adapting a software to culturally different users. Problems include date representation, number representation, error messages, and even character sets and left-right or right-left human reading. For example some french people want "Taux d'inflation = 3,14% - TROP" instead of "TOO MUCH inflation 3.14%" (message in english/french, numbers with decimal point/comma, argument 3.14 and string "TOO MUCH" or "TROP" (locale dependent) in different order. I am not at all a fan of localization. But I do have a wish if it ever occur in Ocaml: * do not depend on C localization (This means Printf.printf should not depend on LC_NUMERIC environment variable. Is this true now?) * make the locale an explicit argument, or at least a property bound to a channel. Several channels may need different locales (for instance an HTTP socket needs a C locale, while the user stderr could be in French locale) so lprintf Locale.French "%d %g" 2 3.14 is much better than set_locale LC_ALL "FR" printf "%d %g" 2 3.14 By the way, I more and more believe that the printf interface is (in C as in Ocaml) a big mistake (which could easily be avoided in Ocaml, thanks to it typing) We should code print [Int 2; String " < "; Float 3.14] instead of printf "%d < %g" 2 3.14 Again, I am *not* asking for localization in Ocaml, but if somebody needs it (I don't) I still hope it would be implemented better than in C. And I think that Unicode would be more useful than localization. I'm saying all this because I have now a headache regarding C localization, so hope that Ocaml will avoid that mistake. ################ Court Resumé: je pense que la localisation en Ocaml -dont je ne ressens pas le besoin- ne devrait pas être faite comme en C. N.B. Any opinions expressed here are only mine, and not of my organization. N.B. Les opinions exprimees ici me sont personnelles et n engagent pas le CEA. --------------------------------------------------------------------- Basile STARYNKEVITCH ---- Commissariat à l Energie Atomique DTA/LETI/DEIN/SLA * CEA/Saclay b.528 (p111f) * 91191 GIF/YVETTE CEDEX * France phone: 1,69.08.60.55; fax: 1.69.08.83.95 home: 1,46.65.45.53 email: Basile point Starynkevitch at cea point fr ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: localization, internationalization and Caml 1999-10-13 12:12 localization, internationalization and Caml STARYNKEVITCH Basile @ 1999-10-14 22:20 ` skaller 1999-10-15 8:26 ` Francis Dupont 0 siblings, 1 reply; 21+ messages in thread From: skaller @ 1999-10-14 22:20 UTC (permalink / raw) To: STARYNKEVITCH Basile; +Cc: caml-list STARYNKEVITCH Basile wrote: > > By the way, I more and more believe that the printf interface is (in C > as in Ocaml) a big mistake (which could easily be avoided in Ocaml, > thanks to it typing) I agree but .. > We should code > > print [Int 2; String " < "; Float 3.14] > > instead of > > printf "%d < %g" 2 3.14 However, I do not agree with the solution. The correct method, IMHO, is to provide some proper formatting functions (ocamls are plain WRONG!) such as formatted_string_of_int justify width value [where justify is LeftSpace | RightSpace | LeftZero] and then use the power of functional programming to create output strings. {the above is only a quick exemplary interface, not a well considered one] > Again, I am *not* asking for localization in Ocaml, but if somebody > needs it (I don't) I still hope it would be implemented better than in > C. And I think that Unicode would be more useful than localization. Please, ISO10646 not unicode. We have International Standards. There is a lot of work to be done in internationalisation. If it is worth doing, it is worth doing right. The current 'support' for 8 bit characters in ocaml should be deprecated immediately. It is an extremely bad thing to have, since Latin-1 et al are archaic 8 bit standards incompatible with the international standard for ISO10646 communication, namely the UTF-8 encoding. Yes, I know Latin-1 is useful now for French. The way forward may well be to provide an input filter to convert Latin-1 (or any other encoding) to UTF8, and have ocaml process that. This requires almost no changes to the compiler: the design should open the set of characters acceptable in identifiers, probably to some subset of the set recommended in one of the ISO10646 related documents; the other change required is to accept \uXXXX and \UXXXXXXXX escapes in strings. String processing functions should generally continue to be 8 bit [per octet]: full internationalisation of client string handling functions is a very complex, non-trivial, task] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: localization, internationalization and Caml 1999-10-14 22:20 ` skaller @ 1999-10-15 8:26 ` Francis Dupont 1999-10-17 11:27 ` skaller 0 siblings, 1 reply; 21+ messages in thread From: Francis Dupont @ 1999-10-15 8:26 UTC (permalink / raw) To: skaller; +Cc: STARYNKEVITCH Basile, caml-list In your previous mail you wrote: The current 'support' for 8 bit characters in ocaml should be deprecated immediately. It is an extremely bad thing to have, since Latin-1 et al are archaic 8 bit standards incompatible with the international standard for ISO10646 communication, namely the UTF-8 encoding. => there is a rather strong opposition against UTF-8 in France because it is not a natural encoding (ie. if ASCII maps to ASCII it is not the case for ISO 8859-* characters, imagine a new UTF-X encoding maps ASCII to strange things and you'd be able to understand our concern). Yes, I know Latin-1 is useful now for French. => it is more than useful, Latin-1 (soon ISO IS 8859-15) is necessary if you need really readable texts in French. The way forward may well be to provide an input filter to convert Latin-1 (or any other encoding) to UTF8, and have ocaml process that. => my problem is the output of the filter will be no more readable when I've put too much French in the program (in comments for instance). This requires almost no changes to the compiler: the design should open the set of characters acceptable in identifiers, probably to some subset of the set recommended in one of the ISO10646 related documents; the other change required is to accept \uXXXX and \UXXXXXXXX escapes in strings. String processing functions should generally continue to be 8 bit [per octet]: full internationalisation of client string handling functions is a very complex, non-trivial, task] => I believe internationalization should not be done by countries where English is the only used language: this is at least awkward... Regards Francis.Dupont@inria.fr ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: localization, internationalization and Caml 1999-10-15 8:26 ` Francis Dupont @ 1999-10-17 11:27 ` skaller 1999-10-17 15:54 ` Francis Dupont 0 siblings, 1 reply; 21+ messages in thread From: skaller @ 1999-10-17 11:27 UTC (permalink / raw) To: Francis Dupont; +Cc: STARYNKEVITCH Basile, caml-list Francis Dupont wrote: > > In your previous mail you wrote: > > The current 'support' for 8 bit characters in ocaml should be > deprecated immediately. It is an extremely bad thing to have, since > Latin-1 et al are archaic 8 bit standards incompatible with the > international standard for ISO10646 communication, namely > the UTF-8 encoding. > > => there is a rather strong opposition against UTF-8 in France > because it is not a natural encoding (ie. if ASCII maps to ASCII > it is not the case for ISO 8859-* characters, imagine a new UTF-X > encoding maps ASCII to strange things and you'd be able to understand > our concern). I do understand the concern, but the decision on the International Standards has been made. The transition for ISO 8859-x clients will involve some pain. Better to start going through the pain now :-( > Yes, I know Latin-1 is useful now for French. > > => it is more than useful, Latin-1 (soon ISO IS 8859-15) is necessary > if you need really readable texts in French. No, what you mean is that with _current technology_ there is plenty of support for 8 bit characters, using code pages, so that Latin-1 is well supported. For example, there are a lot of text editors that accept 8 bit characters, and even permit switching code pages. There are almost none that work with ISO10646 or unicode, let alone accept UTF-8 encoding. (Yudit is the only one I know of). > The way forward may well be to provide an input filter to convert > Latin-1 (or any other encoding) to UTF8, and have ocaml process that. > > => my problem is the output of the filter will be no more readable when > I've put too much French in the program (in comments for instance). You will have no problem with the right tools: the difference will be transparent. Of course, you will need the right tools. For example, you will need a browser like Internet Explorer 5, which processes UTF-8 encoding correctly. I agree that this is a problem, but supporting Latin-1, or any other archaic standard, is not going to help move forward. It is bad enough that most vendors only support Unicode, which is a small, almost filled, 16 bit subset of the full 31 bit ISO-10646 Standard. > This requires almost no changes to the compiler: the design should > open the set of characters acceptable in identifiers, probably > to some subset of the set recommended in one of the ISO10646 related > documents; the other change required is to accept \uXXXX and \UXXXXXXXX > escapes in strings. String processing functions should generally > continue to be 8 bit [per octet]: full internationalisation of client > string handling functions is a very complex, non-trivial, task] > > => I believe internationalization should not be done by countries > where English is the only used language: this is at least awkward... I believe people with international concerns can work together no matter what their native language. Some English speakers may be concerned, some, like me, are somewhat embarrased to be non-fluent in _any_ other language. [I speak a smattering of high school German] However, Australia, where I live, has migrants from all over the world and support for many languages is an important issue here. Particularly Asian languages. And ISO-8859-x is not much help there :-) -- John Skaller, mailto:skaller@maxtal.com.au 1/10 Toxteth Rd Glebe NSW 2037 Australia homepage: http://www.maxtal.com.au/~skaller downloads: http://www.triode.net.au/~skaller ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: localization, internationalization and Caml 1999-10-17 11:27 ` skaller @ 1999-10-17 15:54 ` Francis Dupont 1999-10-19 18:48 ` skaller 0 siblings, 1 reply; 21+ messages in thread From: Francis Dupont @ 1999-10-17 15:54 UTC (permalink / raw) To: skaller; +Cc: STARYNKEVITCH Basile, caml-list In your previous mail you wrote: > The current 'support' for 8 bit characters in ocaml should be > deprecated immediately. It is an extremely bad thing to have, since > Latin-1 et al are archaic 8 bit standards incompatible with the > international standard for ISO10646 communication, namely > the UTF-8 encoding. > > => there is a rather strong opposition against UTF-8 in France > because it is not a natural encoding (ie. if ASCII maps to ASCII > it is not the case for ISO 8859-* characters, imagine a new UTF-X > encoding maps ASCII to strange things and you'd be able to understand > our concern). I do understand the concern, but the decision on the International Standards has been made. => this is not so obvious because there are other encoding (UTF-X) without this kind of problems. I'll send this thread to a colleague who tried to get something better than UTF-8 at the IETF (but he was too late). > Yes, I know Latin-1 is useful now for French. > > => it is more than useful, Latin-1 (soon ISO IS 8859-15) is necessary > if you need really readable texts in French. No, what you mean is that with _current technology_ there is plenty of support for 8 bit characters, using code pages, so that Latin-1 is well supported. => yes, for instance you have a reasonable set of fonts. For example, there are a lot of text editors that accept 8 bit characters, and even permit switching code pages. There are almost none that work with ISO10646 or unicode, let alone accept UTF-8 encoding. (Yudit is the only one I know of). => I'd like to get some free ISO10646/Unicode fonts. I believe without them ISO10646/Unicode will not be accepted by users. I agree that this is a problem, but supporting Latin-1, or any other archaic standard, is not going to help move forward. => Latin-1 is not so archaic (it should be old enough in order to become archaic :-). It is bad enough that most vendors only support Unicode, which is a small, almost filled, 16 bit subset of the full 31 bit ISO-10646 Standard. => Unicode is not so supported... I believe people with international concerns can work together no matter what their native language. Some English speakers may be concerned, some, like me, are somewhat embarrased to be non-fluent in _any_ other language. [I speak a smattering of high school German] => It is great than English speakers support internationalization but we need other language speakers in order to get an as complete as possible one. For instance where is the first character of a string? An Arabic speaker can easily show to them this is not so obvious. However, Australia, where I live, has migrants from all over the world and support for many languages is an important issue here. Particularly Asian languages. => Asian languages seem hard and we can't ignore one third of the world... Regards Francis.Dupont@inria.fr ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: localization, internationalization and Caml 1999-10-17 15:54 ` Francis Dupont @ 1999-10-19 18:48 ` skaller 0 siblings, 0 replies; 21+ messages in thread From: skaller @ 1999-10-19 18:48 UTC (permalink / raw) To: Francis Dupont; +Cc: STARYNKEVITCH Basile, caml-list Francis Dupont wrote: > I believe people with international concerns can work > together no matter what their native language. Some English > speakers may be concerned, some, like me, are somewhat > embarrased to be non-fluent in _any_ other language. > [I speak a smattering of high school German] > > => It is great than English speakers support internationalization > but we need other language speakers in order to get an as complete > as possible one. For instance where is the first character of a string? > An Arabic speaker can easily show to them this is not so obvious. Yes. And ocaml developers cannot go through all the pain of this complex field, and do not need to: workers in the field have produced an International Standard which guides how programming language developers should proceed. Not all choices are fixed, by any means, but the documents exist, and _are_ being worked on by people speaking many different languages. > => Asian languages seem hard and we can't ignore one third of the world... Not nearly so hard as Arabic and Hindic languages, in which the usual categorical composition of strings by concatenation fails. Similarly, things like collation sequences are a serious nightmare. But they can be implemented with (perhaps complex) functions, so it is possible to extend the support a programming language gives clients as time goes on. My initial point was that very few changes are required to prepare for internationalisation, if the UTF8 of ISO10646 is adopted, but one of the key things that is required, urgently, is to deprecate latin-1. -- John Skaller, mailto:skaller@maxtal.com.au 1/10 Toxteth Rd Glebe NSW 2037 Australia homepage: http://www.maxtal.com.au/~skaller downloads: http://www.triode.net.au/~skaller ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: localization, internationalization and Caml @ 1999-10-15 13:53 Gerard Huet 1999-10-15 20:28 ` Gerd Stolpmann 1999-10-17 14:29 ` Xavier Leroy 0 siblings, 2 replies; 21+ messages in thread From: Gerard Huet @ 1999-10-15 13:53 UTC (permalink / raw) To: Francis Dupont, skaller; +Cc: STARYNKEVITCH Basile, caml-list Just to put my 2 cents on this issue... At 10:26 15/10/99 +0200, Francis Dupont wrote: > In your previous mail you wrote: > > The current 'support' for 8 bit characters in ocaml should be > deprecated immediately. It is an extremely bad thing to have, since > Latin-1 et al are archaic 8 bit standards incompatible with the > international standard for ISO10646 communication, namely > the UTF-8 encoding. I do not agree. What we need is not ayatollah dictats, but careful thinking about evolution of standards. First of all, ISO-Latin is as international a standard as ISO10646, only a bit more mature. By essence, international standards are not immediately obsolete, they are here to stay because we need some stability in a world of sound engineering, as opposed to the permanent hype which our discipline is subjected to. Secondly, the string data type of Ocaml is not about ASCII or ISO-Latin or whatever. It is a low-level data type of implementation of lists of bytes of data efficiently represented in machine memory. These bytes may be used for encoding elements of various finite sets such as ASCII or ISO-Latin, but the string library does not care about such intentions. When such strings are used to represent natural language sentences, there is a natural tendency to sophistication, from UPPER CASE letters of the computer printers of old to ASCII to ISO-Latin 1, 2, etc to Unicode. At some point (256) these sets of codes cannot be represented in a one to one fashion into bytes, and so multi-bytes representations must be designed, such as UTF-8. Such multi-bytes representations are inconsistent with ISO-Latin convention somewhere, and thus the ISO-Latin character set must be shifted out of its usual representation since the 8th bit is needed for the multi-byte encoding. So for instance engineers designing natural language interfaces must make the choice of sticking to the old convention in a purely local software, or upgrading their software to the international standard, typically for Web applications. At some point I am sure some brave soul from the Ocaml implementation team will write a Unicode library for implementing the non-trivial manipulations of lists of Unicode characters, so that the above engineers will have a generic tool to use. Such libraries will typically implement a NEW datatype of "unistring" or whatever, with proper conversion to string representations of course, but the string data type is surely here to stay, because bytes are not going to become obsolete overnight. :-) >=> there is a rather strong opposition against UTF-8 in France >because it is not a natural encoding (ie. if ASCII maps to ASCII >it is not the case for ISO 8859-* characters, imagine a new UTF-X >encoding maps ASCII to strange things and you'd be able to understand >our concern). I do not share Francis' pessimism. The ISO commitees are not entirely stupid, and care has been taken to make the move as painless as possible. ISO-Latin has just been shifted by a mere translation. Here is my Ocaml code for translating strings of ISO-Latin 1 characters to UTF-8 HTML: let print_unicode c let ascii = int_of_char c in (* test for ISO-LATIN *) if ascii < 128 then print_char c (* 7 bit ascii *) else print_string ("&#" ^ (string_of_int ascii) ^ ";"); This is hardly mysterious or complicated or inefficient. >=> my problem is the output of the filter will be no more readable when >I've put too much French in the program (in comments for instance). Come on, Francis, we do not read core dumps nowadays, we read through the eyes of HTML or TeX or whatever ! >=> I believe internationalization should not be done by countries >where English is the only used language: this is at least awkward... I simply do not understand this remark in a WWW world. Cheers Gérard ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: localization, internationalization and Caml 1999-10-15 13:53 Gerard Huet @ 1999-10-15 20:28 ` Gerd Stolpmann 1999-10-19 18:06 ` skaller 1999-10-17 14:29 ` Xavier Leroy 1 sibling, 1 reply; 21+ messages in thread From: Gerd Stolpmann @ 1999-10-15 20:28 UTC (permalink / raw) To: caml-list I agree that Unicode or even ISO-10646 support would be a nice thing. I also agree that for many users (including myself) the Latin-1 character set suffices. Luckily, both character sets are strongly related: The first 256 character positions of Unicode (which is the 16-bit subset of ISO-10646) are exactly the same as in Latin1. UTF-8 is a special encoding of the bigger character sets. Every 16- or 31-bit character is represented by one to six bytes; the higher the character code the more bytes are needed. This encoding is mainly interesting for I/O, and not for internal processing, because it is impossible to access the characters of a string by their position. Internally, you must represent the characters as 16- or 32-bit numbers (this is called UCS-2 and -4, respectively), even if memory is wasted (this is the price for the enhanced possibilities). UTF-8 is thought for compatibility, because the following holds: - Every ASCII character (i.e. with codes 0 to 127) is represented as before, and every non-ASCII character is represented by a byte sequence where the eighth bit is set. Old, non-UTF-aware programs can at least interpret the ASCII characters. (Note that there is a variant of UTF-8 which encodes the 0 character differently - by two characters.) - If you sort UTF-8 strings alphabetically (more precisely, using the byte values of the encoding as criterion) you get the same result as if you sorted the strings alphabetically by their character codes. This means that we need at least three types of strings: Latin1 strings for compatibility, UCS-2 or -4 strings for internal processing, and UTF-8 strings for I/O. For simplicity, I suggest to represent both Latin1 and UTF-8 strings by the same language type "string", and to provide "wchar" and "wstring" for the extended character set. Of course, the current "string" data type is mainly an implementation of byte sequences, which is independent of the underlying interpretation. Only the following functions seem to introduce a character set: - The String.uppercase and .lowercase functions - The channels if opened in text mode (because they specially recognize the line endings if needed for the operating system, and newline characters are part of the character set) The best solution would be to have internationalized versions of these functions (and of perhaps some more functions) which still operate on the "string" type but allow the user to select the encoding and the locale. This means we would have something like type encoding = UTF8 | Latin1 | ... type locale = ... val String.i18n_uppercase : encoding -> locale -> string -> string val String.i18n_lowercase : encoding -> locale -> string -> string val String.recode : encoding -> encoding -> string -> string (* changes the encoding if possible *) For "wstring" it is simpler: val Wstring.uppercase : string -> string val Wstring.i18n_uppercase : locale -> string -> string New opening mode for channels: Text of encoding This encoding specifies the encoding of the file. It must be possible to change this later (e.g. to process XML's "encoding" declaration). New input/output functions: val output_i18n_string : out_channel -> encoding -> string -> unit val input_i18n_line : in_channel -> encoding -> string Here, the encoding argument specifies the encoding of the internal representation. The other I/O functions need I18N versions as well, and of course we need to operate on "wstring"s directly: val output_wstring : out_channel -> wstring -> unit val input_wstring_line : in_channel -> wstring This all means that the number of string functions explodes: We need functions for compatibility (Latin1), functions for arbitrary 8 bit encodings, and functions for wide strings. I think this is the main argument against it, and it is very difficult to get around this. (Any ideas?) Francis Dupont: >=> my problem is the output of the filter will be no more readable when >I've put too much French in the program (in comments for instance). The enlarged character sets become more and more important, and it is only a matter of time until every piece of software which wants to be taken seriously can process them, even a dumb terminal or simple text editor. So you will be able to put accented characters into your comments, and you will see them as such even if you 'cat' the program text to the terminal or printer; this will work everywhere... >=> I believe internationalization should not be done by countries >where English is the only used language: this is at least awkward... but in the USA (a general prejudice not worth to discuss; there is only some "personal experience" behind it). Gerd -- ---------------------------------------------------------------------------- Gerd Stolpmann Telefon: +49 6151 997705 (privat) Viktoriastr. 100 64293 Darmstadt EMail: Gerd.Stolpmann@darmstadt.netsurf.de (privat) Germany ---------------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: localization, internationalization and Caml 1999-10-15 20:28 ` Gerd Stolpmann @ 1999-10-19 18:06 ` skaller 1999-10-20 21:05 ` Gerd Stolpmann 0 siblings, 1 reply; 21+ messages in thread From: skaller @ 1999-10-19 18:06 UTC (permalink / raw) To: Gerd.Stolpmann; +Cc: caml-list Gerd Stolpmann wrote: > > I agree that Unicode or even ISO-10646 support would be a nice thing. I > also agree that for many users (including myself) the Latin-1 character set > suffices. Generally, for me, 7 bit ASCII 'suffices'. But that is irrelevant, the world is bigger than my country, or Europe. >Luckily, both character sets are strongly related: The first 256 > character positions of Unicode (which is the 16-bit subset of ISO-10646) are > exactly the same as in Latin1. .. of course, this is not luck .. > UTF-8 is a special encoding of the bigger character sets. Every 16- or 31-bit > character is represented by one to six bytes; the higher the character code the > more bytes are needed. This encoding is mainly interesting for I/O, and not for > internal processing, because it is impossible to access the characters of a > string by their position. Internally, you must represent the characters as 16- > or 32-bit numbers (this is called UCS-2 and -4, respectively), even if memory is > wasted (this is the price for the enhanced possibilities). I don't agree. If you read ISO10646 carefully, you will find that you must STILL parse sequences of code points to obtain the equivalent of a 'character', if, indeed, such a concept exists, and furthermore, the sequences are not unique. For example, many diacritic marks such as accents may be appended to a code point, and act in conjunction with the preceding code point to represent a character. This is permitted EVEN if there is a single code point for that character; and worse, if there are TWO such marks, the order is not fixed. And that's just simple European languages. Now try Arabic or Thai :-) > This means that we need at least three types of strings: Latin1 strings for > compatibility, UCS-2 or -4 strings for internal processing, and UTF-8 strings > for I/O. For simplicity, I suggest to represent both Latin1 and UTF-8 strings by > the same language type "string", and to provide "wchar" and "wstring" for the > extended character set. I'd like to suggest we forget the 'wchar' string, at least initially. I think you will find UTF-8 encoding requires very few changes. For example, genuine regular expressions work out of the box. String searching works out of the box. What doesn't work efficiently is indexing. And it is never necessary to do it for human script. Why would you ever want to, say, replace the 10'th character of a string?? [You could: if you were analysing, say, a stock code, but in that case the n'th byte would do: it isn't natural language script] The way to handle Latin-1, or Big-5, or KSC, or ShiftJis, is to translate it with an input filter, or internally, if the client is reading the codes directly. > Of course, the current "string" data type is mainly an implementation of byte > sequences, which is independent of the underlying interpretation. Only the > following functions seem to introduce a character set: > > - The String.uppercase and .lowercase functions It is best to get rid of these functions. The belong in a more sophisticated natural language processing package. > - The channels if opened in text mode (because they specially recognize the > line endings if needed for the operating system, and newline characters are > part of the character set) This is a serious problem. It is also partly ill formed: what is a 'line' in Chinese, which writes characters top down? What is a 'line' in a Web page :-) > The best solution would be to have internationalized versions of these > functions (and of perhaps some more functions) which still operate on the > "string" type but allow the user to select the encoding and the locale. There is more. The compiler must be modified to accept identifiers in the extended character set. This should work out of the box (since 8'th bit set characters are already accepted); in fact, it is too permissive. Secondly, literals need to be processed, to expand \uXXXX and \UXXXXXXXX escapes. > This means we would have something like > > type encoding = UTF8 | Latin1 | ... Be careful to distinguish CODE SET from ENCODING. See the Unicode home page for more details: Latin-1 is NOT an encoding, but a character set. There is a MAPPING from Latin-1 to ISO-10646. This is not the same thing as an encoding. I think this is the wrong approach: we do not want built-in cases for every possible encoding/character set. Instead, we want an open ended set of conversions from (and perhaps to) the internally used representation. There are a LOT of such combinations, we need to add new ones without breaking into a module. We should do it functionally; this should work well in ocaml :-) > type locale = ... > val String.i18n_uppercase : encoding -> locale -> string -> string > val String.i18n_lowercase : encoding -> locale -> string -> string Not in the String module. This belongs in a different package which handles complex vagaries of human script. [This particular function, is relatively simple. Another is 'get_digit', 'isdigit'. Whitespace is much harder. Collation is a nightmare :-] > val String.recode : encoding -> encoding -> string -> string > (* changes the encoding if possible *) This isn't quite right. The way to do this is to have a function: LATIN1_to_ISO10646 code which does the mapping (from a SINGLE code point in LATIN1 to ISO10646). The code point is an int. Separately, we handle encodings: UCS4_to_UTF8 code converts an int to a UTF8 string, and UTF8_to_UCS4 string position parses the string from position, returning a code point and position. There are other encodings, such as DBCS encodings, which generally are tied to a single character set. [UCS4/UTF8 are less depenent] [....] > This all means that the number of string functions explodes: Exactly. And we don't want that. So I suggest, we continue to use the existing strings of 8 bit bytes ONLY, and represent ALL foreign [non ISO10646] character sets using ISO-10646 code points, encoded as UTF-8, and provide an input filter for the compiler. In addition, some extra functions to convert other character sets and encodings to ISO-10646/UTF-8 are provided, and, if you like, they can be plugged into the I/O system. This means a lot of conversion functions, but ONE internal representation only: the one we already have. We need functions > for compatibility (Latin1), functions for arbitrary 8 bit encodings, and > functions for wide strings. I think this is the main argument against it, > and it is very difficult to get around this. (Any ideas?) I've been trying to tell you how to do it. The solution is simple, to adopt ISO-10646 as the SOLE character set, and UTF-8 as the SOLE encoding of it; and provide conversions from other character sets and encodings. All the code that needs to manipulate strings can then be provided NOW as additional functions manipulating the existing string type. The apparent loss of indexing is a mirage. The gain is huge: ISO-10646 Level 1 compliance without any explosiion of data types. Yes, some more _functions_ are needed to do extra processing, such as normalisation, comparisons of various kinds, capitalisation, etc. Regular expressions will need to be enhanced, to fix the special features (like case insensitive searching), but the basic regular expressions will work out of the box. > The enlarged character sets become more and more important, and it is only a > matter of time until every piece of software which wants to be taken seriously > can process them, even a dumb terminal or simple text editor. So you will be > able to put accented characters into your comments, and you will see them as > such even if you 'cat' the program text to the terminal or printer; this will > work everywhere... Yes. This time is not here yet, but it will come soon that international support is mandatory for all large software purchases by governments and large corporations. -- John Skaller, mailto:skaller@maxtal.com.au 1/10 Toxteth Rd Glebe NSW 2037 Australia homepage: http://www.maxtal.com.au/~skaller downloads: http://www.triode.net.au/~skaller ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: localization, internationalization and Caml 1999-10-19 18:06 ` skaller @ 1999-10-20 21:05 ` Gerd Stolpmann 1999-10-21 4:42 ` skaller 1999-10-21 12:05 ` Matías Giovannini 0 siblings, 2 replies; 21+ messages in thread From: Gerd Stolpmann @ 1999-10-20 21:05 UTC (permalink / raw) To: skaller; +Cc: caml-list On Tue, 19 Oct 1999, John Skaller wrote: >Gerd Stolpmann wrote: >> UTF-8 is a special encoding of the bigger character sets. Every 16- or 31-bit >> character is represented by one to six bytes; the higher the character code the >> more bytes are needed. This encoding is mainly interesting for I/O, and not for >> internal processing, because it is impossible to access the characters of a >> string by their position. Internally, you must represent the characters as 16- >> or 32-bit numbers (this is called UCS-2 and -4, respectively), even if memory is >> wasted (this is the price for the enhanced possibilities). > > I don't agree. If you read ISO10646 carefully, you will find >that you must STILL parse sequences of code points to obtain the >equivalent >of a 'character', if, indeed, such a concept exists, >and furthermore, the sequences are not unique. For example, >many diacritic marks such as accents may be appended to a code point, >and act in conjunction with the preceding code point to represent >a character. This is permitted EVEN if there is a single code point >for that character; and worse, if there are TWO such marks, the order is >not >fixed. > > And that's just simple European languages. Now try Arabic or Thai :-) Let's begin with languages we know. As far as I know, ISO10646 allows it not to implement the combining characters. I think, a programming language should only provide the basic means by which you can operate with characters, but should not solve it completely. >> This means that we need at least three types of strings: Latin1 strings for >> compatibility, UCS-2 or -4 strings for internal processing, and UTF-8 strings >> for I/O. For simplicity, I suggest to represent both Latin1 and UTF-8 strings by >> the same language type "string", and to provide "wchar" and "wstring" for the >> extended character set. > > I'd like to suggest we forget the 'wchar' string, at least initially. >I think you will find UTF-8 encoding requires very few changes. For >example, >genuine regular expressions work out of the box. String searching >works out of the box. > > What doesn't work efficiently is indexing. >And it is never necessary to do it for human script. >Why would you ever want to, say, replace the 10'th character >of a string?? [You could: if you were analysing, say, a stock code, >but in that case the n'th byte would do: it isn't natural language >script] Because I have an algorithm operating on the characters of a string. Such algorithms use indexes as pointers to parts of a string, and in most cases the indexes are only incremented or decremented. On a UTF-8 string, you could define an index as type index = { index_position : int; byte_position : int } and define the operations "increment", "decrement" (only working together with the string), "add", "substract", "compare" (to calculate string lengths). Such indexes have strange properties; they can only be interpreted together with the string to which they refer. You cannot avoid such an index type really; you can only avoid to give the thing a name and program the index operations every time anew. Perhaps your suggestion works; but string manipulation will then be much slower. For example, an "increment" must be implemented by finding the next beginning of a character (instead of just incrementing a numeric index). >> Of course, the current "string" data type is mainly an implementation of byte >> sequences, which is independent of the underlying interpretation. Only the >> following functions seem to introduce a character set: >> >> - The String.uppercase and .lowercase functions > >It is best to get rid of these functions. The belong in a more >sophisticated >natural language processing package. There will always be a difference between natural languages and sophisticated packages. Even the current String.uppercase is wrong (in Latin1 there is a lower case character without corresponding capital character (\223), but WORDS containing this character can be capitalized by applying a semantical rule). I would suppose that String.upper/lowercase are part of the library because the compiler itself needs them. Currently, ocaml depends on languages that know the distinction of character cases. In my opinion such case functions can only approximate the semantical meaning, and a simple approximation is better than no approximation. >> - The channels if opened in text mode (because they specially recognize the >> line endings if needed for the operating system, and newline characters are >> part of the character set) > > This is a serious problem. It is also partly ill formed: >what is a 'line' in Chinese, which writes characters top down? But lines exist. For example, your message is divided into lines. The concept of lines is too important to be dropped although it is simple (much of the success has to do with its simplicity). Other writing traditions also have a writing direction. >What is a 'line' in a Web page :-) What is a 'line' in the sky? >> This means we would have something like >> >> type encoding = UTF8 | Latin1 | ... > > Be careful to distinguish CODE SET from ENCODING. >See the Unicode home page for more details: Latin-1 is NOT >an encoding, but a character set. There is a MAPPING from >Latin-1 to ISO-10646. This is not the same thing as an encoding. >I think this is the wrong approach: we do not want built-in >cases for every possible encoding/character set. Character sets and encodings are both artificial concepts. When I program, I have always to do with a combination of both. The distinction is irrelevant for most applications; it is important if you want to convert texts from (cs1,enc1) to (cs2,enc2) because conversion is not always possible. My idea is that the type "encoding" enumerates all supported combinations; I expect only a few. > Instead, we want an open ended set of conversions from (and perhaps to) >the internally used representation. There are a LOT of such >combinations, >we need to add new ones without breaking into a module. We should do it >functionally; this should work well in ocaml :-) What kind of problem do you want to solve with an open ended set of conversions? Isn't this the task of a specialized program? >> type locale = ... >> val String.i18n_uppercase : encoding -> locale -> string -> string >> val String.i18n_lowercase : encoding -> locale -> string -> string > > Not in the String module. This belongs in a different package >which handles complex vagaries of human script. [This particular >function, is relatively simple. See above. >Another is 'get_digit', 'isdigit'. >Whitespace is much harder. Collation is a nightmare :-] I think collation should be left out by a basic library. Even for a single language, there are often several traditions how to sort, and it also depends on the kind of strings your are sorting (for example, think of personal names). Members of traditions can contribute special modules for collation. > >> val String.recode : encoding -> encoding -> string -> string >> (* changes the encoding if possible *) > > This isn't quite right. The way to do this is to have a function: > > LATIN1_to_ISO10646 code > >which does the mapping (from a SINGLE code point in LATIN1 to ISO10646). >The code point is an int. Separately, we handle encodings: > > UCS4_to_UTF8 code > >converts an int to a UTF8 string, and > > UTF8_to_UCS4 string position > >parses the string from position, returning a code point and position. >There are other encodings, such as DBCS encodings, which generally >are tied to a single character set. [UCS4/UTF8 are less depenent] > The most correct interface is not always the best. >[....] >> This all means that the number of string functions explodes: > > Exactly. And we don't want that. So I suggest, we continue >to use the existing strings of 8 bit bytes ONLY, and represent >ALL foreign [non ISO10646] character sets using ISO-10646 code points, >encoded as UTF-8, and provide an input filter for the compiler. > > In addition, some extra functions to convert other >character sets and encodings to ISO-10646/UTF-8 are provided, >and, if you like, they can be plugged into the I/O system. > > This means a lot of conversion functions, but ONE >internal representation only: the one we already have. There will be a significant slow-down of all ocaml programs if the strings are encoded as UTF-8. I think the user of a language should be able to choose what is more important: time or space or reduced functionality. UTF-8 saves space, and costs time; UCS-4 wastes space, but saves time; UCS-2 is a compromise and bad because it is a compromise; Latin 1 (or another 8 bit cs) saves time and space but has less functionality. >> We need functions >> for compatibility (Latin1), functions for arbitrary 8 bit encodings, and >> functions for wide strings. I think this is the main argument against it, >> and it is very difficult to get around this. (Any ideas?) > > I've been trying to tell you how to do it. The solution is simple, >to adopt ISO-10646 as the SOLE character set, and UTF-8 as the SOLE >encoding of it; and provide conversions from other character sets and >encodings. It looks simple but I suppose it is not what the ocaml users want. >All the code that needs to manipulate strings can then be provided NOW >as additional functions manipulating the existing string type. And because compatibility is lost, the whole current code base has to be worked through. > The apparent loss of indexing is a mirage. The gain is huge: >ISO-10646 Level 1 compliance without any explosiion of data types. >Yes, some more _functions_ are needed to do extra processing, >such as normalisation, comparisons of various kinds, >capitalisation, etc. Regular expressions will need to be >enhanced, to fix the special features (like case insensitive searching), >but the basic regular expressions will work out of the box. > >> The enlarged character sets become more and more important, and it is only a >> matter of time until every piece of software which wants to be taken seriously >> can process them, even a dumb terminal or simple text editor. So you will be >> able to put accented characters into your comments, and you will see them as >> such even if you 'cat' the program text to the terminal or printer; this will >> work everywhere... > > Yes. This time is not here yet, but it will come soon that >international support is mandatory for all large software purchases >by governments and large corporations. I do not believe that this will be the driving force because the current solutions exist, and it is VERY expensive to replace them. It is even cheaper to replace a language than a character set/encoding. Looks like another Year 2000 but without deadline. The first field where some progress will be made is data exchange, because ISO10646 can bridge several character sets. At that time, tools will be available to view and edit such data, and of course to convert them; ISO10646 will be used in parallel with the "traditional" character set. These tools will be low-level, and perhaps operating systems will then support the tools with fonts, input methods, and conventions how to indicate the encoding.(The current environment variables solution is a pain if you try to use two encodings in parallel. For example, I can imagine that Unix terminal drivers allow it to select the encoding directly, in the same way as you can set other terminal properties.) In contrast to this, many applications need not to be replaced and won't be. Perhaps they will have a ISO10646 import/export filter. -- ---------------------------------------------------------------------------- Gerd Stolpmann Telefon: +49 6151 997705 (privat) Viktoriastr. 100 64293 Darmstadt EMail: Gerd.Stolpmann@darmstadt.netsurf.de (privat) Germany ---------------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: localization, internationalization and Caml 1999-10-20 21:05 ` Gerd Stolpmann @ 1999-10-21 4:42 ` skaller 1999-10-21 12:05 ` Matías Giovannini 1 sibling, 0 replies; 21+ messages in thread From: skaller @ 1999-10-21 4:42 UTC (permalink / raw) To: Gerd.Stolpmann; +Cc: caml-list Gerd Stolpmann wrote: > On Tue, 19 Oct 1999, John Skaller wrote: > > > > I don't agree. If you read ISO10646 carefully, you will find > >that you must STILL parse sequences of code points > Let's begin with languages we know. As far as I know, ISO10646 allows it not to > implement the combining characters. From memory, you are correct: there are three specified levels of compliance. Level 1 compliance does not require processing combining characters. > I think, a programming language should only > provide the basic means by which you can operate with characters, but should not > solve it completely. Yes, I agree, at least at this time. > > What doesn't work efficiently is indexing. > >And it is never necessary to do it for human script. > >Why would you ever want to, say, replace the 10'th character > >of a string?? > > Because I have an algorithm operating on the characters of a string. If the string represents human script, it is then wrong because it makes incorrect assumptions about the nature of human script. You will need to rewrite it, if you want it to work in an international setting. > Such > algorithms use indexes as pointers to parts of a string, and in most cases the > indexes are only incremented or decremented. On a UTF-8 string, you could > define an index as > > type index = { index_position : int; byte_position : int } > > and define the operations "increment", "decrement" (only working together with > the string), "add", "substract", "compare" (to calculate string lengths). Such > indexes have strange properties; they can only be interpreted together with the > string to which they refer. > > You cannot avoid such an index type really; you can only avoid to give the > thing a name and program the index operations every time anew. I agree. But my point is: you should change your code _anyhow_, to use the new and correct parsing method, because it is necessary for Level 2 and Level 3 compliance. Your code will then work correctly at those levels when the 'increment' function is upgraded. What you will find is something which by chance, perhaps, is natural in Python: there is no such thing as a character. A string is NOT an array of characters. Strings can be composed from strings, and decomposed into arrays of strings, but there is not really any character type. > Perhaps your suggestion works; but string manipulation will then be much > slower. For example, an "increment" must be implemented by finding the next > beginning of a character (instead of just incrementing a numeric index). Yes, but this is a fact: it is actually required for correct processing of human script. You cannot 'magic' away the facts. What you can do, is, if you are programming with a known subset, such as the characters for a stock code, then you can use indexing anyhow, perhaps with the ASCII subset. That is, you can use the byte strings as character strings. > There will always be a difference between natural languages and sophisticated > packages. Yes. However, there is an important point here. Natural languages are quirky and behaviour is variant: each human uses language differently each sentence, varing with region, context .. etc. Obviously, computer systems only use some abstracted representation. While there are many levels and ways of abstracting this, there is one that is worthy of special interest here: the ISO10646 Standard. So I guess my suggestion is that in the _standard_ language libraries we will eventually need to implement the algorithms required for compliance with that Standard. In my opinion, that naturally breaks into two parts: a) (byte/word) string management: this is an issue of storage allocation and manipulation, not natural language processing b) basic natural language processing >Even the current String.uppercase is wrong (in Latin1 there is a > lower case character without corresponding capital character (\223), but WORDS > containing this character can be capitalized by applying a semantical rule). > > I would suppose that String.upper/lowercase are part of the library because the > compiler itself needs them. Currently, ocaml depends on languages that know the > distinction of character cases. AH! you are right! > In my opinion such case functions can only approximate the semantical meaning, > and a simple approximation is better than no approximation. No. That is, I agree entirely, but make a different point: an arbitrary simple approximation is worthless, the one that is useful is the ISO Standardised one. > My idea is that the type "encoding" enumerates all supported combinations; I > expect only a few. Please no. Leave the type open to external augmentation. Just consider: my Interscript literate programming tool ALREADY supports something like 30 "encodings" -- all those present on the unicode.org website. Your 'type' is already a joke. I already support a lot more encodings than that. > What kind of problem do you want to solve with an open ended set of > conversions? Isn't this the task of a specialized program? No. It allows a generalised ISO10646 compliant program to read and perhaps write any file encoded in any supported encoding, but manipulate it internally in one format. If there is an encoding that is missed, it is easy to add a new pair of conversion functions, without breaking the standard library. That is, it is the task of specialised _functions_. It makes sense to provide some as standard like the ones your type suggests -- but not represent the cases with a type. Ocaml variants are not amenable to extension. Function parameters are. That is, I think there are exactly two cases: a) no conversion required b) user supplied conversion function > I think collation should be left out by a basic library. Probably right. Level 1 compliance is a good start, and does not require collation. > The most correct interface is not always the best. What do you mean 'most correct'? Either the interface supports the (ISO10646) required behaviour or not. > There will be a significant slow-down of all ocaml programs if the strings are > encoded as UTF-8. No. On the contrary, most existing programs will be unaffected. Those which actually care about internationalisation can only be made faster ( by providing native support). >I think the user of a language should be able to choose what > is more important: time or space or reduced functionality. UTF-8 saves space, > and costs time; UCS-4 wastes space, but saves time; UCS-2 is a compromise > and bad because it is a compromise; Latin 1 (or another 8 bit cs) saves > time and space but has less functionality. Sure, but, this leads to multiple interfaces. Was that not the original problem? Let me put the argument for UTF-8 differently. Processing UTF-8 'as is' is non-trivial and should be done in low level system functions for speed. Processing arrays of 31 bit integers is _already_ well supported in ocaml, and will be better supported by adding variable length arrays with functions that are designed with some view of use for string processing. So we don't actually need a wide character string type or supporting functions, precisely because in the simplest cases a standard data type not really specialised to script processing will do the job. What is actually required (in both cases) are some 'data tables' to support things like case mapping. For example, a function convert_to_upper i which takes an ocaml integer argument would be useful, and it is easy enough to 'map' this over an array. Sigh. See next post. I will post my code, so it can be torn up by experts. -- John Skaller, mailto:skaller@maxtal.com.au 1/10 Toxteth Rd Glebe NSW 2037 Australia homepage: http://www.maxtal.com.au/~skaller downloads: http://www.triode.net.au/~skaller ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: localization, internationalization and Caml 1999-10-20 21:05 ` Gerd Stolpmann 1999-10-21 4:42 ` skaller @ 1999-10-21 12:05 ` Matías Giovannini 1999-10-21 15:35 ` skaller 1 sibling, 1 reply; 21+ messages in thread From: Matías Giovannini @ 1999-10-21 12:05 UTC (permalink / raw) To: caml-list; +Cc: Gerd.Stolpmann, skaller Gerd Stolpmann wrote: > > On Tue, 19 Oct 1999, John Skaller wrote: > >Gerd Stolpmann wrote: > >> The enlarged character sets become more and more important, and it is only a > >> matter of time until every piece of software which wants to be taken seriously > >> can process them, even a dumb terminal or simple text editor. So you will be > >> able to put accented characters into your comments, and you will see them as > >> such even if you 'cat' the program text to the terminal or printer; this will > >> work everywhere... > > > > Yes. This time is not here yet, but it will come soon that > >international support is mandatory for all large software purchases > >by governments and large corporations. > > I do not believe that this will be the driving force because the current > solutions exist, and it is VERY expensive to replace them. It is even cheaper > to replace a language than a character set/encoding. Looks like another Year > 2000 but without deadline. I still don't understand the point of this discussion. As a MacOS programmer of many years, I tend to view localization and internationalization as tasks best performed by the operating system, or at least by pluggable modules. This discussion of patching l12n and i18n functions *into* OCaml is, to me at least, losing direction. OCaml uses Latin1 for its *internal* encoding of identifiers. While I'll agree that my view is chauvinistic (and selfish, perhaps: I already have "¿¡áéíóúuñÁÉÍÓÚÜÑ" for writing in Spanish, why should I ask for more?), I see no restriction in that (well, If I were Chinese, or Egiptian, I would see things differently). What's more, the whole syntactic apparatus of a programming language *assumes* a Latin setting, where things make sense when read from left to right, from top to bottom; and where punctuation is what we're used to. Programming languages suited for a Han, or Arab, or even a Hebrew audience would have to be rethinked from the grounds up. On the other hand, OCaml provides a String type that *can be* seen as a variable-length sequence of uninterpreted bytes. We have uninterpreted bytes! It's all we need to build whatever I18NString type we may need. What is missing is *library* facilities to abstract that view into a full-fledged i18n machinery. Of course, there's a problem with the manipulation of 32-bit integer values, but if used with care, the Nat datatype could serve perfectly well as the underlying, low-level datatype. Which makes me think, John, you already have variable-length int arrays. Nat's are as unsafe as they get :-) Regards, Matías. -- I got your message. I couldn't read it. It was a cryptogram. -- Laurie Anderson ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: localization, internationalization and Caml 1999-10-21 12:05 ` Matías Giovannini @ 1999-10-21 15:35 ` skaller 1999-10-21 16:27 ` Matías Giovannini 0 siblings, 1 reply; 21+ messages in thread From: skaller @ 1999-10-21 15:35 UTC (permalink / raw) To: matias; +Cc: caml-list, Gerd.Stolpmann Matías Giovannini wrote: > OCaml uses Latin1 for its *internal* encoding of identifiers. While I'll > agree that my view is chauvinistic (and selfish, perhaps: I already have > "¿¡áéíóúuñÁÉÍÓÚÜÑ" for writing in Spanish, why should I ask for more?), > I see no restriction in that (well, If I were Chinese, or Egiptian, I > would see things differently). Exactly. There are quite a lot of Chinese, Indian, Russian ... and non-Latin people in the world: more than Latins. And many are faced with a barrier, participating in the computing world because of language problems. >What's more, the whole syntactic > apparatus of a programming language *assumes* a Latin setting, where > things make sense when read from left to right, from top to bottom; and > where punctuation is what we're used to. Programming languages suited > for a Han, or Arab, or even a Hebrew audience would have to be rethinked > from the grounds up. Actually, no. Most of these peoples learn English and learn computing, if they are to work with computers. But they still wish to use comments, strings, and identifiers in their native script. Have you ever seen a Japanese program? I have. Quite an interesting challenge: normal C/C++ code, with Latin characters encoding Japanese character names in identifiers, and actual Japanese characters in comments and strings. I had no idea what the code did. My point: for a non-native speaker, being forced to use a foreign language for identifiers and comments is a serious impediment, not having native characters in string is not an impediment, but a complete disaster (how will the users of the program understand it -- they may not know any Latin language) > On the other hand, OCaml provides a String type that *can be* seen as a > variable-length sequence of uninterpreted bytes. Yes. What ocaml does not provide is a way of encoding extended characters -- \uXXXX \UXXXXXXXXX in strings, or in identifiers. >We have uninterpreted > bytes! It's all we need to build whatever I18NString type we may need. > What is missing is *library* facilities to abstract that view into a > full-fledged i18n machinery. I agree. >Of course, there's a problem with the > manipulation of 32-bit integer values, but if used with care, the Nat > datatype could serve perfectly well as the underlying, low-level datatype. > > Which makes me think, John, you already have variable-length int arrays. But they're not standard (yet). Actually, ocaml 'int' is 31 bits, which is enough bits for ISO10646 (with some careful fiddling to avoid problems with the sign?). So there are TWO issues -- one is to make ocaml itself ISO10646 aware (i.e., the compiler), and the other is to provide users with libraries to manipulate extended characters. Please note: neither of these features would be optional, were ocaml to be submitted for ISO standardisation. ISO directives require all ISO languages to upgrade to provide international support. I know ocaml isn't an ISO language, but I think the basic intent is sound. [In some sense, ocaml is already a leader, accepting Latin-1 characters when other languages only allowed ASCII] -- John Skaller, mailto:skaller@maxtal.com.au 1/10 Toxteth Rd Glebe NSW 2037 Australia homepage: http://www.maxtal.com.au/~skaller downloads: http://www.triode.net.au/~skaller ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: localization, internationalization and Caml 1999-10-21 15:35 ` skaller @ 1999-10-21 16:27 ` Matías Giovannini 1999-10-21 16:36 ` skaller 0 siblings, 1 reply; 21+ messages in thread From: Matías Giovannini @ 1999-10-21 16:27 UTC (permalink / raw) To: caml-list; +Cc: skaller skaller wrote: > > Matías Giovannini wrote: > >What's more, the whole syntactic > > apparatus of a programming language *assumes* a Latin setting, where > > things make sense when read from left to right, from top to bottom; and > > where punctuation is what we're used to. Programming languages suited > > for a Han, or Arab, or even a Hebrew audience would have to be rethinked > > from the grounds up. > > Actually, no. Most of these peoples learn English and learn > computing, if they are to work with computers. But they still wish > to use comments, strings, and identifiers in their native script. Strings can be localized with a package mechanism, à la Java. I don't like hardwired strings in code, they're a maintenance nightmare (not that I always abide by my own rule :-) > Have you ever seen a Japanese program? I have. > Quite an interesting challenge: normal C/C++ code, with > Latin characters encoding Japanese character names in identifiers, > and actual Japanese characters in comments and strings. I agree that comments should be written in the language most suited to the intended audience (I normally comment my code in English, unless I know I wnat someone else to maintain it, in which case I comment it in Spanish.) > > On the other hand, OCaml provides a String type that *can be* seen as a > > variable-length sequence of uninterpreted bytes. > > Yes. What ocaml does not provide is a way of encoding > extended characters -- \uXXXX \UXXXXXXXXX in strings, or in identifiers. No need to. Use \HH\LL. Again, what OCaml does is sensible, if crude. > >Of course, there's a problem with the > > manipulation of 32-bit integer values, but if used with care, the Nat > > datatype could serve perfectly well as the underlying, low-level datatype. > > > > Which makes me think, John, you already have variable-length int arrays. > > But they're not standard (yet). They are! Don't be put off by its status as "experimental feature". Nat's been around since CamlLight. You could even use it as a template implementation of unsafe longint varlen arrays and link a custom toplevel. Yet again, OCaml provides the tools. > So there are TWO issues -- one is to make ocaml itself > ISO10646 aware (i.e., the compiler), and the other is to provide > users with libraries to manipulate extended characters. I think a more realistic goal would be making OCaml ISO10646-tolerant in comments. Perhaps adding real conditional compilation and transparent comments would suffice. Again, anyone can download the source code and modify OCaml to suit his tastes. OCaml's goal is not to be a model of i18n awareness, but a platform for experimenting with types in a functional setting. It happens that OCaml is open enough, and extensible enough, and efficient enough to make a good i18n effort possible, and that is a tribute to its success as strongly-typed, imperative, fast functional language. > Please note: neither of these features would be optional, > were ocaml to be submitted for ISO standardisation. ISO directives > require all ISO languages to upgrade to provide international > support. I know ocaml isn't an ISO language, but I think the > basic intent is sound. [In some sense, ocaml is already a leader, > accepting Latin-1 characters when other languages only allowed ASCII] The implementors have made clear in more than one occasion that they're not interested in making OCaml a standard language (remember the thread "How to convince management?"). But don't take my word for it, ask Pierre. -- I got your message. I couldn't read it. It was a cryptogram. -- Laurie Anderson ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: localization, internationalization and Caml 1999-10-21 16:27 ` Matías Giovannini @ 1999-10-21 16:36 ` skaller 1999-10-21 17:21 ` Matías Giovannini 1999-10-23 9:53 ` Benoit Deboursetty 0 siblings, 2 replies; 21+ messages in thread From: skaller @ 1999-10-21 16:36 UTC (permalink / raw) To: matias; +Cc: caml-list Matías Giovannini wrote: > Strings can be localized with a package mechanism, à la Java. I don't > like hardwired strings in code, they're a maintenance nightmare (not > that I always abide by my own rule :-) It doesn't matter what you like (or what I like). > > Have you ever seen a Japanese program? I have. > > Yes. What ocaml does not provide is a way of encoding > > extended characters -- \uXXXX \UXXXXXXXXX in strings, or in identifiers. > > No need to. Use \HH\LL. Again, what OCaml does is sensible, if crude. Irrelevant. The \u \U escapes are ISO recommended, used in C and C++, and must be supported. > > > Which makes me think, John, you already have variable-length int arrays. > > > > But they're not standard (yet). > > They are! Don't be put off by its status as "experimental feature". > Nat's been around since CamlLight. Oh, I must have misunderstood your comment: Nat is standard, I'm using it in Viper, but 'a Varray -- a variable length array of 'a, is not. > Again, anyone can download the source code and modify OCaml to suit his > tastes. OCaml's goal is not to be a model of i18n awareness, but a > platform for experimenting with types in a functional setting. Ocaml is a tool, it doesn't have a goal. :-) Humans have goals. The problem is that the designers of ocaml have been too successful: ocaml is so good that other people now want to use it, and _their_ goals are important too. >It > happens that OCaml is open enough, and extensible enough, and efficient > enough to make a good i18n effort possible, and that is a tribute to its > success as strongly-typed, imperative, fast functional language. I agree. It could easily become a leader in this field, since implementing complex stuff is relatively easy in ocaml :-) > The implementors have made clear in more than one occasion that they're > not interested in making OCaml a standard language (remember the thread > "How to convince management?"). But don't take my word for it, ask Pierre. My point was simply that the ISO internationalisation requirements are not unreasonable, and that other languages will be doing this work, some because they have to, and some because they want to stay part of the real world -- and encourage non-English (woops, I mean, non-Latin :-) clients, who, after all, may well make significant contributions. -- John Skaller, mailto:skaller@maxtal.com.au 1/10 Toxteth Rd Glebe NSW 2037 Australia homepage: http://www.maxtal.com.au/~skaller downloads: http://www.triode.net.au/~skaller ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: localization, internationalization and Caml 1999-10-21 16:36 ` skaller @ 1999-10-21 17:21 ` Matías Giovannini 1999-10-23 9:53 ` Benoit Deboursetty 1 sibling, 0 replies; 21+ messages in thread From: Matías Giovannini @ 1999-10-21 17:21 UTC (permalink / raw) To: caml-list; +Cc: skaller skaller wrote: > > Matías Giovannini wrote: > > > Strings can be localized with a package mechanism, à la Java. I don't > > like hardwired strings in code, they're a maintenance nightmare (not > > that I always abide by my own rule :-) > > It doesn't matter what you like (or what I like). It doesn't, my point is: the functionality for localized strings can be had, only through an indirect route, as are "string packages". As an aside, let's keep the tone light, ok? > > > > Have you ever seen a Japanese program? I have. > > > Yes. What ocaml does not provide is a way of encoding > > > extended characters -- \uXXXX \UXXXXXXXXX in strings, or in identifiers. > > > > No need to. Use \HH\LL. Again, what OCaml does is sensible, if crude. > > Irrelevant. The \u \U escapes are ISO recommended, used in > C and C++, and must be supported. Well, OCaml is *not* ISO recommended, is *not* C and it is certainly *not* C++. Let's learn to live with languages other than ISO-mandated, ISO-validated, ISO-standardized and whatnot. In fact, now that I think of it, standardization is driven by market pressure. If OCaml were a commercial product, I guess things would be different. But it's not (thank Pete), see below. > > > > Which makes me think, John, you already have variable-length int arrays. > > > > > > But they're not standard (yet). > > > > They are! Don't be put off by its status as "experimental feature". > > Nat's been around since CamlLight. > > Oh, I must have misunderstood your comment: Nat is standard, > I'm using it in Viper, but 'a Varray -- a variable length array of 'a, > is not. And it's not going to be, unless someone comes with a sound typing strategy *and* an efficient implementation for them. > > > Again, anyone can download the source code and modify OCaml to suit his > > tastes. OCaml's goal is not to be a model of i18n awareness, but a > > platform for experimenting with types in a functional setting. > > Ocaml is a tool, it doesn't have a goal. :-) > Humans have goals. The problem is that the designers of ocaml > have been too successful: ocaml is so good that other people now > want to use it, and _their_ goals are important too. Let me restate it: OCaml is the intellectual property of INRIA, developed under a specific project (Projet Cristal if I remember correctly) with very definite goals. The project *has* goals, anything outside those goals is a gift (what is more, everything falling *within* those goals already *are* a gift), and must be accepted as that. If INRIA decides that since OCaml is useful to many many people around the world and want to make one of its goals to turn OCaml into a platform for experimenting in the implementation of programming languages with strong i18n support, well, bring the champagne. In the meantime, we'll have to build upon what it's there. Suppose the following scenario: INRIA decides that the MacOS platform is not nearly significative enough to justify the porting effort, and so it is dropped. What should I do? Plead, certainly, until I'm told "don't whine, there's nothing we can do". What would be my options? Use a Wintel box, or make my own port. This scenario is not unrealistic: there's no native compiler under MacOS, and there won't be until someone ports it. I can't do it, the implementors can't do it, and such is life. > >It > > happens that OCaml is open enough, and extensible enough, and efficient > > enough to make a good i18n effort possible, and that is a tribute to its > > success as strongly-typed, imperative, fast functional language. > > I agree. It could easily become a leader in this field, > since implementing complex stuff is relatively easy in ocaml :-) > > > The implementors have made clear in more than one occasion that they're > > not interested in making OCaml a standard language (remember the thread > > "How to convince management?"). But don't take my word for it, ask Pierre. > > My point was simply that the ISO internationalisation requirements > are not unreasonable, and that other languages will be doing this work, > some because they have to, and some because they want to stay part of > the real world -- and encourage non-English (woops, I mean, non-Latin > :-) > clients, who, after all, may well make significant contributions. Hm. I see your point. I don't necessarily agree, though. -- I got your message. I couldn't read it. It was a cryptogram. -- Laurie Anderson ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: localization, internationalization and Caml 1999-10-21 16:36 ` skaller 1999-10-21 17:21 ` Matías Giovannini @ 1999-10-23 9:53 ` Benoit Deboursetty 1999-10-25 21:06 ` Jan Skibinski 1999-10-26 18:02 ` skaller 1 sibling, 2 replies; 21+ messages in thread From: Benoit Deboursetty @ 1999-10-23 9:53 UTC (permalink / raw) To: caml-list This message just wants to raise a paradoxical point in this discussion [yet it may have already been posted ?]. It seems to me that allowing foreign characters to be used in a computer language, as identifiers or comments, would reduce the exchange of contributions worldwide. Here is my personal experience: i have used caml and ocaml for more than 2 years now. From the beginning, it seemed to me really cool to be able to have identifiers in French, with accents and everything. So i took the habit of using French in my programs. Now, i'm writing a more consequent program, which could become a small intl "open project". *Except* that i find myself with a program in French, and that it's not so easy to find qualified programming partners who understand French. The range of people who could help with my program is terribly limited. You should understand i sometimes feel i should have written it in english. I must however acknowledge that [o']caml 's ability to cope with latin1 characters is above all useful for educational purpose. Let me explain... Perhaps is it a French thing, but in this country it sounds quite snobbish for a French to embed English words in a sentence with the right accent + stress. Hence, almost every computer science teacher takes an exaggerated French accent to pronounce English words ("la fonction 'rimouve'"). [I shall not disclose the names of my teachers in CaML :) ] So, for educational purposes, it is much better if the teachers can have French identifiers ("la fonction 'enlève'"). Much easier to pronounce, isn't it? I suppose it is the same for many other countries. (i think especially of japan "biko-zu ingurisshu izu ha-do tsu puronaonsu foa japani-zu pi-poru tsu-") My point remains: encouraging people to write code in their language would reduce the possiblities of exchanging their work. This does not mean, though, that i will translate the program i've written into english. I consider it is a sort of tribute to the preservation of the diversity of languages, at my most humblest scale... and i will write enough programs in English when i work for a company, too. Benoît de Boursetty Benoit.de-Boursetty@polytechnique.org ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: localization, internationalization and Caml 1999-10-23 9:53 ` Benoit Deboursetty @ 1999-10-25 21:06 ` Jan Skibinski 1999-10-26 18:02 ` skaller 1 sibling, 0 replies; 21+ messages in thread From: Jan Skibinski @ 1999-10-25 21:06 UTC (permalink / raw) To: Benoit Deboursetty; +Cc: caml-list On Sat, 23 Oct 1999, Benoit Deboursetty wrote: > This message just wants to raise a paradoxical point in this discussion > [yet it may have already been posted ?]. It seems to me that allowing > foreign characters to be used in a computer language, as identifiers or > comments, would reduce the exchange of contributions worldwide. Yes, but it is nice to have error messges, prompts, etc. expressed in a native language of a program user. And an ability of a native text processing is also quite often desirable. I have been reading this thread for some time and I've seen plenty of references to Latin1 and of different attitudes to its usefulness (or not). Let me add my two cents here. Demanding a support for diacritical marks is often not a matter of being snobbish or a language purist. I cannot speak for other languages that use Latin alphabet but I can tell you what a mess it is with Polish (having 8 diacritical marks), and, I suppose, with other languages, such as Hungarian, etc. that have been qualified to Latin2. Someone has made such decision some time ago, and now we pay a price, since Latin1 seems to be seen by some as some sort of improvement over the plain ascii. I am not whining here because I can get quite fine with the plain ascii in my email, etc., and I can even cope with all sort of email that come here formatted as either Latin1 or Latin2. But even so, I sometimes find myself cornered by plain ascii when a meaning of a sentence becomes suddenly funny, or bezerk or senseless. One example to illustrate the point. 1. z<.>a<;>danie - "a strong request". This is what I want to use 2. zadanie - "a problem to solve, or a goal". Wrong! This is what I get from plain ascii 3. rzadanie - When pronounced it does not sound quite as "a request", but an intelligent recipient can guess my intention. But they might as well consider me illiterate; Polish has two alternative spelling of the same (similar) sound: "z<.>" and "rz". In this case "rz" is very wrong. 4. rzondanie - Now it sounds almost OK ("on" sounds close to "a<;>"), but the spelling is even worse. where z<.> stands for dot over z a<;> stands for "ogonek" (yes, this is an official name in Unicode), or "a tail" under a. As you can see, this is not just matter of some perky accents. Jan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: localization, internationalization and Caml 1999-10-23 9:53 ` Benoit Deboursetty 1999-10-25 21:06 ` Jan Skibinski @ 1999-10-26 18:02 ` skaller 1 sibling, 0 replies; 21+ messages in thread From: skaller @ 1999-10-26 18:02 UTC (permalink / raw) To: Benoit Deboursetty; +Cc: caml-list Benoit Deboursetty wrote: > > This message just wants to raise a paradoxical point in this discussion > [yet it may have already been posted ?]. It seems to me that allowing > foreign characters to be used in a computer language, as identifiers or > comments, would reduce the exchange of contributions worldwide. Excuse me, but exactly what do you mean by 'foreign' characters? Do you mean non Chinese characters? What? You aren't Chinese? > You should understand i sometimes feel i should have written it in > english. I think that, at the moment, English is the 'lingua franca' <grin> of the Internet. Spoken with an American accent :-) However, the Internet is growing fast, and the number of English speakers will soon enough be a minority. It will probably remain true that most of the _programmers_ will be able to use English. > I must however acknowledge that [o']caml 's ability to cope with latin1 > characters is above all useful for educational purpose. Yes. I think it is highly laudible that ocaml accepts more than just plain 'ASCII': many students are more fluent with their native language (even if they speak some English and/or are learning it), and being able to program with it will enhance learning. Internationalising software that is actually worth sharing internationally is a lesser obstacle that writing good software in the first place. > My point remains: encouraging people to write code in their language would > reduce the possiblities of exchanging their work. In my opinion, a programming language should simply give clients a _choice_. Cultures, people, and circumstances vary. I don't think programming language designers should be in the business of encouraging or discouraging use of a particular language, but rather facilitating the implementation of the clients own wishes or requirements. -- John Skaller, mailto:skaller@maxtal.com.au 1/10 Toxteth Rd Glebe NSW 2037 Australia homepage: http://www.maxtal.com.au/~skaller downloads: http://www.triode.net.au/~skaller ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: localization, internationalization and Caml 1999-10-15 13:53 Gerard Huet 1999-10-15 20:28 ` Gerd Stolpmann @ 1999-10-17 14:29 ` Xavier Leroy 1999-10-19 18:36 ` skaller 1 sibling, 1 reply; 21+ messages in thread From: Xavier Leroy @ 1999-10-17 14:29 UTC (permalink / raw) To: caml-list Wow, there's nothing like internationalization to spark lively discussions. Since even Gérard Huet (oops, sorry for that 8859-1 accent, couldn't resist) and Francis Dupont broke their vows of silence, I guess I have to say something too. The support for ISO-8859-1 in Caml Light and OCaml is essentially an historical and geographical accident. The first books on Caml were written in French, and it was nice to be able to use accented french words as identifiers. Also, that was at a time (1991-1992) where Unicode and consorts didn't even exist. The choice of ISO-8859-1 is not that politically incorrect either: it works not only for western Europe, but also for Latin America, many Pacific countries, and large parts of Africa. If we were to choose an 8-bit character set based on the number of OCaml programmers that actually need it, I guess ISO-8859-1 (or its newer incarnation with the Euro sign whose name I can't remember) would still win. (At least until we get OCaml in the Chinese curriculum...) Notice also that Caml doesn't prevent the programmer from putting any character set that includes ASCII (ISO-8859-x, but also UTF8-encoded Unicode) in character strings and in comments. There are several ways to internationalize further. One is to support other 8-bit character sets the POSIX way (the LC_CTYPE stuff). There are several problems with this: - It's not enough for Asian languages. - The POSIX localization stuff isn't supported under Windows. - It's badly supported on all Unixes I know (e.g. to get French, I need to set LC_CTYPE to different values under Linux, Solaris, and Digital Unix; it gets worse for other languages such as Japanese). - Handling of mixed-language texts is a nightmare. Unicode / ISO10646 is probably a better approach. However, it has its own problems: - There's 16-bit Unicode and 32-bit Unicode. Early adopters of that technology (Windows, Java) chose 16-bit Unicode; late adopters (Unix) chose 32-bit Unicode. (That's the great things about standards: there are so many to choose from...) - Apparently, not everyone agrees on multi-byte encodings (UTF8) as well. E.g. Java seems to have its own variant of UTF8. How are we going to interoperate? - I/O is a nightmare. The API has to handle at least byte streams, wide character streams, and UTF8-encoded streams. - Support for Unicode / UTF8 files in today's operating systems and GUIs is very low. When will I be able to do "more" on an UTF8 file and see my French accented letters? My conclusion is that I18N is such a mess that I don't think we'll do much about it in Caml anytime soon. Perhaps some basic support for wide characters and wide character strings will be added at some point, if only because COM interoperability requires it. - Xavier Leroy ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: localization, internationalization and Caml 1999-10-17 14:29 ` Xavier Leroy @ 1999-10-19 18:36 ` skaller 0 siblings, 0 replies; 21+ messages in thread From: skaller @ 1999-10-19 18:36 UTC (permalink / raw) To: Xavier Leroy; +Cc: caml-list Xavier Leroy wrote: > The support for ISO-8859-1 in Caml Light and OCaml is essentially an > historical and geographical accident. The first books on Caml were > written in French, and it was nice to be able to use accented french > words as identifiers. Also, that was at a time (1991-1992) where > Unicode and consorts didn't even exist. And supporting ISO-8859-1 was a fine thing to do at the time! > The choice of ISO-8859-1 is not that politically incorrect either: it > works not only for western Europe, but also for Latin America, many > Pacific countries, and large parts of Africa. If we were to choose an > 8-bit character set based on the number of OCaml programmers that > actually need it, I guess ISO-8859-1 (or its newer incarnation with > the Euro sign whose name I can't remember) would still win. (At least > until we get OCaml in the Chinese curriculum...) While this is true, there is a circularity here: people not using 8 bit character sets face an extra battle using ocaml. > Notice also that Caml doesn't prevent the programmer from putting any > character set that includes ASCII (ISO-8859-x, but also UTF8-encoded > Unicode) in character strings and in comments. Yes. This one of the key points of my argument that UTF-8 is the natural way to go: it provides ISO-10646 compliance without requiring any new string kind. > There are several ways to internationalize further. One is to support > other 8-bit character sets the POSIX way (the LC_CTYPE stuff). There > are several problems with this: > - It's not enough for Asian languages. > - The POSIX localization stuff isn't supported under Windows. > - It's badly supported on all Unixes I know (e.g. to get French, I > need to set LC_CTYPE to different values under Linux, Solaris, and > Digital Unix; it gets worse for other languages such as Japanese). > - Handling of mixed-language texts is a nightmare. If you are suggesting not using C locale stuff -- I agree entirely. > Unicode / ISO10646 is probably a better approach. However, it has its > own problems: > - There's 16-bit Unicode and 32-bit Unicode. Early adopters of that > technology (Windows, Java) chose 16-bit Unicode; late adopters (Unix) > chose 32-bit Unicode. (That's the great things about standards: > there are so many to choose from...) I cannot see the problem -- except for the 16 bit adopters, who must eventually upgrade .. again. > - Apparently, not everyone agrees on multi-byte encodings (UTF8) as well. > E.g. Java seems to have its own variant of UTF8. How are we going > to interoperate? I do not understand: UTF-8 is a fixed, internationally standardised encoding. If it is used, the ISO Standard is followed. If Java doesn't do that, that is Java's problem. > - I/O is a nightmare. The API has to handle at least byte streams, > wide character streams, and UTF8-encoded streams. No, it doesn't. This is a possibility. But it is NOT necessary. It is necessary only to read byte streams. Conversion can be done later using strings. This is less efficient, but it is a sensible starting point (to ignore internationalisation on I/O completely). > - Support for Unicode / UTF8 files in today's operating systems and GUIs > is very low. When will I be able to do "more" on an UTF8 file and see my > French accented letters? Yes. I agree. This is a major problem. One of the answers is "When programming languages provide the support that applications programmers need" :-) > My conclusion is that I18N is such a mess that I don't think we'll do > much about it in Caml anytime soon. I agree. The way forward is, I believe: a) do not change the I/O system, but deprecate TEXT mode (all I/O should be done in binary) b) do not change the String module, but deprecate the upper/lower case functions (and anything else that smacks of relating to natural language) c) Provide functions to support internationalisation. d) modify the ocaml compiler, to process \uXXXX and \UXXXX escapes [everywhere] e) provide a fast variable length array type (d) could be done easily using ocamlp4 I think. >Perhaps some basic support for > wide characters and wide character strings will be added at some > point, if only because COM interoperability requires it. I don't think it is necessary, a variable length array of integers is good enough. -- John Skaller, mailto:skaller@maxtal.com.au 1/10 Toxteth Rd Glebe NSW 2037 Australia homepage: http://www.maxtal.com.au/~skaller downloads: http://www.triode.net.au/~skaller ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~1999-10-28 17:07 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 1999-10-13 12:12 localization, internationalization and Caml STARYNKEVITCH Basile 1999-10-14 22:20 ` skaller 1999-10-15 8:26 ` Francis Dupont 1999-10-17 11:27 ` skaller 1999-10-17 15:54 ` Francis Dupont 1999-10-19 18:48 ` skaller 1999-10-15 13:53 Gerard Huet 1999-10-15 20:28 ` Gerd Stolpmann 1999-10-19 18:06 ` skaller 1999-10-20 21:05 ` Gerd Stolpmann 1999-10-21 4:42 ` skaller 1999-10-21 12:05 ` Matías Giovannini 1999-10-21 15:35 ` skaller 1999-10-21 16:27 ` Matías Giovannini 1999-10-21 16:36 ` skaller 1999-10-21 17:21 ` Matías Giovannini 1999-10-23 9:53 ` Benoit Deboursetty 1999-10-25 21:06 ` Jan Skibinski 1999-10-26 18:02 ` skaller 1999-10-17 14:29 ` Xavier Leroy 1999-10-19 18:36 ` skaller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).