From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id TAA06631 for caml-red; Fri, 12 Jan 2001 19:57:20 +0100 (MET) Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id KAA17595 for ; Fri, 12 Jan 2001 10:25:30 +0100 (MET) Received: from localhost.localdomain (kenny77.zip.com.au [61.8.18.205]) by concorde.inria.fr (8.11.1/8.10.0) with ESMTP id f0C9PRD19877; Fri, 12 Jan 2001 10:25:27 +0100 (MET) Received: from ozemail.com.au (IDENT:root@localhost [127.0.0.1]) by localhost.localdomain (8.9.3/8.8.7) with ESMTP id UAA07238; Fri, 12 Jan 2001 20:24:10 +1100 Message-ID: <3A5ECD3A.7B65D01B@ozemail.com.au> Date: Fri, 12 Jan 2001 20:24:10 +1100 From: John Max Skaller X-Mailer: Mozilla 4.7 [en] (X11; I; Linux 2.2.12-20 i686) X-Accept-Language: en MIME-Version: 1.0 To: Xavier Leroy CC: Dave Berry , Markus Mottl , OCAML Subject: Re: Unicode (was RE: JIT-compilation for OCaml?) References: <3145774E67D8D111BE6E00C0DF418B663AD720@nt.kal.com> <20010111194916.B4332@pauillac.inria.fr> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: weis@pauillac.inria.fr Xavier Leroy wrote: > Shall we "do it right" (for some notion of "right") or favor > interoperability? Hard question. My current answer is to > procrastinate... Actually, multi-byte encoded strings (UTF-8) are not > so bad and already have full support in OCaml :-) I personally think this is the first step, since no new data types are required. Instead, what is needed would seem to be simple. What I believe is required is 1. changes to the lexer to support \uXXXX and \UXXXXXXXX escapes (in strings, and probably in identifiers) 2. changes to the lexer to recognize the 'letters' which can be used in identifiers. The letters which should be allowed are specified in an ISO document. 3. Provide a codec to convert Latin-1 to UTF-8. [One can argue about whether it is applied by default or not :-] You might provide other codecs too, such as UCS-16 -> UTF-8 I guess most of the rest can be done in Ocaml or C without impacting the compiler/run-time, and when it is right, the compiler/run-time can be tuned to make more efficient representations possible. [For example, to generate inline code to compare 16/31 bit unsigned integers, rather than call a C routine] -- John (Max) Skaller, mailto:skaller@maxtal.com.au 10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850 checkout Vyper http://Vyper.sourceforge.net download Interscript http://Interscript.sourceforge.net