From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id KAA14016 for caml-red; Fri, 12 Jan 2001 10:02:03 +0100 (MET) Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id TAA04904 for ; Thu, 11 Jan 2001 19:49:34 +0100 (MET) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.11.1/8.10.0) with ESMTP id f0BInMj00709; Thu, 11 Jan 2001 19:49:22 +0100 (MET) Received: (from xleroy@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id TAA04292; Thu, 11 Jan 2001 19:49:16 +0100 (MET) Date: Thu, 11 Jan 2001 19:49:16 +0100 From: Xavier Leroy To: Dave Berry Cc: John Max Skaller , Markus Mottl , OCAML Subject: Re: Unicode (was RE: JIT-compilation for OCaml?) Message-ID: <20010111194916.B4332@pauillac.inria.fr> References: <3145774E67D8D111BE6E00C0DF418B663AD720@nt.kal.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: <3145774E67D8D111BE6E00C0DF418B663AD720@nt.kal.com>; from dave@kal.com on Thu, Jan 11, 2001 at 12:58:53PM -0000 Sender: weis@pauillac.inria.fr > I thought Unicode was a recognised subset of ISO-10646, corresponding to the > range 0-2^16. Also, don't Windows NT/2000 use Unicode? Yes, Win32 (i.e. 95, 98, ME, NT, 2000 and whatnot) uses 16-bit characters. Java too. But Unix C libraries that support wide chars seem to prefer 32-bit characters. Remember: "Standards are great: there are so many to choose from." > (I realise this isn't directly on-topic, but it may be relevant for future > extensions to OCaml?) It is very relevant indeed. We've been contemplating adding some simple support for wide characters and wide strings, e.g. as two new library modules, but the stumbling point is whether to use 16-bit or 32-bit wide characters. While 32 bits is probably the wave of the future, 16 bits is what we need to interface easily with Java and with many Microsoft products (e.g. COM dispatch components, Visual Basic, various Win32 APIs). Shall we "do it right" (for some notion of "right") or favor interoperability? Hard question. My current answer is to procrastinate... Actually, multi-byte encoded strings (UTF-8) are not so bad and already have full support in OCaml :-) - Xavier Leroy