From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id TAA19081; Thu, 17 Jan 2002 19:00:51 +0100 (MET) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id TAA19375 for caml-list@pauillac.inria.fr; Thu, 17 Jan 2002 19:00:51 +0100 (MET) Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id LAA08848 for ; Thu, 17 Jan 2002 11:19:37 +0100 (MET) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.11.1/8.11.1) with ESMTP id g0HAJVX28651; Thu, 17 Jan 2002 11:19:31 +0100 (MET) Received: (from vouillon@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id LAA07365; Thu, 17 Jan 2002 11:19:31 +0100 (MET) Date: Thu, 17 Jan 2002 11:19:31 +0100 From: Jerome Vouillon To: Mattias Waldau Cc: caml-list@inria.fr, Xavier Leroy Subject: Re: [Caml-list] Non-mutable strings Message-ID: <20020117111931.B7420@pauillac.inria.fr> References: <20020110185619.A20606@pauillac.inria.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Mailer: Mutt 1.0i In-Reply-To: ; from mattias.waldau@abc.se on Wed, Jan 16, 2002 at 08:22:36PM +0100 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk On Wed, Jan 16, 2002 at 08:22:36PM +0100, Mattias Waldau wrote: > A unicode char is between 1 and 4 bytes, that means that str[i] doesn't work > (unless you do as NT or Java, store it as wide chars internally, which of > course Ocaml could do too). You always have to start at the beginning of the > string to find the i:th char. Is this really a problem? It seems to me that you very rarely need to do this. NT uses internally the UTF-16 encoding, where a unicode character takes either 2 or 4 bytes, so you cannot easily find the i-th character either. Java is broken and only support Unicode characters that fit in two bytes. > Thus, introducing Unicode strings (or something similar, I heard that Asians > don't like Unicode at all) and introducing non-mutable strings should > preferrable be done simultaneously. Yes, Unicode support seems to be a good opportunity to introduce non-mutable strings. > In order to have 8-bit chars strings and unicode strings simultaneously we > need something like 'u"', and maybe the possibility to say that all strings > are unicode. Can this be done using a module just like 'open Float' > redefines '+' to '+.'? > > Or should Ocaml v 4 go the whole way and let all strings (also identifiers) > be Unicode? We can go a long way without specific support from the language. In my opinion, we should first write a good Unicode library and only then start to think about language support. -- Jérôme ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr