From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id UAA21512; Wed, 16 Jan 2002 20:22:39 +0100 (MET) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id UAA21666 for ; Wed, 16 Jan 2002 20:22:38 +0100 (MET) Received: from maila.telia.com (maila.telia.com [194.22.194.231]) by nez-perce.inria.fr (8.11.1/8.11.1) with ESMTP id g0GJMbv08578; Wed, 16 Jan 2002 20:22:37 +0100 (MET) Received: from gateway (h175n2fls34o849.telia.com [217.208.235.175]) by maila.telia.com (8.11.6/8.11.6) with SMTP id g0GJMb719794; Wed, 16 Jan 2002 20:22:37 +0100 (CET) From: "Mattias Waldau" To: Cc: "Xavier Leroy" Subject: RE: [Caml-list] Non-mutable strings Date: Wed, 16 Jan 2002 20:22:36 +0100 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 Importance: Normal In-Reply-To: <20020110185619.A20606@pauillac.inria.fr> Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk Nice to see that there is a general interest of non-mutable strings. However, as Xavier says, maybe it is a bit late. We have another string problem, namely handling non-ascii. As I understand it, one of the points of of nML (http://ropas.kaist.ac.kr/n/), with is a new ML language currently built using Ocaml, is that it handles asian characters. Also, their was an entry recently into this group about asian characters codings. I don't think any language can continue to be pure-ascii for ever. One of the reason of Ruby's success is that it handles non-ascii (I think it is made by an japanese). However, even we Swedes have problems, only 2 of our 3 special characters are in the lower 7 bits and sorting is always wrong. A unicode char is between 1 and 4 bytes, that means that str[i] doesn't work (unless you do as NT or Java, store it as wide chars internally, which of course Ocaml could do too). You always have to start at the beginning of the string to find the i:th char. Thus, introducing Unicode strings (or something similar, I heard that Asians don't like Unicode at all) and introducing non-mutable strings should preferrable be done simultaneously. In order to have 8-bit chars strings and unicode strings simultaneously we need something like 'u"', and maybe the possibility to say that all strings are unicode. Can this be done using a module just like 'open Float' redefines '+' to '+.'? Or should Ocaml v 4 go the whole way and let all strings (also identifiers) be Unicode? /mattias P.s. Microsoft NT, 2000, XP handles double byte chars everywhere, it is called BSTR and in order to make string comparasion etc library-routines are called all the time. However, since Unicode can be 4 byte, I don't know how that is encoded into 2 bytes. ------------------- Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr