From mboxrd@z Thu Jan  1 00:00:00 1970
Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id UAA21512; Wed, 16 Jan 2002 20:22:39 +0100 (MET)
X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f
Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id UAA21666 for <caml-list@pauillac.inria.fr>; Wed, 16 Jan 2002 20:22:38 +0100 (MET)
Received: from maila.telia.com (maila.telia.com [194.22.194.231])
	by nez-perce.inria.fr (8.11.1/8.11.1) with ESMTP id g0GJMbv08578;
	Wed, 16 Jan 2002 20:22:37 +0100 (MET)
Received: from gateway (h175n2fls34o849.telia.com [217.208.235.175])
	by maila.telia.com (8.11.6/8.11.6) with SMTP id g0GJMb719794;
	Wed, 16 Jan 2002 20:22:37 +0100 (CET)
From: "Mattias Waldau" <mattias.waldau@abc.se>
To: <caml-list@inria.fr>
Cc: "Xavier Leroy" <xavier.leroy@inria.fr>
Subject: RE: [Caml-list] Non-mutable strings
Date: Wed, 16 Jan 2002 20:22:36 +0100
Message-ID: <AAEBJHFJOIPMMIILCEPBKEPBDGAA.mattias.waldau@abc.se>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0)
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000
Importance: Normal
In-Reply-To: <20020110185619.A20606@pauillac.inria.fr>
Sender: owner-caml-list@pauillac.inria.fr
Precedence: bulk

Nice to see that there is a general interest of non-mutable strings.
However, as Xavier says, maybe it is a bit late.

We have another string problem, namely handling non-ascii. As I understand
it, one of the points of of nML (http://ropas.kaist.ac.kr/n/), with is a new
ML language currently built using Ocaml, is that it handles asian
characters. Also, their was an entry recently into this group about asian
characters codings.

I don't think any language can continue to be pure-ascii for ever. One of
the reason of Ruby's success is that it handles non-ascii (I think it is
made by an japanese). However, even we Swedes have problems, only 2 of our 3
special characters are in the lower 7 bits and sorting is always wrong.

A unicode char is between 1 and 4 bytes, that means that str[i] doesn't work
(unless you do as NT or Java, store it as wide chars internally, which of
course Ocaml could do too). You always have to start at the beginning of the
string to find the i:th char.

Thus, introducing Unicode strings (or something similar, I heard that Asians
don't like Unicode at all) and introducing non-mutable strings should
preferrable be done simultaneously.

In order to have 8-bit chars strings and unicode strings simultaneously we
need something like 'u"', and maybe the possibility to say that all strings
are unicode. Can this be done using a module just like 'open Float'
redefines '+' to '+.'?

Or should Ocaml v 4 go the whole way and let all strings (also identifiers)
be Unicode?

/mattias

P.s. Microsoft NT, 2000, XP handles double byte chars everywhere, it is
called BSTR and in order to make string comparasion etc library-routines are
called all the time. However, since Unicode can be 4 byte, I don't know how
that is encoded into 2 bytes.

-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr