From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id p1SECMYL001801 for ; Mon, 28 Feb 2011 15:12:22 +0100 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgYCAMM7a03RVdI2kGdsb2JhbACEJKIaCBUBAQEBCQkMBxEEIaICiWSCWYQjiQkBAQMFgSKDRHYEjB2ITzqBFYFz X-IronPort-AV: E=Sophos;i="4.62,239,1297033200"; d="scan'208";a="76741105" Received: from mail-pz0-f54.google.com ([209.85.210.54]) by mail3-smtp-sop.national.inria.fr with ESMTP/TLS/RC4-SHA; 28 Feb 2011 15:12:16 +0100 Received: by pzk32 with SMTP id 32so903742pzk.27 for ; Mon, 28 Feb 2011 06:12:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=Y3Go2jxGxsungPtkFWULFVB4NyoLXn9IqIPo6QlRAO8=; b=veMWMCF5+NOYnRzE+FPTfKiSVhfubFrPo8qtTzfzTIZt6vE3MA5cPb0/6jILfb9eix lween4wCY9qW9M0onS5wquFPdlDQCLhMQ3l9I05wJfuyh7FxW1nitkMa1zG4wTvJQxGW XOdGgVTKu1rYt37W0QxtWU9SfriySZFiIYBPI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=SXoU18eGKw2zDj0/bBML9IJIQt/NJYfhlP35alXrQ14vzuWRb+LTxeh9qlks5eTQkN YTeIjp50wOEzuCeLYnluzlWyEo06ODa1/5XNxjecRIySpR+uFQn1G4VbUqs+OKUJUtQ/ f3/tedb5zSOFOCtRATOJBKAk78Y6iPPWhs6fI= MIME-Version: 1.0 Received: by 10.142.238.9 with SMTP id l9mr4429599wfh.247.1298902316306; Mon, 28 Feb 2011 06:11:56 -0800 (PST) Sender: daniel.c.buenzli@gmail.com Received: by 10.142.127.3 with HTTP; Mon, 28 Feb 2011 06:11:56 -0800 (PST) In-Reply-To: <20110228.143157.1265982603697554449.Christophe.Troestler+ocaml@umons.ac.be> References: <20110228.093528.996524125295855263.Christophe.Troestler@umons.ac.be> <20110228.143157.1265982603697554449.Christophe.Troestler+ocaml@umons.ac.be> Date: Mon, 28 Feb 2011 15:11:56 +0100 X-Google-Sender-Auth: Kse9ukJNHi1e2ofarubfKXr-cRI Message-ID: From: =?UTF-8?Q?Daniel_B=C3=BCnzli?= To: Christophe TROESTLER Cc: OCaml Mailing List Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by walapai.inria.fr id p1SECMYL001801 Subject: Re: [Caml-list] GSoC: better UTF-8 support > Thinking more about this, one could introduce a new type (say “utf8” > or “ustring”) for these UTF-8 strings.  It should be compatible with > the way UTF-8 strings are handled on the C side for interoperability > but “optimized” — e.g. should they contain their length (number of > unicode chars)? > > Another thing: it could be a nice way to transition to *immutable* > unicode strings.  This is not possible for (standard) strings because, > as you all know, they are both used as strings and as buffers.  The > introduction of unicode strings may be the right opportunity to > distinguish both [1]. Frankly I see no benefit of introducing this half-baked UTF-8 support into the standard library (which is what this proposal is about). This will just bring in more noise in the interfaces. Even worse, developers will think they handle unicode properly while they do in fact not, bringing more confusion on already confusing topic (I'm always surprised the little programmers know about unicode). Again, pretending supporting unicode character level processing by replacing latin1 character level processing the way you suggest is just plain wrong. For me either you : 1) Provide full unicode support in the standard library with at least normal form and collation support in a new API, separate from the current, existing String and Char modules. 2) Leave full unicode support to a third party library and keep the current state with some improvements for coping with UTF-8 encoded string literals and for interacting with file systems correctly. Given signals given in the past by the ocaml dev team 2) seems more likely to be accepted. Best, Daniel