From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id p1SBLgUM026272 for ; Mon, 28 Feb 2011 12:21:43 +0100 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgYCACcUa03RVdS2kGdsb2JhbACEJKIaCBUBAQEBCQkMBxEEIaIiiWSCWYQXiQkBAQMFgSKDRHYEjB2ITzqBFQ X-IronPort-AV: E=Sophos;i="4.62,238,1297033200"; d="scan'208";a="88823720" Received: from mail-px0-f182.google.com ([209.85.212.182]) by mail4-smtp-sop.national.inria.fr with ESMTP/TLS/RC4-SHA; 28 Feb 2011 12:21:33 +0100 Received: by pxi20 with SMTP id 20so872337pxi.27 for ; Mon, 28 Feb 2011 03:21:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=ss4SYDR70UQ2eE09hurJWbJSgZYgHxd2D2x/mKfQDLE=; b=UxEeeUa5LOH2xESRiMo8NS0uOLHTltmznXrNqadh97MRXbGQCoHs9u62RLHMZLD+zO jn+/Od+IzyoBVDDT4wkYM+1wyNjdehGqSmQORRgtM3sMTo0M/4cIBVRGD4shsuBqTCec ymlFSMX21SC7MZur6F2sAuzGJc271tC6SCyf4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=QidDLGChWHY7rMusnPS9XhDfk58CZkJU71x5oyxo/LVxKyaKPWiGJQyvldm0EBHxjG jPFKQ58DjCLncYBJhNoDbsw4mY4mEdvuMdkofoPJ9IGwZhJYUbGk7Fz7+4V2K+PVHxMZ LD14RSD2QucZWJy4US9xSd77OsXerNewouwa0= MIME-Version: 1.0 Received: by 10.142.172.13 with SMTP id u13mr4291919wfe.361.1298892092061; Mon, 28 Feb 2011 03:21:32 -0800 (PST) Sender: daniel.c.buenzli@gmail.com Received: by 10.142.127.3 with HTTP; Mon, 28 Feb 2011 03:21:32 -0800 (PST) In-Reply-To: References: <20110228.093528.996524125295855263.Christophe.Troestler@umons.ac.be> Date: Mon, 28 Feb 2011 12:21:32 +0100 X-Google-Sender-Auth: YA53jraoFQ8V57ubywq22RTnh9E Message-ID: From: =?UTF-8?Q?Daniel_B=C3=BCnzli?= To: David Allsopp Cc: Christophe TROESTLER , OCaml Mailing List Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by walapai.inria.fr id p1SBLgUM026272 Subject: Re: [Caml-list] GSoC: better UTF-8 support > If it's to go into the standard library then yes, it should exactly replicate the interface of the Char and String modules, that way within the standard library UTF8 can be used as a drop-in replacement [...] I'm not sure many programs would actually benefit from that. At a certain point if you really want to process unicode at the character level you'll need a proper library. Using these ascii/latin1 oriented interfaces to process unicode at the character level would be debilitating and frustrating for your final users (e.g. no treatement of normal forms, you do realize that in unicode there's more than one way of representing the character 'é'). The current status quo already allows you to treat UTF-8 encoded string if you don't try to look into them at the character level which is fine for many programs. > Out of interest, what are your complaints against the String and Char modules - missing functions or something deeper? Every time I have to explode a string at a given separator and want to use only the standard library I complain. > It is mildly amusing that you criticise the String and Char modules, yet have interest in this module given [...] The thing is that this support could be included without changing the interfaces at all. Only the regexp language needs to be extended (and I guess the underlying implementation wouldn't have to be changed). Best, Daniel