From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by yquem.inria.fr (Postfix) with ESMTP id 6CA7BBC6B for ; Tue, 9 Oct 2007 18:55:24 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ao8CAFNPC0fUVZgL/2dsb2JhbAA X-IronPort-AV: E=Sophos;i="4.21,249,1188770400"; d="scan'208";a="4273752" Received: from hades.snarc.org ([212.85.152.11]) by mail3-smtp-sop.national.inria.fr with ESMTP; 09 Oct 2007 18:55:23 +0200 Received: by hades.snarc.org (Postfix, from userid 1000) id A3BD31B482; Tue, 9 Oct 2007 18:55:20 +0200 (CEST) Date: Tue, 9 Oct 2007 18:55:20 +0200 To: Loup Vaillant Cc: Jon Harrop , caml-list@yquem.inria.fr Subject: Re: [Caml-list] Re: Rope is the new string Message-ID: <20071009165520.GA27507@snarc.org> References: <1191879429.28011.27.camel@rosella.wigram> <1191886669.26491.29.camel@rosella.wigram> <20071009.122004.251675959.Christophe.Troestler+ocaml@umh.ac.be> <200710091440.48381.jon@ffconsultancy.com> <20071009155702.GB26282@snarc.org> <6f9f8f4a0710090942u2afe6c5erc2d5b11ecfff2253@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6f9f8f4a0710090942u2afe6c5erc2d5b11ecfff2253@mail.gmail.com> X-Warning: Email may contain unsmilyfied humor and/or satire. User-Agent: Mutt/1.5.16 (2007-06-11) From: tab@snarc.org (Vincent Hanquez) X-Spam: no; 0.00; 0200,:01 0100,:01 arrays:01 byte:01 arrays:01 encodings:01 byte:01 appending:01 ocaml:01 atoms:98 wrote:01 wrote:01 caml-list:01 functions:01 strings:01 On Tue, Oct 09, 2007 at 06:42:32PM +0200, Loup Vaillant wrote: > 2007/10/9, Vincent Hanquez : > > On Tue, Oct 09, 2007 at 02:40:48PM +0100, Jon Harrop wrote: > > > Out of curiosity, do your ropes handle UTF-8 and UTF-16? > > > > Out of curiosity, why would a string implementation (has a handle of > > chars bundle together) has to handle UTF-X ? > > My 2 cents: > > It is more convenient to consider strings as characters arrays. Then, > these characters are handled as atoms, even if they take several bytes > in the chosen encoding. Of course, multi-byte characters must be > supported as well. > > Still, I can use byte arrays as strings. But it limits me to ASCII and > Latin-like encodings: if I want to do UTF-X, then I have to worry > about multi-bytes characters myself. Internationalization made hard... > > I would find very convenient to have plain unicode strings (and > chars), with appropriate scan, print, byte_array_from_string, and > string_from_byte_array functions, one bundle per supported encoding. > So I don't need to think about the internals of such a string. By my question i wasn't suggesting that everybody should do internationalization by hand. definitely we also need some UTFstring type library (which can use rope, string, whatever internally), with all common type of operations (appending, finding, ...), but it's a just a specific sub case and also a different type not compatible with strings (in OCaml terminology). -- Vincent Hanquez