From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by yquem.inria.fr (Postfix) with ESMTP id B7A6CBC6B for ; Tue, 9 Oct 2007 21:51:23 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ao8CAM53C0fUVZgL/2dsb2JhbAA X-IronPort-AV: E=Sophos;i="4.21,250,1188770400"; d="scan'208";a="2791922" Received: from hades.snarc.org ([212.85.152.11]) by mail2-smtp-roc.national.inria.fr with ESMTP; 09 Oct 2007 21:51:23 +0200 Received: by hades.snarc.org (Postfix, from userid 1000) id 89B891B482; Tue, 9 Oct 2007 21:51:19 +0200 (CEST) Date: Tue, 9 Oct 2007 21:51:19 +0200 To: Loup Vaillant Cc: Jon Harrop , caml-list@yquem.inria.fr Subject: Re: [Caml-list] Re: Rope is the new string Message-ID: <20071009195119.GA29263@snarc.org> References: <1191879429.28011.27.camel@rosella.wigram> <1191886669.26491.29.camel@rosella.wigram> <20071009.122004.251675959.Christophe.Troestler+ocaml@umh.ac.be> <200710091440.48381.jon@ffconsultancy.com> <20071009155702.GB26282@snarc.org> <6f9f8f4a0710090942u2afe6c5erc2d5b11ecfff2253@mail.gmail.com> <20071009165520.GA27507@snarc.org> <6f9f8f4a0710091032r3dace80bi4f9b584ae5056675@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6f9f8f4a0710091032r3dace80bi4f9b584ae5056675@mail.gmail.com> X-Warning: Email may contain unsmilyfied humor and/or satire. User-Agent: Mutt/1.5.16 (2007-06-11) From: tab@snarc.org (Vincent Hanquez) X-Spam: no; 0.00; 0200,:01 appending:01 ocaml:01 byte:01 arrays:01 ocaml:01 byte:01 subset:01 emacs:01 wrote:01 syntactic:01 incompatible:01 ints:01 caml-list:01 strings:01 On Tue, Oct 09, 2007 at 07:32:25PM +0200, Loup Vaillant wrote: > > definitely we also need some UTFstring type library (which can use rope, > > string, whatever internally), with all common type of operations > > (appending, finding, ...), but it's a just a specific sub case and also > > a different type not compatible with strings (in OCaml terminology). > > Then, we should have both byte arrays (the native Ocaml strings), and > unicode strings. We will also need proper syntactic sugar for unicode > strings. Operators, and literal values (like #"example"). Only then, > ropes could feel like native strings --and be useful as such. not sure If i see your point here, since your are mixing rope and unicode. however I think we are missing some other type of string implementation (maybe rope) *along* the current implementation of string. while we also miss unicode support somehow integrated, what implementation of the underlaying basic byte string is used, is irrevelant. > > [...] it's a just a specific sub case [...] > > Internationalization is, mere text crunching is not. (You meant that, > right?) With properly interfaced unicode strings, I can do my text > crunching without worrying about internationalization, and with no > programming overhead. Then, when (if) I have to internationalize, it > is much easier. Absolutely. What I meant basicly resume into, that unicode strings are just a subset of strings (as array of bytes). you can store a unicode string in a byte string, whereas you can't store a byte string into a unicode string. i want a UTF library to be able to do something like: type ustring = unicode_type * string of_string: string -> ustring (* raise if not unicode compliant *) to_string: ustring -> string append: ustring -> ustring -> ustring ...etc that way when I'm manipulating unicode string, i won't try to append a binary string to a unicode string. I can code safely with my unicode string (whatever the format utf-{8..32}), and certainly expect the type system to complain loudly when doing something that might break unicode. > About the incompatibility, the two types of strings are incompatible > anyway. > > Maybe even more than ints and floats. Sure you once tried some > "Obj.magic" conversions of an non-English text with emacs. :-) I use vim ;), but heh after using Obj.magic you're on your own :) -- Vincent Hanquez