From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id p1SCW61O028649 for ; Mon, 28 Feb 2011 13:32:07 +0100 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgYCAFMka03RVdI2kGdsb2JhbACEJKIaCBUBAQEBCQkMBxEEIaIeiWSCWYQciQkBAQMFgSKDRHYEjB2ITzqBFQ X-IronPort-AV: E=Sophos;i="4.62,238,1297033200"; d="scan'208";a="76734449" Received: from mail-pz0-f54.google.com ([209.85.210.54]) by mail3-smtp-sop.national.inria.fr with ESMTP/TLS/RC4-SHA; 28 Feb 2011 13:32:04 +0100 Received: by pzk32 with SMTP id 32so884645pzk.27 for ; Mon, 28 Feb 2011 04:32:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=lP3eSsUw/q01l2dQIpMgzVAorDyZ8BJuK4pX62YtICs=; b=NZvZFl8msmr7cLlWGNj17N56XX+D0WhgP93Yp/kwjP6boPAEVGotRu2qoloTUpQvOo sV/2ywDnC6NV3uZEaoPSo1CQBbDrnWctNq8lCFMNfU67JZ/kDjtcNd5ocUfgw6/uewew DEPVMvYmdTCszC9pS5uTNVa4dC+dTYVYpjvvQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=n5rtuBfFA1aQkrFoK/jYvmrWs1VY49fO98d5oeeCgpHZHuHgbx8BA6yGd7f9P1QMmH 6FfMr8wiIf7UooAvhNRgAP4dzMTA7lggkGWR9K/ss1HUCnWClHIKwpD946dmwF0X7D1B tkq086RBJUR/ZWL+xA3YGAqQ/krPngVH6+uvQ= MIME-Version: 1.0 Received: by 10.143.13.4 with SMTP id q4mr4332540wfi.418.1298896322413; Mon, 28 Feb 2011 04:32:02 -0800 (PST) Sender: daniel.c.buenzli@gmail.com Received: by 10.142.127.3 with HTTP; Mon, 28 Feb 2011 04:32:02 -0800 (PST) In-Reply-To: References: <20110228.093528.996524125295855263.Christophe.Troestler@umons.ac.be> Date: Mon, 28 Feb 2011 13:32:02 +0100 X-Google-Sender-Auth: RBgx_5zDcJTj09ilCPpY0JLEb-Q Message-ID: From: =?UTF-8?Q?Daniel_B=C3=BCnzli?= To: David Allsopp Cc: Christophe TROESTLER , OCaml Mailing List Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by walapai.inria.fr id p1SCW61O028649 Subject: Re: [Caml-list] GSoC: better UTF-8 support > D:\>md "Paweł Łukaszewski" > D:\>cd "Paweł Łukaszewski" > D:\Paweł Łukaszewski>ocaml >        Objective Caml version 3.11.2 > > # Sys.getcwd();; > - : string = "D:\\Pawel Lukaszewski" 1) That's very different problem from defining new Char and String like modules for UTF-8 encoded strings. 2) Is that a windows problems ? Here on osx : > mkdir Łukaszewski > cd Łukaszewski > rlwrap ocaml Objective Caml version 3.12.0 # Sys.getcwd ();; - : string = "/private/tmp/?\129ukaszewski" # Char.code (Sys.getcwd ()).[13];; - : int = 197 # Char.code (Sys.getcwd ()).[14];; - : int = 129 so we have 0xC5 0x81 for Ł which is the right UTF-8 representation for it. I'm currently not up to date on the problem of unicode encoded filenames in ocaml but isn't that something that should be handled by the underlying libc ? Note, maybe a nice addition for the gsoc project would be to add an option to ocaml so that it doesn't escape the bytes 127 to 159 when it prints strings allowing your UTF-8 aware tty to display UTF-8 encoded strings correctly, not as above. > Fully aware - but just because you need to work with strings does *not* imply you ever need even to compare them. Granted, the documentation may note that to perform a canonical comparison you'll need a third party library but I still maintain that having basic support is better and more usable than having none [...] But then, if you only need byte-level comparison String.compare is fine. So I really don't see the benefits of these UTF-8 Char and String like modules. Best, Daniel