From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by yquem.inria.fr (Postfix) with ESMTP id 3C831BBAF for ; Wed, 29 Sep 2010 09:57:08 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Av0EAKKOokyty1O7/2dsb2JhbACDHZ5+cbYFkmiBIoMudASNOQ X-IronPort-AV: E=Sophos;i="4.57,252,1283724000"; d="scan'208";a="73967338" Received: from elehack.net ([173.203.83.187]) by mail1-smtp-roc.national.inria.fr with ESMTP; 29 Sep 2010 09:56:51 +0200 Received: from [192.168.74.134] (82.159.190.241.static.user.ono.com [82.159.190.241]) by elehack.net (Postfix) with ESMTPSA id 0FD68C8A49; Wed, 29 Sep 2010 02:58:37 -0500 (CDT) Subject: Re: [Caml-list] Windows filenames and Unicode From: Michael Ekstrand To: Paul Steckler Cc: "caml-list@yquem.inria.fr" In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Date: Wed, 29 Sep 2010 09:56:45 +0200 Message-ID: <1285747005.5671.27.camel@knine> Mime-Version: 1.0 X-Mailer: Evolution 2.28.3 Content-Transfer-Encoding: 7bit X-Spam: no; 0.00; steckler:01 hmmm:01 ocaml's:01 camomile:01 ocaml's:01 blog:98 wrote:01 unix:01 unix:01 caml-list:01 functions:01 strings:01 implemented:02 filenames:02 filenames:02 On Wed, 2010-09-29 at 17:26 +1000, Paul Steckler wrote: > Hmmm, I shouldn't have to do this. Are there plans afoot to modernize > OCaml's string-handling? The Batteries project aims to provide more modernized string handling, and we already go a long way with the UTF8 module (from Camomile) and ropes. That does not, however, affect the file opening routines, as the current Batteries design requires you to open files using platform strings for their names. It may be interesting to look at allowing files to be opened using unicode names. However, this is fraught with difficulties, particularly in cross-platform situation. Handling filenames in Unicode is incorrect on Unix and Linux systems where the locale encoding does not have an idempotent conversion to and from Unicode. Therefore, the correct way to handle filenames in a cross-platform fashion is to always store them in the system filename encoding (any Unicode encoding on Windows when the wide functions are supported, the current locale encoding on Unix) and convert them to Unicode only for display. IMO any enhanced string design covering filenames must encourage this. Fortunately, OCaml's type system makes such an API fairly natural, it just needs to be designed and implemented. - Michael -- Web/blog: http://elehack.net/michael Jabber/Google Talk: this e-mail address Twitter: http://twitter.com/elehack mouse, n: a device for pointing at the xterm in which you want to type