From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by yquem.inria.fr (Postfix) with ESMTP id 85AE4BBAF for ; Wed, 29 Sep 2010 09:27:12 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvcBAJqHokxKfVIukGdsb2JhbACiEwgVAQEBAQkJDAcRAx+tTJtuhUQEijo X-IronPort-AV: E=Sophos;i="4.57,252,1283724000"; d="scan'208";a="73963313" Received: from mail-ww0-f46.google.com ([74.125.82.46]) by mail1-smtp-roc.national.inria.fr with ESMTP; 29 Sep 2010 09:26:52 +0200 Received: by wwb13 with SMTP id 13so516402wwb.3 for ; Wed, 29 Sep 2010 00:26:52 -0700 (PDT) MIME-Version: 1.0 Received: by 10.227.146.4 with SMTP id f4mr1129340wbv.14.1285745212219; Wed, 29 Sep 2010 00:26:52 -0700 (PDT) Received: by 10.216.137.84 with HTTP; Wed, 29 Sep 2010 00:26:52 -0700 (PDT) X-Originating-IP: [203.143.161.115] In-Reply-To: References: Date: Wed, 29 Sep 2010 17:26:52 +1000 Message-ID: Subject: Re: [Caml-list] Windows filenames and Unicode From: Paul Steckler To: David Allsopp Cc: "caml-list@yquem.inria.fr" Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Spam: no; 0.00; steckler:01 steck:01 pathname:01 ocaml:01 runtime:01 translated:01 translated:01 ocaml:01 mingw:01 hmmm:01 ocaml's:01 foolproof:98 wrote:01 caml-list:01 functions:01 On Wed, Sep 29, 2010 at 4:23 PM, David Allsopp wro= te: > A way (but not foolproof on Windows 7 and Windows 2008 R2 because you can= disable it) would be to wrap the GetShortPathName Windows API function[1] = which will convert the pathname to its DOS 8.3 format which will not contai= n Unicode characters. Another way might be to wrap the Unicode version of C= reateFileEx and convert the result into a handle compatible with the standa= rd library functions but I reckon that could be tricky! For Linux, I was planning on enforcing the invariant that all strings inside my program are UTF-8. For Windows, I could use the same invariant, and modify the OCaml runtime so that all calls to Windows file primitives have those strings translated to UTF-16 (and return values translated back to UTF-8). That is, I'd have to build a custom version of OCaml and wrap CreateFile, etc. with such Unicode translation functions. All this is made slightly more complicated by the fact that I'm using the MinGW version of OCaml. Hmmm, I shouldn't have to do this. Are there plans afoot to modernize OCaml's string-handling? -- Paul