From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by yquem.inria.fr (Postfix) with ESMTP id 1A746BBAF for ; Wed, 29 Sep 2010 09:59:56 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvcBAGaOokxRZ90wkWdsb2JhbACiGxUBAQEBCQsKBxEDH8hshUQEijo X-IronPort-AV: E=Sophos;i="4.57,252,1283724000"; d="scan'208";a="71684884" Received: from mtaout02-winn.ispmail.ntl.com ([81.103.221.48]) by mail4-smtp-sop.national.inria.fr with ESMTP; 29 Sep 2010 09:59:55 +0200 Received: from aamtaout03-winn.ispmail.ntl.com ([81.103.221.35]) by mtaout02-winn.ispmail.ntl.com (InterMail vM.7.08.04.00 201-2186-134-20080326) with ESMTP id <20100929075955.TBDS3192.mtaout02-winn.ispmail.ntl.com@aamtaout03-winn.ispmail.ntl.com>; Wed, 29 Sep 2010 08:59:55 +0100 Received: from romulus.metastack.com ([81.102.132.77]) by aamtaout03-winn.ispmail.ntl.com (InterMail vG.2.02.00.01 201-2161-120-102-20060912) with ESMTP id <20100929075954.FTOA1807.aamtaout03-winn.ispmail.ntl.com@romulus.metastack.com>; Wed, 29 Sep 2010 08:59:54 +0100 Received: from remus.metastack.local ([172.16.0.1]) by romulus.metastack.com (8.14.2/8.14.2) with ESMTP id o8T7xpkA012350 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL); Wed, 29 Sep 2010 08:59:51 +0100 Received: from Remus.metastack.local ([fe80::547c:3c42:e1da:eda2]) by Remus.metastack.local ([fe80::547c:3c42:e1da:eda2%11]) with mapi; Wed, 29 Sep 2010 08:58:32 +0100 From: David Allsopp To: Paul Steckler Cc: "caml-list@yquem.inria.fr" Subject: RE: [Caml-list] Windows filenames and Unicode Thread-Topic: [Caml-list] Windows filenames and Unicode Thread-Index: AQHLX5PPOwwrCzpT5EGTUsu4XyPtE5MoeleAgAAFHACAABVRoA== Date: Wed, 29 Sep 2010 07:58:32 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Organization: MetaStack Solutions Ltd. X-Scanned-By: MIMEDefang 2.65 on 81.102.132.77 X-Cloudmark-Analysis: v=1.1 cv=3ENABmdyEd/Fm7fR7+mZIuMDn6+IErAeEhlfWBImZFk= c=1 sm=0 a=6XaqXQm5RQQA:10 a=kj9zAlcOel0A:10 a=xqWC_Br6kY4A:10 a=SGJpDYsEAAAA:8 a=k8f4PJZe3w2m4hxSVgUA:9 a=C-JW6JDA_xRhnsXbT3AA:7 a=T1VFEcw32hQwsf3K-rI3HDQjyfYA:4 a=CjuIK1q_8ugA:10 a=dTth3pfvbawA:10 a=HpAAvcLHHh0Zw7uRqdWCyQ==:117 X-Spam: no; 0.00; steckler:01 pathname:01 ocaml:01 runtime:01 translated:01 translated:01 ocaml:01 runtime:01 byte:01 one-liner:01 mingw:01 mingw:01 stubs:01 cygwin:01 lib:01 Paul Steckler wrote: > On Wed, Sep 29, 2010 at 4:23 PM, David Allsopp > wrote: > > A way (but not foolproof on Windows 7 and Windows 2008 R2 because you > > can disable it) would be to wrap the GetShortPathName Windows API > > function[1] which will convert the pathname to its DOS 8.3 format which > > will not contain Unicode characters. Another way might be to wrap the > > Unicode version of CreateFileEx and convert the result into a handle > > compatible with the standard library functions but I reckon that could = be > > tricky! >=20 > For Linux, I was planning on enforcing the invariant that all strings > inside my program are UTF-8. > For Windows, I could use the same invariant, and modify the OCaml runtime > so that all calls to Windows file primitives have those strings translate= d > to UTF-16 (and return values translated back to UTF-8). That is, I'd hav= e > to build a custom version of OCaml and wrap CreateFile, etc. with such > Unicode translation functions. Rather than hacking the OCaml runtime (the relevant code is {byte,asm}run/s= ys.c, btw) personally I'd produce a separate module of my own with two impl= ementations - one for Linux which just uses the built-in primitives and the= n one for Windows using WinAPI functions directly. A cursory glance at the = runtime code suggests that hacking wide support onto the runtime is not a "= one-liner". > All this is made slightly more complicated by the fact that I'm using the > MinGW version of OCaml. Shouldn't make it (too much) harder - I use the MinGW build of OCaml withou= t issue for C stubs, after a slightly steep learning-curve. The w32api pack= age in Cygwin provides all of the headers and link libraries for Windows li= braries (it's installed by default with the gcc-mingw). If you ever have to= link with more exotic libraries, dlltool is (sort of) your friend (it with= a little bit of help allows you to generate the .a files needed for the DL= L you're linking with - 3rd party libraries on Windows tend only to ship th= e MSVC .lib files) > Hmmm, I shouldn't have to do this. Are there plans afoot to modernize > OCaml's string-handling? Can o' worms, I expect! Windows (by which I mean Windows NT) took the simpl= est route of 16-bit wchars for Unicode but it's not necessarily the best wa= y of programming in general... the problem with going Unicode is *how* you = go Unicode. David