From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by yquem.inria.fr (Postfix) with ESMTP id 8C552BBAF for ; Thu, 14 Oct 2010 12:18:36 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AtcBAP11tkxKfVIqimdsb2JhbACTK41uCBUBAQEKCQwHDwUfoCWbU4VIBIpB X-IronPort-AV: E=Sophos;i="4.57,329,1283724000"; d="scan'208";a="72773884" Received: from mail-ww0-f42.google.com ([74.125.82.42]) by mail2-smtp-roc.national.inria.fr with ESMTP; 14 Oct 2010 12:18:36 +0200 Received: by wwb34 with SMTP id 34so1647wwb.3 for ; Thu, 14 Oct 2010 03:18:36 -0700 (PDT) MIME-Version: 1.0 Received: by 10.227.128.13 with SMTP id i13mr9908947wbs.31.1287051515797; Thu, 14 Oct 2010 03:18:35 -0700 (PDT) Received: by 10.216.232.102 with HTTP; Thu, 14 Oct 2010 03:18:35 -0700 (PDT) X-Originating-IP: [110.33.135.94] Date: Thu, 14 Oct 2010 21:18:35 +1100 Message-ID: Subject: Unicode, update From: Paul Steckler To: caml-list@yquem.inria.fr Content-Type: text/plain; charset=ISO-8859-1 X-Spam: no; 0.00; steckler:01 steck:01 ocaml:01 camomile:01 ocaml:01 parsing:01 functions:01 functions:01 exceptions:01 primitives:01 primitives:01 filenames:02 filenames:02 string:02 library:03 A couple of weeks ago or so, I asked about using OCaml file primitives with the Camomile library for Unicode on Windows. I thought I'd update people on the list about my resolution of these issues. I decided to make the application UTF-8 throughout, so that the string type always means UTF-8 -- OK, there are a few exceptions to that rule. The SQLite3 library already deals with UTF-8 in a graceful way, The same is true for the C/C++ parsing library I'm using. That leaves the OCaml library procedures, like open_in and open_out, which definitely don't handle Unicode filenames on Windows. I took the OCaml sources and made modified versions of functions, like file_exists, open_in, and so on, that convert filenames from UTF-8 to UTF-16 and then used "wide" versions of the underlying Win32 primitives. In some cases, I had to convert UTF-16 back to UTF-8. The Win32 functions MultiByteToWideChar and WideCharToMultiByte handle those conversions nicely. I link in these new functions, named file_exists_win32, open_in_win32, etc., and everything works a treat. -- Paul