caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: David Allsopp <dra-news@metastack.com>
To: Paul Steckler <steck@stecksoft.com>,
	"caml-list@yquem.inria.fr" <caml-list@yquem.inria.fr>
Subject: RE: [Caml-list] Windows filenames and Unicode
Date: Wed, 29 Sep 2010 06:23:04 +0000	[thread overview]
Message-ID: <E51C5B015DBD1348A1D85763337FB6D92AEAD1@Remus.metastack.local> (raw)
In-Reply-To: <AANLkTikXYCdGHBzQ0G4mRbrWcA245K5oOp1CZay-OYoT@mail.gmail.com>

Paul Steckler wrote:
> In Windows, NTFS filenames are specified in Unicode (UTF-16).  Am I right
> in thinking that OCaml file primitives, like open_in, readdir, etc. cannot
> handle NTFS filenames containing characters with codepoints greater than
> 255?

Given that the WinAPI "wide" functions use UTF-16, you can of course fake UTF-16 on top of normal OCaml strings but I think that you'll hit a brick wall because the I/O primitives are based on the underlying C library functions which at the end of the day will be using the ANSI versions of the Windows API functions, not the Unicode ones.

> I'm aware of the Camomile library, which gives the ability to manipulate
> UTF-16 strings inside of OCaml.  But it looks like crucial points of
> OCaml's I/O, like Sys.argv and file primitives are strictly limited to 8-
> bit characters.
> 
> Is there a way around this limitation, other than rewriting the file I/O
> primitives?

A way (but not foolproof on Windows 7 and Windows 2008 R2 because you can disable it) would be to wrap the GetShortPathName Windows API function[1] which will convert the pathname to its DOS 8.3 format which will not contain Unicode characters. Another way might be to wrap the Unicode version of CreateFileEx and convert the result into a handle compatible with the standard library functions but I reckon that could be tricky!


David

[1] http://msdn.microsoft.com/en-us/library/aa364989(v=VS.85).aspx


> 
> -- Paul
> 
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs


  reply	other threads:[~2010-09-29  6:24 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-29  5:05 Paul Steckler
2010-09-29  6:23 ` David Allsopp [this message]
2010-09-29  7:26   ` [Caml-list] " Paul Steckler
2010-09-29  7:56     ` Michael Ekstrand
2010-09-29  7:58     ` David Allsopp
2010-09-29  8:14     ` Jerome Vouillon
2010-09-30 19:27     ` ygrek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E51C5B015DBD1348A1D85763337FB6D92AEAD1@Remus.metastack.local \
    --to=dra-news@metastack.com \
    --cc=caml-list@yquem.inria.fr \
    --cc=steck@stecksoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).