caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* 8-bit characters on command line
@ 2010-05-14  4:36 Paul Steckler
  2010-05-14  5:44 ` [Caml-list] " Dmitry Bely
  0 siblings, 1 reply; 5+ messages in thread
From: Paul Steckler @ 2010-05-14  4:36 UTC (permalink / raw)
  To: caml-list

I have an OCaml 3.11 program that prints out the arguments on the command line:

  let main =Array.iter (Printf.printf "arg = %s\n") Sys.argv

On Linux, if I provide a command line argument containing 8-bit characters,
like é (an e with an acute accent), the program above, compiled with ocamlopt
or ocamlc, prints them faithfully.

For Windows, I can compile the program above with ocamlc on Windows, or cross-compile
it with MinGW-ocaml on Linux.  In both cases, any 8-bit characters in the command
line are printed as garbage.  I've tried running the program from rxvt (a shell for
Cygwin) and Windows cmd.exe.

Why does the behavior differ?

Although it's not a particular concern to me, the OCaml interpreter handles 8-bit characters
on Linux and Windows differently.  From the earlier part of my message, you'd think that
Linux ocaml handles such characters well, and Windows ocaml, not well -- but just the
opposite holds!  In Windows, if I enter a string containing an 8-bit character, the interpreter spits
it back faithfully:

  # "é";;
  - : string = "é"

But in Linux:

  # "é";;
  - : string = "\195\169"

Why this inconsistency?

-- Paul
--
Paul Steckler
National ICT Australia
paul DOT steckler AT nicta.com.au

The information in this e-mail may be confidential and subject to legal professional privilege and/or copyright. National ICT Australia Limited accepts no liability for any damage caused by this email or its attachments.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] 8-bit characters on command line
  2010-05-14  4:36 8-bit characters on command line Paul Steckler
@ 2010-05-14  5:44 ` Dmitry Bely
  2010-05-14  6:20   ` Paul Steckler
  0 siblings, 1 reply; 5+ messages in thread
From: Dmitry Bely @ 2010-05-14  5:44 UTC (permalink / raw)
  To: Paul Steckler; +Cc: caml-list

On Fri, May 14, 2010 at 8:36 AM, Paul Steckler
<Paul.Steckler@nicta.com.au> wrote:
> I have an OCaml 3.11 program that prints out the arguments on the command line:
>
>  let main =Array.iter (Printf.printf "arg = %s\n") Sys.argv
>
> On Linux, if I provide a command line argument containing 8-bit characters,
> like é (an e with an acute accent), the program above, compiled with ocamlopt
> or ocamlc, prints them faithfully.
>
> For Windows, I can compile the program above with ocamlc on Windows, or cross-compile
> it with MinGW-ocaml on Linux.  In both cases, any 8-bit characters in the command
> line are printed as garbage.  I've tried running the program from rxvt (a shell for
> Cygwin) and Windows cmd.exe.

I believe that's because there are actually two current code pages in
Windows: "OEM" code page for console input/output and "ANSI" one for
everything else. In mode detail:

http://msdn.microsoft.com/en-us/library/dd317752%28VS.85%29.aspx

E.g. in my system ANSI/OEM code pages are 1251/866. In your case they
are probably 1252/437.

Program arguments and any 8-bit character strings inside an
application are considered to have ANSI encoding (as that's what
non-Unicode Windows API functions expect), but console output
functions perform ANSI->OEM code page translation. So you see a
garbage.

- Dmitry Bely


^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [Caml-list] 8-bit characters on command line
  2010-05-14  5:44 ` [Caml-list] " Dmitry Bely
@ 2010-05-14  6:20   ` Paul Steckler
  2010-05-14  8:13     ` Dmitry Bely
  0 siblings, 1 reply; 5+ messages in thread
From: Paul Steckler @ 2010-05-14  6:20 UTC (permalink / raw)
  To: Dmitry Bely; +Cc: caml-list

> From: Dmitry Bely [dmitry.bely@gmail.com]
> I believe that's because there are actually two current code pages in
> Windows: "OEM" code page for console input/output and "ANSI" one for
> everything else.

I'm not sure that's the issue, because

  1) 8-bit characters are read and written OK to the OCaml interpreter
via the Windows console, and

  2) if I write the equivalent C program and compile it on Windows, 8-bit
characters are passed as arguments and spat back just fine

Instead, there seems to be something in the way OCaml handles command-line
arguments.

-- Paul
--
Paul Steckler
National ICT Australia
paul DOT steckler AT nicta.com.au

The information in this e-mail may be confidential and subject to legal professional privilege and/or copyright. National ICT Australia Limited accepts no liability for any damage caused by this email or its attachments.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] 8-bit characters on command line
  2010-05-14  6:20   ` Paul Steckler
@ 2010-05-14  8:13     ` Dmitry Bely
  2010-05-17  2:55       ` Paul Steckler
  0 siblings, 1 reply; 5+ messages in thread
From: Dmitry Bely @ 2010-05-14  8:13 UTC (permalink / raw)
  To: Paul Steckler; +Cc: caml-list

On Fri, May 14, 2010 at 10:20 AM, Paul Steckler
<Paul.Steckler@nicta.com.au> wrote:

>  2) if I write the equivalent C program and compile it on Windows, 8-bit
> characters are passed as arguments and spat back just fine

Just tested with MSVC 9.0 - exactly the same problem. Try this

#include <stdio.h>
int main(int argc, char** argv)
{
  for (int i=0; i < argc; ++i) {
    puts(argv[i]);
  }
  return 0;
}

and you'll see.

- Dmitry Bely


^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [Caml-list] 8-bit characters on command line
  2010-05-14  8:13     ` Dmitry Bely
@ 2010-05-17  2:55       ` Paul Steckler
  0 siblings, 0 replies; 5+ messages in thread
From: Paul Steckler @ 2010-05-17  2:55 UTC (permalink / raw)
  To: Dmitry Bely; +Cc: caml-list

> From: Dmitry Bely [dmitry.bely@gmail.com]
> Just tested with MSVC 9.0 - exactly the same problem.

Yes, MSVC 10.0 has the same issue -- though gcc 4.3 installed via Cygwin
does not.

-- Paul

The information in this e-mail may be confidential and subject to legal professional privilege and/or copyright. National ICT Australia Limited accepts no liability for any damage caused by this email or its attachments.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-05-17  2:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-14  4:36 8-bit characters on command line Paul Steckler
2010-05-14  5:44 ` [Caml-list] " Dmitry Bely
2010-05-14  6:20   ` Paul Steckler
2010-05-14  8:13     ` Dmitry Bely
2010-05-17  2:55       ` Paul Steckler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).