From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by yquem.inria.fr (Postfix) with ESMTP id 43144BC57 for ; Fri, 14 May 2010 06:37:04 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AlYCAIZw7EvLj65Ge2dsb2JhbACeAhUBARYiBRcGvGeFEAQ X-IronPort-AV: E=Sophos;i="4.53,226,1272837600"; d="scan'208";a="59338229" Received: from atp-es2.it.nicta.com.au ([203.143.174.70]) by mail1-smtp-roc.national.inria.fr with ESMTP/TLS/AES256-SHA; 14 May 2010 06:37:03 +0200 Received: from atp-mbx1.it.nicta.com.au ([221.199.216.123] helo=atp-mbx1.in.nicta.com.au) by atp-es2.it.nicta.com.au with esmtp (Exim 4.69) (envelope-from ) id 1OCmdu-0001a2-Ic for caml-list@yquem.inria.fr; Fri, 14 May 2010 14:36:58 +1000 Received: from atp-mbx1.in.nicta.com.au ([221.199.216.123]) by atp-mbx1.in.nicta.com.au ([221.199.216.123]) with mapi; Fri, 14 May 2010 14:36:59 +1000 From: Paul Steckler To: "caml-list@yquem.inria.fr" Date: Fri, 14 May 2010 14:36:58 +1000 Thread-Topic: 8-bit characters on command line Thread-Index: AQHK8xw+GiWGJa1p0UOQdf0V78pW8Q== Message-ID: <2EB36A07AAE8C44BBB1986425E7A22D00FA659E379@atp-mbx1.in.nicta.com.au> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US x-tm-as-product-ver: SMEX-8.0.0.4125-6.000.1038-17382.000 x-tm-as-result: No--32.533700-0.000000-31 x-tm-as-user-approved-sender: Yes x-tm-as-user-blocked-sender: No Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-SA-Exim-Connect-IP: 221.199.216.123 X-SA-Exim-Mail-From: Paul.Steckler@nicta.com.au X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on atp-es2.it.nicta.com.au X-Spam-Level: Subject: 8-bit characters on command line X-SA-Exim-Version: 4.2.1 (built Wed, 25 Jun 2008 17:20:07 +0000) X-SA-Exim-Scanned: Yes (on atp-es2.it.nicta.com.au) X-Spam: no; 0.00; steckler:01 steckler:01 ocaml:01 iter:01 printf:01 printf:01 argv:01 ocamlopt:01 ocamlc:01 ocamlc:01 cygwin:01 ocaml:01 garbage:01 compile:01 argument:02 I have an OCaml 3.11 program that prints out the arguments on the command l= ine: let main =3DArray.iter (Printf.printf "arg =3D %s\n") Sys.argv On Linux, if I provide a command line argument containing 8-bit characters, like =E9 (an e with an acute accent), the program above, compiled with ocam= lopt or ocamlc, prints them faithfully. For Windows, I can compile the program above with ocamlc on Windows, or cro= ss-compile it with MinGW-ocaml on Linux. In both cases, any 8-bit characters in the c= ommand line are printed as garbage. I've tried running the program from rxvt (a s= hell for Cygwin) and Windows cmd.exe. Why does the behavior differ? Although it's not a particular concern to me, the OCaml interpreter handles= 8-bit characters on Linux and Windows differently. From the earlier part of my message, you= 'd think that Linux ocaml handles such characters well, and Windows ocaml, not well -- bu= t just the opposite holds! In Windows, if I enter a string containing an 8-bit charac= ter, the interpreter spits it back faithfully: # "=E9";; - : string =3D "=E9" But in Linux: # "=E9";; - : string =3D "\195\169" Why this inconsistency? -- Paul -- Paul Steckler National ICT Australia paul DOT steckler AT nicta.com.au The information in this e-mail may be confidential and subject to legal pro= fessional privilege and/or copyright. National ICT Australia Limited accept= s no liability for any damage caused by this email or its attachments.