From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by yquem.inria.fr (Postfix) with ESMTP id 04A78BC57 for ; Fri, 14 May 2010 07:44:02 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: At8DAMN/7EtKfVI0gGdsb2JhbACdeggVAQEUJCKsaIIAhHguiE0BAQMFhQsEgzs X-IronPort-AV: E=Sophos;i="4.53,227,1272837600"; d="scan'208";a="59339543" Received: from mail-ww0-f52.google.com ([74.125.82.52]) by mail1-smtp-roc.national.inria.fr with ESMTP; 14 May 2010 07:44:01 +0200 Received: by wwb24 with SMTP id 24so1379833wwb.39 for ; Thu, 13 May 2010 22:44:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=JWO1vSr2pXm5PZau3YI4FdSCr88AuCIfJ06/MPDzLRw=; b=Ugj8WdUV0MgCMaLQz/2AY/zo7pP45ern4fE2YnI6V7dEDuQJj8htqJgqJSftvYlVnh rIJLlvi1ZGbZXKsEUXLFW41ErLQ52fglGFwhR3KK6t6qafsQIxjOcs64OsjUHHxQSbj6 LXFOya+JxlGOJF09RM+yBZ27XI8bD/ccSf8Pw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=tr2oi5w9SOR1kBkLr5GGJ29T2oS6IwzX34L/rWe+siE9L+f6nSOQlfp9dwz1qFLwBC kgpEXjt2IzoSYYVAGJxAfOkphZcVfmW28gkY802mnivCTNG0ZLo259HKm5Qi+WSpniKu LD0xsx00BFMkfcJyK6lLFFZrZaRDrNOL372HE= MIME-Version: 1.0 Received: by 10.216.86.84 with SMTP id v62mr505195wee.34.1273815841347; Thu, 13 May 2010 22:44:01 -0700 (PDT) Received: by 10.216.85.204 with HTTP; Thu, 13 May 2010 22:44:01 -0700 (PDT) In-Reply-To: <2EB36A07AAE8C44BBB1986425E7A22D00FA659E379@atp-mbx1.in.nicta.com.au> References: <2EB36A07AAE8C44BBB1986425E7A22D00FA659E379@atp-mbx1.in.nicta.com.au> Date: Fri, 14 May 2010 09:44:01 +0400 Message-ID: Subject: Re: [Caml-list] 8-bit characters on command line From: Dmitry Bely To: Paul Steckler Cc: "caml-list@yquem.inria.fr" Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Spam: no; 0.00; steckler:01 steckler:01 ocaml:01 iter:01 printf:01 printf:01 argv:01 ocamlopt:01 ocamlc:01 ocamlc:01 cygwin:01 1251:98 garbage:01 garbage:01 wrote:01 On Fri, May 14, 2010 at 8:36 AM, Paul Steckler wrote: > I have an OCaml 3.11 program that prints out the arguments on the command= line: > > =A0let main =3DArray.iter (Printf.printf "arg =3D %s\n") Sys.argv > > On Linux, if I provide a command line argument containing 8-bit character= s, > like =E9 (an e with an acute accent), the program above, compiled with oc= amlopt > or ocamlc, prints them faithfully. > > For Windows, I can compile the program above with ocamlc on Windows, or c= ross-compile > it with MinGW-ocaml on Linux. =A0In both cases, any 8-bit characters in t= he command > line are printed as garbage. =A0I've tried running the program from rxvt = (a shell for > Cygwin) and Windows cmd.exe. I believe that's because there are actually two current code pages in Windows: "OEM" code page for console input/output and "ANSI" one for everything else. In mode detail: http://msdn.microsoft.com/en-us/library/dd317752%28VS.85%29.aspx E.g. in my system ANSI/OEM code pages are 1251/866. In your case they are probably 1252/437. Program arguments and any 8-bit character strings inside an application are considered to have ANSI encoding (as that's what non-Unicode Windows API functions expect), but console output functions perform ANSI->OEM code page translation. So you see a garbage. - Dmitry Bely