From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from discorde.inria.fr (discorde.inria.fr [192.93.2.38]) by yquem.inria.fr (Postfix) with ESMTP id B1BB1BC69 for ; Fri, 30 Mar 2007 08:23:38 +0200 (CEST) Received: from ipmail02.adl2.internode.on.net (ipmail02.adl2.internode.on.net [203.16.214.141]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l2U6NaLQ022713 for ; Fri, 30 Mar 2007 08:23:37 +0200 Received: from ppp36-111.lns2.syd6.internode.on.net (HELO [192.168.1.201]) ([59.167.36.111]) by ipmail02.adl2.internode.on.net with ESMTP; 30 Mar 2007 15:52:59 +0930 X-IronPort-AV: i="4.14,350,1170595800"; d="scan'208"; a="104111258:sNHT4103015749" Subject: Re: [Caml-list] int_of_string bug From: skaller To: caml-list In-Reply-To: <20070330155936.86654f51.mle+ocaml@mega-nerd.com> References: <891bd3390703290927o4e1c6bb5gf8f562fedbc70096@mail.gmail.com> <891bd3390703291826k6002ef28i93bca9944cdbceee@mail.gmail.com> <1175228631.18890.4.camel@rosella.wigram> <20070330155936.86654f51.mle+ocaml@mega-nerd.com> Content-Type: text/plain Date: Fri, 30 Mar 2007 16:22:54 +1000 Message-Id: <1175235774.18890.67.camel@rosella.wigram> Mime-Version: 1.0 X-Mailer: Evolution 2.8.1 Content-Transfer-Encoding: 7bit X-Miltered: at discorde with ID 460CACE8.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; bug:01 yaron:01 minsky:01 parser:01 ocaml:01 compiler:01 lexing:01 'string:01 arrays:01 ocaml:01 bug:01 -unsafe:01 wildcard:01 sourceforge:01 wrote:01 On Fri, 2007-03-30 at 15:59 +1000, Erik de Castro Lopo wrote: > skaller wrote: > > > On Thu, 2007-03-29 at 21:26 -0400, Yaron Minsky wrote: > > > # int_of_string "1073741824";; > > > - : int = -1073741824 > > > # int_of_string "1073741825";; > > > Exception: Failure "int_of_string". > > Thats the behaviour on 32 bit systems. > > > # int_of_string "1073741824";; > > - : int = 1073741824 > > # int_of_string "1073741825";; > > - : int = 1073741825 > > But 64 bit systems get it right. The point being .. the behaviour for large values is platform independent anyhow, so in the abstract you can say the behaviour is undefined for large values, where 'large' isn't specified. If you want to get it RIGHT: if you have a user input string possibly containing digits, and you want to convert it, you must already write a parser to parse the input, so you won't be using int_of_string anyhow. If the input was generated (say by another Ocaml program), then it will already be correct. In the Felix compiler, after lexing 'string of digits' I use the Big_int module to convert to an integer: that behaviour is platform independent. If I really want an int (say for indexing), and there's a risk of the conversion overflowing .. there's a risk that even without overflowing the data is wrong and will blow up, eg .. I'm not going to be indexing arrays with max_int elements .. :) If I really want to check, I'll use an application specific bound such as 16000, and check the big_int against that before converting. Thus, all the operations are deterministic and platform independent if you do things properly. So the 'bug' in string_of_int is just an inconvenience. IMHO there is a 'bug' in some Ocaml documentation, where the abstract language is not clearly distinguished from the implementation. Throwing exceptions on error should generally NOT be considered a specified part of the language. Undefined behaviour is sometimes the right specification because it allows superior optimisation and prevents programmers relying on exceptions. This doesn't prevent the implementation throwing them, it just means catching them locally in your code is a bug (because you can't be sure they will be thrown). Bounds violations are a good example of this, and indeed since Ocaml allows -unsafe switch to disable bound checks you'd better NOT rely on catching them. The same applies to match failures -- use a wildcard if you want to catch unmatched cases (otherwise be willing to sketch a proof to your boss that there can't be a violation :) -- John Skaller Felix, successor to C++: http://felix.sf.net