caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* int_of_string bug
@ 2007-03-29 16:27 Yaron Minsky
  2007-03-29 21:29 ` [Caml-list] " Oliver Bandel
  2007-03-30  1:21 ` Brian Hurt
  0 siblings, 2 replies; 15+ messages in thread
From: Yaron Minsky @ 2007-03-29 16:27 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 745 bytes --]

So, there's a weird int_of_string bug where positive decimal numbers are
sometimes read in as negative numbers without error.  Here's the bug:

http://caml.inria.fr/mantis/view.php?id=0004210

This has been marked as "wontfix" in the bug database because apparently
there's some weird spot in the lexer that depends on the wrong behavior of
int_of_string.

First of all, people should be aware of this behavior and should defend
against it in their code.  Secondly, the justification for not fixing it
seems really thin.  The behavior seems obviously wrong, and it's hard to see
why one wouldn't simply fix the lexer (perhaps by providing an alternate
broken implementation of int_of_string) and leave the ordinary int_of_string
where it is.

y

[-- Attachment #2: Type: text/html, Size: 889 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] int_of_string bug
  2007-03-29 16:27 int_of_string bug Yaron Minsky
@ 2007-03-29 21:29 ` Oliver Bandel
  2007-03-30  0:26   ` Yaron Minsky
  2007-03-30  1:21 ` Brian Hurt
  1 sibling, 1 reply; 15+ messages in thread
From: Oliver Bandel @ 2007-03-29 21:29 UTC (permalink / raw)
  To: caml-list

On Thu, Mar 29, 2007 at 12:27:05PM -0400, Yaron Minsky wrote:
> So, there's a weird int_of_string bug where positive decimal numbers are
> sometimes read in as negative numbers without error.  Here's the bug:
> 
> http://caml.inria.fr/mantis/view.php?id=0004210
> 
> This has been marked as "wontfix" in the bug database because apparently
> there's some weird spot in the lexer that depends on the wrong behavior of
> int_of_string.
[...]

Oh, that's bad. :(

But btw. it's also bad that, when overflowing of int occurs, no
exception is thrown. :(



Ciao,
   Oliver Bandel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] int_of_string bug
  2007-03-29 21:29 ` [Caml-list] " Oliver Bandel
@ 2007-03-30  0:26   ` Yaron Minsky
  2007-03-30  7:30     ` Florian Weimer
  0 siblings, 1 reply; 15+ messages in thread
From: Yaron Minsky @ 2007-03-30  0:26 UTC (permalink / raw)
  To: Oliver Bandel; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 1131 bytes --]

On 3/29/07, Oliver Bandel <oliver@first.in-berlin.de> wrote:
>
> On Thu, Mar 29, 2007 at 12:27:05PM -0400, Yaron Minsky wrote:
> > So, there's a weird int_of_string bug where positive decimal numbers are
> > sometimes read in as negative numbers without error.  Here's the bug:
> >
> > http://caml.inria.fr/mantis/view.php?id=0004210
> >
> > This has been marked as "wontfix" in the bug database because apparently
> > there's some weird spot in the lexer that depends on the wrong behavior
> of
> > int_of_string.
> [...]
>
> Oh, that's bad. :(
>
> But btw. it's also bad that, when overflowing of int occurs, no
> exception is thrown. :(


That's a problem too, but there is at least a defensible reason for that,
which is that it is expensive to get integer overflow to throw an exception.

Ciao,
>    Oliver Bandel
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

[-- Attachment #2: Type: text/html, Size: 1960 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] int_of_string bug
  2007-03-29 16:27 int_of_string bug Yaron Minsky
  2007-03-29 21:29 ` [Caml-list] " Oliver Bandel
@ 2007-03-30  1:21 ` Brian Hurt
  2007-03-30  1:26   ` Yaron Minsky
  1 sibling, 1 reply; 15+ messages in thread
From: Brian Hurt @ 2007-03-30  1:21 UTC (permalink / raw)
  To: Yaron Minsky; +Cc: caml-list



On Thu, 29 Mar 2007, Yaron Minsky wrote:

> So, there's a weird int_of_string bug where positive decimal numbers are
> sometimes read in as negative numbers without error.  Here's the bug:
>
> http://caml.inria.fr/mantis/view.php?id=0004210

I'm actually not sure this is a bug either.  Note that ocaml will quite 
happily compute max_int+1 without an error either.

Wether this behavior (silent wrap around) is correct or not is another 
argument.  Elsewhere I have opinioned that the only purpose for having 
more than one type of integer in your programming language is so that 
programmers can pick the wrong one.  But I'm widely known to be a heretic.

Ocaml's behavior is, at least, *consistent*.

Brian


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] int_of_string bug
  2007-03-30  1:21 ` Brian Hurt
@ 2007-03-30  1:26   ` Yaron Minsky
  2007-03-30  4:23     ` skaller
  0 siblings, 1 reply; 15+ messages in thread
From: Yaron Minsky @ 2007-03-30  1:26 UTC (permalink / raw)
  To: Brian Hurt; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 873 bytes --]

On 3/29/07, Brian Hurt <bhurt@spnz.org> wrote:
>
>
> Wether this behavior (silent wrap around) is correct or not is another
> argument.  Elsewhere I have opinioned that the only purpose for having
> more than one type of integer in your programming language is so that
> programmers can pick the wrong one.  But I'm widely known to be a heretic.
>
> Ocaml's behavior is, at least, *consistent*.


Not really all that consistent:

# int_of_string "1073741824";;
- : int = -1073741824
# int_of_string "1073741825";;
Exception: Failure "int_of_string".
#
y

Brian
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

[-- Attachment #2: Type: text/html, Size: 1659 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] int_of_string bug
  2007-03-30  1:26   ` Yaron Minsky
@ 2007-03-30  4:23     ` skaller
  2007-03-30  5:59       ` Erik de Castro Lopo
  0 siblings, 1 reply; 15+ messages in thread
From: skaller @ 2007-03-30  4:23 UTC (permalink / raw)
  To: Yaron Minsky; +Cc: Brian Hurt, caml-list

On Thu, 2007-03-29 at 21:26 -0400, Yaron Minsky wrote:
> On 3/29/07, Brian Hurt <bhurt@spnz.org> wrote:
>         
>         Wether this behavior (silent wrap around) is correct or not is
>         another
>         argument.  Elsewhere I have opinioned that the only purpose
>         for having
>         more than one type of integer in your programming language is
>         so that 
>         programmers can pick the wrong one.  But I'm widely known to
>         be a heretic.
>         
>         Ocaml's behavior is, at least, *consistent*.
> 
> Not really all that consistent:
> 
> # int_of_string "1073741824";;
> - : int = -1073741824
> # int_of_string "1073741825";;
> Exception: Failure "int_of_string".
> #  

skaller@rosella:/work/felix/svn/felix/felix/trunk$ ledit ocaml
        Objective Caml version 3.10+dev25 (2007-03-26)

# int_of_string "1073741824";;
- : int = 1073741824
# int_of_string "1073741825";;
- : int = 1073741825



-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] int_of_string bug
  2007-03-30  4:23     ` skaller
@ 2007-03-30  5:59       ` Erik de Castro Lopo
  2007-03-30  6:22         ` skaller
  2007-03-30 13:38         ` Markus Mottl
  0 siblings, 2 replies; 15+ messages in thread
From: Erik de Castro Lopo @ 2007-03-30  5:59 UTC (permalink / raw)
  To: caml-list

skaller wrote:

> On Thu, 2007-03-29 at 21:26 -0400, Yaron Minsky wrote:
> > # int_of_string "1073741824";;
> > - : int = -1073741824
> > # int_of_string "1073741825";;
> > Exception: Failure "int_of_string".

Thats the behaviour on 32 bit systems.

> # int_of_string "1073741824";;
> - : int = 1073741824
> # int_of_string "1073741825";;
> - : int = 1073741825

But 64 bit systems get it right.

Erik
-- 
+-----------------------------------------------------------+
  Erik de Castro Lopo
+-----------------------------------------------------------+
"Java, the best argument for Smalltalk since C++." -- Frank Winkler


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] int_of_string bug
  2007-03-30  5:59       ` Erik de Castro Lopo
@ 2007-03-30  6:22         ` skaller
  2007-03-30 13:38         ` Markus Mottl
  1 sibling, 0 replies; 15+ messages in thread
From: skaller @ 2007-03-30  6:22 UTC (permalink / raw)
  To: caml-list

On Fri, 2007-03-30 at 15:59 +1000, Erik de Castro Lopo wrote:
> skaller wrote:
> 
> > On Thu, 2007-03-29 at 21:26 -0400, Yaron Minsky wrote:
> > > # int_of_string "1073741824";;
> > > - : int = -1073741824
> > > # int_of_string "1073741825";;
> > > Exception: Failure "int_of_string".
> 
> Thats the behaviour on 32 bit systems.
> 
> > # int_of_string "1073741824";;
> > - : int = 1073741824
> > # int_of_string "1073741825";;
> > - : int = 1073741825
> 
> But 64 bit systems get it right.

The point being .. the behaviour for large values is
platform independent anyhow, so in the abstract
you can say the behaviour is undefined for large values,
where 'large' isn't specified.

If you want to get it RIGHT: if you have a user input string
possibly containing digits, and you want to convert it,
you must already write a parser to parse the input,
so you won't be using int_of_string anyhow.

If the input was generated (say by another Ocaml program),
then it will already be correct.

In the Felix compiler, after lexing 'string of digits'
I use the Big_int module to convert to an integer:
that behaviour is platform independent.

If I really want an int (say for indexing), and there's
a risk of the conversion overflowing .. there's a risk
that even without overflowing the data is wrong and will
blow up, eg .. I'm not going to be indexing arrays
with max_int elements .. :)

If I really want to check, I'll use an application specific
bound such as 16000, and check the big_int against that
before converting. Thus, all the operations are deterministic
and platform independent if you do things properly.

So the 'bug' in string_of_int is just an inconvenience.

IMHO there is a 'bug' in some Ocaml documentation, where the
abstract language is not clearly distinguished from the
implementation. Throwing exceptions on error should generally
NOT be considered a specified part of the language.

Undefined behaviour is sometimes the right specification because it
allows superior optimisation and prevents programmers
relying on exceptions. This doesn't prevent the implementation
throwing them, it just means catching them locally in your
code is a bug (because you can't be sure they will be thrown).

Bounds violations are a good example of this, and indeed
since Ocaml allows -unsafe switch to disable bound checks
you'd better NOT rely on catching them. The same applies
to match failures -- use a wildcard if you want to catch
unmatched cases (otherwise be willing to sketch a proof
to your boss that there can't be a violation :)


-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] int_of_string bug
  2007-03-30  0:26   ` Yaron Minsky
@ 2007-03-30  7:30     ` Florian Weimer
  2007-03-30  8:44       ` skaller
  0 siblings, 1 reply; 15+ messages in thread
From: Florian Weimer @ 2007-03-30  7:30 UTC (permalink / raw)
  To: Yaron Minsky; +Cc: Oliver Bandel, caml-list

* Yaron Minsky:

> That's a problem too, but there is at least a defensible reason for
> that, which is that it is expensive to get integer overflow to throw
> an exception.

i386 and amd64 have hardware support for that, so it's not very
expensive.  There are pretty short RISC sequences for the checks, too.

MLton uses the i386 hardware support, and I think you can disable the
checks, so measuring the overhead shouldn't be too hard.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] int_of_string bug
  2007-03-30  7:30     ` Florian Weimer
@ 2007-03-30  8:44       ` skaller
  2007-03-30  8:59         ` Andreas Rossberg
  0 siblings, 1 reply; 15+ messages in thread
From: skaller @ 2007-03-30  8:44 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Yaron Minsky, Oliver Bandel, caml-list

On Fri, 2007-03-30 at 09:30 +0200, Florian Weimer wrote:
> * Yaron Minsky:
> 
> > That's a problem too, but there is at least a defensible reason for
> > that, which is that it is expensive to get integer overflow to throw
> > an exception.
> 
> i386 and amd64 have hardware support for that, so it's not very
> expensive.  There are pretty short RISC sequences for the checks, too.
> 
> MLton uses the i386 hardware support, and I think you can disable the
> checks, so measuring the overhead shouldn't be too hard.

But there is a difference you may have missed: Ocaml integers
are 31 or 63 bits, not 32 or 64 bits. 

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] int_of_string bug
  2007-03-30  8:44       ` skaller
@ 2007-03-30  8:59         ` Andreas Rossberg
  2007-03-30  9:20           ` skaller
  0 siblings, 1 reply; 15+ messages in thread
From: Andreas Rossberg @ 2007-03-30  8:59 UTC (permalink / raw)
  To: skaller; +Cc: Florian Weimer, Oliver Bandel, Yaron Minsky, caml-list

skaller wrote:
>>
>>> That's a problem too, but there is at least a defensible reason for
>>> that, which is that it is expensive to get integer overflow to throw
>>> an exception.
>> i386 and amd64 have hardware support for that, so it's not very
>> expensive.  There are pretty short RISC sequences for the checks, too.
>>
>> MLton uses the i386 hardware support, and I think you can disable the
>> checks, so measuring the overhead shouldn't be too hard.
> 
> But there is a difference you may have missed: Ocaml integers
> are 31 or 63 bits, not 32 or 64 bits. 

But it uses the most significant 31/63 bits for ints, so that becomes a 
non-issue. ;-)

-- 
Andreas Rossberg, rossberg@ps.uni-sb.de


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] int_of_string bug
  2007-03-30  8:59         ` Andreas Rossberg
@ 2007-03-30  9:20           ` skaller
  0 siblings, 0 replies; 15+ messages in thread
From: skaller @ 2007-03-30  9:20 UTC (permalink / raw)
  To: Andreas Rossberg; +Cc: Florian Weimer, Yaron Minsky, Oliver Bandel, caml-list

On Fri, 2007-03-30 at 10:59 +0200, Andreas Rossberg wrote:
> skaller wrote:
> >>
> >>> That's a problem too, but there is at least a defensible reason for
> >>> that, which is that it is expensive to get integer overflow to throw
> >>> an exception.
> >> i386 and amd64 have hardware support for that, so it's not very
> >> expensive.  There are pretty short RISC sequences for the checks, too.
> >>
> >> MLton uses the i386 hardware support, and I think you can disable the
> >> checks, so measuring the overhead shouldn't be too hard.
> > 
> > But there is a difference you may have missed: Ocaml integers
> > are 31 or 63 bits, not 32 or 64 bits. 
> 
> But it uses the most significant 31/63 bits for ints, so that becomes a 
> non-issue. ;-)

For addition maybe, certainly not for multiplication: one of the
operands has to be shifted right 1 place.

But it depends on the code generator internal details. You could
always shift BOTH operands, do the register calculation, then
shift back .. in which case you'd lose overflow detection.
The problem is you cannot use the carry bit after the shift back
because the bit will definitely be set if the result is negative.

>From what I've seen Ocaml actually uses tricks which also might
defeat detection, for example I've seen some use of LEA
(load effective address) with the scale by 2 option to 
load and shift one bit in a single instruction.

Processors are quirky about flag bits .. some set sign bit
on loading and others don't, etc, so it could be quite messy.
This is why C doesn't specify what happens on overflow:
it would compromise performance on some processors.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] int_of_string bug
  2007-03-30  5:59       ` Erik de Castro Lopo
  2007-03-30  6:22         ` skaller
@ 2007-03-30 13:38         ` Markus Mottl
  2007-04-03 17:51           ` Toby Kelsey
  1 sibling, 1 reply; 15+ messages in thread
From: Markus Mottl @ 2007-03-30 13:38 UTC (permalink / raw)
  To: Erik de Castro Lopo; +Cc: caml-list

On 3/30/07, Erik de Castro Lopo <mle+ocaml@mega-nerd.com> wrote:
> But 64 bit systems get it right.

Not really:

# int_of_string "4611686018427387903";;
- : int = 4611686018427387903
# int_of_string "4611686018427387904";;
- : int = -4611686018427387904
# int_of_string "4611686018427387905";;
Exception: Failure "int_of_string".

The problem is just shifted to bigger numbers.  This problem arises
with all integer conversion functions, i.e. Int64.of_string,
Int32.of_string, Nativeint.of_string, int_of_string.

Regards
Markus

-- 
Markus Mottl        http://www.ocaml.info        markus.mottl@gmail.com


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] int_of_string bug
  2007-03-30 13:38         ` Markus Mottl
@ 2007-04-03 17:51           ` Toby Kelsey
  2007-04-03 22:37             ` ls-ocaml-developer-2006
  0 siblings, 1 reply; 15+ messages in thread
From: Toby Kelsey @ 2007-04-03 17:51 UTC (permalink / raw)
  To: caml-list

Markus Mottl wrote:

> The problem is just shifted to bigger numbers.  This problem arises
> with all integer conversion functions, i.e. Int64.of_string,
> Int32.of_string, Nativeint.of_string, int_of_string.
> 
> Regards
> Markus

This bug is not just a conversion problem:

# let x = 1073741824;;
val x : int = -1073741824
# (x < 0) && (x >= -x);;
- : bool = true

Toby


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] int_of_string bug
  2007-04-03 17:51           ` Toby Kelsey
@ 2007-04-03 22:37             ` ls-ocaml-developer-2006
  0 siblings, 0 replies; 15+ messages in thread
From: ls-ocaml-developer-2006 @ 2007-04-03 22:37 UTC (permalink / raw)
  To: caml-list



Toby Kelsey <toby.kelsey@gmail.com> writes:

> Markus Mottl wrote:
>
>> The problem is just shifted to bigger numbers.  This problem arises
>> with all integer conversion functions, i.e. Int64.of_string,
>> Int32.of_string, Nativeint.of_string, int_of_string.
>> Regards
>> Markus
>
> This bug is not just a conversion problem:
>
> # let x = 1073741824;;
> val x : int = -1073741824
> # (x < 0) && (x >= -x);;
> - : bool = true


# let x = - 1073741824;;
val x : int = -1073741824
# -x;;
- : int = -1073741824

But this is as specified for modular ints. No surprise here ...


Regards -- Markus


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2007-04-03 22:30 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-29 16:27 int_of_string bug Yaron Minsky
2007-03-29 21:29 ` [Caml-list] " Oliver Bandel
2007-03-30  0:26   ` Yaron Minsky
2007-03-30  7:30     ` Florian Weimer
2007-03-30  8:44       ` skaller
2007-03-30  8:59         ` Andreas Rossberg
2007-03-30  9:20           ` skaller
2007-03-30  1:21 ` Brian Hurt
2007-03-30  1:26   ` Yaron Minsky
2007-03-30  4:23     ` skaller
2007-03-30  5:59       ` Erik de Castro Lopo
2007-03-30  6:22         ` skaller
2007-03-30 13:38         ` Markus Mottl
2007-04-03 17:51           ` Toby Kelsey
2007-04-03 22:37             ` ls-ocaml-developer-2006

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).