mailing list of musl libc
 help / color / mirror / code / Atom feed
* Undefined behavior in atoi()
@ 2011-11-06 14:24 Pascal Cuoq
  2011-11-06 21:21 ` Rich Felker
  0 siblings, 1 reply; 6+ messages in thread
From: Pascal Cuoq @ 2011-11-06 14:24 UTC (permalink / raw)
  To: musl


[-- Attachment #1.1: Type: text/plain, Size: 595 bytes --]

Hello,

the attached patch against musl-0.8.3
removes an undefined behavior when atoi()
is applied to the representation of INT_MIN.

The undefined behavior is not observable
if the compiler implements 2's complement
for signed arithmetic overflows, but the compiler
doesn't have to.

On the other hand, C99 mandates either two's complement's
lopsided representation of integers or other,
symmetrical, representations (6.2.6.2), so I think the patch
is an overall improvement.

The patch applies in musl-0.8.3/src/stdlib/
and contains identical changes for atol() and atoll().

Regards,

Pascal

[-- Attachment #1.2: Type: text/html, Size: 868 bytes --]

[-- Attachment #2: patch_atoi --]
[-- Type: application/octet-stream, Size: 1224 bytes --]

--- atol.orig.c	2011-11-06 14:49:24.000000000 +0100
+++ atol.c	2011-11-06 14:52:16.000000000 +0100
@@ -10,7 +10,9 @@
 	case '-': neg=1;
 	case '+': s++;
 	}
+  /* Compute n as a negative number to avoid undefined behavior 
+     when s represents LONG_MIN on 2’s complement archs: */
 	while (isdigit(*s))
-		n = 10*n + *s++ - '0';
-	return neg ? -n : n;
+		n = 10*n - (*s++ - '0');
+	return neg ? n : -n;
 }
--- atoll.c~	2011-09-22 02:24:48.000000000 +0200
+++ atoll.c	2011-11-06 14:54:52.000000000 +0100
@@ -10,7 +10,9 @@
 	case '-': neg=1;
 	case '+': s++;
 	}
+  /* Compute n as a negative number to avoid undefined behavior 
+     when s represents LLONG_MIN on 2’s complement archs: */
 	while (isdigit(*s))
-		n = 10*n + *s++ - '0';
-	return neg ? -n : n;
+		n = 10*n - (*s++ - '0');
+	return neg ? n : -n;
 }
--- atoi.c~	2011-09-22 02:24:48.000000000 +0200
+++ atoi.c	2011-11-06 14:53:36.000000000 +0100
@@ -9,7 +9,9 @@
 	case '-': neg=1;
 	case '+': s++;
 	}
+ /* Compute n as a negative number to avoid undefined behavior 
+     when s represents INT_MIN on 2’s complement archs: */
 	while (isdigit(*s))
-		n = 10*n + *s++ - '0';
-	return neg ? -n : n;
+		n = 10*n - (*s++ - '0');
+	return neg ? n : -n;
 }

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Undefined behavior in atoi()
  2011-11-06 14:24 Undefined behavior in atoi() Pascal Cuoq
@ 2011-11-06 21:21 ` Rich Felker
  2011-11-06 22:28   ` Pascal Cuoq
  0 siblings, 1 reply; 6+ messages in thread
From: Rich Felker @ 2011-11-06 21:21 UTC (permalink / raw)
  To: musl

On Sun, Nov 06, 2011 at 03:24:00PM +0100, Pascal Cuoq wrote:
> Hello,
> 
> the attached patch against musl-0.8.3
> removes an undefined behavior when atoi()
> is applied to the representation of INT_MIN.

Thanks. How did you manage to find this bug? Just browsing source?

> The undefined behavior is not observable
> if the compiler implements 2's complement
> for signed arithmetic overflows, but the compiler
> doesn't have to.

musl does assume/require a twos complement representation for signed
values, but of course even with this, signed overflow is UB.

> On the other hand, C99 mandates either two's complement's
> lopsided representation of integers or other,
> symmetrical, representations (6.2.6.2), so I think the patch
> is an overall improvement.
> 
> The patch applies in musl-0.8.3/src/stdlib/
> and contains identical changes for atol() and atoll().

I was thinking about some alternate fixes, but they turned out to be
wrong. The only other one I'm mildly considering is just making atoi,
etc. wrappers for strtol, etc. This should reduce the overall size of
the library at the cost of increasing the size of tiny programs that
just use atoi, but since atoi has UB when the value overflows, it's
second only to gets in unusability for correct programs...

Still leaning towards applying your patch as-is.

Rich


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Undefined behavior in atoi()
  2011-11-06 21:21 ` Rich Felker
@ 2011-11-06 22:28   ` Pascal Cuoq
  2011-11-08  3:44     ` Rich Felker
  0 siblings, 1 reply; 6+ messages in thread
From: Pascal Cuoq @ 2011-11-06 22:28 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1193 bytes --]

On Sun, Nov 6, 2011 at 10:21 PM, Rich Felker <dalias@aerifal.cx> wrote:

>
> Thanks. How did you manage to find this bug? Just browsing source?
>

I was looking at implementations for strtod() (long story for another time)
and I noticed that this library was the most pleasant C code I had
seen in months (seriously), so I lingered a bit. So, the straightforward
answer is "yes, code review".

The slightly longer answer is that I have done the formal verification of
"string to number" C functions in the past, found that they were correct,
noticed the opposite trick, and somehow the relationship
between the two clicked. So I was looking for a classic pitfall here.

Since my previous e-mail, I have been analyzing a few more cases for fun,
and musl's original code also has this issue for the 40 or so values
below LONG_MAX.

The problem is then:

src/stdlib/atol.orig.c:14:[kernel] warning: Signed overflow.
                  assert (long)((long)10*n)+(long)*tmp_0 ≤
9223372036854775807LL;

That is, you should in any case subtract '0' from the digit before adding
it in
at line 14. In the case of atoll() on IA32, it will even be more efficient.

Pascal

[-- Attachment #2: Type: text/html, Size: 1772 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Undefined behavior in atoi()
  2011-11-06 22:28   ` Pascal Cuoq
@ 2011-11-08  3:44     ` Rich Felker
  2011-11-08  5:43       ` Pascal Cuoq
  0 siblings, 1 reply; 6+ messages in thread
From: Rich Felker @ 2011-11-08  3:44 UTC (permalink / raw)
  To: musl

On Sun, Nov 06, 2011 at 11:28:41PM +0100, Pascal Cuoq wrote:
> On Sun, Nov 6, 2011 at 10:21 PM, Rich Felker <dalias@aerifal.cx> wrote:
> 
> >
> > Thanks. How did you manage to find this bug? Just browsing source?
> >
> 
> I was looking at implementations for strtod() (long story for another time)

It should be noted that the current implementation is not correct.
There's lots of loss-of-precision.

> and I noticed that this library was the most pleasant C code I had
> seen in months (seriously), so I lingered a bit. So, the straightforward
> answer is "yes, code review".

Glad to hear you found the code pleasant, at least that part. Some
things (like floating point in vfprintf.c) are a bit ugly.. :)

> The slightly longer answer is that I have done the formal verification of
> "string to number" C functions in the past, found that they were correct,
> noticed the opposite trick, and somehow the relationship
> between the two clicked. So I was looking for a classic pitfall here.

Interesting, I'd never come across this one.

> Since my previous e-mail, I have been analyzing a few more cases for fun,
> and musl's original code also has this issue for the 40 or so values
> below LONG_MAX.
> 
> The problem is then:
> 
> src/stdlib/atol.orig.c:14:[kernel] warning: Signed overflow.
>                   assert (long)((long)10*n)+(long)*tmp_0 ≤
> 9223372036854775807LL;
> 
> That is, you should in any case subtract '0' from the digit before adding
> it in
> at line 14. In the case of atoll() on IA32, it will even be more efficient.

Thanks for catching that too. I'd certainly be glad to hear the
results of any more code review/testing you're up to doing. I've been
a little bit out of touch with working on musl lately due to being
very busy, but I'm reading the list/irc and making occasional
improvements. Development should speed up a lot in a month or so.

Rich


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Undefined behavior in atoi()
  2011-11-08  3:44     ` Rich Felker
@ 2011-11-08  5:43       ` Pascal Cuoq
  2011-11-08 14:12         ` Rich Felker
  0 siblings, 1 reply; 6+ messages in thread
From: Pascal Cuoq @ 2011-11-08  5:43 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1293 bytes --]

On Tue, Nov 8, 2011 at 4:44 AM, Rich Felker <dalias@aerifal.cx> wrote:

> On Sun, Nov 06, 2011 at 11:28:41PM +0100, Pascal Cuoq wrote:
>
> > I was looking at implementations for strtod() (long story for another
> time)
>
> It should be noted that the current implementation is not correct.
> There's lots of loss-of-precision.
>

So I have seen, but if it's possible to do better for the objectives of
musl,
I do not know how. David M. Gay's code is efficient but it's not portable,
and the other implementation I found (provided as part of Tcl and Ruby
for instance) is only to a few ULPs too.

I am going to write my own, with multi-precision integers (that I already
rely on for other things). I am hoping that by letting ldexp take care
of denormals and infinites, it will be simple.


> Glad to hear you found the code pleasant, at least that part. Some
> things (like floating point in vfprintf.c) are a bit ugly.. :)
>

I did look at the floating-point part in vfprintf.c, because of
the claim on musl's website that printing floating-point numbers
was correct. I have had trouble with floating-point
to decimal conversion too, but I cheated a little there by inventing
by own format:

http://blog.frama-c.com/index.php?post/2011/10/29/A-portable-OCaml-function-to-print-floats

Pascal

[-- Attachment #2: Type: text/html, Size: 2040 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Undefined behavior in atoi()
  2011-11-08  5:43       ` Pascal Cuoq
@ 2011-11-08 14:12         ` Rich Felker
  0 siblings, 0 replies; 6+ messages in thread
From: Rich Felker @ 2011-11-08 14:12 UTC (permalink / raw)
  To: musl

On Tue, Nov 08, 2011 at 06:43:45AM +0100, Pascal Cuoq wrote:
> On Tue, Nov 8, 2011 at 4:44 AM, Rich Felker <dalias@aerifal.cx> wrote:
> 
> > On Sun, Nov 06, 2011 at 11:28:41PM +0100, Pascal Cuoq wrote:
> >
> > > I was looking at implementations for strtod() (long story for another
> > time)
> >
> > It should be noted that the current implementation is not correct.
> > There's lots of loss-of-precision.
> >
> 
> So I have seen, but if it's possible to do better for the objectives of
> musl,
> I do not know how. David M. Gay's code is efficient but it's not portable,
> and the other implementation I found (provided as part of Tcl and Ruby
> for instance) is only to a few ULPs too.
> 
> I am going to write my own, with multi-precision integers (that I already
> rely on for other things). I am hoping that by letting ldexp take care
> of denormals and infinites, it will be simple.

I believe it's possible using a method similar to what's in
vfprintf.c, but in reverse. In principle this amounts to using
high-precision floats, but with only a couple operations permitted on
them and fixed-size buffers, the code simplifies down to a managable
level. Actually that's why mentioning it came to mind..

Rich


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-11-08 14:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-06 14:24 Undefined behavior in atoi() Pascal Cuoq
2011-11-06 21:21 ` Rich Felker
2011-11-06 22:28   ` Pascal Cuoq
2011-11-08  3:44     ` Rich Felker
2011-11-08  5:43       ` Pascal Cuoq
2011-11-08 14:12         ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).