mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] Bug in atoll strtoll, the output of then differ
@ 2022-12-18  9:32 Domingo Alvarez Duarte
  2022-12-18  9:58 ` Markus Wichmann
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Domingo Alvarez Duarte @ 2022-12-18  9:32 UTC (permalink / raw)
  To: musl

Hello !

Doing some work with emscripten with this project 
https://github.com/mingodad/CG-SQL-Lua-playground I was getting some 
errors with the usage of "atoll" and with this small program to compare 
the output of "musl" and "glibc" I found what seems to be a bug in 
"atoll" because with "musl" it gives a different output than "strtoll".

=====

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
     const char *s = "9223372036854775808";
     long  long ll = atoll(s);
     long long ll2 = strtoll (s, (char **) NULL, 10);
     int imax = 0x7fffffff;
     printf("%s : %lld : %lld : %d : %d\n",  s, ll, ll2, imax, ll <= imax);
     return 0;
}

=====

Output from "glibc":

=====

9223372036854775808 : 9223372036854775807 : 9223372036854775807 : 
2147483647 : 0

=====

Output from "musl":

=====

9223372036854775808 : -9223372036854775808 : 9223372036854775807 : 
2147483647 : 1

=====

Cheers !


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [musl] Bug in atoll strtoll, the output of then differ
  2022-12-18  9:32 [musl] Bug in atoll strtoll, the output of then differ Domingo Alvarez Duarte
@ 2022-12-18  9:58 ` Markus Wichmann
  2022-12-18 10:22   ` Domingo Alvarez Duarte
  2022-12-18 10:06 ` Quentin Rameau
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 7+ messages in thread
From: Markus Wichmann @ 2022-12-18  9:58 UTC (permalink / raw)
  To: musl

On Sun, Dec 18, 2022 at 10:32:10AM +0100, Domingo Alvarez Duarte wrote:
> Hello !
>
> Doing some work with emscripten with this project
> https://github.com/mingodad/CG-SQL-Lua-playground I was getting some errors
> with the usage of "atoll" and with this small program to compare the output
> of "musl" and "glibc" I found what seems to be a bug in "atoll" because with
> "musl" it gives a different output than "strtoll".
>
> =====
>
> #include <stdio.h>
> #include <stdlib.h>
>
> int main(int argc, char *argv[])
> {
>     const char *s = "9223372036854775808";
>     long  long ll = atoll(s);
>     long long ll2 = strtoll (s, (char **) NULL, 10);
>     int imax = 0x7fffffff;
>     printf("%s : %lld : %lld : %d : %d\n",  s, ll, ll2, imax, ll <= imax);
>     return 0;
> }
>
> =====
>
> Output from "glibc":
>
> =====
>
> 9223372036854775808 : 9223372036854775807 : 9223372036854775807 : 2147483647
> : 0
>
> =====
>
> Output from "musl":
>
> =====
>
> 9223372036854775808 : -9223372036854775808 : 9223372036854775807 :
> 2147483647 : 1
>
> =====
>
> Cheers !
>

Well, your problem here is that ato* behavior on error is not defined.
The C standard explicitly excepts behavior on error from the requirement
that these functions return the same thing as their strto* counterparts,
and §7.24.1 (of C23) explicitly states that behavior in that case is
undefined.

This means that a test case is wrong; no result is defined. Actually, a
crash would be acceptable behavior. This also means that when I return
to work next year, I should really go through my code base and replace
all ato* calls with their strto* counterparts for that reason alone.

Ciao,
Markus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [musl] Bug in atoll strtoll, the output of then differ
  2022-12-18  9:32 [musl] Bug in atoll strtoll, the output of then differ Domingo Alvarez Duarte
  2022-12-18  9:58 ` Markus Wichmann
@ 2022-12-18 10:06 ` Quentin Rameau
  2022-12-18 12:23 ` Szabolcs Nagy
  2022-12-18 15:25 ` Rich Felker
  3 siblings, 0 replies; 7+ messages in thread
From: Quentin Rameau @ 2022-12-18 10:06 UTC (permalink / raw)
  To: musl

> Hello !

Hi,

> Doing some work with emscripten with this project 
> https://github.com/mingodad/CG-SQL-Lua-playground I was getting some 
> errors with the usage of "atoll" and with this small program to compare 
> the output of "musl" and "glibc" I found what seems to be a bug in 
> "atoll" because with "musl" it gives a different output than "strtoll".
> 
> =====
> 
> #include <stdio.h>
> #include <stdlib.h>
> 
> int main(int argc, char *argv[])
> {
>      const char *s = "9223372036854775808";
>      long  long ll = atoll(s);
>      long long ll2 = strtoll (s, (char **) NULL, 10);
>      int imax = 0x7fffffff;
>      printf("%s : %lld : %lld : %d : %d\n",  s, ll, ll2, imax, ll <= imax);
>      return 0;
> }

This is not a bug in musl, but a bug in the code,
9223372036854775808 is outside the range of long long,
so the behavior is undefined.

As recommended by the standard, ato* should only be used if the input
is known to always be in the target range,
otherwise use the strto* functins and do proper error handling.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [musl] Bug in atoll strtoll, the output of then differ
  2022-12-18  9:58 ` Markus Wichmann
@ 2022-12-18 10:22   ` Domingo Alvarez Duarte
  2022-12-18 11:10     ` Markus Wichmann
  0 siblings, 1 reply; 7+ messages in thread
From: Domingo Alvarez Duarte @ 2022-12-18 10:22 UTC (permalink / raw)
  To: musl

Here is the "glibc" implementation of "atoll":

=====

/* Convert a string to a long long int.  */
long long int
atoll (const char *nptr)
{
   return strtoll (nptr, (char **) NULL, 10);
}

=====

With that there is no way for get different results from "atolll" and 
"strtoll".

Cheers !

On 18/12/22 10:58, Markus Wichmann wrote:
> On Sun, Dec 18, 2022 at 10:32:10AM +0100, Domingo Alvarez Duarte wrote:
>> Hello !
>>
>> Doing some work with emscripten with this project
>> https://github.com/mingodad/CG-SQL-Lua-playground I was getting some errors
>> with the usage of "atoll" and with this small program to compare the output
>> of "musl" and "glibc" I found what seems to be a bug in "atoll" because with
>> "musl" it gives a different output than "strtoll".
>>
>> =====
>>
>> #include <stdio.h>
>> #include <stdlib.h>
>>
>> int main(int argc, char *argv[])
>> {
>>      const char *s = "9223372036854775808";
>>      long  long ll = atoll(s);
>>      long long ll2 = strtoll (s, (char **) NULL, 10);
>>      int imax = 0x7fffffff;
>>      printf("%s : %lld : %lld : %d : %d\n",  s, ll, ll2, imax, ll <= imax);
>>      return 0;
>> }
>>
>> =====
>>
>> Output from "glibc":
>>
>> =====
>>
>> 9223372036854775808 : 9223372036854775807 : 9223372036854775807 : 2147483647
>> : 0
>>
>> =====
>>
>> Output from "musl":
>>
>> =====
>>
>> 9223372036854775808 : -9223372036854775808 : 9223372036854775807 :
>> 2147483647 : 1
>>
>> =====
>>
>> Cheers !
>>
> Well, your problem here is that ato* behavior on error is not defined.
> The C standard explicitly excepts behavior on error from the requirement
> that these functions return the same thing as their strto* counterparts,
> and §7.24.1 (of C23) explicitly states that behavior in that case is
> undefined.
>
> This means that a test case is wrong; no result is defined. Actually, a
> crash would be acceptable behavior. This also means that when I return
> to work next year, I should really go through my code base and replace
> all ato* calls with their strto* counterparts for that reason alone.
>
> Ciao,
> Markus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [musl] Bug in atoll strtoll, the output of then differ
  2022-12-18 10:22   ` Domingo Alvarez Duarte
@ 2022-12-18 11:10     ` Markus Wichmann
  0 siblings, 0 replies; 7+ messages in thread
From: Markus Wichmann @ 2022-12-18 11:10 UTC (permalink / raw)
  To: musl

On Sun, Dec 18, 2022 at 11:22:59AM +0100, Domingo Alvarez Duarte wrote:
> Here is the "glibc" implementation of "atoll":
>
> =====
>
> /* Convert a string to a long long int.  */
> long long int
> atoll (const char *nptr)
> {
>   return strtoll (nptr, (char **) NULL, 10);
> }
>
> =====
>
> With that there is no way for get different results from "atolll" and
> "strtoll".
>
> Cheers !
>

That's precisely why I would like for programmers to learn some Pascal
at some point in their lives, so they start to understand that interface
and implementation are two separate things.

Yes, that is a nice implementation up there. The interface for atoll
however comes from the C standard, and it says:

| 7.24.1 Numeric conversion functions
| 1 The functions atof, atoi, atol, and atoll need not affect the value of the integer expression errno
| on an error. If the value of the result cannot be represented, the behavior is undefined.
| [...]
| The atoi, atol, and atoll functions convert the initial portion of the string pointed to by nptr to
| int, long int, and long long int representation, respectively. Except for the behavior on error,
| they are equivalent to
| atoi: (int)strtol(nptr, nullptr, 10)
| atol: strtol(nptr, nullptr, 10)
| atoll: strtoll(nptr, nullptr, 10)

Entering 2^63 into atoll() (when long long int is a 64 bit type) is an
error that invokes undefined behavior.  Explicitly undefined behavior.
Some implementations choose to deal with this by implementing the
protections from strtoll() (for example by calling strtoll()). Some
implementations don't. This is an implementation detail. The application
cannot know what will happen, and should not make assumptions about what
will happen. It should only call the ato* functions on known valid
input. If it does not know that the input is valid, it should not call
ato*.

Ciao,
Markus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [musl] Bug in atoll strtoll, the output of then differ
  2022-12-18  9:32 [musl] Bug in atoll strtoll, the output of then differ Domingo Alvarez Duarte
  2022-12-18  9:58 ` Markus Wichmann
  2022-12-18 10:06 ` Quentin Rameau
@ 2022-12-18 12:23 ` Szabolcs Nagy
  2022-12-18 15:25 ` Rich Felker
  3 siblings, 0 replies; 7+ messages in thread
From: Szabolcs Nagy @ 2022-12-18 12:23 UTC (permalink / raw)
  To: Domingo Alvarez Duarte; +Cc: musl

* Domingo Alvarez Duarte <mingodad@gmail.com> [2022-12-18 10:32:10 +0100]:
> Doing some work with emscripten with this project
> https://github.com/mingodad/CG-SQL-Lua-playground I was getting some errors
> with the usage of "atoll" and with this small program to compare the output
> of "musl" and "glibc" I found what seems to be a bug in "atoll" because with
> "musl" it gives a different output than "strtoll".

as others pointed out using ato* is a bug here.

glibc does not guarantee ato* to to be compatible with strto* either
and it just got around fixing its internal ato* usage:
https://sourceware.org/pipermail/libc-alpha/2022-December/144147.html


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [musl] Bug in atoll strtoll, the output of then differ
  2022-12-18  9:32 [musl] Bug in atoll strtoll, the output of then differ Domingo Alvarez Duarte
                   ` (2 preceding siblings ...)
  2022-12-18 12:23 ` Szabolcs Nagy
@ 2022-12-18 15:25 ` Rich Felker
  3 siblings, 0 replies; 7+ messages in thread
From: Rich Felker @ 2022-12-18 15:25 UTC (permalink / raw)
  To: Domingo Alvarez Duarte; +Cc: musl

On Sun, Dec 18, 2022 at 10:32:10AM +0100, Domingo Alvarez Duarte wrote:
> Hello !
> 
> Doing some work with emscripten with this project
> https://github.com/mingodad/CG-SQL-Lua-playground I was getting some
> errors with the usage of "atoll" and with this small program to
> compare the output of "musl" and "glibc" I found what seems to be a
> bug in "atoll" because with "musl" it gives a different output than
> "strtoll".

Everyone's already covered the reason this is not a bug, but to shed
some light on possible motivations for not implementing ato* as
wrappers around strto*:

Aside from making these functions somewhat smaller when static linked
into tiny programs, writing the conversion with arithmetic that
overflows on out-of-bounds inputs rather than handling it as an error
case makes it so that a build of libc with suitable sanitizers would
automatically make ato* trap-and-crash on inputs that have undefined
behavior via the undefinedness of the underlying arithmetic. To do
this with strto* wrappers would require manually checking error cases
and manual alignment of the trap cases with the specification, which
would need review and testing to get the same benefit.

That's not to say it *has to* be done this way. In lots of places in
musl, we do just implement "junk functions" similar to the ato* family
as wrappers around a modern "good function". But being that we already
have it here, I see no reason to change to something that's worse in
most ways.

Rich

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-12-18 15:25 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-18  9:32 [musl] Bug in atoll strtoll, the output of then differ Domingo Alvarez Duarte
2022-12-18  9:58 ` Markus Wichmann
2022-12-18 10:22   ` Domingo Alvarez Duarte
2022-12-18 11:10     ` Markus Wichmann
2022-12-18 10:06 ` Quentin Rameau
2022-12-18 12:23 ` Szabolcs Nagy
2022-12-18 15:25 ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).