mailing list of musl libc
 help / color / mirror / code / Atom feed
* Handling of L and ll prefixes different from glibc
@ 2016-12-14 13:46 Nadav Har'El
  2016-12-14 16:13 ` Rich Felker
  0 siblings, 1 reply; 7+ messages in thread
From: Nadav Har'El @ 2016-12-14 13:46 UTC (permalink / raw)
  To: musl; +Cc: Nadav Har'El

[-- Attachment #1: Type: text/plain, Size: 1100 bytes --]

Hi,

Posix's printf manual suggests (see
http://pubs.opengroup.org/onlinepubs/9699919799/functions/fprintf.html)
that the "ll" format prefix should only be used for integer types, and "L"
should only be used for long double type. And it seems that indeed, this is
what Musl's printf() supports - the test program

    long double d = 123.456;
    printf("Lf: %Lf\n", d);
    printf("llf %llf\n", d);
    long long int i = 123456;
    printf("Ld: %Ld\n", i);
    printf("lld: %lld\n", i);

produces with Musl's printf just two lines of output:

    Lf: 123.456000
    lld: 123456

The two other printf()s (with %Ld and %llf) are silently dropped.

However, in glibc, it seems that "ll" and "L" are synonyms, and both work
for both integer and floating types. The above program produces with glibc
four lines of output:

    Lf: 123.456000
    llf 123.456000
    Ld: 123456
    lld: 123456

If Musl's intention is to be compatible with glibc, not Posix, I guess this
behavior should be fixed, and LL and ll should become synonyms, not
different flags?

Thanks,
Nadav.

--
Nadav Har'El
nyh@scylladb.com

[-- Attachment #2: Type: text/html, Size: 1789 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Handling of L and ll prefixes different from glibc
  2016-12-14 13:46 Handling of L and ll prefixes different from glibc Nadav Har'El
@ 2016-12-14 16:13 ` Rich Felker
  2016-12-14 17:17   ` Szabolcs Nagy
  0 siblings, 1 reply; 7+ messages in thread
From: Rich Felker @ 2016-12-14 16:13 UTC (permalink / raw)
  To: musl

On Wed, Dec 14, 2016 at 03:46:40PM +0200, Nadav Har'El wrote:
> Hi,
> 
> Posix's printf manual suggests (see
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/fprintf.html)
> that the "ll" format prefix should only be used for integer types, and "L"
> should only be used for long double type. And it seems that indeed, this is
> what Musl's printf() supports - the test program
> 
>     long double d = 123.456;
>     printf("Lf: %Lf\n", d);
>     printf("llf %llf\n", d);
>     long long int i = 123456;
>     printf("Ld: %Ld\n", i);
>     printf("lld: %lld\n", i);
> 
> produces with Musl's printf just two lines of output:
> 
>     Lf: 123.456000
>     lld: 123456
> 
> The two other printf()s (with %Ld and %llf) are silently dropped.

Not quite silently; printf is returning -1 with errno set to EINVAL.

> However, in glibc, it seems that "ll" and "L" are synonyms, and both work
> for both integer and floating types. The above program produces with glibc
> four lines of output:
> 
>     Lf: 123.456000
>     llf 123.456000
>     Ld: 123456
>     lld: 123456
> 
> If Musl's intention is to be compatible with glibc, not Posix, I guess this
> behavior should be fixed, and LL and ll should become synonyms, not
> different flags?

There is no general "intention to be compatible with glibc". There are
a couple related topics you might be thinking of:

Widely-used and widely-available extensions: There are written-up
guidelines for the criteria for inclusion or exclusion of such
interfaces, balancing things like usefulness, cost, and whether
there's already a better way to do the same thing portably.

ABI compatibility: There is an intent to support use of some
glibc-linked code in binary form with musl. From a practical
standpoint, this is mainly for libraries without source that some
users depend on (like flash and eventually nvidia stuff, maybe). Aside
from practical needs like that, the scope of the compatibility goal is
purely to support fully POSIX-conforming programs, or programs which
use common extension APIs provided by musl, not programs relying on
unsupported glibc functionality, doing things gratuitously wrong (like
ll vs L here), or depending on glibc bugs.

Now back to the topic of printf: 

As for printf formats specifically, musl avoids defining any of the
cases which are undefined behavior in order to avoid getting in a
situation where we conflict with future versions of the standard. This
happened with glibc's scanf, which took 'a' as an extension flag for
auto-allocation, only to have C99 later assign it for floating point
(to match printf hex formatting), and glibc had to use hackery of
remapping symbols in different conformance profiles to work around the
problem. musl does not do that kind of hackery, so we have to be
careful not to introduce such problems in the first place.

Note that there is one printf extension we have, %m, but this is
because POSIX requires %m to be supported by syslog(), making it
unlikely that the standards would assign a conflicting meaning in the
future. Also %m is very useful, whereas mismatched L/ll is just
programmer sloppiness.

One thing I'm not happy with now is the way printf returns an error
(which the caller usually ignores) on invalid format strings; this
hides lots of bugs (for example, a similar issue with some legacy
software using %qd for printing long long) and doesn't have any basis
in requirements, since invalid format strings invoke undefined
behavior. I'm mildly leaning towards causing a crash on invalid format
strings so that the location of the incorrect usage can be quickly
found with a debugger, but I'd like feedback from users who've
debugged this sort of thing on whether that'd actually be helpful.

Rich


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Handling of L and ll prefixes different from glibc
  2016-12-14 16:13 ` Rich Felker
@ 2016-12-14 17:17   ` Szabolcs Nagy
  2016-12-14 22:37     ` A. Wilcox
  0 siblings, 1 reply; 7+ messages in thread
From: Szabolcs Nagy @ 2016-12-14 17:17 UTC (permalink / raw)
  To: musl

* Rich Felker <dalias@libc.org> [2016-12-14 11:13:48 -0500]:
> behavior. I'm mildly leaning towards causing a crash on invalid format
> strings so that the location of the incorrect usage can be quickly
> found with a debugger, but I'd like feedback from users who've
> debugged this sort of thing on whether that'd actually be helpful.

crashing sounds good to me.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Handling of L and ll prefixes different from glibc
  2016-12-14 17:17   ` Szabolcs Nagy
@ 2016-12-14 22:37     ` A. Wilcox
  2016-12-15  2:30       ` Rich Felker
  0 siblings, 1 reply; 7+ messages in thread
From: A. Wilcox @ 2016-12-14 22:37 UTC (permalink / raw)
  To: musl

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 14/12/16 11:17, Szabolcs Nagy wrote:
> * Rich Felker <dalias@libc.org> [2016-12-14 11:13:48 -0500]:
>> behavior. I'm mildly leaning towards causing a crash on invalid
>> format strings so that the location of the incorrect usage can be
>> quickly found with a debugger, but I'd like feedback from users
>> who've debugged this sort of thing on whether that'd actually be
>> helpful.
> 
> crashing sounds good to me.
> 

Would this be able to be configured in some way when building the libc
(-D_CRASH_ON_PRINTF_UB or such)?

This sounds like a great tool to use when doing conformance testing,
and in general once testing has been done.  However, it also sounds
like a great way to break packages already "working" on musl.

- --arw

- -- 
A. Wilcox (awilfox)
Project Lead, Adélie Linux
http://adelielinux.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJYUcm9AAoJEMspy1GSK50U4twQALPW33GUPTTBdycM8QsnzDNE
nb3X/Mnuf30PjD5MaLBbVrw5jiOMbBLkD9krVvkpAPmFSpkzv43ec8rLknmhSX8E
97UsQtZKqUKft/LqpU4rJyAqcKMIhIUVRbCfixFkW2LIVCMoEZu7znaf6p3E3ISk
lCQpuOfmnESW4/YPozu0nWLZSdwabCilqNylLLu9AsUJJrwWwIWU4XyZYjMlfWhP
CIGFtnBqPw20rCHxdpyPnzzlzz/eb89ZmwjfR88BsRzk9g5piaoDWbWi5LGgG4iz
JrhVJ2wYeGOxuDxmnXy64WM0b+cw4loe5uNp1CYeCCbAGNFe6bv0IonwBb/PfKyX
4wzBtlsRcvPavJCQhlWU1pX+jMHDFdDc0Z9kT07IVQCKsMDVzp95rMCEFfiYCVkK
1grZB6GO0mZKaH+1EavUXeRb+OF2T8o+xvSqHEc4NpRQ+xsqrM6TFAL0vdEBny7z
GLC9+v9rr6BTFy+MVICPIyEmpPvzzkzztMOCWK6BBR0BHyj7/tFCDMHjZ8tarq5n
bVe2x3hgG/II4hCOZ2dC/cxv5Q1jHG0oOrIsnUTecdGhN+VU+v1BT2OwjJMSAScM
MYchwXK0V1DEaXd5vVj9/UuZGbupkYnzKPJLMNAlhWYBUbwQHp6pC81EsJQPbp+T
rv/O49Cuv3xPRiSl0iu/
=2U8C
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Handling of L and ll prefixes different from glibc
  2016-12-14 22:37     ` A. Wilcox
@ 2016-12-15  2:30       ` Rich Felker
  2016-12-15  4:01         ` A. Wilcox
  0 siblings, 1 reply; 7+ messages in thread
From: Rich Felker @ 2016-12-15  2:30 UTC (permalink / raw)
  To: musl

On Wed, Dec 14, 2016 at 04:37:55PM -0600, A. Wilcox wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
> 
> On 14/12/16 11:17, Szabolcs Nagy wrote:
> > * Rich Felker <dalias@libc.org> [2016-12-14 11:13:48 -0500]:
> >> behavior. I'm mildly leaning towards causing a crash on invalid
> >> format strings so that the location of the incorrect usage can be
> >> quickly found with a debugger, but I'd like feedback from users
> >> who've debugged this sort of thing on whether that'd actually be
> >> helpful.
> > 
> > crashing sounds good to me.
> > 
> 
> Would this be able to be configured in some way when building the libc
> (-D_CRASH_ON_PRINTF_UB or such)?
> 
> This sounds like a great tool to use when doing conformance testing,
> and in general once testing has been done.  However, it also sounds
> like a great way to break packages already "working" on musl.

While that's possible, I _really_ prefer avoiding switches like this.
It's a path that leads to maintenance-death of a project.

It's true that some programs which are just misusing printf format
specifiers as part of unnecessary status/debug/junk output will fully
work now, despite having UB, and that they would stop working with
such a change. But in most cases, the lack of output now, even if it's
unnoticed, is a bug that could have serious consequences. For example
missing output in text that's parsed and used in a script can lead to
things like rm -rf'ing the wrong directory. So I tend to think always
failing hard and catching the bug is preferable.

BTW I wonder if gcc's -Wformat catches these errors.

Rich


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Handling of L and ll prefixes different from glibc
  2016-12-15  2:30       ` Rich Felker
@ 2016-12-15  4:01         ` A. Wilcox
  2016-12-15 11:30           ` Szabolcs Nagy
  0 siblings, 1 reply; 7+ messages in thread
From: A. Wilcox @ 2016-12-15  4:01 UTC (permalink / raw)
  To: musl

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 14/12/16 20:30, Rich Felker wrote:
> It's true that some programs which are just misusing printf format 
> specifiers as part of unnecessary status/debug/junk output will
> fully work now, despite having UB, and that they would stop working
> with such a change. But in most cases, the lack of output now, even
> if it's unnoticed, is a bug that could have serious consequences.
> For example missing output in text that's parsed and used in a
> script can lead to things like rm -rf'ing the wrong directory. So I
> tend to think always failing hard and catching the bug is
> preferable.

Yeah, I can understand that.  Just makes me nervous as a package
maintainer is all :)

> BTW I wonder if gcc's -Wformat catches these errors.

It is meant to.  I know that clang whines loudly on mismatched format
specifiers, and I seem to recall it even whines on format specifiers
that don't exist, but it has been a while since I checked GCC's.

- --arw

- -- 
A. Wilcox (awilfox)
Project Lead, Adélie Linux
http://adelielinux.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJYUhWyAAoJEMspy1GSK50UX9QP/0EhqwhljRAm3yY5Glhl2emt
R0FtBYsHHDhnqkjPJ4AWV3z6eVCPb2nd9RZYGpj778rFl/nOijfR8ilzUL7sKYGJ
KXiBx5N0cOWpm75RWGKwvAEEkoC4zqQZ4HbyK13RzWdO6rJPieb137UW7sKw+S7C
I7S4PRbd09pBd9Uk1smDEEknbLxDwUbARJaFOuChzzGgZU0AOfnSg7FgOGEPv+va
1dBB98gIAcMkhSOy3xBZsMZWr0frpXiym119Y2IHP56xkoIQGN585ChluEWa54tt
pHEXYsDIT5ZOMMdZqIbllI3mFILopZ3PalrBiLTKwqqnAyhkRyZNWTTTxtdm7aNx
iARmCXupxk1boNYjBcsQhc25EZg6tRUebHveSKfoDxKALRu+YGtEcWg+um/29L78
Jz1G4D9nAExoUVBKGkxxG6VlTEUBdmVd6pCWdm08GzX0QJaq0aA1KBK+0lexDluV
eqZfG+J40bwWhFuI3hNpKy46UHs+mDPgGPzCaGWupMAYaYLAo5UCnMqIAOSFMWed
hwwNlwUCA8hwjXcq6nsWa3B2lIt5LmioAfZQ4+8WtiEfU5Kwzjw66olSF1uwdNMh
q4g7Sju81oUOWEFId7Dy+zBah5XZt+nyRL/6QSob9WKz5hXb30WZinHH6M+m1z4F
RAPqzt4nfGqMhRfBY2vL
=5CdV
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Handling of L and ll prefixes different from glibc
  2016-12-15  4:01         ` A. Wilcox
@ 2016-12-15 11:30           ` Szabolcs Nagy
  0 siblings, 0 replies; 7+ messages in thread
From: Szabolcs Nagy @ 2016-12-15 11:30 UTC (permalink / raw)
  To: musl

* A. Wilcox <awilfox@adelielinux.org> [2016-12-14 22:01:59 -0600]:
> On 14/12/16 20:30, Rich Felker wrote:
> > BTW I wonder if gcc's -Wformat catches these errors.
> 
> It is meant to.  I know that clang whines loudly on mismatched format
> specifiers, and I seem to recall it even whines on format specifiers
> that don't exist, but it has been a while since I checked GCC's.

despite clang propaganda, gcc actually has more detailed model
of printf now and thus gives better warnings

https://godbolt.org/g/Z0nnEH

note that clang does not warn at all, while gcc caught two bugs.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-12-15 11:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-14 13:46 Handling of L and ll prefixes different from glibc Nadav Har'El
2016-12-14 16:13 ` Rich Felker
2016-12-14 17:17   ` Szabolcs Nagy
2016-12-14 22:37     ` A. Wilcox
2016-12-15  2:30       ` Rich Felker
2016-12-15  4:01         ` A. Wilcox
2016-12-15 11:30           ` Szabolcs Nagy

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).