mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] swprintf: count returned by %n is wrong after conversion error
@ 2023-03-19 23:48 Bruno Haible
  2023-03-20 12:15 ` Rich Felker
  0 siblings, 1 reply; 5+ messages in thread
From: Bruno Haible @ 2023-03-19 23:48 UTC (permalink / raw)
  To: musl

Hi,

On musl-1.2.3 I see this violation of the POSIX specification of swprintf [1]:

==================================== foo1.c ====================================
#include <stdio.h>
#include <wchar.h>

int main ()
{
  static const wchar_t input[] = { (wchar_t) 1702057263, 114, 0 };
  wchar_t buf[12] = { 0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF };
  int count = -1;
  int ret = swprintf (buf, 12, L"%ls%n", input, &count);
  printf ("ret = %d, count = %d, buf[0] = 0x%x, buf[1] = 0x%x, buf[2] = 0x%x\n",
          ret, count,
          (unsigned int) buf[0], (unsigned int) buf[1], (unsigned int) buf[2]);
  return 0;
}
/*
glibc:      ret = 2, count = 2, buf[0] = 0x6573552f, buf[1] = 0x72, buf[2] = 0x0
musl libc:  ret = -1, count = 2, buf[0] = 0x0, buf[1] = 0xdeadbeef, buf[2] = 0xdeadbeef
FreeBSD 13: ret = -1, count = -1, buf[0] = 0x0, buf[1] = 0xdeadbeef, buf[2] = 0xdeadbeef
Solaris OI: ret = 2, count = 2, buf[0] = 0x6573552f, buf[1] = 0x72, buf[2] = 0x0
*/
================================================================================

$ gcc -Wall foo1.c
$ ./a.out
ret = -1, count = 2, buf[0] = 0x0, buf[1] = 0xdeadbeef, buf[2] = 0xdeadbeef

The POSIX specification says:
  "The application shall ensure that the argument is a pointer to an integer
   into which is written the number of wide characters written to the output
   so far by this call to one of the fwprintf() functions."

From the values of buf[0], buf[1], buf[2] it can be seen that the number
of wide characters written after the %ls directive is 0, not 2. Therefore
the value of count should be 0 or — if the processing of the format string
stops right after the %ls directive, like it does on FreeBSD 13 — -1.

It is OK for the %ls directive to fail, because of the invalid wide characters
in the input[] arrary. What is not OK is for the %n directive to report 2
written wide characters, when in fact 0 wide characters have been written.

For comparison, in snprintf, this case is handled correctly:

==================================== foo2.c ====================================
#include <stdio.h>
#include <string.h>
#include <wchar.h>

int main ()
{
  static const wchar_t input[] = { (wchar_t) 1702057263, 114, 0 };
  char buf[12] = { 0xDD, 0xDD, 0xDD, 0xDD, 0xDD, 0xDD, 0xDD, 0xDD, 0xDD, 0xDD, 0xDD, 0xDD };
  int count = -1;
  int ret = snprintf (buf, 12, "%ls%n", input, &count);
  printf ("ret = %d, count = %d, buf[0] = 0x%x, buf[1] = 0x%x, buf[2] = 0x%x\n",
          ret, count,
          (unsigned char) buf[0], (unsigned char) buf[1], (unsigned char) buf[2]);
  return 0;
}
/*
glibc:      ret = -1, count = -1, buf[0] = 0x0, buf[1] = 0xdd, buf[2] = 0xdd
musl libc:  ret = -1, count = -1, buf[0] = 0x0, buf[1] = 0xdd, buf[2] = 0xdd
FreeBSD 13: ret = -1, count = -1, buf[0] = 0x0, buf[1] = 0xdd, buf[2] = 0xdd
Solaris OI: ret = -1, count = -1, buf[0] = 0x0, buf[1] = 0xdd, buf[2] = 0xdd
*/
================================================================================

$ gcc -Wall foo2.c
$ ./a.out
ret = -1, count = -1, buf[0] = 0x0, buf[1] = 0xdd, buf[2] = 0xdd

Here, count remains unchanged, = -1.

Bruno

[1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/swprintf.html




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [musl] swprintf: count returned by %n is wrong after conversion error
  2023-03-19 23:48 [musl] swprintf: count returned by %n is wrong after conversion error Bruno Haible
@ 2023-03-20 12:15 ` Rich Felker
  2023-03-20 18:08   ` Rich Felker
  0 siblings, 1 reply; 5+ messages in thread
From: Rich Felker @ 2023-03-20 12:15 UTC (permalink / raw)
  To: Bruno Haible; +Cc: musl

On Mon, Mar 20, 2023 at 12:48:59AM +0100, Bruno Haible wrote:
> Hi,
> 
> On musl-1.2.3 I see this violation of the POSIX specification of swprintf [1]:
> 
> ==================================== foo1.c ====================================
> #include <stdio.h>
> #include <wchar.h>
> 
> int main ()
> {
>   static const wchar_t input[] = { (wchar_t) 1702057263, 114, 0 };
>   wchar_t buf[12] = { 0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF };
>   int count = -1;
>   int ret = swprintf (buf, 12, L"%ls%n", input, &count);
>   printf ("ret = %d, count = %d, buf[0] = 0x%x, buf[1] = 0x%x, buf[2] = 0x%x\n",
>           ret, count,
>           (unsigned int) buf[0], (unsigned int) buf[1], (unsigned int) buf[2]);
>   return 0;
> }
> /*
> glibc:      ret = 2, count = 2, buf[0] = 0x6573552f, buf[1] = 0x72, buf[2] = 0x0
> musl libc:  ret = -1, count = 2, buf[0] = 0x0, buf[1] = 0xdeadbeef, buf[2] = 0xdeadbeef
> FreeBSD 13: ret = -1, count = -1, buf[0] = 0x0, buf[1] = 0xdeadbeef, buf[2] = 0xdeadbeef
> Solaris OI: ret = 2, count = 2, buf[0] = 0x6573552f, buf[1] = 0x72, buf[2] = 0x0
> */
> ================================================================================
> 
> $ gcc -Wall foo1.c
> $ ./a.out
> ret = -1, count = 2, buf[0] = 0x0, buf[1] = 0xdeadbeef, buf[2] = 0xdeadbeef
> 
> The POSIX specification says:
>   "The application shall ensure that the argument is a pointer to an integer
>    into which is written the number of wide characters written to the output
>    so far by this call to one of the fwprintf() functions."
> 
> From the values of buf[0], buf[1], buf[2] it can be seen that the number
> of wide characters written after the %ls directive is 0, not 2. Therefore
> the value of count should be 0 or — if the processing of the format string
> stops right after the %ls directive, like it does on FreeBSD 13 — -1.
> 
> It is OK for the %ls directive to fail, because of the invalid wide characters
> in the input[] arrary. What is not OK is for the %n directive to report 2
> written wide characters, when in fact 0 wide characters have been written.

Thanks. It looks like the wide printf core is just missing any
validation logic for the wchar_t[] array. Probably it was assumed
writing it that any wchar_t would be accepted so that no error
handling would be needed, but the backend doesn't actually accept
them.

I'm not sure what the best fix is, but the simplest one is to iterate
and validate the entire range to be printed before printing anything
and error out on EILSEQ. But I'm not sure if this is actually
conforming either.

Since fwprintf is supposed to behave as if by repeated fputwc, and the
allowance (rather mandate) for EILSEQ comes from fputwc, the error
should only happen at the point the invalid character is written, with
all output up to that point being visible. So I guess we should drop
use of the out() helper function here and call fputc explicitly in a
loop where we can process the error. And in fact out() is only used
two places (also for the format string, where not being a valid wchar
string is UB so it doesn't really matter) so maybe we should just drop
the helper altogether and open-code it both places with proper error
handling...

Rich

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [musl] swprintf: count returned by %n is wrong after conversion error
  2023-03-20 12:15 ` Rich Felker
@ 2023-03-20 18:08   ` Rich Felker
  2023-03-20 18:22     ` Bruno Haible
  0 siblings, 1 reply; 5+ messages in thread
From: Rich Felker @ 2023-03-20 18:08 UTC (permalink / raw)
  To: Bruno Haible; +Cc: musl

On Mon, Mar 20, 2023 at 08:15:59AM -0400, Rich Felker wrote:
> On Mon, Mar 20, 2023 at 12:48:59AM +0100, Bruno Haible wrote:
> > Hi,
> > 
> > On musl-1.2.3 I see this violation of the POSIX specification of swprintf [1]:
> > 
> > ==================================== foo1.c ====================================
> > #include <stdio.h>
> > #include <wchar.h>
> > 
> > int main ()
> > {
> >   static const wchar_t input[] = { (wchar_t) 1702057263, 114, 0 };
> >   wchar_t buf[12] = { 0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF };
> >   int count = -1;
> >   int ret = swprintf (buf, 12, L"%ls%n", input, &count);
> >   printf ("ret = %d, count = %d, buf[0] = 0x%x, buf[1] = 0x%x, buf[2] = 0x%x\n",
> >           ret, count,
> >           (unsigned int) buf[0], (unsigned int) buf[1], (unsigned int) buf[2]);
> >   return 0;
> > }
> > /*
> > glibc:      ret = 2, count = 2, buf[0] = 0x6573552f, buf[1] = 0x72, buf[2] = 0x0
> > musl libc:  ret = -1, count = 2, buf[0] = 0x0, buf[1] = 0xdeadbeef, buf[2] = 0xdeadbeef
> > FreeBSD 13: ret = -1, count = -1, buf[0] = 0x0, buf[1] = 0xdeadbeef, buf[2] = 0xdeadbeef
> > Solaris OI: ret = 2, count = 2, buf[0] = 0x6573552f, buf[1] = 0x72, buf[2] = 0x0
> > */
> > ================================================================================
> > 
> > $ gcc -Wall foo1.c
> > $ ./a.out
> > ret = -1, count = 2, buf[0] = 0x0, buf[1] = 0xdeadbeef, buf[2] = 0xdeadbeef
> > 
> > The POSIX specification says:
> >   "The application shall ensure that the argument is a pointer to an integer
> >    into which is written the number of wide characters written to the output
> >    so far by this call to one of the fwprintf() functions."
> > 
> > From the values of buf[0], buf[1], buf[2] it can be seen that the number
> > of wide characters written after the %ls directive is 0, not 2. Therefore
> > the value of count should be 0 or — if the processing of the format string
> > stops right after the %ls directive, like it does on FreeBSD 13 — -1.
> > 
> > It is OK for the %ls directive to fail, because of the invalid wide characters
> > in the input[] arrary. What is not OK is for the %n directive to report 2
> > written wide characters, when in fact 0 wide characters have been written.
> 
> Thanks. It looks like the wide printf core is just missing any
> validation logic for the wchar_t[] array. Probably it was assumed
> writing it that any wchar_t would be accepted so that no error
> handling would be needed, but the backend doesn't actually accept
> them.
> 
> I'm not sure what the best fix is, but the simplest one is to iterate
> and validate the entire range to be printed before printing anything
> and error out on EILSEQ. But I'm not sure if this is actually
> conforming either.
> 
> Since fwprintf is supposed to behave as if by repeated fputwc, and the
> allowance (rather mandate) for EILSEQ comes from fputwc, the error
> should only happen at the point the invalid character is written, with
> all output up to that point being visible. So I guess we should drop
> use of the out() helper function here and call fputc explicitly in a
> loop where we can process the error. And in fact out() is only used
> two places (also for the format string, where not being a valid wchar
> string is UB so it doesn't really matter) so maybe we should just drop
> the helper altogether and open-code it both places with proper error
> handling...

OK, it looks like we have some high-level wrong error handling logic
in vfwprintf.c. The top-level vfwprintf function saves and clears the
error flag for the stream, then tests it at the end to determine
whether it needs to replace the return value with -1. And the out()
helper function tests the flag to avoid producing further output when
an error has already been hit, but this neither prevents the side
effects of processing %n, nor does it prevent writes via pass thru to
fprintf for numeric formats, or writes in the %s case.

I think the logic to check the error flag should just be moved into
the core function to check it before processing each format specifier,
and bail out of it's set.

There is probably also a similar issue in non-wide printf, but only
for write errors, not for encoding errors, and only visibly wrong if
the write error is transient (e.g. EAGAIN).

Rich

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [musl] swprintf: count returned by %n is wrong after conversion error
  2023-03-20 18:08   ` Rich Felker
@ 2023-03-20 18:22     ` Bruno Haible
  2023-03-20 21:19       ` Rich Felker
  0 siblings, 1 reply; 5+ messages in thread
From: Bruno Haible @ 2023-03-20 18:22 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

Rich Felker wrote:
> There is probably also a similar issue in non-wide printf, but only
> for write errors, not for encoding errors, and only visibly wrong if
> the write error is transient (e.g. EAGAIN).

Glad that you are seeing this by code inspection. Because in my test
suite, I only test the behaviour in case of encoding errors, not of
write errors.

Bruno




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [musl] swprintf: count returned by %n is wrong after conversion error
  2023-03-20 18:22     ` Bruno Haible
@ 2023-03-20 21:19       ` Rich Felker
  0 siblings, 0 replies; 5+ messages in thread
From: Rich Felker @ 2023-03-20 21:19 UTC (permalink / raw)
  To: Bruno Haible; +Cc: musl

On Mon, Mar 20, 2023 at 07:22:25PM +0100, Bruno Haible wrote:
> Rich Felker wrote:
> > There is probably also a similar issue in non-wide printf, but only
> > for write errors, not for encoding errors, and only visibly wrong if
> > the write error is transient (e.g. EAGAIN).
> 
> Glad that you are seeing this by code inspection. Because in my test
> suite, I only test the behaviour in case of encoding errors, not of
> write errors.

Thanks for reporting it. The underlying issue is somewhat larger than
what you found, but your report gave the lead for discovering the
rest.

Rich

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-03-20 21:19 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-19 23:48 [musl] swprintf: count returned by %n is wrong after conversion error Bruno Haible
2023-03-20 12:15 ` Rich Felker
2023-03-20 18:08   ` Rich Felker
2023-03-20 18:22     ` Bruno Haible
2023-03-20 21:19       ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).