mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] swprintf %lc directive does not work for some wide characters
@ 2023-06-12 14:44 Bruno Haible
  2023-06-12 20:27 ` Rich Felker
  0 siblings, 1 reply; 4+ messages in thread
From: Bruno Haible @ 2023-06-12 14:44 UTC (permalink / raw)
  To: musl

According to ISO C 11 § 7.29.2.1, in the *wprintf family of functions, the
%lc directive works like this:
  "[If an l length modifier is present,] the wint_t argument is converted to
   wchar_t and written."

Likewise in ISO C 17 § 7.29.2 and ISO C 23 § 7.31.2.1 and in POSIX:2018
<https://pubs.opengroup.org/onlinepubs/9699919799/functions/fwprintf.html>.

In musl libc 1.2.4 (as part of Alpine Linux 3.18.0) this does not work for
some characters.

How to reproduce:
=================================== foo.c ===================================
#include <stdio.h>
#include <wchar.h>
int main ()
{
  static wint_t L_invalid = (wchar_t) 0x76543210;
  wchar_t buf[3];
  int ret = swprintf (buf, 3, L"%lc", L_invalid);
  if (ret >= 0)
    fprintf (stderr, "OK, %d characters\n", ret);
  else
    perror ("swprintf failed");
}
=============================================================================
$ gcc -Wall foo.c
$ ./a.out

Expected output:
OK, 1 characters

Actual output:
swprintf failed: Illegal byte sequence




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [musl] swprintf %lc directive does not work for some wide characters
  2023-06-12 14:44 [musl] swprintf %lc directive does not work for some wide characters Bruno Haible
@ 2023-06-12 20:27 ` Rich Felker
  2023-06-12 20:53   ` Bruno Haible
  0 siblings, 1 reply; 4+ messages in thread
From: Rich Felker @ 2023-06-12 20:27 UTC (permalink / raw)
  To: Bruno Haible; +Cc: musl

On Mon, Jun 12, 2023 at 04:44:44PM +0200, Bruno Haible wrote:
> According to ISO C 11 § 7.29.2.1, in the *wprintf family of functions, the
> %lc directive works like this:
>   "[If an l length modifier is present,] the wint_t argument is converted to
>    wchar_t and written."
> 
> Likewise in ISO C 17 § 7.29.2 and ISO C 23 § 7.31.2.1 and in POSIX:2018
> <https://pubs.opengroup.org/onlinepubs/9699919799/functions/fwprintf.html>.
> 
> In musl libc 1.2.4 (as part of Alpine Linux 3.18.0) this does not work for
> some characters.
> 
> How to reproduce:
> =================================== foo.c ===================================
> #include <stdio.h>
> #include <wchar.h>
> int main ()
> {
>   static wint_t L_invalid = (wchar_t) 0x76543210;
>   wchar_t buf[3];
>   int ret = swprintf (buf, 3, L"%lc", L_invalid);
>   if (ret >= 0)
>     fprintf (stderr, "OK, %d characters\n", ret);
>   else
>     perror ("swprintf failed");
> }
> =============================================================================
> $ gcc -Wall foo.c
> $ ./a.out
> 
> Expected output:
> OK, 1 characters
> 
> Actual output:
> swprintf failed: Illegal byte sequence
> 
> 

Per my reading of the specification, this is not a bug but is the
expected behavior.

    In addition, all forms of fwprintf() shall fail if:

    [EILSEQ]
            A wide-character code that does not correspond to a valid
            character has been detected.

Since the language "has been detected" is used here, this seems to
allow for an implementation not to make it an error if the condition
is not "detected". We make it an error because all wide stdio takes
place through a byte-oriented buffer, and the conversions back and
forth inherently "detect" the condition and have no way to pass the
invalid wchar_t thru. There is no concept of directly writing wide
characters.

Note that the error here is happening not as part of the conversion
specifier, but the output operation.

Rich

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [musl] swprintf %lc directive does not work for some wide characters
  2023-06-12 20:27 ` Rich Felker
@ 2023-06-12 20:53   ` Bruno Haible
  2023-06-12 21:28     ` Rich Felker
  0 siblings, 1 reply; 4+ messages in thread
From: Bruno Haible @ 2023-06-12 20:53 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

Rich Felker wrote:
> Per my reading of the specification, this is not a bug but is the
> expected behavior.
> 
>     In addition, all forms of fwprintf() shall fail if:
> 
>     [EILSEQ]
>             A wide-character code that does not correspond to a valid
>             character has been detected.

From my reading of ISO C, it's a bug. Namely, in ISO C 23 § 7.31.2.3
the error conditions are specified as
  "The swprintf function returns the number of wide characters written
   in the array, not counting the terminating null wide character,
   or a negative value if
     an encoding error occurred
     or if n or more wide characters were requested to be written."

In swprintf, where "the wint_t argument converted to wchar_t" is written
and the output is to a wchar_t[], no "encoding error" should be possible.
That's obvious. The "encoding errors" occur in %c and %s directives,
AFAIU, not in %lc and %ls directives.

Bruno




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [musl] swprintf %lc directive does not work for some wide characters
  2023-06-12 20:53   ` Bruno Haible
@ 2023-06-12 21:28     ` Rich Felker
  0 siblings, 0 replies; 4+ messages in thread
From: Rich Felker @ 2023-06-12 21:28 UTC (permalink / raw)
  To: Bruno Haible; +Cc: musl

On Mon, Jun 12, 2023 at 10:53:24PM +0200, Bruno Haible wrote:
> Rich Felker wrote:
> > Per my reading of the specification, this is not a bug but is the
> > expected behavior.
> > 
> >     In addition, all forms of fwprintf() shall fail if:
> > 
> >     [EILSEQ]
> >             A wide-character code that does not correspond to a valid
> >             character has been detected.
> 
> From my reading of ISO C, it's a bug. Namely, in ISO C 23 § 7.31.2.3
> the error conditions are specified as
>   "The swprintf function returns the number of wide characters written
>    in the array, not counting the terminating null wide character,
>    or a negative value if
>      an encoding error occurred
>      or if n or more wide characters were requested to be written."
> 
> In swprintf, where "the wint_t argument converted to wchar_t" is written
> and the output is to a wchar_t[], no "encoding error" should be possible.
> That's obvious. The "encoding errors" occur in %c and %s directives,
> AFAIU, not in %lc and %ls directives.

You're reading this "obvious" thing that is not present in the
specification into it. I don't have the exact same text you're looking
at in front of me at the moment, but what I have from the current
standard (C11) is:

7.29.2.1 ¶14:

    "The fwprintf function returns the number of wide characters
    transmitted, or a negative value if an output or encoding error
    occurred."

7.29.2.3 ¶2:

    "The swprintf function is equivalent to fwprintf, except that the
    argument s specifies an array of wide characters into which the
    generated output is to be written, rather than written to a
    stream."

I read "equivalent to fwprintf..." as allowing swprintf to return an
error in any case where fwprintf would unless the "except..."
explicitly forbid one or more (which it doesn't).

Since POSIX aims not to conflict with ISO C, I would think the POSIX
position is also that this requirement does not conflict, but is
intended to allow for both implementations that don't detect the
encoding error (ones which use a wchar_t[] buffer) and ones that do
(ones which use a char[] buffer).

Rich

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-06-12 21:29 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-12 14:44 [musl] swprintf %lc directive does not work for some wide characters Bruno Haible
2023-06-12 20:27 ` Rich Felker
2023-06-12 20:53   ` Bruno Haible
2023-06-12 21:28     ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).