mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] mbsnrtowcs(3) behavior not compatible with POSIX.1-2024
@ 2025-04-29  9:14 Kang-Che Sung
  2025-04-29 19:31 ` Rich Felker
  0 siblings, 1 reply; 2+ messages in thread
From: Kang-Che Sung @ 2025-04-29  9:14 UTC (permalink / raw)
  To: musl; +Cc: Alejandro Colomar

Hi, musl libc developers,

I just tested the mbsnrtowcs function in musl libc and discovered there is one
behavior that is not compatible with the new POSIX.1-2024 standard.

It's this thing: POSIX.1-2017 stated
"If the input buffer ends with an incomplete character, it is unspecified
whether conversion stops at the end of the previous character (if any), or at
the end of the input buffer.
[...] A future version may require that when the input buffer ends with an
incomplete character, conversion stops at the end of the input buffer."
(Reference: https://pubs.opengroup.org/onlinepubs/9699919799/functions/mbsrtowcs.html)

POSIX.1-2024 now requires the conversion stop at the end of the input buffer in
that case.
(https://pubs.opengroup.org/onlinepubs/9799919799/functions/mbsrtowcs.html)
(https://www.austingroupbugs.net/view.php?id=616)

Test code

```c
#include <locale.h>
#include <stdio.h>
#include <string.h>
#include <wchar.h>

wchar_t wcs[100];
char mbs[100];

int main()
{
        mbstate_t state; const char *s;
        setlocale(LC_CTYPE, "en_US.UTF-8");

        memset(&state, 0, sizeof(state));
        // U+754C U+7DDA
        memcpy(mbs, "\xe7\x95\x8c\xe7\xb7\x9a", 7);
        s = mbs;
        printf("%zu, ", mbsnrtowcs(wcs, &s, 5, 100, &state));
        printf("%td\n", s - mbs);
        // Expected output: "1, 5". Actual output in musl: "1, 3".

        memset(&state, 0, sizeof(state));
        memcpy(mbs, "\xe7\x95\x8c\xe7\xb7", 6);
        s = mbs;
        printf("%zu, ", mbsnrtowcs(wcs, &s, 6, 100, &state));
        printf("%td\n", s - mbs);
        // Expected output: "18446744073709551615, 3"
}
```

By the way, I Cc'd the Linux man pages' maintainer as I plan to suggest a patch
to the mbsnrtowcs(3) man page. And it would be good to see the behaviors of
mbsnrtowcs consistent between glibc and musl libc.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [musl] mbsnrtowcs(3) behavior not compatible with POSIX.1-2024
  2025-04-29  9:14 [musl] mbsnrtowcs(3) behavior not compatible with POSIX.1-2024 Kang-Che Sung
@ 2025-04-29 19:31 ` Rich Felker
  0 siblings, 0 replies; 2+ messages in thread
From: Rich Felker @ 2025-04-29 19:31 UTC (permalink / raw)
  To: Kang-Che Sung; +Cc: musl, Alejandro Colomar

[-- Attachment #1: Type: text/plain, Size: 2065 bytes --]

On Tue, Apr 29, 2025 at 05:14:54PM +0800, Kang-Che Sung wrote:
> Hi, musl libc developers,
> 
> I just tested the mbsnrtowcs function in musl libc and discovered there is one
> behavior that is not compatible with the new POSIX.1-2024 standard.
> 
> It's this thing: POSIX.1-2017 stated
> "If the input buffer ends with an incomplete character, it is unspecified
> whether conversion stops at the end of the previous character (if any), or at
> the end of the input buffer.
> [...] A future version may require that when the input buffer ends with an
> incomplete character, conversion stops at the end of the input buffer."
> (Reference: https://pubs.opengroup.org/onlinepubs/9699919799/functions/mbsrtowcs.html)
> 
> POSIX.1-2024 now requires the conversion stop at the end of the input buffer in
> that case.
> (https://pubs.opengroup.org/onlinepubs/9799919799/functions/mbsrtowcs.html)
> (https://www.austingroupbugs.net/view.php?id=616)
> 
> Test code
> 
> ```c
> #include <locale.h>
> #include <stdio.h>
> #include <string.h>
> #include <wchar.h>
> 
> wchar_t wcs[100];
> char mbs[100];
> 
> int main()
> {
>         mbstate_t state; const char *s;
>         setlocale(LC_CTYPE, "en_US.UTF-8");
> 
>         memset(&state, 0, sizeof(state));
>         // U+754C U+7DDA
>         memcpy(mbs, "\xe7\x95\x8c\xe7\xb7\x9a", 7);
>         s = mbs;
>         printf("%zu, ", mbsnrtowcs(wcs, &s, 5, 100, &state));
>         printf("%td\n", s - mbs);
>         // Expected output: "1, 5". Actual output in musl: "1, 3".
> 
>         memset(&state, 0, sizeof(state));
>         memcpy(mbs, "\xe7\x95\x8c\xe7\xb7", 6);
>         s = mbs;
>         printf("%zu, ", mbsnrtowcs(wcs, &s, 6, 100, &state));
>         printf("%td\n", s - mbs);
>         // Expected output: "18446744073709551615, 3"
> }
> ```
> 
> By the way, I Cc'd the Linux man pages' maintainer as I plan to suggest a patch
> to the mbsnrtowcs(3) man page. And it would be good to see the behaviors of
> mbsnrtowcs consistent between glibc and musl libc.

Does the attached patch (untested) fix it?

Rich

[-- Attachment #2: mbsnrtowcs.diff --]
[-- Type: text/plain, Size: 809 bytes --]

diff --git a/src/multibyte/mbsnrtowcs.c b/src/multibyte/mbsnrtowcs.c
index 931192e2..47cbdc00 100644
--- a/src/multibyte/mbsnrtowcs.c
+++ b/src/multibyte/mbsnrtowcs.c
@@ -2,11 +2,13 @@
 
 size_t mbsnrtowcs(wchar_t *restrict wcs, const char **restrict src, size_t n, size_t wn, mbstate_t *restrict st)
 {
+	static unsigned internal_state;
 	size_t l, cnt=0, n2;
 	wchar_t *ws, wbuf[256];
 	const char *s = *src;
 	const char *tmp_s;
 
+	if (!st) st = (void *)&internal_state;
 	if (!wcs) ws = wbuf, wn = sizeof wbuf / sizeof *wbuf;
 	else ws = wcs;
 
@@ -41,8 +43,8 @@ size_t mbsnrtowcs(wchar_t *restrict wcs, const char **restrict src, size_t n, si
 				s = 0;
 				break;
 			}
-			/* have to roll back partial character */
-			*(unsigned *)st = 0;
+			s += n;
+			n -= n;
 			break;
 		}
 		s += l; n -= l;

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2025-04-29 19:32 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-29  9:14 [musl] mbsnrtowcs(3) behavior not compatible with POSIX.1-2024 Kang-Che Sung
2025-04-29 19:31 ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).