mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] Issues with conversion state handling in mbsnrtowcs
@ 2023-05-30 12:13 Alexey Izbyshev
  0 siblings, 0 replies; only message in thread
From: Alexey Izbyshev @ 2023-05-30 12:13 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 3055 bytes --]

Hi,

I found the following issues with conversion state handling in 
mbsnrtowcs (also see the attached patches):

1) mbsnrtowcs may modify the internal state of mbrtowc, which is then 
observable by subsequent mbrtowc calls.

2) mbsnrtowcs resets the conversion state even if it stopped due to a 
partial (valid) sequence before converting a single character, making it 
impossible to call it again without saving/restoring mbstate_t manually.

One possible alternative to the attached patch is to change mbsnrtowcs 
to consume the partial sequence instead of rolling back. POSIX says the 
following[1]:

> If the input buffer ends with an incomplete character, it is 
> unspecified whether conversion stops at the end of the previous 
> character (if any), or at the end of the input buffer. In the latter 
> case, a subsequent call to mbsnrtowcs() with an input buffer that 
> starts with the remainder of the incomplete character shall correctly 
> complete the conversion of that character.

And in FUTURE DIRECTIONS:

> A future version may require that when the input buffer ends with an 
> incomplete character, conversion stops at the end of the input buffer.

musl currently does the former, but the latter seems more convenient for 
the caller. For example, it allows easy chunk-by-chunk conversion 
without the need for the caller to handle a partial sequence at the end 
of a chunk manually.

3) mbsnrtowcs updates the conversion state even when called with NULL 
destination buffer.

Here I find it hard to understand what the actual POSIX requirements 
are. The only clue that I found is for mbsrtowcs:

> If conversion stopped due to reaching a terminating null character, and 
> if dst is not a null pointer, the resulting state described shall be 
> the initial conversion state.

This could be understood as implying that the conversion state shouldn't 
be updated if dst is NULL, and this is what musl does for mbsrtowcs. I 
couldn't find additional requirements for mbsnrtowcs.

If that sentence was written with the intent to support the pattern 
"call with dst == NULL; allocate dst; call again", then probably it 
makes sense for both functions to behave in the same way (i.e. not to 
update the state).

But note that this pattern conflicts with "call with dst == NULL on the 
first input chunk, then call with dst == NULL on the second chunk" 
(because the conversion state on the second call could be wrong), which 
could be interpreted as conflicting with POSIX requirement for 
mbsnrtowcs implementations that consume partial sequences at the input 
end (also quoted above):

> In the latter case, a subsequent call to mbsnrtowcs with an input 
> buffer that starts with the remainder of the incomplete character shall 
> correctly complete the conversion of that character.

My patch assumes that this sentence doesn't apply if dst is NULL. (This 
only matters if musl decides to change mbsnrtowcs to consume partial 
sequences.)

Thanks,
Alexey

[1] 
https://pubs.opengroup.org/onlinepubs/9699919799/functions/mbsnrtowcs.html

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-mbsnrtowcs-fix-observable-reuse-of-mbrtowc-s-interna.patch --]
[-- Type: text/x-diff; name=0001-mbsnrtowcs-fix-observable-reuse-of-mbrtowc-s-interna.patch, Size: 1345 bytes --]

From b61d676c5ce01ac6aef9a3563129af0429db41b5 Mon Sep 17 00:00:00 2001
From: Alexey Izbyshev <izbyshev@ispras.ru>
Date: Mon, 29 May 2023 21:37:47 +0300
Subject: [PATCH 1/3] mbsnrtowcs: fix observable reuse of mbrtowc's internal
 state
Mail-Followup-To: musl@lists.openwall.com

mbsnrtowcs can pass the caller-provided conversion state to mbsrtowcs
and to mbrtowc even if it is NULL. For mbsrtowcs it's fine because it
doesn't have any internal state, but mbrtowc does, and POSIX doesn't
allow other standard functions to observably modify it.

Moreover, if mbrtowc call returns -2, mbsnrtowcs currently dereferences
NULL and crashes.

Fix these issues by adding the internal conversion state to mbsnrtowcs.
---
 src/multibyte/mbsnrtowcs.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/multibyte/mbsnrtowcs.c b/src/multibyte/mbsnrtowcs.c
index 931192e2..375e01d7 100644
--- a/src/multibyte/mbsnrtowcs.c
+++ b/src/multibyte/mbsnrtowcs.c
@@ -2,11 +2,14 @@
 
 size_t mbsnrtowcs(wchar_t *restrict wcs, const char **restrict src, size_t n, size_t wn, mbstate_t *restrict st)
 {
+	static unsigned internal_state;
 	size_t l, cnt=0, n2;
 	wchar_t *ws, wbuf[256];
 	const char *s = *src;
 	const char *tmp_s;
 
+	if (!st) st = (void *)&internal_state;
+
 	if (!wcs) ws = wbuf, wn = sizeof wbuf / sizeof *wbuf;
 	else ws = wcs;
 
-- 
2.39.2


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: 0002-mbsnrtowcs-fix-wrong-state-rollback-if-no-characters.patch --]
[-- Type: text/x-diff; name=0002-mbsnrtowcs-fix-wrong-state-rollback-if-no-characters.patch, Size: 1552 bytes --]

From 063179a55bdacb3d7b9f6ef66d44907078a73a45 Mon Sep 17 00:00:00 2001
From: Alexey Izbyshev <izbyshev@ispras.ru>
Date: Tue, 30 May 2023 00:05:54 +0300
Subject: [PATCH 2/3] mbsnrtowcs: fix wrong state rollback if no characters are
 converted
Mail-Followup-To: musl@lists.openwall.com

mbsnrtowcs always resets the conversion state to zero if mbrtowc can't
parse a complete multibyte sequence due to reaching the length limit.
However, if mbsnrtowcs started in a non-initial state and hasn't
produced even a single wide character, the state should be rolled back
to its original value instead.
---
 src/multibyte/mbsnrtowcs.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/multibyte/mbsnrtowcs.c b/src/multibyte/mbsnrtowcs.c
index 375e01d7..c3c1f709 100644
--- a/src/multibyte/mbsnrtowcs.c
+++ b/src/multibyte/mbsnrtowcs.c
@@ -3,12 +3,14 @@
 size_t mbsnrtowcs(wchar_t *restrict wcs, const char **restrict src, size_t n, size_t wn, mbstate_t *restrict st)
 {
 	static unsigned internal_state;
+	unsigned st0;
 	size_t l, cnt=0, n2;
 	wchar_t *ws, wbuf[256];
 	const char *s = *src;
 	const char *tmp_s;
 
 	if (!st) st = (void *)&internal_state;
+	st0 = *(unsigned *)st;
 
 	if (!wcs) ws = wbuf, wn = sizeof wbuf / sizeof *wbuf;
 	else ws = wcs;
@@ -45,7 +47,7 @@ size_t mbsnrtowcs(wchar_t *restrict wcs, const char **restrict src, size_t n, si
 				break;
 			}
 			/* have to roll back partial character */
-			*(unsigned *)st = 0;
+			*(unsigned *)st = (s == *src ? st0 : 0);
 			break;
 		}
 		s += l; n -= l;
-- 
2.39.2


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #4: 0003-mbsnrtowcs-don-t-modify-conversion-state-if-dest-buf.patch --]
[-- Type: text/x-diff; name=0003-mbsnrtowcs-don-t-modify-conversion-state-if-dest-buf.patch, Size: 1502 bytes --]

From 836d348090eca4d96007639074f4c101388f6945 Mon Sep 17 00:00:00 2001
From: Alexey Izbyshev <izbyshev@ispras.ru>
Date: Tue, 30 May 2023 13:52:03 +0300
Subject: [PATCH 3/3] mbsnrtowcs: don't modify conversion state if dest buf is
 NULL
Mail-Followup-To: musl@lists.openwall.com

POSIX specifies mbsnrtowcs to be an input-length-limiting equivalent of
mbsrtowcs. The latter doesn't modify the passed conversion state if the
destination buffer is NULL. This behavior, in particular, makes it
possible to learn the required size of the destination buffer by passing
NULL on the first call, allocate the buffer, and call mbsrtowcs again
without the need to save/restore the conversion state manually.

Since we don't care about rolling back the conversion state in this
case, simply reuse the copy of the original state as a throw-away state.
---
 src/multibyte/mbsnrtowcs.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/multibyte/mbsnrtowcs.c b/src/multibyte/mbsnrtowcs.c
index c3c1f709..bd73ff09 100644
--- a/src/multibyte/mbsnrtowcs.c
+++ b/src/multibyte/mbsnrtowcs.c
@@ -12,7 +12,10 @@ size_t mbsnrtowcs(wchar_t *restrict wcs, const char **restrict src, size_t n, si
 	if (!st) st = (void *)&internal_state;
 	st0 = *(unsigned *)st;
 
-	if (!wcs) ws = wbuf, wn = sizeof wbuf / sizeof *wbuf;
+	if (!wcs) {
+		ws = wbuf, wn = sizeof wbuf / sizeof *wbuf;
+		st = (void *)&st0;
+	}
 	else ws = wcs;
 
 	/* making sure output buffer size is at most n/4 will ensure
-- 
2.39.2


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2023-05-30 12:14 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-30 12:13 [musl] Issues with conversion state handling in mbsnrtowcs Alexey Izbyshev

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).