From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.5 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_LOW,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 11262 invoked from network); 30 May 2023 12:14:01 -0000 Received: from second.openwall.net (193.110.157.125) by inbox.vuxu.org with ESMTPUTF8; 30 May 2023 12:14:01 -0000 Received: (qmail 15951 invoked by uid 550); 30 May 2023 12:13:58 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 15907 invoked from network); 30 May 2023 12:13:57 -0000 DKIM-Filter: OpenDKIM Filter v2.11.0 mail.ispras.ru 6454244C1014 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ispras.ru; s=default; t=1685448824; bh=YQNI6uPwu+VcWva3YuQQoPkxCp2PBu+Gs0xDLhQKHnE=; h=Date:From:To:Subject:Reply-To:From; b=Mia/jQ+2iKKHgz9cOZhnc+oFgNakMRvedNwXT60OIo8E2oDcqdWVaSKnHBb1farDM 0V2QxbBfl6NTK2bukcugoaZfXBTA8hMdWjwp6AZH3qd/dUEnDndZO+9r3ToAKEjyRZ DtoRd6G2w+0O4/oBhfRvIe4d3K2vdljrTS/SHgYc= MIME-Version: 1.0 Date: Tue, 30 May 2023 15:13:44 +0300 From: Alexey Izbyshev To: musl@lists.openwall.com Mail-Followup-To: musl@lists.openwall.com User-Agent: Roundcube Webmail/1.4.4 Message-ID: X-Sender: izbyshev@ispras.ru Content-Type: multipart/mixed; boundary="=_3dddb5463b11d250a93b4af68b6f7eb7" Subject: [musl] Issues with conversion state handling in mbsnrtowcs --=_3dddb5463b11d250a93b4af68b6f7eb7 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII; format=flowed Hi, I found the following issues with conversion state handling in mbsnrtowcs (also see the attached patches): 1) mbsnrtowcs may modify the internal state of mbrtowc, which is then observable by subsequent mbrtowc calls. 2) mbsnrtowcs resets the conversion state even if it stopped due to a partial (valid) sequence before converting a single character, making it impossible to call it again without saving/restoring mbstate_t manually. One possible alternative to the attached patch is to change mbsnrtowcs to consume the partial sequence instead of rolling back. POSIX says the following[1]: > If the input buffer ends with an incomplete character, it is > unspecified whether conversion stops at the end of the previous > character (if any), or at the end of the input buffer. In the latter > case, a subsequent call to mbsnrtowcs() with an input buffer that > starts with the remainder of the incomplete character shall correctly > complete the conversion of that character. And in FUTURE DIRECTIONS: > A future version may require that when the input buffer ends with an > incomplete character, conversion stops at the end of the input buffer. musl currently does the former, but the latter seems more convenient for the caller. For example, it allows easy chunk-by-chunk conversion without the need for the caller to handle a partial sequence at the end of a chunk manually. 3) mbsnrtowcs updates the conversion state even when called with NULL destination buffer. Here I find it hard to understand what the actual POSIX requirements are. The only clue that I found is for mbsrtowcs: > If conversion stopped due to reaching a terminating null character, and > if dst is not a null pointer, the resulting state described shall be > the initial conversion state. This could be understood as implying that the conversion state shouldn't be updated if dst is NULL, and this is what musl does for mbsrtowcs. I couldn't find additional requirements for mbsnrtowcs. If that sentence was written with the intent to support the pattern "call with dst == NULL; allocate dst; call again", then probably it makes sense for both functions to behave in the same way (i.e. not to update the state). But note that this pattern conflicts with "call with dst == NULL on the first input chunk, then call with dst == NULL on the second chunk" (because the conversion state on the second call could be wrong), which could be interpreted as conflicting with POSIX requirement for mbsnrtowcs implementations that consume partial sequences at the input end (also quoted above): > In the latter case, a subsequent call to mbsnrtowcs with an input > buffer that starts with the remainder of the incomplete character shall > correctly complete the conversion of that character. My patch assumes that this sentence doesn't apply if dst is NULL. (This only matters if musl decides to change mbsnrtowcs to consume partial sequences.) Thanks, Alexey [1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/mbsnrtowcs.html --=_3dddb5463b11d250a93b4af68b6f7eb7 Content-Transfer-Encoding: base64 Content-Type: text/x-diff; name=0001-mbsnrtowcs-fix-observable-reuse-of-mbrtowc-s-interna.patch Content-Disposition: attachment; filename=0001-mbsnrtowcs-fix-observable-reuse-of-mbrtowc-s-interna.patch; size=1345 RnJvbSBiNjFkNjc2YzVjZTAxYWM2YWVmOWEzNTYzMTI5YWYwNDI5ZGI0MWI1IE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBBbGV4ZXkgSXpieXNoZXYgPGl6YnlzaGV2QGlzcHJhcy5ydT4K RGF0ZTogTW9uLCAyOSBNYXkgMjAyMyAyMTozNzo0NyArMDMwMApTdWJqZWN0OiBbUEFUQ0ggMS8z XSBtYnNucnRvd2NzOiBmaXggb2JzZXJ2YWJsZSByZXVzZSBvZiBtYnJ0b3djJ3MgaW50ZXJuYWwK IHN0YXRlCk1haWwtRm9sbG93dXAtVG86IG11c2xAbGlzdHMub3BlbndhbGwuY29tCgptYnNucnRv d2NzIGNhbiBwYXNzIHRoZSBjYWxsZXItcHJvdmlkZWQgY29udmVyc2lvbiBzdGF0ZSB0byBtYnNy dG93Y3MKYW5kIHRvIG1icnRvd2MgZXZlbiBpZiBpdCBpcyBOVUxMLiBGb3IgbWJzcnRvd2NzIGl0 J3MgZmluZSBiZWNhdXNlIGl0CmRvZXNuJ3QgaGF2ZSBhbnkgaW50ZXJuYWwgc3RhdGUsIGJ1dCBt YnJ0b3djIGRvZXMsIGFuZCBQT1NJWCBkb2Vzbid0CmFsbG93IG90aGVyIHN0YW5kYXJkIGZ1bmN0 aW9ucyB0byBvYnNlcnZhYmx5IG1vZGlmeSBpdC4KCk1vcmVvdmVyLCBpZiBtYnJ0b3djIGNhbGwg cmV0dXJucyAtMiwgbWJzbnJ0b3djcyBjdXJyZW50bHkgZGVyZWZlcmVuY2VzCk5VTEwgYW5kIGNy YXNoZXMuCgpGaXggdGhlc2UgaXNzdWVzIGJ5IGFkZGluZyB0aGUgaW50ZXJuYWwgY29udmVyc2lv biBzdGF0ZSB0byBtYnNucnRvd2NzLgotLS0KIHNyYy9tdWx0aWJ5dGUvbWJzbnJ0b3djcy5jIHwg MyArKysKIDEgZmlsZSBjaGFuZ2VkLCAzIGluc2VydGlvbnMoKykKCmRpZmYgLS1naXQgYS9zcmMv bXVsdGlieXRlL21ic25ydG93Y3MuYyBiL3NyYy9tdWx0aWJ5dGUvbWJzbnJ0b3djcy5jCmluZGV4 IDkzMTE5MmUyLi4zNzVlMDFkNyAxMDA2NDQKLS0tIGEvc3JjL211bHRpYnl0ZS9tYnNucnRvd2Nz LmMKKysrIGIvc3JjL211bHRpYnl0ZS9tYnNucnRvd2NzLmMKQEAgLTIsMTEgKzIsMTQgQEAKIAog c2l6ZV90IG1ic25ydG93Y3Mod2NoYXJfdCAqcmVzdHJpY3Qgd2NzLCBjb25zdCBjaGFyICoqcmVz dHJpY3Qgc3JjLCBzaXplX3Qgbiwgc2l6ZV90IHduLCBtYnN0YXRlX3QgKnJlc3RyaWN0IHN0KQog eworCXN0YXRpYyB1bnNpZ25lZCBpbnRlcm5hbF9zdGF0ZTsKIAlzaXplX3QgbCwgY250PTAsIG4y OwogCXdjaGFyX3QgKndzLCB3YnVmWzI1Nl07CiAJY29uc3QgY2hhciAqcyA9ICpzcmM7CiAJY29u c3QgY2hhciAqdG1wX3M7CiAKKwlpZiAoIXN0KSBzdCA9ICh2b2lkICopJmludGVybmFsX3N0YXRl OworCiAJaWYgKCF3Y3MpIHdzID0gd2J1Ziwgd24gPSBzaXplb2Ygd2J1ZiAvIHNpemVvZiAqd2J1 ZjsKIAllbHNlIHdzID0gd2NzOwogCi0tIAoyLjM5LjIKCg== --=_3dddb5463b11d250a93b4af68b6f7eb7 Content-Transfer-Encoding: base64 Content-Type: text/x-diff; name=0002-mbsnrtowcs-fix-wrong-state-rollback-if-no-characters.patch Content-Disposition: attachment; filename=0002-mbsnrtowcs-fix-wrong-state-rollback-if-no-characters.patch; size=1552 RnJvbSAwNjMxNzlhNTViZGFjYjNkN2I5ZjZlZjY2ZDQ0OTA3MDc4YTczYTQ1IE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBBbGV4ZXkgSXpieXNoZXYgPGl6YnlzaGV2QGlzcHJhcy5ydT4K RGF0ZTogVHVlLCAzMCBNYXkgMjAyMyAwMDowNTo1NCArMDMwMApTdWJqZWN0OiBbUEFUQ0ggMi8z XSBtYnNucnRvd2NzOiBmaXggd3Jvbmcgc3RhdGUgcm9sbGJhY2sgaWYgbm8gY2hhcmFjdGVycyBh cmUKIGNvbnZlcnRlZApNYWlsLUZvbGxvd3VwLVRvOiBtdXNsQGxpc3RzLm9wZW53YWxsLmNvbQoK bWJzbnJ0b3djcyBhbHdheXMgcmVzZXRzIHRoZSBjb252ZXJzaW9uIHN0YXRlIHRvIHplcm8gaWYg bWJydG93YyBjYW4ndApwYXJzZSBhIGNvbXBsZXRlIG11bHRpYnl0ZSBzZXF1ZW5jZSBkdWUgdG8g cmVhY2hpbmcgdGhlIGxlbmd0aCBsaW1pdC4KSG93ZXZlciwgaWYgbWJzbnJ0b3djcyBzdGFydGVk IGluIGEgbm9uLWluaXRpYWwgc3RhdGUgYW5kIGhhc24ndApwcm9kdWNlZCBldmVuIGEgc2luZ2xl IHdpZGUgY2hhcmFjdGVyLCB0aGUgc3RhdGUgc2hvdWxkIGJlIHJvbGxlZCBiYWNrCnRvIGl0cyBv cmlnaW5hbCB2YWx1ZSBpbnN0ZWFkLgotLS0KIHNyYy9tdWx0aWJ5dGUvbWJzbnJ0b3djcy5jIHwg NCArKystCiAxIGZpbGUgY2hhbmdlZCwgMyBpbnNlcnRpb25zKCspLCAxIGRlbGV0aW9uKC0pCgpk aWZmIC0tZ2l0IGEvc3JjL211bHRpYnl0ZS9tYnNucnRvd2NzLmMgYi9zcmMvbXVsdGlieXRlL21i c25ydG93Y3MuYwppbmRleCAzNzVlMDFkNy4uYzNjMWY3MDkgMTAwNjQ0Ci0tLSBhL3NyYy9tdWx0 aWJ5dGUvbWJzbnJ0b3djcy5jCisrKyBiL3NyYy9tdWx0aWJ5dGUvbWJzbnJ0b3djcy5jCkBAIC0z LDEyICszLDE0IEBACiBzaXplX3QgbWJzbnJ0b3djcyh3Y2hhcl90ICpyZXN0cmljdCB3Y3MsIGNv bnN0IGNoYXIgKipyZXN0cmljdCBzcmMsIHNpemVfdCBuLCBzaXplX3Qgd24sIG1ic3RhdGVfdCAq cmVzdHJpY3Qgc3QpCiB7CiAJc3RhdGljIHVuc2lnbmVkIGludGVybmFsX3N0YXRlOworCXVuc2ln bmVkIHN0MDsKIAlzaXplX3QgbCwgY250PTAsIG4yOwogCXdjaGFyX3QgKndzLCB3YnVmWzI1Nl07 CiAJY29uc3QgY2hhciAqcyA9ICpzcmM7CiAJY29uc3QgY2hhciAqdG1wX3M7CiAKIAlpZiAoIXN0 KSBzdCA9ICh2b2lkICopJmludGVybmFsX3N0YXRlOworCXN0MCA9ICoodW5zaWduZWQgKilzdDsK IAogCWlmICghd2NzKSB3cyA9IHdidWYsIHduID0gc2l6ZW9mIHdidWYgLyBzaXplb2YgKndidWY7 CiAJZWxzZSB3cyA9IHdjczsKQEAgLTQ1LDcgKzQ3LDcgQEAgc2l6ZV90IG1ic25ydG93Y3Mod2No YXJfdCAqcmVzdHJpY3Qgd2NzLCBjb25zdCBjaGFyICoqcmVzdHJpY3Qgc3JjLCBzaXplX3Qgbiwg c2kKIAkJCQlicmVhazsKIAkJCX0KIAkJCS8qIGhhdmUgdG8gcm9sbCBiYWNrIHBhcnRpYWwgY2hh cmFjdGVyICovCi0JCQkqKHVuc2lnbmVkICopc3QgPSAwOworCQkJKih1bnNpZ25lZCAqKXN0ID0g KHMgPT0gKnNyYyA/IHN0MCA6IDApOwogCQkJYnJlYWs7CiAJCX0KIAkJcyArPSBsOyBuIC09IGw7 Ci0tIAoyLjM5LjIKCg== --=_3dddb5463b11d250a93b4af68b6f7eb7 Content-Transfer-Encoding: base64 Content-Type: text/x-diff; name=0003-mbsnrtowcs-don-t-modify-conversion-state-if-dest-buf.patch Content-Disposition: attachment; filename=0003-mbsnrtowcs-don-t-modify-conversion-state-if-dest-buf.patch; size=1502 RnJvbSA4MzZkMzQ4MDkwZWNhNGQ5NjAwNzYzOTA3NGY0YzEwMTM4OGY2OTQ1IE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBBbGV4ZXkgSXpieXNoZXYgPGl6YnlzaGV2QGlzcHJhcy5ydT4K RGF0ZTogVHVlLCAzMCBNYXkgMjAyMyAxMzo1MjowMyArMDMwMApTdWJqZWN0OiBbUEFUQ0ggMy8z XSBtYnNucnRvd2NzOiBkb24ndCBtb2RpZnkgY29udmVyc2lvbiBzdGF0ZSBpZiBkZXN0IGJ1ZiBp cwogTlVMTApNYWlsLUZvbGxvd3VwLVRvOiBtdXNsQGxpc3RzLm9wZW53YWxsLmNvbQoKUE9TSVgg c3BlY2lmaWVzIG1ic25ydG93Y3MgdG8gYmUgYW4gaW5wdXQtbGVuZ3RoLWxpbWl0aW5nIGVxdWl2 YWxlbnQgb2YKbWJzcnRvd2NzLiBUaGUgbGF0dGVyIGRvZXNuJ3QgbW9kaWZ5IHRoZSBwYXNzZWQg Y29udmVyc2lvbiBzdGF0ZSBpZiB0aGUKZGVzdGluYXRpb24gYnVmZmVyIGlzIE5VTEwuIFRoaXMg YmVoYXZpb3IsIGluIHBhcnRpY3VsYXIsIG1ha2VzIGl0CnBvc3NpYmxlIHRvIGxlYXJuIHRoZSBy ZXF1aXJlZCBzaXplIG9mIHRoZSBkZXN0aW5hdGlvbiBidWZmZXIgYnkgcGFzc2luZwpOVUxMIG9u IHRoZSBmaXJzdCBjYWxsLCBhbGxvY2F0ZSB0aGUgYnVmZmVyLCBhbmQgY2FsbCBtYnNydG93Y3Mg YWdhaW4Kd2l0aG91dCB0aGUgbmVlZCB0byBzYXZlL3Jlc3RvcmUgdGhlIGNvbnZlcnNpb24gc3Rh dGUgbWFudWFsbHkuCgpTaW5jZSB3ZSBkb24ndCBjYXJlIGFib3V0IHJvbGxpbmcgYmFjayB0aGUg Y29udmVyc2lvbiBzdGF0ZSBpbiB0aGlzCmNhc2UsIHNpbXBseSByZXVzZSB0aGUgY29weSBvZiB0 aGUgb3JpZ2luYWwgc3RhdGUgYXMgYSB0aHJvdy1hd2F5IHN0YXRlLgotLS0KIHNyYy9tdWx0aWJ5 dGUvbWJzbnJ0b3djcy5jIHwgNSArKysrLQogMSBmaWxlIGNoYW5nZWQsIDQgaW5zZXJ0aW9ucygr KSwgMSBkZWxldGlvbigtKQoKZGlmZiAtLWdpdCBhL3NyYy9tdWx0aWJ5dGUvbWJzbnJ0b3djcy5j IGIvc3JjL211bHRpYnl0ZS9tYnNucnRvd2NzLmMKaW5kZXggYzNjMWY3MDkuLmJkNzNmZjA5IDEw MDY0NAotLS0gYS9zcmMvbXVsdGlieXRlL21ic25ydG93Y3MuYworKysgYi9zcmMvbXVsdGlieXRl L21ic25ydG93Y3MuYwpAQCAtMTIsNyArMTIsMTAgQEAgc2l6ZV90IG1ic25ydG93Y3Mod2NoYXJf dCAqcmVzdHJpY3Qgd2NzLCBjb25zdCBjaGFyICoqcmVzdHJpY3Qgc3JjLCBzaXplX3Qgbiwgc2kK IAlpZiAoIXN0KSBzdCA9ICh2b2lkICopJmludGVybmFsX3N0YXRlOwogCXN0MCA9ICoodW5zaWdu ZWQgKilzdDsKIAotCWlmICghd2NzKSB3cyA9IHdidWYsIHduID0gc2l6ZW9mIHdidWYgLyBzaXpl b2YgKndidWY7CisJaWYgKCF3Y3MpIHsKKwkJd3MgPSB3YnVmLCB3biA9IHNpemVvZiB3YnVmIC8g c2l6ZW9mICp3YnVmOworCQlzdCA9ICh2b2lkICopJnN0MDsKKwl9CiAJZWxzZSB3cyA9IHdjczsK IAogCS8qIG1ha2luZyBzdXJlIG91dHB1dCBidWZmZXIgc2l6ZSBpcyBhdCBtb3N0IG4vNCB3aWxs IGVuc3VyZQotLSAKMi4zOS4yCgo= --=_3dddb5463b11d250a93b4af68b6f7eb7--