From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_MSPIKE_H2 autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 3988 invoked from network); 18 Apr 2023 15:22:37 -0000 Received: from second.openwall.net (193.110.157.125) by inbox.vuxu.org with ESMTPUTF8; 18 Apr 2023 15:22:37 -0000 Received: (qmail 32664 invoked by uid 550); 18 Apr 2023 15:22:33 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 32632 invoked from network); 18 Apr 2023 15:22:33 -0000 ARC-Seal: i=1; a=rsa-sha256; t=1681831341; cv=none; d=strato.com; s=strato-dkim-0002; b=LijqxsEshZ8RHzU+x5DJPlkBaH76frIx4N/RUjldO+Ub2sHruOap0VvvJjJ3iODKuS FOwfFJ2R8UITqKOOqImyaPTkhX+jQvHKxpPPKWCpvUjgUHatIsmH9U///Rot7a8R7Gnm VdiCe/9GcWN7vdX7aryPZkOhnBJyvHqRi8o7/IXbR+AFqfLpEY0KLjuW+fdtpazFomFW G3ziwCGcFIIOeVz0DkVn4KRZokcVIAAbp1aAyAI56Myx4pRM5upRrukVxH+AiSOqAeql v3rTyvUqmpxI65sV4yIM5ZeR+GpsDNPDUGnGcjcCW1MSy2sNLfZMlc0kEFhGo0hcM63e Sfmw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; t=1681831341; s=strato-dkim-0002; d=strato.com; h=Message-ID:Date:Subject:To:From:Cc:Date:From:Subject:Sender; bh=04POg8fU1I0i2kq84kcNQxIMbNWq+lQIi1SnKSgu8jQ=; b=k132d+jVN5trOZnRBKMrDEntj8ONiCDVFep3XXEbJIy4S79eGDiPmr6NRvXtuM2IBI 8PdXGMXtHk6rNeSv33YvK9+EIwOY/Q7K08BcZRnuqyKWp7hWYQopgsNEXcv01FbjZzuQ r1TaBS1WZalxMcpJlaX/rfFgrRa7B1H2BPYUVdn192ialNz6HtgbnefsqYeRg+720qWP lPLkiNb4JP9ZL4gleo26wA92hGO8ALXPsQa+v3TXvF8P/xxMVxzqBAbZ10xX5ZF+D3G8 dkBdKrbs7fxB2Ow2LxzU+XfwpGj/0hYCoFmH2UHyf7+KBHxr/odWsAXy9XDh0n/rzkOp 5U6g== ARC-Authentication-Results: i=1; strato.com; arc=none; dkim=none X-RZG-CLASS-ID: mo00 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; t=1681831341; s=strato-dkim-0002; d=clisp.org; h=Message-ID:Date:Subject:To:From:Cc:Date:From:Subject:Sender; bh=04POg8fU1I0i2kq84kcNQxIMbNWq+lQIi1SnKSgu8jQ=; b=dueZ0WTQLndicvf36ehLqaKt9USfwYBWZzXWWAhPnqS2L9FcA0m3r5lAAwcT8Iklzb Kly2FCHe+NJQaR7hQ45ZMDHyjua/DXibgFDK9I8A8WWkBAEvB7qO83HvLtctYYpP6VrQ XQeQg9CQwrJQmb8RcYIZrvcxDuL0uHL1hIrYspYtbDs2B2pNEDg1k+3ikovV3d2OaOPd z2CwzzmGR8Jy5fhiPtaVZs/QvlBvfuVUGx5CCVWFbGR1zjwtLS0YxgBvR2YuUnQEI3A4 +WQ204k42s1DVQvwuXx5DYLI2J/0AGfs66mvjI2b+Wh+ZYmOFbJHD3eqcEyXuV9/HC2b jLlg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; t=1681831341; s=strato-dkim-0003; d=clisp.org; h=Message-ID:Date:Subject:To:From:Cc:Date:From:Subject:Sender; bh=04POg8fU1I0i2kq84kcNQxIMbNWq+lQIi1SnKSgu8jQ=; b=9G92RQRP/3PLQ8B5Gg9cmo3zjMnVcBywh/5A6rSsLwCCG2QkX/R08YmvsjpVRYaRgR LDYP6/4Jra46AocZjjAg== X-RZG-AUTH: ":Ln4Re0+Ic/6oZXR1YgKryK8brlshOcZlIWs+iCP5vnk6shH0WWb0LN8XZoH94zq68+3cfpOT2vN+j99710EKugYHXkRHprwZvw==" From: Bruno Haible To: musl@lists.openwall.com Date: Tue, 18 Apr 2023 17:22:20 +0200 Message-ID: <5129919.jY9Djz4Zq0@nimes> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="iso-8859-1" Subject: [musl] wmemcmp and wcscmp returns incorrect results for some inputs, on most architectures Hi, ---- Test program ---- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D foo.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D #include #include #include int main () { printf (" wchar_t is %s.\n", (wchar_t)-1 < 0 ? "signed" : "unsigned"); wchar_t a[2] =3D { (wchar_t) 0x76543210, 0 }; wchar_t b[2] =3D { (wchar_t) 0x9abcdef1, 0 }; int cmp1 =3D wmemcmp (a, b, 1); int cmp2 =3D wcscmp (a, b); cmp1 =3D (cmp1 > 0 ? 1 : cmp1 < 0 ? -1 : 0); cmp2 =3D (cmp2 > 0 ? 1 : cmp2 < 0 ? -1 : 0); printf (" wmemcmp (a, b, 1) =3D %d\n", cmp1); printf (" wcscmp (a, b) =3D %d\n", cmp2); return 0; } =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D $ gcc -Wall foo.c $ ./a.out This program has two possible correct results (for why, see below): wchar_t is unsigned. wmemcmp (a, b, 1) =3D -1 wcscmp (a, b) =3D -1 and wchar_t is signed. wmemcmp (a, b, 1) =3D 1 wcscmp (a, b) =3D 1 ---- Results on musl libc ---- On arm64, this program prints: wchar_t is unsigned. wmemcmp (a, b, 1) =3D -1 wcscmp (a, b) =3D -1 Which is correct. On x86_64, i686, s390x, powerpc64le, it prints: wchar_t is signed. wmemcmp (a, b, 1) =3D -1 wcscmp (a, b) =3D -1 Which is incorrect. Version: On x86_64 I tested musl libc 1.2.3 (in Alpine Linux); for the other architectures some older versions of musl libc. ---- About wmemcmp ---- ISO C 17 describes wmemcmp (=A7 7.29.4.4.5) like this: "The wmemcmp function compares the first n wide characters of the object pointed to by s1 to the first n wide characters of the object pointed to by s2." So, it has to compare "wide characters". =A7 3.7.3 defines a "wide characte= r" as "value representable by an object of type wchar_t, capable of representing any character in the current locale". The second part of this sentence is merely an explanation of what wchar_t is, a wording similar to the one in =A7 7.19 paragraph 2. So, it is *not* a requirement that the value actually represents a character in the current locale. Any wchar_t value is a "wide character". (Note that this definition of wide character is broader than the one in POSIX:2018: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html section 3.443 defines it as "An integer value corresponding to a single graphic symbol or control code". But, in an apparently attempt to align with ISO C, the description of wmemcmp in POSIX:2018 https://pubs.opengroup.org/onlinepubs/9699919799/functions/wmemcmp.html has this wording: "This function shall not be affected by locale and all wchar_t values shall be treated identically. The null wide character and wchar_t values not corresponding to valid characters shall not be treated specially." ) So, wmemcmp has to compare the array elements by comparing wchar_t values. I.e. if wchar_t is unsigned, by an unsigned comparison; if wchar_t is signed, by a signed comparison. ---- About wcscmp ---- Similarly, ISO C 17 describes wcscmp (=A7 7.29.4.4.1) as "The wcscmp function compares the wide string pointed to by s1 to the wide string pointed to by s2." The term "wide string" is defined in =A7 7.1.1 paragraph 4: "A wide string is a contiguous sequence of wide characters terminated by and including the first null wide character." Regarding the term "wide character", see above. So, wcscmp as well has to compare the array elements by comparing wchar_t values. I.e. if wchar_t is unsigned, by an unsigned comparison; if wchar_t is signed, by a signed comparison. Bruno