mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Bruno Haible <bruno@clisp.org>
To: musl@lists.openwall.com
Subject: [musl] wmemcmp and wcscmp returns incorrect results for some inputs, on most architectures
Date: Tue, 18 Apr 2023 17:22:20 +0200	[thread overview]
Message-ID: <5129919.jY9Djz4Zq0@nimes> (raw)

Hi,

 ---- Test program ----

==================================== foo.c ====================================
#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>

int main ()
{
  printf ("  wchar_t is %s.\n", (wchar_t)-1 < 0 ? "signed" : "unsigned");
  wchar_t a[2] = { (wchar_t) 0x76543210, 0 };
  wchar_t b[2] = { (wchar_t) 0x9abcdef1, 0 };
  int cmp1 = wmemcmp (a, b, 1);
  int cmp2 = wcscmp (a, b);
  cmp1 = (cmp1 > 0 ? 1 : cmp1 < 0 ? -1 : 0);
  cmp2 = (cmp2 > 0 ? 1 : cmp2 < 0 ? -1 : 0);
  printf ("  wmemcmp (a, b, 1) = %d\n", cmp1);
  printf ("  wcscmp (a, b) = %d\n", cmp2);
  return 0;
}
===============================================================================
$ gcc -Wall foo.c
$ ./a.out

This program has two possible correct results (for why, see below):

  wchar_t is unsigned.
  wmemcmp (a, b, 1) = -1
  wcscmp (a, b) = -1

and

  wchar_t is signed.
  wmemcmp (a, b, 1) = 1
  wcscmp (a, b) = 1

 ---- Results on musl libc ----

On arm64, this program prints:

  wchar_t is unsigned.
  wmemcmp (a, b, 1) = -1
  wcscmp (a, b) = -1

Which is correct.

On x86_64, i686, s390x, powerpc64le, it prints:

  wchar_t is signed.
  wmemcmp (a, b, 1) = -1
  wcscmp (a, b) = -1

Which is incorrect.

Version: On x86_64 I tested musl libc 1.2.3 (in Alpine Linux); for the other
architectures some older versions of musl libc.

 ---- About wmemcmp ----

ISO C 17 describes wmemcmp (§ 7.29.4.4.5) like this:
  "The wmemcmp function compares the first n wide characters of
   the object pointed to by s1 to the first n wide characters of
   the object pointed to by s2."

So, it has to compare "wide characters". § 3.7.3 defines a "wide character"
as "value representable by an object of type wchar_t, capable of
    representing any character in the current locale".
The second part of this sentence is merely an explanation of what wchar_t
is, a wording similar to the one in § 7.19 paragraph 2.
So, it is *not* a requirement that the value actually represents a
character in the current locale. Any wchar_t value is a "wide character".

(Note that this definition of wide character is broader than the one in
POSIX:2018:
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html
section 3.443 defines it as "An integer value corresponding to a
single graphic symbol or control code".
But, in an apparently attempt to align with ISO C, the description of
wmemcmp in POSIX:2018
https://pubs.opengroup.org/onlinepubs/9699919799/functions/wmemcmp.html
has this wording:
  "This function shall not be affected by locale and all wchar_t values
   shall be treated identically. The null wide character and wchar_t
   values not corresponding to valid characters shall not be treated
   specially."
)

So, wmemcmp has to compare the array elements by comparing wchar_t
values. I.e. if wchar_t is unsigned, by an unsigned comparison; if
wchar_t is signed, by a signed comparison.

 ---- About wcscmp ----

Similarly, ISO C 17 describes wcscmp (§ 7.29.4.4.1) as
  "The wcscmp function compares the wide string pointed to by s1
   to the wide string pointed to by s2."

The term "wide string" is defined in § 7.1.1 paragraph 4:
  "A wide string is a contiguous sequence of wide characters
   terminated by and including the first null wide character."

Regarding the term "wide character", see above.

So, wcscmp as well has to compare the array elements by comparing
wchar_t values. I.e. if wchar_t is unsigned, by an unsigned comparison;
if wchar_t is signed, by a signed comparison.

Bruno




             reply	other threads:[~2023-04-18 15:22 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-18 15:22 Bruno Haible [this message]
2023-04-18 15:40 ` [musl] " Bruno Haible
2023-04-18 15:48   ` Gabriel Ravier
2023-04-18 15:50     ` Bruno Haible
2023-04-24 15:23     ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5129919.jY9Djz4Zq0@nimes \
    --to=bruno@clisp.org \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).