mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] wmemcmp and wcscmp returns incorrect results for some inputs, on most architectures
@ 2023-04-18 15:22 Bruno Haible
  2023-04-18 15:40 ` [musl] " Bruno Haible
  0 siblings, 1 reply; 5+ messages in thread
From: Bruno Haible @ 2023-04-18 15:22 UTC (permalink / raw)
  To: musl

Hi,

 ---- Test program ----

==================================== foo.c ====================================
#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>

int main ()
{
  printf ("  wchar_t is %s.\n", (wchar_t)-1 < 0 ? "signed" : "unsigned");
  wchar_t a[2] = { (wchar_t) 0x76543210, 0 };
  wchar_t b[2] = { (wchar_t) 0x9abcdef1, 0 };
  int cmp1 = wmemcmp (a, b, 1);
  int cmp2 = wcscmp (a, b);
  cmp1 = (cmp1 > 0 ? 1 : cmp1 < 0 ? -1 : 0);
  cmp2 = (cmp2 > 0 ? 1 : cmp2 < 0 ? -1 : 0);
  printf ("  wmemcmp (a, b, 1) = %d\n", cmp1);
  printf ("  wcscmp (a, b) = %d\n", cmp2);
  return 0;
}
===============================================================================
$ gcc -Wall foo.c
$ ./a.out

This program has two possible correct results (for why, see below):

  wchar_t is unsigned.
  wmemcmp (a, b, 1) = -1
  wcscmp (a, b) = -1

and

  wchar_t is signed.
  wmemcmp (a, b, 1) = 1
  wcscmp (a, b) = 1

 ---- Results on musl libc ----

On arm64, this program prints:

  wchar_t is unsigned.
  wmemcmp (a, b, 1) = -1
  wcscmp (a, b) = -1

Which is correct.

On x86_64, i686, s390x, powerpc64le, it prints:

  wchar_t is signed.
  wmemcmp (a, b, 1) = -1
  wcscmp (a, b) = -1

Which is incorrect.

Version: On x86_64 I tested musl libc 1.2.3 (in Alpine Linux); for the other
architectures some older versions of musl libc.

 ---- About wmemcmp ----

ISO C 17 describes wmemcmp (§ 7.29.4.4.5) like this:
  "The wmemcmp function compares the first n wide characters of
   the object pointed to by s1 to the first n wide characters of
   the object pointed to by s2."

So, it has to compare "wide characters". § 3.7.3 defines a "wide character"
as "value representable by an object of type wchar_t, capable of
    representing any character in the current locale".
The second part of this sentence is merely an explanation of what wchar_t
is, a wording similar to the one in § 7.19 paragraph 2.
So, it is *not* a requirement that the value actually represents a
character in the current locale. Any wchar_t value is a "wide character".

(Note that this definition of wide character is broader than the one in
POSIX:2018:
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html
section 3.443 defines it as "An integer value corresponding to a
single graphic symbol or control code".
But, in an apparently attempt to align with ISO C, the description of
wmemcmp in POSIX:2018
https://pubs.opengroup.org/onlinepubs/9699919799/functions/wmemcmp.html
has this wording:
  "This function shall not be affected by locale and all wchar_t values
   shall be treated identically. The null wide character and wchar_t
   values not corresponding to valid characters shall not be treated
   specially."
)

So, wmemcmp has to compare the array elements by comparing wchar_t
values. I.e. if wchar_t is unsigned, by an unsigned comparison; if
wchar_t is signed, by a signed comparison.

 ---- About wcscmp ----

Similarly, ISO C 17 describes wcscmp (§ 7.29.4.4.1) as
  "The wcscmp function compares the wide string pointed to by s1
   to the wide string pointed to by s2."

The term "wide string" is defined in § 7.1.1 paragraph 4:
  "A wide string is a contiguous sequence of wide characters
   terminated by and including the first null wide character."

Regarding the term "wide character", see above.

So, wcscmp as well has to compare the array elements by comparing
wchar_t values. I.e. if wchar_t is unsigned, by an unsigned comparison;
if wchar_t is signed, by a signed comparison.

Bruno




^ permalink raw reply	[flat|nested] 5+ messages in thread

* [musl] Re: wmemcmp and wcscmp returns incorrect results for some inputs, on most architectures
  2023-04-18 15:22 [musl] wmemcmp and wcscmp returns incorrect results for some inputs, on most architectures Bruno Haible
@ 2023-04-18 15:40 ` Bruno Haible
  2023-04-18 15:48   ` Gabriel Ravier
  0 siblings, 1 reply; 5+ messages in thread
From: Bruno Haible @ 2023-04-18 15:40 UTC (permalink / raw)
  To: musl

PS: I see that for wcscmp a correction has been added on 2023-01-04.
    The bug still exists in wmemcmp.




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [musl] Re: wmemcmp and wcscmp returns incorrect results for some inputs, on most architectures
  2023-04-18 15:40 ` [musl] " Bruno Haible
@ 2023-04-18 15:48   ` Gabriel Ravier
  2023-04-18 15:50     ` Bruno Haible
  2023-04-24 15:23     ` Rich Felker
  0 siblings, 2 replies; 5+ messages in thread
From: Gabriel Ravier @ 2023-04-18 15:48 UTC (permalink / raw)
  To: musl, Bruno Haible

On 4/18/23 17:40, Bruno Haible wrote:
> PS: I see that for wcscmp a correction has been added on 2023-01-04.
>      The bug still exists in wmemcmp.
>
>
>
Yup, I forgot about wmemcmp after finding the bug in wcscmp and wcsncmp 
(also, you might want to start testing wcsncmp if you're testing this 
across various C libraries).


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [musl] Re: wmemcmp and wcscmp returns incorrect results for some inputs, on most architectures
  2023-04-18 15:48   ` Gabriel Ravier
@ 2023-04-18 15:50     ` Bruno Haible
  2023-04-24 15:23     ` Rich Felker
  1 sibling, 0 replies; 5+ messages in thread
From: Bruno Haible @ 2023-04-18 15:50 UTC (permalink / raw)
  To: musl, Gabriel Ravier

Gabriel Ravier wrote:
> (also, you might want to start testing wcsncmp if you're testing this 
> across various C libraries).

Good point. Thank you. I will do that.




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [musl] Re: wmemcmp and wcscmp returns incorrect results for some inputs, on most architectures
  2023-04-18 15:48   ` Gabriel Ravier
  2023-04-18 15:50     ` Bruno Haible
@ 2023-04-24 15:23     ` Rich Felker
  1 sibling, 0 replies; 5+ messages in thread
From: Rich Felker @ 2023-04-24 15:23 UTC (permalink / raw)
  To: Gabriel Ravier; +Cc: musl, Bruno Haible

On Tue, Apr 18, 2023 at 05:48:25PM +0200, Gabriel Ravier wrote:
> On 4/18/23 17:40, Bruno Haible wrote:
> >PS: I see that for wcscmp a correction has been added on 2023-01-04.
> >     The bug still exists in wmemcmp.
> >
> >
> >
> Yup, I forgot about wmemcmp after finding the bug in wcscmp and
> wcsncmp (also, you might want to start testing wcsncmp if you're
> testing this across various C libraries).

I'll apply the same fix to wmemcmp.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-04-24 15:23 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-18 15:22 [musl] wmemcmp and wcscmp returns incorrect results for some inputs, on most architectures Bruno Haible
2023-04-18 15:40 ` [musl] " Bruno Haible
2023-04-18 15:48   ` Gabriel Ravier
2023-04-18 15:50     ` Bruno Haible
2023-04-24 15:23     ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).