mailing list of musl libc
 help / color / mirror / code / Atom feed
* Bug report on iswalpha
@ 2014-08-05 20:35 Alon Zakai
  2014-08-05 21:02 ` Rich Felker
  0 siblings, 1 reply; 4+ messages in thread
From: Alon Zakai @ 2014-08-05 20:35 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 876 bytes --]

I think we have encountered a bug in iswalpha, as shown by the following
program:

====
#include <locale.h>
#include <stdio.h>
#include <wctype.h>

int
main(const int argc, const char * const * const argv)
{
  const char * const locale = (argc > 1 ? argv[1] : "C");
  const char * const actual = setlocale(LC_ALL, locale);
  if(actual == NULL) {
    printf("%s locale not supported; skipped locale-dependent code\n",
           locale);
    return 0;
  }
  printf("locale set to %s: %s\n", locale, actual);

  const int result = iswalpha(0xf4); // ô
  printf("iswalpha(\"\xc3\xb4\") = %d\n", result);
  return 0;
}
====

It returns 1 in the final printf, saying that that char is an walpha char,
when I believe it is not. For comparison, glibc reports 0.

Tested on musl 1.0.3 (used in emscripten) and musl trunk on git, same
result.

- Alon

[-- Attachment #2: Type: text/html, Size: 1107 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Bug report on iswalpha
  2014-08-05 20:35 Bug report on iswalpha Alon Zakai
@ 2014-08-05 21:02 ` Rich Felker
  2014-08-05 21:10   ` Alon Zakai
  0 siblings, 1 reply; 4+ messages in thread
From: Rich Felker @ 2014-08-05 21:02 UTC (permalink / raw)
  To: musl

On Tue, Aug 05, 2014 at 01:35:27PM -0700, Alon Zakai wrote:
> I think we have encountered a bug in iswalpha, as shown by the following
> program:

At least an inconsistency with glibc. Not necessarily a bug.

> ====
> #include <locale.h>
> #include <stdio.h>
> #include <wctype.h>
> 
> int
> main(const int argc, const char * const * const argv)
> {
>   const char * const locale = (argc > 1 ? argv[1] : "C");
>   const char * const actual = setlocale(LC_ALL, locale);
>   if(actual == NULL) {
>     printf("%s locale not supported; skipped locale-dependent code\n",
>            locale);
>     return 0;
>   }
>   printf("locale set to %s: %s\n", locale, actual);
> 
>   const int result = iswalpha(0xf4); // ô
>   printf("iswalpha(\"\xc3\xb4\") = %d\n", result);
>   return 0;
> }
> ====
> 
> It returns 1 in the final printf, saying that that char is an walpha char,
> when I believe it is not. For comparison, glibc reports 0.
> 
> Tested on musl 1.0.3 (used in emscripten) and musl trunk on git, same
> result.

Expecting iswalpha(0xf4) to return 0 in the C locale is wron, since
0xf4 has not been established to be valid wchar_t value in the current
locale, and the behavior of iswalpha is _undefined_ unless the
argument is either WEOF or a valid wchar_t in the current locale.

As documented, musl's C locale contains all of Unicode, and
additionally classifies all Unicode characters into the C classes like
"alpha", etc. based on their Unicode identities. This behavior is
definitely conforming to the requirements of ISO C and likely (though
the specification is not entirely clear) conforming to the current
requirements of POSIX, but is expected to be forbidden in future
issues of POSIX.

This is actually a topic of current discussion and possible change
(depending on what happens in POSIX), but I don't think the behavior
of iswalpha is likely to change in any case. If the C locale in musl
is changed not to include all of Unicode, then iswalpha(0xf4) would
just be undefined behavior in the C locale, and there would be no
reason to make it check the locale and return false. If the above code
is part of a test, I think it's an invalid test. With a better idea of
what it's trying to test, I could possibly suggest a fix that avoids
the UB.

Rich


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Bug report on iswalpha
  2014-08-05 21:02 ` Rich Felker
@ 2014-08-05 21:10   ` Alon Zakai
  2014-08-05 21:23     ` Rich Felker
  0 siblings, 1 reply; 4+ messages in thread
From: Alon Zakai @ 2014-08-05 21:10 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 2754 bytes --]

I see what you mean, yes, this does seem like undefined behavior then, as
it's invalid in that locale. Thanks for the quick response!

And thanks for musl in general! We are very happy with it in the emscripten
project.

- Alon



On Tue, Aug 5, 2014 at 2:02 PM, Rich Felker <dalias@libc.org> wrote:

> On Tue, Aug 05, 2014 at 01:35:27PM -0700, Alon Zakai wrote:
> > I think we have encountered a bug in iswalpha, as shown by the following
> > program:
>
> At least an inconsistency with glibc. Not necessarily a bug.
>
> > ====
> > #include <locale.h>
> > #include <stdio.h>
> > #include <wctype.h>
> >
> > int
> > main(const int argc, const char * const * const argv)
> > {
> >   const char * const locale = (argc > 1 ? argv[1] : "C");
> >   const char * const actual = setlocale(LC_ALL, locale);
> >   if(actual == NULL) {
> >     printf("%s locale not supported; skipped locale-dependent code\n",
> >            locale);
> >     return 0;
> >   }
> >   printf("locale set to %s: %s\n", locale, actual);
> >
> >   const int result = iswalpha(0xf4); // ô
> >   printf("iswalpha(\"\xc3\xb4\") = %d\n", result);
> >   return 0;
> > }
> > ====
> >
> > It returns 1 in the final printf, saying that that char is an walpha
> char,
> > when I believe it is not. For comparison, glibc reports 0.
> >
> > Tested on musl 1.0.3 (used in emscripten) and musl trunk on git, same
> > result.
>
> Expecting iswalpha(0xf4) to return 0 in the C locale is wron, since
> 0xf4 has not been established to be valid wchar_t value in the current
> locale, and the behavior of iswalpha is _undefined_ unless the
> argument is either WEOF or a valid wchar_t in the current locale.
>
> As documented, musl's C locale contains all of Unicode, and
> additionally classifies all Unicode characters into the C classes like
> "alpha", etc. based on their Unicode identities. This behavior is
> definitely conforming to the requirements of ISO C and likely (though
> the specification is not entirely clear) conforming to the current
> requirements of POSIX, but is expected to be forbidden in future
> issues of POSIX.
>
> This is actually a topic of current discussion and possible change
> (depending on what happens in POSIX), but I don't think the behavior
> of iswalpha is likely to change in any case. If the C locale in musl
> is changed not to include all of Unicode, then iswalpha(0xf4) would
> just be undefined behavior in the C locale, and there would be no
> reason to make it check the locale and return false. If the above code
> is part of a test, I think it's an invalid test. With a better idea of
> what it's trying to test, I could possibly suggest a fix that avoids
> the UB.
>
> Rich
>

[-- Attachment #2: Type: text/html, Size: 3515 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Bug report on iswalpha
  2014-08-05 21:10   ` Alon Zakai
@ 2014-08-05 21:23     ` Rich Felker
  0 siblings, 0 replies; 4+ messages in thread
From: Rich Felker @ 2014-08-05 21:23 UTC (permalink / raw)
  To: musl

On Tue, Aug 05, 2014 at 02:10:25PM -0700, Alon Zakai wrote:
> I see what you mean, yes, this does seem like undefined behavior then, as
> it's invalid in that locale. Thanks for the quick response!

Just to be clear -- given musl's current C locale, it's not UB in
musl. In musl's current C locale, mbtowc() for "\xc3\xb4" produces
(wchar_t)0xf4, so the behavior is well-defined and the "true" result
is correct. On the other hand, the behavior is undefined for glibc.

Rich


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-08-05 21:23 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-05 20:35 Bug report on iswalpha Alon Zakai
2014-08-05 21:02 ` Rich Felker
2014-08-05 21:10   ` Alon Zakai
2014-08-05 21:23     ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).