mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Subject: Re: Bug report on iswalpha
Date: Tue, 5 Aug 2014 17:02:38 -0400	[thread overview]
Message-ID: <20140805210238.GL1674@brightrain.aerifal.cx> (raw)
In-Reply-To: <CAEX4NpRfJBsVCi1DvqHWDDpaD1DU2anj9m=-ry-2qN=6xxK7vQ@mail.gmail.com>

On Tue, Aug 05, 2014 at 01:35:27PM -0700, Alon Zakai wrote:
> I think we have encountered a bug in iswalpha, as shown by the following
> program:

At least an inconsistency with glibc. Not necessarily a bug.

> ====
> #include <locale.h>
> #include <stdio.h>
> #include <wctype.h>
> 
> int
> main(const int argc, const char * const * const argv)
> {
>   const char * const locale = (argc > 1 ? argv[1] : "C");
>   const char * const actual = setlocale(LC_ALL, locale);
>   if(actual == NULL) {
>     printf("%s locale not supported; skipped locale-dependent code\n",
>            locale);
>     return 0;
>   }
>   printf("locale set to %s: %s\n", locale, actual);
> 
>   const int result = iswalpha(0xf4); // ô
>   printf("iswalpha(\"\xc3\xb4\") = %d\n", result);
>   return 0;
> }
> ====
> 
> It returns 1 in the final printf, saying that that char is an walpha char,
> when I believe it is not. For comparison, glibc reports 0.
> 
> Tested on musl 1.0.3 (used in emscripten) and musl trunk on git, same
> result.

Expecting iswalpha(0xf4) to return 0 in the C locale is wron, since
0xf4 has not been established to be valid wchar_t value in the current
locale, and the behavior of iswalpha is _undefined_ unless the
argument is either WEOF or a valid wchar_t in the current locale.

As documented, musl's C locale contains all of Unicode, and
additionally classifies all Unicode characters into the C classes like
"alpha", etc. based on their Unicode identities. This behavior is
definitely conforming to the requirements of ISO C and likely (though
the specification is not entirely clear) conforming to the current
requirements of POSIX, but is expected to be forbidden in future
issues of POSIX.

This is actually a topic of current discussion and possible change
(depending on what happens in POSIX), but I don't think the behavior
of iswalpha is likely to change in any case. If the C locale in musl
is changed not to include all of Unicode, then iswalpha(0xf4) would
just be undefined behavior in the C locale, and there would be no
reason to make it check the locale and return false. If the above code
is part of a test, I think it's an invalid test. With a better idea of
what it's trying to test, I could possibly suggest a fix that avoids
the UB.

Rich


  reply	other threads:[~2014-08-05 21:02 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-05 20:35 Alon Zakai
2014-08-05 21:02 ` Rich Felker [this message]
2014-08-05 21:10   ` Alon Zakai
2014-08-05 21:23     ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140805210238.GL1674@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).