From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/5731 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: Bug report on iswalpha Date: Tue, 5 Aug 2014 17:02:38 -0400 Message-ID: <20140805210238.GL1674@brightrain.aerifal.cx> References: Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1407272579 12846 80.91.229.3 (5 Aug 2014 21:02:59 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 5 Aug 2014 21:02:59 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-5736-gllmg-musl=m.gmane.org@lists.openwall.com Tue Aug 05 23:02:52 2014 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1XElsg-0007np-VF for gllmg-musl@plane.gmane.org; Tue, 05 Aug 2014 23:02:51 +0200 Original-Received: (qmail 28118 invoked by uid 550); 5 Aug 2014 21:02:50 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 28110 invoked from network); 5 Aug 2014 21:02:50 -0000 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:5731 Archived-At: On Tue, Aug 05, 2014 at 01:35:27PM -0700, Alon Zakai wrote: > I think we have encountered a bug in iswalpha, as shown by the following > program: At least an inconsistency with glibc. Not necessarily a bug. > ==== > #include > #include > #include > > int > main(const int argc, const char * const * const argv) > { > const char * const locale = (argc > 1 ? argv[1] : "C"); > const char * const actual = setlocale(LC_ALL, locale); > if(actual == NULL) { > printf("%s locale not supported; skipped locale-dependent code\n", > locale); > return 0; > } > printf("locale set to %s: %s\n", locale, actual); > > const int result = iswalpha(0xf4); // รด > printf("iswalpha(\"\xc3\xb4\") = %d\n", result); > return 0; > } > ==== > > It returns 1 in the final printf, saying that that char is an walpha char, > when I believe it is not. For comparison, glibc reports 0. > > Tested on musl 1.0.3 (used in emscripten) and musl trunk on git, same > result. Expecting iswalpha(0xf4) to return 0 in the C locale is wron, since 0xf4 has not been established to be valid wchar_t value in the current locale, and the behavior of iswalpha is _undefined_ unless the argument is either WEOF or a valid wchar_t in the current locale. As documented, musl's C locale contains all of Unicode, and additionally classifies all Unicode characters into the C classes like "alpha", etc. based on their Unicode identities. This behavior is definitely conforming to the requirements of ISO C and likely (though the specification is not entirely clear) conforming to the current requirements of POSIX, but is expected to be forbidden in future issues of POSIX. This is actually a topic of current discussion and possible change (depending on what happens in POSIX), but I don't think the behavior of iswalpha is likely to change in any case. If the C locale in musl is changed not to include all of Unicode, then iswalpha(0xf4) would just be undefined behavior in the C locale, and there would be no reason to make it check the locale and return false. If the above code is part of a test, I think it's an invalid test. With a better idea of what it's trying to test, I could possibly suggest a fix that avoids the UB. Rich