From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/8573 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: Re: First feedback on new C locale problems Date: Sat, 26 Sep 2015 15:35:42 -0400 Message-ID: <20150926193542.GO17773@brightrain.aerifal.cx> References: <20150926045836.GA2341@nyan> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1443296167 10374 80.91.229.3 (26 Sep 2015 19:36:07 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 26 Sep 2015 19:36:07 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-8585-gllmg-musl=m.gmane.org@lists.openwall.com Sat Sep 26 21:36:02 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1ZfvGI-00068A-VO for gllmg-musl@m.gmane.org; Sat, 26 Sep 2015 21:35:59 +0200 Original-Received: (qmail 7469 invoked by uid 550); 26 Sep 2015 19:35:56 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 7445 invoked from network); 26 Sep 2015 19:35:55 -0000 Content-Disposition: inline In-Reply-To: <20150926045836.GA2341@nyan> User-Agent: Mutt/1.5.21 (2010-09-15) Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:8573 Archived-At: On Sat, Sep 26, 2015 at 06:58:36AM +0200, Felix Janda wrote: > On 2015-09-09 05:56:48 GMT, Rich Felker wrote: > > On Tue, Sep 01, 2015 at 02:32:35AM -0400, Rich Felker wrote: > > > What I'd like to do to fix it is just always return "UTF-8" for > > > nl_langinfo(CODESET) regardless of locale (rather than returning > > > "UTF-8-CODE-UNITS" when in C locale). POSIX places no requirements on > > > nl_langinfo that would preclude this, and it seems like it would > > > restore the desired properties and fix all the regressions. > > > > Committed. > > > > Rich > > GNU sed seems to care about the output from nl_langinfo: > > https://bugs.gentoo.org/show_bug.cgi?id=560728 > > More specifically, so does lib/localecharset.c, which is used in > the replacement of re_compile_pattern. I was able to reproduce this (with slightly different output, "a© a'") on Alpine. Clearly this is some sort of bug in the gnulib code or sed itself, since it's producing corrupt output. I think we should explore why that's happening and whether it's possible to fix there. But if there remain other reasons that returning "UTF-8" in the C locale is not practical then perhaps we could resort to returning "ASCII". Rich