From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/11019 Path: news.gmane.org!.POSTED!not-for-mail From: He X Newsgroups: gmane.linux.lib.musl.general Subject: Re: Re: a bug in bindtextdomain() and strip '.UTF-8' Date: Wed, 8 Feb 2017 18:13:30 +0800 Message-ID: References: <20170129133946.GT17692@port70.net> <20170129140747.GJ1533@brightrain.aerifal.cx> <20170129155507.GK1533@brightrain.aerifal.cx> <20170129163329.GL1533@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=94eb2c12345802089b0548021fe7 X-Trace: blaine.gmane.org 1486548849 1875 195.159.176.226 (8 Feb 2017 10:14:09 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 8 Feb 2017 10:14:09 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-11034-gllmg-musl=m.gmane.org@lists.openwall.com Wed Feb 08 11:14:03 2017 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1cbPGD-00006M-RM for gllmg-musl@m.gmane.org; Wed, 08 Feb 2017 11:14:01 +0100 Original-Received: (qmail 14326 invoked by uid 550); 8 Feb 2017 10:14:03 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 14305 invoked from network); 8 Feb 2017 10:14:03 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=bKrb3fyGWWQHXZuqBvPjt0q3EeQ4iNXGa2IrS0BjQEw=; b=ATEL5ULvuyQ0xvbmvviZpW7jy8dDXzJHYu4Cm2JgBY0pBgZzsJV1NRcYuXsviLLdyL 9oNW/vXsQMThgE3QGE7hBEGUFrUITxBqbTHozHgkXKw6Xq7ZIj3IKwYMWGTFdkl+tqI+ p9ApO2UZIh+TkvzfxlHcCFd92/kb8/vLiLqTIEb84KjQcDcj46+AjkZV//PiLni4lq1N vFYi8u6eq5cT3aqtiB5UJCH1chM9fLvNybgugUgGYEVXT0pm8+UaEcrpynrQWsFCu9gS 8EVCNxxDq44ahdqzxG/nn7QnXTpti0qXnOlwYT0zcDOvdsn79tCoJVT6fH7sjkxW6ehv Uhzw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=bKrb3fyGWWQHXZuqBvPjt0q3EeQ4iNXGa2IrS0BjQEw=; b=J0XCQ6GrbTfE375vwNTZhHBhEwxR1g5E3rCrX54c6ObrsmkTsgEpjh3Ji5gcWAD3ZP ZoL5j/mRoQVwWaOUEbp9i2P43h0nsspjbPmHMU0IhabDn7nDDark2GmRnWGzTEjnMfv5 Zpc0Rqrd18qXiPGRhN5gIKgDnQPVV1ZBJQjH3NP4GF5OzNPygbz5RbO5Vhir09i07/lU pT9Jj4UUKpaOARe5jZe09lwi7bSM0LhUzgEc2INt0lfuh0nsqDn2fbjg75uAiOf+0Kti FftmeTDu1o6QhD5nZCg/a7F5jz3HzL2nyU8STL/lb51aUMgVp/hy+hUaNTbXeL4LUqJq vg8w== X-Gm-Message-State: AMke39nQAHolQVbOlI+aduV7q5tVEy1AcBzsCuE0NHrACfqAVCCLVf1u9Rkt/3whFb3jfpuE03yffMdgkhXwvw== X-Received: by 10.159.39.199 with SMTP id b65mr8670543uab.3.1486548830912; Wed, 08 Feb 2017 02:13:50 -0800 (PST) In-Reply-To: <20170129163329.GL1533@brightrain.aerifal.cx> Xref: news.gmane.org gmane.linux.lib.musl.general:11019 Archived-At: --94eb2c12345802089b0548021fe7 Content-Type: text/plain; charset=UTF-8 here the patch is: http://paste.ubuntu.com/23953329/ The code tested, but maybe it sucks. 1. striping @xx, _TT: when mapping with full name failed, we check if there's a '@' in locname. if so, go back to the part of copying catname, override and skip '@xx'. Then we check if there's a '_', and if both '@' and '_TT' is there, point locname to '@xx', set a correct loclen, go back to the part of writing locname to replace '_TT' with '@xx'. If not both, skip and simply override '_TT'. Because there's also '_' in 'LC_xx', we may get into a dead loop of stripping '_TT'. So locname is checked, it's set to NULL if we used strchr to skip once. Same reason, we may get into a dead loop of overriding '_TT'. The first position of '/' should be front of the '_' if we replaced it once, the name will like: 'zh@t/LC_xx'. zh_CN@t (stripped by the first part)-> zh_CN (overrided by the second part)-> zh@t (stripped by the first part again)-> zh 2. about rewriting of '.GBK': I agreeded with keeping the original value of user, and stripping it in gettext() before. But i thought that someone may validate if libc set the correct charset by setlocale(). So we should rewrite .XX to .UTF-8 in setlocale(), we cant return a wrong value in principle. 2017-01-30 0:33 GMT+08:00 Rich Felker : > On Mon, Jan 30, 2017 at 12:14:49AM +0800, He X wrote: > > I can't wait, can i work on it and make a patch for these issues if > Masanori > > Ogino is busy now? I'd like to see that these issues could be solved in > > official musl repo as soon as possible. > > I'm not saying you need to wait, just that you should be aware of past > discussion of the topic, and if you want to propose patches they > should either follow the behavior outlined before or come with > discussion of why you think a different behavior is more appropriate. > > > And maybe rejection for NON-UTF-8, since 'LANG=zh_CN.GBK ./a.out( > > setlocale(LC_*, "") )' showed me a segfault with glibc. > > I don't think "it crashes on glibc" is a good justification for > anything. Rather there should probably be UX discussions of what > different choices mean for different poor-configuration situations > that are likely to arise in the wild (from things like LC_* getting > copied over ssh). > > Rich > --94eb2c12345802089b0548021fe7 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
here the patch is: http://paste.ubuntu.com/23953329/
The code tested, but maybe= it sucks.

1. striping @xx, _TT: when mapping= with full name failed, =C2=A0we check if there's a '@' in locn= ame. if so, go back to the part of copying catname, override and skip '= @xx'.=C2=A0

Then we check if there's a= '_', and if both '@' and '_TT' is there, point loc= name to '@xx', set a correct loclen, go back to the part of writing= locname to replace '_TT' with '@xx'. If not both, skip and= simply override '_TT'.=C2=A0

Because ther= e's also '_' in 'LC_xx', we may get into a dead loop of= stripping '_TT'. So locname is checked, it's set to NULL if we= used strchr to skip once.=C2=A0

Same reason, we m= ay get into a dead loop of overriding '_TT'. The first position of = '/' should be front of the '_' if we replaced it once, the = name will like: 'zh@t/LC_xx'.

zh_CN@t (str= ipped by the first part)-> zh_CN (overrided by the second part)-> zh@= t =C2=A0(stripped by the first part again)-> zh

=
2. about rewriting of '.GBK': I agreeded with keeping the orig= inal value =C2=A0of user, and stripping it in gettext() before. But i thoug= ht that someone may validate if libc set the correct charset by setlocale()= . So we should rewrite .XX to .UTF-8 in setlocale(), we cant return a wrong= value in principle.
--94eb2c12345802089b0548021fe7--