From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/10987 Path: news.gmane.org!.POSTED!not-for-mail From: He X Newsgroups: gmane.linux.lib.musl.general Subject: Re: Re: a bug in bindtextdomain() and strip '.UTF-8' Date: Mon, 30 Jan 2017 08:37:42 +0800 Message-ID: References: <20170129133946.GT17692@port70.net> <20170129140747.GJ1533@brightrain.aerifal.cx> <20170129163714.GM1533@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=f40304361e2c3b206e0547450707 X-Trace: blaine.gmane.org 1485736701 2521 195.159.176.226 (30 Jan 2017 00:38:21 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Mon, 30 Jan 2017 00:38:21 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-11002-gllmg-musl=m.gmane.org@lists.openwall.com Mon Jan 30 01:38:15 2017 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1cXzz2-0000NZ-Jj for gllmg-musl@m.gmane.org; Mon, 30 Jan 2017 01:38:12 +0100 Original-Received: (qmail 16113 invoked by uid 550); 30 Jan 2017 00:38:16 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 16092 invoked from network); 30 Jan 2017 00:38:15 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=tS4A7NMhaDu6Q8sx4/wSB/v7rKarFKFEPOqnDun0Qlo=; b=Qfn6ZytVk+w5R/KAJ1iCMP+98yynUmJD16P7KZfFovD33vxewj1Qo5yzcJSUNQ+oEd relprB8OJc4GzEIyj33dIwYA2RMAls1H1jYu2aJZQLLC4pTgt1B5fPQjX+AJFzFrQeGZ ngQKwY/7JSb1hZ2hKviS+QpLlff/fmidbLej6ShIYOmpiwDfrTZfF1ny4Dbr5WRnRQ/3 qVVj6rV9cvoZgAwctbBTFzvGTSzHvlZe6vYDZWRApLxzYndD9nFockRP5s9YgCkFX11i g3DF8q6sRt3u2fEMVlMh7kGYeFURSCMvLOz6WRSwobRevaTdDQxV6Q9c3uKlL1H4K60a ZsKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=tS4A7NMhaDu6Q8sx4/wSB/v7rKarFKFEPOqnDun0Qlo=; b=qJZTm3xPsMMMJalXesU/MI0KVS+Q+inTLlVH72s4y0YcopCNnt/NJVJ/X8cVRH+vCN le7dZ+JujTEcgH0PMZL+b55nvmXtXrV6336QsfmXbiuyMe/GYc9qWlbI6PcWXt2r4Jn4 OAx9CiwFFBPKe2/gLcvl79HM/A3pfy3ubczU0Fg0Dznu76NmmcuJwLYhuYVlYudUoQIW UiUKYwy4YPpbsL1JgJG6HXvqAvCWRM004qoLkBoGGVAy35MQw0qtt8rX64oOzwaWBckR dstkYQWka4pHJQoEGZ7iwuO8FPZtp45ZytBjOBvpmfHU39s8I1kGtPX29RCduxr9lBCA Fyhg== X-Gm-Message-State: AIkVDXLQVehU6UElmXAja634zlzH4wH9sMR9ucZr3xsPMS7hB0hDsSon2aZOtPj4AFQcg2mkFU7OyVScOAXIrA== X-Received: by 10.176.23.22 with SMTP id j22mr9620530uaf.168.1485736683182; Sun, 29 Jan 2017 16:38:03 -0800 (PST) In-Reply-To: <20170129163714.GM1533@brightrain.aerifal.cx> Xref: news.gmane.org gmane.linux.lib.musl.general:10987 Archived-At: --f40304361e2c3b206e0547450707 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable > I'm not saying you need to wait.... 1. its hard to read that thread for me, i just glanced once, thx for you advice, ill be more cautious next time! ;p > Can I ask how .UTF-8 got in the locale name.... 2. And '.UTF-8' is copied from glibc's locale-table, i put it there, it's set by normal user. As i looked in to musl's source, i found it's totally useless for musl to set such a suffix, suffixes are meaningless. But we should still do a compatibility with glibc in my view, suffixes seems already unofficial but standard way to ask libc to provide a proper charset= . > I don't think "it crashes on glibc"... 3. Really sorry, forgot to locale-gen before test, that's why segfault, seems glibc only stripped '.GBK' at translation load time, showed me '=C2=BB=E1=BB=B0=D1=A1=C3=8F=C3=AE:'. In another word, it was using real GB= K set! Though I agree with rejection: because musl is utf8, but this '.GBK' asked for using 'GBK' rather than utf8, conceptually we should just reject it. But stand on the side of normal users, rewriting is nice to avoid failing. And for developers using musl, they should know there's no 'non-utf8' sets in musl rather than depending on libc, so i would like the idea of rewriting. Or we could put the responsibility of setting right LC_* to users? Not so friendly... Because users may want to validate the strings returned by setlocale()... So the best rewriting time, i think, is at the translation time. > Re: the original patch, it should probably... 4. makes sense, i'm not a pro coder, i havnt think about using strchr or strcmp! :) And with the idea above, i suggest better using strchr to strip all things after '.'. that is good, and we dont need focus at what is placed after '.', since whatever he asked, musl is using utf8. 2017-01-30 0:37 GMT+08:00 Rich Felker : > On Sun, Jan 29, 2017 at 10:48:34PM +0800, He X wrote: > > btw, with 'p-> to q->', 'strip .UTF-8'(these two in the first thread), > and > > these two patches, fcitx, chromium are working well. > > Can I ask how .UTF-8 got in the locale name to begin with? Did you put > it there, or was it copied from another non-glibc system you logged in > from, or did chromium itself add it? > > Re: the original patch, it should probably (depending on what we want > to do with other invalid encodings) either use strchr to find the > first '.' and strip everything after it, or something like: > > if (loclen > 6 && !strcmp(locname+loclen-6, ".UTF-8")) > > There's no reason to pull strstr in here. > > Rich > --f40304361e2c3b206e0547450707 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
>=C2=A0I'm not = saying you need to wait....
1. its hard t= o read that thread for me, i just glanced once, thx for you advice, ill be = more=C2=A0cautious next time! ;p
<= br>
>=C2=A0Can I as= k how .UTF-8 got in the locale name....
2. And '.UTF-8' is copied from glibc's locale-table, i = put it there, it's set by normal user. As i looked in to musl's sou= rce, i found it's totally useless for musl to set such a suffix, suffix= es are meaningless. But we should still do a compatibility with glibc in my view, suffixes seems already unofficial but= standard way to ask libc to provide a proper charset.

&g= t;=C2=A0I don't think "it crashes on glibc"...
3. Really sorry, forgot to locale-gen before test, that= 's why segfault, seems glibc only=C2=A0stripped '.GBK' at=C2=A0= translation load time, showed me '=C2=BB=E1=BB=B0=D1=A1=C3=8F=C3=AE:= 9;. In another word,=C2=A0it was using real GBK set!

Though I a= gree with=C2=A0rejection: because musl is utf8, but this '.GBK' ask= ed for using 'GBK' rather than utf8,=C2=A0conceptually we should ju= st reject it. But stand on the side= of normal users,=C2=A0rewriting is nice to avoid failing. And for dev= elopers using musl, they should know there's no 'non-utf8' sets= in musl rather than depending on libc, so i would like the idea of rewriti= ng. Or we could put the=C2=A0responsibility = of setting right LC_* to users? Not so friendly...

Because users may want to validate the stri= ngs returned by setlocale()... So the best rewriting time, i think, is at t= he translation time.

>=C2=A0Re: the original patch, i= t should probably...
4. makes sense, i= 9;m not a pro coder, i havnt think about using strchr or strcmp! :)<= /div>

And with the idea above, i suggest better using strchr to strip all th= ings after '.'. that is good, and we dont need focus at what is pla= ced after '.', since whatever he asked, musl is using utf8.<= /div>

2017-01-30 0= :37 GMT+08:00 Rich Felker <dalias@libc.org>:
On Sun, Jan 29, = 2017 at 10:48:34PM +0800, He X wrote:
> btw, with 'p-> to q->', &#= 39;strip .UTF-8'(these two in the first thread),=C2=A0 and
> these two patches, fcitx, chromium are working well.

Can I ask how .UTF-8 got in the locale name to begin with? Did you p= ut
it there, or was it copied from another non-glibc system you logged in
from, or did chromium itself add it?

Re: the original patch, it should probably (depending on what we want
to do with other invalid encodings) either use strchr to find the
first '.' and strip everything after it, or something like:

=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (loclen > 6 && !strcmp(locname+lo= clen-6, ".UTF-8"))

There's no reason to pull strstr in here.

Rich

--f40304361e2c3b206e0547450707--