From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/10987
Path: news.gmane.org!.POSTED!not-for-mail
From: He X <xw897002528@gmail.com>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: Re: a bug in bindtextdomain() and strip '.UTF-8'
Date: Mon, 30 Jan 2017 08:37:42 +0800
Message-ID: <CAPG2z0-2iGiVXBhFnaMWF_wVPCs6AgNMqobLJoWHLrmeR=Uy+A@mail.gmail.com>
References: <CAPG2z08mpVnx8nDc1703Ejhb0pwQqjG1SynHcafHO-NcqaKTrg@mail.gmail.com>
 <CAPG2z08Sd0F50iiQohqjYQfJMZkqhTFCkRT59R91TiVG0wNiNQ@mail.gmail.com>
 <20170129133946.GT17692@port70.net> <20170129140747.GJ1533@brightrain.aerifal.cx>
 <CAPG2z0_EHU=U0=pkd31b5fPN__-Ly_qd9W9ftK=1e40SDJHX1w@mail.gmail.com> <20170129163714.GM1533@brightrain.aerifal.cx>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: blaine.gmane.org
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary=f40304361e2c3b206e0547450707
X-Trace: blaine.gmane.org 1485736701 2521 195.159.176.226 (30 Jan 2017 00:38:21 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Mon, 30 Jan 2017 00:38:21 +0000 (UTC)
To: musl@lists.openwall.com
Original-X-From: musl-return-11002-gllmg-musl=m.gmane.org@lists.openwall.com Mon Jan 30 01:38:15 2017
Return-path: <musl-return-11002-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@m.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by blaine.gmane.org with smtp (Exim 4.84_2)
	(envelope-from <musl-return-11002-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1cXzz2-0000NZ-Jj
	for gllmg-musl@m.gmane.org; Mon, 30 Jan 2017 01:38:12 +0100
Original-Received: (qmail 16113 invoked by uid 550); 30 Jan 2017 00:38:16 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Original-Received: (qmail 16092 invoked from network); 30 Jan 2017 00:38:15 -0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to;
        bh=tS4A7NMhaDu6Q8sx4/wSB/v7rKarFKFEPOqnDun0Qlo=;
        b=Qfn6ZytVk+w5R/KAJ1iCMP+98yynUmJD16P7KZfFovD33vxewj1Qo5yzcJSUNQ+oEd
         relprB8OJc4GzEIyj33dIwYA2RMAls1H1jYu2aJZQLLC4pTgt1B5fPQjX+AJFzFrQeGZ
         ngQKwY/7JSb1hZ2hKviS+QpLlff/fmidbLej6ShIYOmpiwDfrTZfF1ny4Dbr5WRnRQ/3
         qVVj6rV9cvoZgAwctbBTFzvGTSzHvlZe6vYDZWRApLxzYndD9nFockRP5s9YgCkFX11i
         g3DF8q6sRt3u2fEMVlMh7kGYeFURSCMvLOz6WRSwobRevaTdDQxV6Q9c3uKlL1H4K60a
         ZsKQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:in-reply-to:references:from:date
         :message-id:subject:to;
        bh=tS4A7NMhaDu6Q8sx4/wSB/v7rKarFKFEPOqnDun0Qlo=;
        b=qJZTm3xPsMMMJalXesU/MI0KVS+Q+inTLlVH72s4y0YcopCNnt/NJVJ/X8cVRH+vCN
         le7dZ+JujTEcgH0PMZL+b55nvmXtXrV6336QsfmXbiuyMe/GYc9qWlbI6PcWXt2r4Jn4
         OAx9CiwFFBPKe2/gLcvl79HM/A3pfy3ubczU0Fg0Dznu76NmmcuJwLYhuYVlYudUoQIW
         UiUKYwy4YPpbsL1JgJG6HXvqAvCWRM004qoLkBoGGVAy35MQw0qtt8rX64oOzwaWBckR
         dstkYQWka4pHJQoEGZ7iwuO8FPZtp45ZytBjOBvpmfHU39s8I1kGtPX29RCduxr9lBCA
         Fyhg==
X-Gm-Message-State: AIkVDXLQVehU6UElmXAja634zlzH4wH9sMR9ucZr3xsPMS7hB0hDsSon2aZOtPj4AFQcg2mkFU7OyVScOAXIrA==
X-Received: by 10.176.23.22 with SMTP id j22mr9620530uaf.168.1485736683182;
 Sun, 29 Jan 2017 16:38:03 -0800 (PST)
In-Reply-To: <20170129163714.GM1533@brightrain.aerifal.cx>
Xref: news.gmane.org gmane.linux.lib.musl.general:10987
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/10987>

--f40304361e2c3b206e0547450707
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

> I'm not saying you need to wait....
1. its hard to read that thread for me, i just glanced once, thx for you
advice, ill be more cautious next time! ;p

> Can I ask how .UTF-8 got in the locale name....
2. And '.UTF-8' is copied from glibc's locale-table, i put it there, it's
set by normal user. As i looked in to musl's source, i found it's totally
useless for musl to set such a suffix, suffixes are meaningless. But we
should still do a compatibility with glibc in my view, suffixes seems
already unofficial but standard way to ask libc to provide a proper charset=
.

> I don't think "it crashes on glibc"...
3. Really sorry, forgot to locale-gen before test, that's why segfault,
seems glibc only stripped '.GBK' at translation load time, showed me
'=C2=BB=E1=BB=B0=D1=A1=C3=8F=C3=AE:'. In another word, it was using real GB=
K set!

Though I agree with rejection: because musl is utf8, but this '.GBK' asked
for using 'GBK' rather than utf8, conceptually we should just reject it.
But stand on the side of normal users, rewriting is nice to avoid failing.
And for developers using musl, they should know there's no 'non-utf8' sets
in musl rather than depending on libc, so i would like the idea of
rewriting. Or we could put the responsibility of setting right LC_* to
users? Not so friendly...

Because users may want to validate the strings returned by setlocale()...
So the best rewriting time, i think, is at the translation time.

> Re: the original patch, it should probably...
4. makes sense, i'm not a pro coder, i havnt think about using strchr or
strcmp! :)

And with the idea above, i suggest better using strchr to strip all things
after '.'. that is good, and we dont need focus at what is placed after
'.', since whatever he asked, musl is using utf8.

2017-01-30 0:37 GMT+08:00 Rich Felker <dalias@libc.org>:

> On Sun, Jan 29, 2017 at 10:48:34PM +0800, He X wrote:
> > btw, with 'p-> to q->', 'strip .UTF-8'(these two in the first thread),
> and
> > these two patches, fcitx, chromium are working well.
>
> Can I ask how .UTF-8 got in the locale name to begin with? Did you put
> it there, or was it copied from another non-glibc system you logged in
> from, or did chromium itself add it?
>
> Re: the original patch, it should probably (depending on what we want
> to do with other invalid encodings) either use strchr to find the
> first '.' and strip everything after it, or something like:
>
>         if (loclen > 6 && !strcmp(locname+loclen-6, ".UTF-8"))
>
> There's no reason to pull strstr in here.
>
> Rich
>

--f40304361e2c3b206e0547450707
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><font color=3D"#000000">&gt;=C2=A0</font>I&#39;m not =
saying you need to wait....</div><div><font color=3D"#000000">1. its hard t=
o read that thread for me, i just glanced once, thx for you advice, ill be =
more=C2=A0cautious next time! ;p</font></div><div><font color=3D"#000000"><=
br></font></div><div><div><font color=3D"#000000">&gt;=C2=A0</font>Can I as=
k how .UTF-8 got in the locale name....</div><div><span style=3D"color:rgb(=
0,0,0)">2. And &#39;.UTF-8&#39; is copied from glibc&#39;s locale-table, i =
put it there, it&#39;s set by normal user. As i looked in to musl&#39;s sou=
rce, i found it&#39;s totally useless for musl to set such a suffix, suffix=
es are meaningless. But we should still do a c</span><font color=3D"#000000=
">ompatibility with glibc in my view, suffixes seems already unofficial but=
 standard way to ask libc to provide a proper charset.</font></div></div><d=
iv><font color=3D"#000000"><br></font></div><div><font color=3D"#000000">&g=
t;=C2=A0</font>I don&#39;t think &quot;it crashes on glibc&quot;...<br><fon=
t color=3D"#000000">3. Really sorry, forgot to locale-gen before test, that=
&#39;s why segfault, seems glibc only=C2=A0stripped &#39;.GBK&#39; at=C2=A0=
translation load time, showed me &#39;=C2=BB=E1=BB=B0=D1=A1=C3=8F=C3=AE:=
9;. In another word,=C2=A0it was using real GBK set!</font></div><div><font=
 color=3D"#000000"><br></font></div><div><font color=3D"#000000">Though I a=
gree with=C2=A0rejection: because musl is utf8, but this &#39;.GBK&#39; ask=
ed for using &#39;GBK&#39; rather than utf8,=C2=A0conceptually we should ju=
st reject it</font><span style=3D"color:rgb(0,0,0)">. But stand on the side=
 of normal users,</span><span style=3D"color:rgb(0,0,0)">=C2=A0rewriting </=
span><span style=3D"color:rgb(0,0,0)">is nice to avoid failing. And for dev=
elopers using musl, they should know there&#39;s no &#39;non-utf8&#39; sets=
 in musl rather than depending on libc, so i would like the idea of rewriti=
ng. Or we could put the=C2=A0</span><font color=3D"#000000">responsibility =
of setting right LC_* to users? Not so friendly...</font></div><div><br></d=
iv><div><font color=3D"#000000">Because users may want to validate the stri=
ngs returned by setlocale()... So the best rewriting time, i think, is at t=
he translation time.</font></div><div><font color=3D"#000000"><br></font></=
div><div><font color=3D"#000000">&gt;=C2=A0</font>Re: the original patch, i=
t should probably...</div><div><font color=3D"#000000">4. makes sense, i=
9;m not a pro coder, i havnt think about using strchr or strcmp! :)</font><=
/div><div><font color=3D"#000000"><br></font></div><div><font color=3D"#000=
000">And with the idea above, i suggest better using strchr to strip all th=
ings after &#39;.&#39;. that is good, and we dont need focus at what is pla=
ced after &#39;.&#39;, since whatever he asked, musl is using utf8.</font><=
/div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">2017-01-30 0=
:37 GMT+08:00 Rich Felker <span dir=3D"ltr">&lt;<a href=3D"mailto:dalias@li=
bc.org" target=3D"_blank">dalias@libc.org</a>&gt;</span>:<br><blockquote cl=
ass=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid=
 rgb(204,204,204);padding-left:1ex"><span class=3D"gmail-">On Sun, Jan 29, =
2017 at 10:48:34PM +0800, He X wrote:<br>
</span><span class=3D"gmail-">&gt; btw, with &#39;p-&gt; to q-&gt;&#39;, &#=
39;strip .UTF-8&#39;(these two in the first thread),=C2=A0 and<br>
&gt; these two patches, fcitx, chromium are working well.<br>
<br>
</span>Can I ask how .UTF-8 got in the locale name to begin with? Did you p=
ut<br>
it there, or was it copied from another non-glibc system you logged in<br>
from, or did chromium itself add it?<br>
<br>
Re: the original patch, it should probably (depending on what we want<br>
to do with other invalid encodings) either use strchr to find the<br>
first &#39;.&#39; and strip everything after it, or something like:<br>
<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (loclen &gt; 6 &amp;&amp; !strcmp(locname+lo=
clen-6, &quot;.UTF-8&quot;))<br>
<br>
There&#39;s no reason to pull strstr in here.<br>
<span class=3D"gmail-HOEnZb"><font color=3D"#888888"><br>
Rich<br>
</font></span></blockquote></div><br></div></div>

--f40304361e2c3b206e0547450707--