From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/11036 Path: news.gmane.org!.POSTED!not-for-mail From: He X Newsgroups: gmane.linux.lib.musl.general Subject: Re: Re: a bug in bindtextdomain() and strip '.UTF-8' Date: Mon, 13 Feb 2017 22:06:49 +0800 Message-ID: References: <20170129155507.GK1533@brightrain.aerifal.cx> <20170129163329.GL1533@brightrain.aerifal.cx> <20170208143147.GY1533@brightrain.aerifal.cx> <20170211023610.GA1520@brightrain.aerifal.cx> <20170212023422.GE1520@brightrain.aerifal.cx> <20170213132816.GG1520@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=001a1144a44ea2fa1a054869f6e6 X-Trace: blaine.gmane.org 1486994848 12133 195.159.176.226 (13 Feb 2017 14:07:28 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Mon, 13 Feb 2017 14:07:28 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-11051-gllmg-musl=m.gmane.org@lists.openwall.com Mon Feb 13 15:07:24 2017 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1cdHHj-0002g4-4a for gllmg-musl@m.gmane.org; Mon, 13 Feb 2017 15:07:19 +0100 Original-Received: (qmail 13466 invoked by uid 550); 13 Feb 2017 14:07:23 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 13445 invoked from network); 13 Feb 2017 14:07:22 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=r5WD71f7sgEVWQcTShnB3B7/HL107QPrgZjiaWw2K7g=; b=ESzbYMJuNvnoxJhfs9hgd0ybr16fW17/GQ6ZdrnYC8vZ59yUd0TCCmVw/2xW0zAPlh xSbPAr+eIBx6VxzxeQ6mxzYmSaq7buTh4hfz0S+7Y1gkjFhuuM4mOBsUhw5opUzmUXhW 0jO6NzpguQdPkQ8xAfVZJhDIqZbVX23QSPZI7f5CaWQvD/WhoVBk0PCdaLf2tN93dOJ4 5YFmVdLoPLWSAIDZqmIkR81wcZSP6uPQMTMQpny0PLNAetKxzIVwHTy62DPYuthzzX1u /K5H7uMDx3hdW68zn6K3aIWaK+E2GG+6xLKlq2KCEFoQzvyU8JHc80Puejqotrx2jSuz ITYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=r5WD71f7sgEVWQcTShnB3B7/HL107QPrgZjiaWw2K7g=; b=hKHWNu70wme2zWTkKZcnP+o2zd0QdjsWVzYBoWeEHouVzV7CApOfYrqvsVIHuMdGX9 BkfD2rPW4vE+Ejv+iqjd4n6nI6572Miebs51uq+JOOWeXxQ3Yao3h9Yc9aHfd7fj2cyt F3URAP6OekdIv35eXhP1EDbxDnEsr1aWGOSLKD/i/tcKr5b+vdQG1GXNQybR0Drhc2ZD mR61o8kuNxgLNWtN6/roAlUHxs9OLM7GSktIJ94+qbGH5exewWTAk0jn1pSQl4eXyRkm lje6RhVef5KdwnPCjNluNZWUoHHklZuFDR6Kj/q1nAHN68ODbPp/hQBP/BFEMz2XA6Qu fIig== X-Gm-Message-State: AMke39kNmw8BjFObIxQ5oF2qAHXNNvRB/LvXw13OS86VzYUIflkzEsLr78pGEzVR0HvtqbNowhOj+uVMA5l2VQ== X-Received: by 10.31.51.68 with SMTP id z65mr11254749vkz.40.1486994830191; Mon, 13 Feb 2017 06:07:10 -0800 (PST) In-Reply-To: <20170213132816.GG1520@brightrain.aerifal.cx> Xref: news.gmane.org gmane.linux.lib.musl.general:11036 Archived-At: --001a1144a44ea2fa1a054869f6e6 Content-Type: text/plain; charset=UTF-8 no, it's on musl, i just tested it with my patches, with vim, stripping will lead to unknown characters. I mean, .mo files under zh_CN/ of vim is GBK set, while zh_CN/ of other apps is UTF-8 set, that meas there may be other apps like vim, we should be more cautious, add a check before map the .mo files, and fail non-UTF8 set in setlocale. Btw, _nl_msg_cat_cntr & _nl_domain_bindings will block apps compiling with the native intl of musl, and after i added a dump for these two symbols, gnu tar showed me segfaults, because he passed a zero msgid1 causing __mo_lookup segfault, we should add a check in dcngettext to avoid it(if (!msgid1) goto notrans;): #2 0x00007ffff7d82a6f in dcngettext (domainname=0x6737a0 "tar", msgid1=0x0, msgid2=0x0, n=1, category=5) at src/locale/dcngettext.c:211 2017-02-13 21:28 GMT+08:00 Rich Felker : > On Mon, Feb 13, 2017 at 04:01:31PM +0800, He X wrote: > > New find, as you can see, zh_CN is different from zh_CN.UTF-8, it's GBK > > codeset, we can't strip .UTF-8 easily, or we will get a lot of junk: > > That's on glibc; your "finding" is irrelevant to musl, where the > encoding for all locales is UTF-8. > > Rich > --001a1144a44ea2fa1a054869f6e6 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
no, it's on musl, i just tested it with my patches, wi= th vim, stripping will lead to unknown characters.
I mean, .mo files un= der zh_CN/ of vim is GBK set, while zh_CN/ of other apps is UTF-8 set, that= meas there may be other apps like vim, we should be more cautious, add a c= heck before map the .mo files, and fail non-UTF8 set in setlocale.

Btw,=C2=A0_nl_msg_cat_cntr & _nl_domain_bindings w= ill block apps compiling with the native intl of musl, and after i added a = dump for these two symbols, gnu tar showed me segfaults, because he passed = a zero msgid1 causing __mo_lookup segfault, we should add a check in dcnget= text to avoid it(if (!msgid1) goto notrans;):
 #2  0x00007ffff7d82a6f in dcngettext (domainname=3D0x6737a0 "tar&q=
uot;, msgid1=3D0x0, msgid2=3D0x0, n=3D1,=20
    category=3D5) at src/locale/dcngettext.c:211

2017-02-13 21:28 GM= T+08:00 Rich Felker <dalias@libc.org>:
On Mon, Feb 13, 2017 at 04:01:31PM +0800, He X = wrote:
> New find, as you can see, zh_CN is different from zh_CN.UTF-8, it'= s GBK
> codeset, we can't strip .UTF-8 easily, or we will get a lot of jun= k:

That's on glibc; your "finding" is irrelevant to musl,= where the
encoding for all locales is UTF-8.

Rich

--001a1144a44ea2fa1a054869f6e6--