From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/11034
Path: news.gmane.org!.POSTED!not-for-mail
From: He X <xw897002528@gmail.com>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: Re: a bug in bindtextdomain() and strip '.UTF-8'
Date: Mon, 13 Feb 2017 16:01:31 +0800
Message-ID: <CAPG2z0-_RPsjSRGMQPKYcMXnfSt_Q=hNwoe9OEP0PRZC8XAXuQ@mail.gmail.com>
References: <20170129140747.GJ1533@brightrain.aerifal.cx> <CAPG2z0_EHU=U0=pkd31b5fPN__-Ly_qd9W9ftK=1e40SDJHX1w@mail.gmail.com>
 <20170129155507.GK1533@brightrain.aerifal.cx> <CAPG2z09NR77cEgUVZD4JkOt0yUyNFrbeoKDvCpD6UQiUi8UBkA@mail.gmail.com>
 <20170129163329.GL1533@brightrain.aerifal.cx> <CAPG2z08tv19E8VzDUYSG6uJqBBiHHuEUnK84WHudjSdQC-6vLg@mail.gmail.com>
 <20170208143147.GY1533@brightrain.aerifal.cx> <CAPG2z0860oCsim2uvw_6je=vPXk_vPBkYqYV2qSMmcVCnoqSOQ@mail.gmail.com>
 <20170211023610.GA1520@brightrain.aerifal.cx> <CAPG2z08yePs-6pqHcoBbMfWRPyXunuT-2Ge_JDWH8E5Y+_0wtw@mail.gmail.com>
 <20170212023422.GE1520@brightrain.aerifal.cx>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: blaine.gmane.org
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary=94eb2c0b771039d79d054864dc78
X-Trace: blaine.gmane.org 1486972926 25069 195.159.176.226 (13 Feb 2017 08:02:06 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Mon, 13 Feb 2017 08:02:06 +0000 (UTC)
To: musl@lists.openwall.com
Original-X-From: musl-return-11049-gllmg-musl=m.gmane.org@lists.openwall.com Mon Feb 13 09:02:01 2017
Return-path: <musl-return-11049-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@m.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by blaine.gmane.org with smtp (Exim 4.84_2)
	(envelope-from <musl-return-11049-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1cdBaD-0006At-O5
	for gllmg-musl@m.gmane.org; Mon, 13 Feb 2017 09:02:01 +0100
Original-Received: (qmail 17663 invoked by uid 550); 13 Feb 2017 08:02:05 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Original-Received: (qmail 17637 invoked from network); 13 Feb 2017 08:02:04 -0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to;
        bh=1VfVY1XaDNsqYGIluGPvlJ5gMLGsGnNbnMF0V9I2OC0=;
        b=JJbL+GYQXhEz+aF2DTD+55tzqTqI9Sdg//0DhBuuZ6xYqvZSZsdaObE9OKSODXu589
         WZpYlaizb6qi2wqwPVhj4TDYoP+qsMm67LYf1XWPVQoDC2YM/q/Rb/TMrcjZc0+Ghf0a
         ehXaVlbM6VlI2drQXY1GcscUsCZ4Iq1jzjNpaihvR47FAOjEdXHxeB0Li3RWOiLeMy23
         hc5hZ2Q368Z/PdYYKzreyUj0/hRFK2vVds9FI+VuF/cXUXGqf4uCYtQ6JqOBlu8q7BbB
         MJqXuNogIRxk11n9zKrWwOs6JXBX/5S8lrIpy1UW3eLoTyK00mC5R/9QZ2j6Lmpm9HwC
         Y/8w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:in-reply-to:references:from:date
         :message-id:subject:to;
        bh=1VfVY1XaDNsqYGIluGPvlJ5gMLGsGnNbnMF0V9I2OC0=;
        b=PGDxVmiBGkGmL9D3oMudtHsWS56K9NaCwi1Dm/GcmpB2N974jy1CvlJlFOTbL898/e
         qEBW5izFQAC9A/DPTfMBEOSeMb4ax2Q+o5vBPC8vxz8bhqVBj9aTMOxUFxr/2ocf5BND
         wUUut2uh6wbiwbQW9S6TBYds/zUXq+WPvZQWAdn2vObfq6GITYswtf2Da4+JVD2Aces6
         R38/JepcXfDM41IcvQG3GUcwzmNOCnULot36p0SdyLOyAE35B1apcec1KkszDzpUyl4V
         8HtPryH6YRq2NbFgbhak39nMiQ1Vb14eYSSmtlrFMmuzw7DF5jF87XPcrFr/8dY0LDtt
         jH5A==
X-Gm-Message-State: AMke39l+//pzQUjBG6wRaBC8AAk25cx0F5UnvOmZ9fPijIzx03skI9833kRWQqaiMYJ9Kl2/6x53bT+5Z0hc0w==
X-Received: by 10.159.32.195 with SMTP id 61mr9429140uaa.147.1486972912258;
 Mon, 13 Feb 2017 00:01:52 -0800 (PST)
In-Reply-To: <20170212023422.GE1520@brightrain.aerifal.cx>
Xref: news.gmane.org gmane.linux.lib.musl.general:11034
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/11034>

--94eb2c0b771039d79d054864dc78
Content-Type: text/plain; charset=UTF-8

New find, as you can see, zh_CN is different from zh_CN.UTF-8, it's GBK
codeset, we can't strip .UTF-8 easily, or we will get a lot of junk:
```
[xhe@xhe-PC ~]$ ls /share/vim/lang/zh_
zh_CN/       zh_CN.UTF-8/ zh_CN.cp936/ zh_TW/       zh_TW.UTF-8/
```
I add this to the loop, and delete strip in setlocale:
+ if (locp = strchr(locbuf, '.')) {
+ *locp = 0;
+ } else if (locp = strchr(locbuf, '@')) {
Now i think we should just fail non-UTF8 codeset in setlocale for safe, and
check the codeset before map that file.

2017-02-12 10:34 GMT+08:00 Rich Felker <dalias@libc.org>:

> On Sat, Feb 11, 2017 at 02:00:56PM +0800, He X wrote:
> > --- a/src/locale/dcngettext.c 2017-02-06 14:39:17.860482624 +0000
> > +++ b/src/locale/dcngettext.c 2017-02-06 14:39:17.860482624 +0000
> > @@ -100,7 +100,8 @@
> >       size_t map_size;
> >       void *volatile plural_rule;
> >       volatile int nplurals;
> > -     char name[];
> > +     struct binding *binding;
> > +     struct __locale_map *lm;
> >  };
>
> As stated in the reply to message body, I think you need the category
> in the keying too, because there can be different .mo files loaded
> depending on which category was requested.
>
> >  static char *dummy_gettextdomain()
> > @@ -120,58 +122,87 @@
> >       struct msgcat *p;
> >       struct __locale_struct *loc = CURRENT_LOCALE;
> >       const struct __locale_map *lm;
> > -     const char *dirname, *locname, *catname;
> > -     size_t dirlen, loclen, catlen, domlen;
> > +     size_t domlen;
> > +     struct binding *q;
> >
> >       if ((unsigned)category >= LC_ALL) goto notrans;
> >
> >       if (!domainname) domainname = __gettextdomain();
> >
> >       domlen = strnlen(domainname, NAME_MAX+1);
> >       if (domlen > NAME_MAX) goto notrans;
> >
> > -     dirname = gettextdir(domainname, &dirlen);
> > -     if (!dirname) goto notrans;
> > +     for (q=bindings; q; q=q->next)
> > +             if (!strcmp(q->domainname, domainname) && q->active)
> > +                     break;
> > +     if (!q) goto notrans;
>
> Looks ok. I had said this should be a function but it really doesn't
> need to be; it's plenty simple inline.
>
> >       lm = loc->cat[category];
> >       if (!lm) {
> >  notrans:
> >               return (char *) ((n == 1) ? msgid1 : msgid2);
> >       }
> > -     locname = lm->name;
> > -
> > -     catname = catnames[category];
> > -     catlen = catlens[category];
> > -     loclen = strlen(locname);
> > -
> > -     size_t namelen = dirlen+1 + loclen+1 + catlen+1 + domlen+3;
> > -     char name[namelen+1], *s = name;
> > -
> > -     memcpy(s, dirname, dirlen);
> > -     s[dirlen] = '/';
> > -     s += dirlen + 1;
> > -     memcpy(s, locname, loclen);
> > -     s[loclen] = '/';
> > -     s += loclen + 1;
> > -     memcpy(s, catname, catlen);
> > -     s[catlen] = '/';
> > -     s += catlen + 1;
> > -     memcpy(s, domainname, domlen);
> > -     s[domlen] = '.';
> > -     s[domlen+1] = 'm';
> > -     s[domlen+2] = 'o';
> > -     s[domlen+3] = 0;
> >
> >       for (p=cats; p; p=p->next)
> > -             if (!strcmp(p->name, name))
> > +             if (p->binding == q && p->lm == lm)
> >                       break;
>
> && p->cat == category
>
> >       if (!p) {
> > +             const char *dirname, *locname, *catname;
> > +             size_t dirlen, loclen, catlen;
> >               void *old_cats;
> >               size_t map_size;
> > +
> > +             dirname = q->dirname;
> > +             locname = lm->name;
> > +             catname = catnames[category];
> > +
> > +             dirlen = q->dirlen;
> > +             loclen = strlen(locname);
> > +             catlen = catlens[category];
>
> Now that these are only computed once rather than per-call, optimizing
> out strlen is probably not worthwhile anymore, but it doesn't really
> hurt either. Not something you need to change, just a comment.
>
> > +
> > +             size_t namelen = dirlen+1 + loclen+1 + catlen+1 + domlen+3;
> > +             char name[namelen+1], *s = name;
> > +             char *str = name;
> > +
> > +             memcpy(s, dirname, dirlen);
> > +             s[dirlen] = '/';
> > +             s += dirlen + 1;
> > +             memcpy(s, locname, loclen);
> > +             s[loclen] = '/';
> > +             s += loclen + 1;
> > +skip_loc:
> > +             memcpy(s, catname, catlen);
> > +             s[catlen] = '/';
> > +             s += catlen + 1;
> > +             memcpy(s, domainname, domlen);
> > +             s[domlen] = '.';
> > +             s[domlen+1] = 'm';
> > +             s[domlen+2] = 'o';
> > +             s[domlen+3] = 0;
>
> Actually, now that this code is not a hot path, it should just be
> using snprintf to construct the pathname, I think. It would be a lot
> simpler and easier to ensure correctness.
>
> > +
> >               const void *map = __map_file(name, &map_size);
> > -             if (!map) goto notrans;
> > +             if (!map) {
> > +                     if (s = strchr(name+dirlen+1, '@')) {
> > +                             *s++ = '/';
> > +                             goto skip_loc;;
> > +                     }
> > +                     if ( str && (s = strchr(name+dirlen+1, '_')) && (s
> < strchr(name+dirlen+1, '/')) ) {
> > +                             if (str = strchr(locname, '@')) {
> > +                                     loclen += locname - str;
> > +                                     memcpy(s, str, loclen);
> > +                                     s[loclen] = '/';
> > +                                     s += loclen + 1;
> > +                                     str = 0;
> > +                                     goto skip_loc;
> > +                             } else {
> > +                                     *s++ = '/';
> > +                                     goto skip_loc;
> > +                             }
> > +                     }
> > +                     goto notrans;
> > +             }
>
> Using snprintf should also make it easy to get rid of the goto/retry
> logic here, perhaps even with a 4-iteration loop and array of which
> format modifications happen on each iteration.
>
> >               p = calloc(sizeof *p + namelen + 1, 1);
> >               if (!p) {
> >                       __munmap((void *)map, map_size);
> >                       goto notrans;
> > @@ -209,7 +209,6 @@
> >               }
> >               p->map = map;
> >               p->map_size = map_size;
> > -             memcpy(p->name, name, namelen+1);
> >               do {
> >                       old_cats = cats;
> >                       p->next = old_cats;
> > --- a/src/locale/locale_map.c 2017-02-06 14:39:17.797148750 +0000
> > +++ b/src/locale/locale_map.c 2017-02-06 14:39:17.797148750 +0000
> > @@ -49,8 +49,8 @@
> >       }
> >
> >       /* Limit name length and forbid leading dot or any slashes. */
> > -     for (n=0; n<LOCALE_NAME_MAX && val[n] && val[n]!='/'; n++);
> > -     if (val[0]=='.' || val[n]) val = "C.UTF-8";
> > +     for (n=0; n<LOCALE_NAME_MAX && val[n] && val[n]!='/' &&
> val[n]!='.'; n++);
> > +     if (val[0]=='.' || (val[n] && val[n]!='.')) val = "C.UTF-8";
> >       int builtin = (val[0]=='C' && !val[1])
> >               || !strcmp(val, "C.UTF-8")
> >               || !strcmp(val, "POSIX");
>
> This looks ok but might still need some tweaks. Should an input like
> "zh_CN.GBK" get treated as "zh_CN" (thus outputting UTF-8 that might
> appear as junk on the user's terminal) or as "C" (no localization)
> with only ASCII characters (safe for the user's terminal), or even
> cause setlocale to fail and return an error so that the application
> can decide what to do? These are not technical comments on your patch
> but policy matters the community should weigh in on.
>
> Rich
>

--94eb2c0b771039d79d054864dc78
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">New find, as you can see, zh_CN is different from zh_CN.UT=
F-8, it&#39;s GBK codeset, we can&#39;t strip .UTF-8 easily, or we will get=
 a lot of junk:<div>```</div><div><div>[xhe@xhe-PC ~]$ ls /share/vim/lang/z=
h_</div><div>zh_CN/ =C2=A0 =C2=A0 =C2=A0 zh_CN.UTF-8/ zh_CN.cp936/ zh_TW/ =
=C2=A0 =C2=A0 =C2=A0 zh_TW.UTF-8/=C2=A0</div></div><div>```</div><div>I add=
 this to the loop, and delete strip in setlocale:</div><div><div>+<span cla=
ss=3D"gmail-Apple-tab-span" style=3D"white-space:pre">			</span>if (locp =
=3D strchr(locbuf, &#39;.&#39;)) {</div><div>+<span class=3D"gmail-Apple-ta=
b-span" style=3D"white-space:pre">				</span>*locp =3D 0;</div><div>+<span =
class=3D"gmail-Apple-tab-span" style=3D"white-space:pre">			</span>} else i=
f (locp =3D strchr(locbuf, &#39;@&#39;)) {</div></div><div>Now i think we s=
hould just fail non-UTF8 codeset in setlocale for safe, and check the codes=
et before map that file.</div></div><div class=3D"gmail_extra"><br><div cla=
ss=3D"gmail_quote">2017-02-12 10:34 GMT+08:00 Rich Felker <span dir=3D"ltr"=
>&lt;<a href=3D"mailto:dalias@libc.org" target=3D"_blank">dalias@libc.org</=
a>&gt;</span>:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .=
8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=3D"">On Sat, F=
eb 11, 2017 at 02:00:56PM +0800, He X wrote:<br>
</span><span class=3D"">&gt; --- a/src/locale/dcngettext.c 2017-02-06 14:39=
:17.860482624 +0000<br>
&gt; +++ b/src/locale/dcngettext.c 2017-02-06 14:39:17.860482624 +0000<br>
</span>&gt; @@ -100,7 +100,8 @@<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t map_size;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0void *volatile plural_rule;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0volatile int nplurals;<br>
&gt; -=C2=A0 =C2=A0 =C2=A0char name[];<br>
&gt; +=C2=A0 =C2=A0 =C2=A0struct binding *binding;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0struct __locale_map *lm;<br>
&gt;=C2=A0 };<br>
<br>
As stated in the reply to message body, I think you need the category<br>
in the keying too, because there can be different .mo files loaded<br>
depending on which category was requested.<br>
<br>
&gt;=C2=A0 static char *dummy_gettextdomain()<br>
&gt; @@ -120,58 +122,87 @@<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0struct msgcat *p;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0struct __locale_struct *loc =3D CURRENT_LOCA=
LE;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0const struct __locale_map *lm;<br>
&gt; -=C2=A0 =C2=A0 =C2=A0const char *dirname, *locname, *catname;<br>
&gt; -=C2=A0 =C2=A0 =C2=A0size_t dirlen, loclen, catlen, domlen;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0size_t domlen;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0struct binding *q;<br>
&gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0if ((unsigned)category &gt;=3D LC_ALL) goto =
notrans;<br>
&gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0if (!domainname) domainname =3D __gettextdom=
ain();<br>
&gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0domlen =3D strnlen(domainname, NAME_MAX+1);<=
br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0if (domlen &gt; NAME_MAX) goto notrans;<br>
&gt;<br>
&gt; -=C2=A0 =C2=A0 =C2=A0dirname =3D gettextdir(domainname, &amp;dirlen);<=
br>
&gt; -=C2=A0 =C2=A0 =C2=A0if (!dirname) goto notrans;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0for (q=3Dbindings; q; q=3Dq-&gt;next)<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!strcmp(q-&gt;dom=
ainname, domainname) &amp;&amp; q-&gt;active)<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0break;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0if (!q) goto notrans;<br>
<br>
Looks ok. I had said this should be a function but it really doesn&#39;t<br=
>
need to be; it&#39;s plenty simple inline.<br>
<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0lm =3D loc-&gt;cat[category];<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0if (!lm) {<br>
&gt;=C2=A0 notrans:<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return (char *) =
((n =3D=3D 1) ? msgid1 : msgid2);<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
&gt; -=C2=A0 =C2=A0 =C2=A0locname =3D lm-&gt;name;<br>
&gt; -<br>
&gt; -=C2=A0 =C2=A0 =C2=A0catname =3D catnames[category];<br>
&gt; -=C2=A0 =C2=A0 =C2=A0catlen =3D catlens[category];<br>
&gt; -=C2=A0 =C2=A0 =C2=A0loclen =3D strlen(locname);<br>
&gt; -<br>
&gt; -=C2=A0 =C2=A0 =C2=A0size_t namelen =3D dirlen+1 + loclen+1 + catlen+1=
 + domlen+3;<br>
&gt; -=C2=A0 =C2=A0 =C2=A0char name[namelen+1], *s =3D name;<br>
&gt; -<br>
&gt; -=C2=A0 =C2=A0 =C2=A0memcpy(s, dirname, dirlen);<br>
&gt; -=C2=A0 =C2=A0 =C2=A0s[dirlen] =3D &#39;/&#39;;<br>
&gt; -=C2=A0 =C2=A0 =C2=A0s +=3D dirlen + 1;<br>
&gt; -=C2=A0 =C2=A0 =C2=A0memcpy(s, locname, loclen);<br>
&gt; -=C2=A0 =C2=A0 =C2=A0s[loclen] =3D &#39;/&#39;;<br>
&gt; -=C2=A0 =C2=A0 =C2=A0s +=3D loclen + 1;<br>
&gt; -=C2=A0 =C2=A0 =C2=A0memcpy(s, catname, catlen);<br>
&gt; -=C2=A0 =C2=A0 =C2=A0s[catlen] =3D &#39;/&#39;;<br>
&gt; -=C2=A0 =C2=A0 =C2=A0s +=3D catlen + 1;<br>
&gt; -=C2=A0 =C2=A0 =C2=A0memcpy(s, domainname, domlen);<br>
&gt; -=C2=A0 =C2=A0 =C2=A0s[domlen] =3D &#39;.&#39;;<br>
&gt; -=C2=A0 =C2=A0 =C2=A0s[domlen+1] =3D &#39;m&#39;;<br>
&gt; -=C2=A0 =C2=A0 =C2=A0s[domlen+2] =3D &#39;o&#39;;<br>
&gt; -=C2=A0 =C2=A0 =C2=A0s[domlen+3] =3D 0;<br>
&gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0for (p=3Dcats; p; p=3Dp-&gt;next)<br>
&gt; -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!strcmp(p-&gt;nam=
e, name))<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (p-&gt;binding =3D=
=3D q &amp;&amp; p-&gt;lm =3D=3D lm)<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0break;<br>
<br>
&amp;&amp; p-&gt;cat =3D=3D category<br>
<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0if (!p) {<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0const char *dirname, =
*locname, *catname;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0size_t dirlen, loclen=
, catlen;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0void *old_cats;<=
br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0size_t map_size;=
<br>
&gt; +<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0dirname =3D q-&gt;dir=
name;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0locname =3D lm-&gt;na=
me;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0catname =3D catnames[=
category];<br>
&gt; +<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0dirlen =3D q-&gt;dirl=
en;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0loclen =3D strlen(loc=
name);<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0catlen =3D catlens[ca=
tegory];<br>
<br>
Now that these are only computed once rather than per-call, optimizing<br>
out strlen is probably not worthwhile anymore, but it doesn&#39;t really<br=
>
hurt either. Not something you need to change, just a comment.<br>
<br>
&gt; +<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0size_t namelen =3D di=
rlen+1 + loclen+1 + catlen+1 + domlen+3;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0char name[namelen+1],=
 *s =3D name;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0char *str =3D name;<b=
r>
&gt; +<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0memcpy(s, dirname, di=
rlen);<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[dirlen] =3D &#39;/&=
#39;;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s +=3D dirlen + 1;<br=
>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0memcpy(s, locname, lo=
clen);<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[loclen] =3D &#39;/&=
#39;;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s +=3D loclen + 1;<br=
>
&gt; +skip_loc:<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0memcpy(s, catname, ca=
tlen);<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[catlen] =3D &#39;/&=
#39;;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s +=3D catlen + 1;<br=
>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0memcpy(s, domainname,=
 domlen);<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[domlen] =3D &#39;.&=
#39;;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[domlen+1] =3D &#39;=
m&#39;;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[domlen+2] =3D &#39;=
o&#39;;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[domlen+3] =3D 0;<br=
>
<br>
Actually, now that this code is not a hot path, it should just be<br>
using snprintf to construct the pathname, I think. It would be a lot<br>
simpler and easier to ensure correctness.<br>
<br>
&gt; +<br>
<span class=3D"">&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0const void *map =3D __map_file(name, &amp;map_size);<br>
&gt; -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!map) goto notran=
s;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!map) {<br>
</span>&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0if (s =3D strchr(name+dirlen+1, &#39;@&#39;)) {<br>
<span class=3D"">&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0*s++ =3D &#39;/&#39;;<b=
r>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0goto skip_loc;;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0}<br>
</span>&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0if ( str &amp;&amp; (s =3D strchr(name+dirlen+1, &#39;_&#39;)=
) &amp;&amp; (s &lt; strchr(name+dirlen+1, &#39;/&#39;)) ) {<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (str =3D strchr(locname, &#39;@&#39;)=
) {<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0loclen +=3D =
locname - str;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0memcpy(s, st=
r, loclen);<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[loclen] =
=3D &#39;/&#39;;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s +=3D locle=
n + 1;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0str =3D 0;<b=
r>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0goto skip_lo=
c;<br>
<span class=3D"">&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0} else {<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0*s++ =3D =
9;/&#39;;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0goto skip_lo=
c;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0}<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0goto notrans;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
<br>
</span>Using snprintf should also make it easy to get rid of the goto/retry=
<br>
logic here, perhaps even with a 4-iteration loop and array of which<br>
format modifications happen on each iteration.<br>
<span class=3D""><br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0p =3D calloc(siz=
eof *p + namelen + 1, 1);<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!p) {<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0__munmap((void *)map, map_size);<br>
</span>&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0goto notrans;<br>
&gt; @@ -209,7 +209,6 @@<br>
<span class=3D"">&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0}<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0p-&gt;map =3D ma=
p;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0p-&gt;map_size =
=3D map_size;<br>
</span>&gt; -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0memcpy(p-&gt;n=
ame, name, namelen+1);<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0do {<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0old_cats =3D cats;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0p-&gt;next =3D old_cats;<br>
<span class=3D"">&gt; --- a/src/locale/locale_map.c 2017-02-06 14:39:<a hre=
f=3D"tel:17.797148750" value=3D"+17797148750">17.797148750</a> +0000<br>
&gt; +++ b/src/locale/locale_map.c 2017-02-06 14:39:<a href=3D"tel:17.79714=
8750" value=3D"+17797148750">17.797148750</a> +0000<br>
</span>&gt; @@ -49,8 +49,8 @@<br>
<span class=3D"">&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
&gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0/* Limit name length and forbid leading dot =
or any slashes. */<br>
</span>&gt; -=C2=A0 =C2=A0 =C2=A0for (n=3D0; n&lt;LOCALE_NAME_MAX &amp;&amp=
; val[n] &amp;&amp; val[n]!=3D&#39;/&#39;; n++);<br>
&gt; -=C2=A0 =C2=A0 =C2=A0if (val[0]=3D=3D&#39;.&#39; || val[n]) val =3D &q=
uot;C.UTF-8&quot;;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0for (n=3D0; n&lt;LOCALE_NAME_MAX &amp;&amp; val[n=
] &amp;&amp; val[n]!=3D&#39;/&#39; &amp;&amp; val[n]!=3D&#39;.&#39;; n++);<=
br>
&gt; +=C2=A0 =C2=A0 =C2=A0if (val[0]=3D=3D&#39;.&#39; || (val[n] &amp;&amp;=
 val[n]!=3D&#39;.&#39;)) val =3D &quot;C.UTF-8&quot;;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0int builtin =3D (val[0]=3D=3D&#39;C&#39; &am=
p;&amp; !val[1])<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|| !strcmp(val, =
&quot;C.UTF-8&quot;)<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|| !strcmp(val, =
&quot;POSIX&quot;);<br>
<br>
This looks ok but might still need some tweaks. Should an input like<br>
&quot;zh_CN.GBK&quot; get treated as &quot;zh_CN&quot; (thus outputting UTF=
-8 that might<br>
appear as junk on the user&#39;s terminal) or as &quot;C&quot; (no localiza=
tion)<br>
with only ASCII characters (safe for the user&#39;s terminal), or even<br>
cause setlocale to fail and return an error so that the application<br>
can decide what to do? These are not technical comments on your patch<br>
but policy matters the community should weigh in on.<br>
<span class=3D"HOEnZb"><font color=3D"#888888"><br>
Rich<br>
</font></span></blockquote></div><br></div>

--94eb2c0b771039d79d054864dc78--