From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/11034 Path: news.gmane.org!.POSTED!not-for-mail From: He X Newsgroups: gmane.linux.lib.musl.general Subject: Re: Re: a bug in bindtextdomain() and strip '.UTF-8' Date: Mon, 13 Feb 2017 16:01:31 +0800 Message-ID: References: <20170129140747.GJ1533@brightrain.aerifal.cx> <20170129155507.GK1533@brightrain.aerifal.cx> <20170129163329.GL1533@brightrain.aerifal.cx> <20170208143147.GY1533@brightrain.aerifal.cx> <20170211023610.GA1520@brightrain.aerifal.cx> <20170212023422.GE1520@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=94eb2c0b771039d79d054864dc78 X-Trace: blaine.gmane.org 1486972926 25069 195.159.176.226 (13 Feb 2017 08:02:06 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Mon, 13 Feb 2017 08:02:06 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-11049-gllmg-musl=m.gmane.org@lists.openwall.com Mon Feb 13 09:02:01 2017 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1cdBaD-0006At-O5 for gllmg-musl@m.gmane.org; Mon, 13 Feb 2017 09:02:01 +0100 Original-Received: (qmail 17663 invoked by uid 550); 13 Feb 2017 08:02:05 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 17637 invoked from network); 13 Feb 2017 08:02:04 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=1VfVY1XaDNsqYGIluGPvlJ5gMLGsGnNbnMF0V9I2OC0=; b=JJbL+GYQXhEz+aF2DTD+55tzqTqI9Sdg//0DhBuuZ6xYqvZSZsdaObE9OKSODXu589 WZpYlaizb6qi2wqwPVhj4TDYoP+qsMm67LYf1XWPVQoDC2YM/q/Rb/TMrcjZc0+Ghf0a ehXaVlbM6VlI2drQXY1GcscUsCZ4Iq1jzjNpaihvR47FAOjEdXHxeB0Li3RWOiLeMy23 hc5hZ2Q368Z/PdYYKzreyUj0/hRFK2vVds9FI+VuF/cXUXGqf4uCYtQ6JqOBlu8q7BbB MJqXuNogIRxk11n9zKrWwOs6JXBX/5S8lrIpy1UW3eLoTyK00mC5R/9QZ2j6Lmpm9HwC Y/8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=1VfVY1XaDNsqYGIluGPvlJ5gMLGsGnNbnMF0V9I2OC0=; b=PGDxVmiBGkGmL9D3oMudtHsWS56K9NaCwi1Dm/GcmpB2N974jy1CvlJlFOTbL898/e qEBW5izFQAC9A/DPTfMBEOSeMb4ax2Q+o5vBPC8vxz8bhqVBj9aTMOxUFxr/2ocf5BND wUUut2uh6wbiwbQW9S6TBYds/zUXq+WPvZQWAdn2vObfq6GITYswtf2Da4+JVD2Aces6 R38/JepcXfDM41IcvQG3GUcwzmNOCnULot36p0SdyLOyAE35B1apcec1KkszDzpUyl4V 8HtPryH6YRq2NbFgbhak39nMiQ1Vb14eYSSmtlrFMmuzw7DF5jF87XPcrFr/8dY0LDtt jH5A== X-Gm-Message-State: AMke39l+//pzQUjBG6wRaBC8AAk25cx0F5UnvOmZ9fPijIzx03skI9833kRWQqaiMYJ9Kl2/6x53bT+5Z0hc0w== X-Received: by 10.159.32.195 with SMTP id 61mr9429140uaa.147.1486972912258; Mon, 13 Feb 2017 00:01:52 -0800 (PST) In-Reply-To: <20170212023422.GE1520@brightrain.aerifal.cx> Xref: news.gmane.org gmane.linux.lib.musl.general:11034 Archived-At: --94eb2c0b771039d79d054864dc78 Content-Type: text/plain; charset=UTF-8 New find, as you can see, zh_CN is different from zh_CN.UTF-8, it's GBK codeset, we can't strip .UTF-8 easily, or we will get a lot of junk: ``` [xhe@xhe-PC ~]$ ls /share/vim/lang/zh_ zh_CN/ zh_CN.UTF-8/ zh_CN.cp936/ zh_TW/ zh_TW.UTF-8/ ``` I add this to the loop, and delete strip in setlocale: + if (locp = strchr(locbuf, '.')) { + *locp = 0; + } else if (locp = strchr(locbuf, '@')) { Now i think we should just fail non-UTF8 codeset in setlocale for safe, and check the codeset before map that file. 2017-02-12 10:34 GMT+08:00 Rich Felker : > On Sat, Feb 11, 2017 at 02:00:56PM +0800, He X wrote: > > --- a/src/locale/dcngettext.c 2017-02-06 14:39:17.860482624 +0000 > > +++ b/src/locale/dcngettext.c 2017-02-06 14:39:17.860482624 +0000 > > @@ -100,7 +100,8 @@ > > size_t map_size; > > void *volatile plural_rule; > > volatile int nplurals; > > - char name[]; > > + struct binding *binding; > > + struct __locale_map *lm; > > }; > > As stated in the reply to message body, I think you need the category > in the keying too, because there can be different .mo files loaded > depending on which category was requested. > > > static char *dummy_gettextdomain() > > @@ -120,58 +122,87 @@ > > struct msgcat *p; > > struct __locale_struct *loc = CURRENT_LOCALE; > > const struct __locale_map *lm; > > - const char *dirname, *locname, *catname; > > - size_t dirlen, loclen, catlen, domlen; > > + size_t domlen; > > + struct binding *q; > > > > if ((unsigned)category >= LC_ALL) goto notrans; > > > > if (!domainname) domainname = __gettextdomain(); > > > > domlen = strnlen(domainname, NAME_MAX+1); > > if (domlen > NAME_MAX) goto notrans; > > > > - dirname = gettextdir(domainname, &dirlen); > > - if (!dirname) goto notrans; > > + for (q=bindings; q; q=q->next) > > + if (!strcmp(q->domainname, domainname) && q->active) > > + break; > > + if (!q) goto notrans; > > Looks ok. I had said this should be a function but it really doesn't > need to be; it's plenty simple inline. > > > lm = loc->cat[category]; > > if (!lm) { > > notrans: > > return (char *) ((n == 1) ? msgid1 : msgid2); > > } > > - locname = lm->name; > > - > > - catname = catnames[category]; > > - catlen = catlens[category]; > > - loclen = strlen(locname); > > - > > - size_t namelen = dirlen+1 + loclen+1 + catlen+1 + domlen+3; > > - char name[namelen+1], *s = name; > > - > > - memcpy(s, dirname, dirlen); > > - s[dirlen] = '/'; > > - s += dirlen + 1; > > - memcpy(s, locname, loclen); > > - s[loclen] = '/'; > > - s += loclen + 1; > > - memcpy(s, catname, catlen); > > - s[catlen] = '/'; > > - s += catlen + 1; > > - memcpy(s, domainname, domlen); > > - s[domlen] = '.'; > > - s[domlen+1] = 'm'; > > - s[domlen+2] = 'o'; > > - s[domlen+3] = 0; > > > > for (p=cats; p; p=p->next) > > - if (!strcmp(p->name, name)) > > + if (p->binding == q && p->lm == lm) > > break; > > && p->cat == category > > > if (!p) { > > + const char *dirname, *locname, *catname; > > + size_t dirlen, loclen, catlen; > > void *old_cats; > > size_t map_size; > > + > > + dirname = q->dirname; > > + locname = lm->name; > > + catname = catnames[category]; > > + > > + dirlen = q->dirlen; > > + loclen = strlen(locname); > > + catlen = catlens[category]; > > Now that these are only computed once rather than per-call, optimizing > out strlen is probably not worthwhile anymore, but it doesn't really > hurt either. Not something you need to change, just a comment. > > > + > > + size_t namelen = dirlen+1 + loclen+1 + catlen+1 + domlen+3; > > + char name[namelen+1], *s = name; > > + char *str = name; > > + > > + memcpy(s, dirname, dirlen); > > + s[dirlen] = '/'; > > + s += dirlen + 1; > > + memcpy(s, locname, loclen); > > + s[loclen] = '/'; > > + s += loclen + 1; > > +skip_loc: > > + memcpy(s, catname, catlen); > > + s[catlen] = '/'; > > + s += catlen + 1; > > + memcpy(s, domainname, domlen); > > + s[domlen] = '.'; > > + s[domlen+1] = 'm'; > > + s[domlen+2] = 'o'; > > + s[domlen+3] = 0; > > Actually, now that this code is not a hot path, it should just be > using snprintf to construct the pathname, I think. It would be a lot > simpler and easier to ensure correctness. > > > + > > const void *map = __map_file(name, &map_size); > > - if (!map) goto notrans; > > + if (!map) { > > + if (s = strchr(name+dirlen+1, '@')) { > > + *s++ = '/'; > > + goto skip_loc;; > > + } > > + if ( str && (s = strchr(name+dirlen+1, '_')) && (s > < strchr(name+dirlen+1, '/')) ) { > > + if (str = strchr(locname, '@')) { > > + loclen += locname - str; > > + memcpy(s, str, loclen); > > + s[loclen] = '/'; > > + s += loclen + 1; > > + str = 0; > > + goto skip_loc; > > + } else { > > + *s++ = '/'; > > + goto skip_loc; > > + } > > + } > > + goto notrans; > > + } > > Using snprintf should also make it easy to get rid of the goto/retry > logic here, perhaps even with a 4-iteration loop and array of which > format modifications happen on each iteration. > > > p = calloc(sizeof *p + namelen + 1, 1); > > if (!p) { > > __munmap((void *)map, map_size); > > goto notrans; > > @@ -209,7 +209,6 @@ > > } > > p->map = map; > > p->map_size = map_size; > > - memcpy(p->name, name, namelen+1); > > do { > > old_cats = cats; > > p->next = old_cats; > > --- a/src/locale/locale_map.c 2017-02-06 14:39:17.797148750 +0000 > > +++ b/src/locale/locale_map.c 2017-02-06 14:39:17.797148750 +0000 > > @@ -49,8 +49,8 @@ > > } > > > > /* Limit name length and forbid leading dot or any slashes. */ > > - for (n=0; n > - if (val[0]=='.' || val[n]) val = "C.UTF-8"; > > + for (n=0; n val[n]!='.'; n++); > > + if (val[0]=='.' || (val[n] && val[n]!='.')) val = "C.UTF-8"; > > int builtin = (val[0]=='C' && !val[1]) > > || !strcmp(val, "C.UTF-8") > > || !strcmp(val, "POSIX"); > > This looks ok but might still need some tweaks. Should an input like > "zh_CN.GBK" get treated as "zh_CN" (thus outputting UTF-8 that might > appear as junk on the user's terminal) or as "C" (no localization) > with only ASCII characters (safe for the user's terminal), or even > cause setlocale to fail and return an error so that the application > can decide what to do? These are not technical comments on your patch > but policy matters the community should weigh in on. > > Rich > --94eb2c0b771039d79d054864dc78 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
New find, as you can see, zh_CN is different from zh_CN.UT= F-8, it's GBK codeset, we can't strip .UTF-8 easily, or we will get= a lot of junk:
```
[xhe@xhe-PC ~]$ ls /share/vim/lang/z= h_
zh_CN/ =C2=A0 =C2=A0 =C2=A0 zh_CN.UTF-8/ zh_CN.cp936/ zh_TW/ = =C2=A0 =C2=A0 =C2=A0 zh_TW.UTF-8/=C2=A0
```
I add= this to the loop, and delete strip in setlocale:
+ if (locp = =3D strchr(locbuf, '.')) {
+ *locp =3D 0;
+ } else i= f (locp =3D strchr(locbuf, '@')) {
Now i think we s= hould just fail non-UTF8 codeset in setlocale for safe, and check the codes= et before map that file.

2017-02-12 10:34 GMT+08:00 Rich Felker <dalias@libc.org>:
On Sat, F= eb 11, 2017 at 02:00:56PM +0800, He X wrote:
> --- a/src/locale/dcngettext.c 2017-02-06 14:39= :17.860482624 +0000
> +++ b/src/locale/dcngettext.c 2017-02-06 14:39:17.860482624 +0000
> @@ -100,7 +100,8 @@
>=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t map_size;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0void *volatile plural_rule;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0volatile int nplurals;
> -=C2=A0 =C2=A0 =C2=A0char name[];
> +=C2=A0 =C2=A0 =C2=A0struct binding *binding;
> +=C2=A0 =C2=A0 =C2=A0struct __locale_map *lm;
>=C2=A0 };

As stated in the reply to message body, I think you need the category
in the keying too, because there can be different .mo files loaded
depending on which category was requested.

>=C2=A0 static char *dummy_gettextdomain()
> @@ -120,58 +122,87 @@
>=C2=A0 =C2=A0 =C2=A0 =C2=A0struct msgcat *p;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0struct __locale_struct *loc =3D CURRENT_LOCA= LE;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0const struct __locale_map *lm;
> -=C2=A0 =C2=A0 =C2=A0const char *dirname, *locname, *catname;
> -=C2=A0 =C2=A0 =C2=A0size_t dirlen, loclen, catlen, domlen;
> +=C2=A0 =C2=A0 =C2=A0size_t domlen;
> +=C2=A0 =C2=A0 =C2=A0struct binding *q;
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0if ((unsigned)category >=3D LC_ALL) goto = notrans;
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0if (!domainname) domainname =3D __gettextdom= ain();
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0domlen =3D strnlen(domainname, NAME_MAX+1);<= br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0if (domlen > NAME_MAX) goto notrans;
>
> -=C2=A0 =C2=A0 =C2=A0dirname =3D gettextdir(domainname, &dirlen);<= br> > -=C2=A0 =C2=A0 =C2=A0if (!dirname) goto notrans;
> +=C2=A0 =C2=A0 =C2=A0for (q=3Dbindings; q; q=3Dq->next)
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!strcmp(q->dom= ainname, domainname) && q->active)
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0break;
> +=C2=A0 =C2=A0 =C2=A0if (!q) goto notrans;

Looks ok. I had said this should be a function but it really doesn't need to be; it's plenty simple inline.

>=C2=A0 =C2=A0 =C2=A0 =C2=A0lm =3D loc->cat[category];
>=C2=A0 =C2=A0 =C2=A0 =C2=A0if (!lm) {
>=C2=A0 notrans:
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return (char *) = ((n =3D=3D 1) ? msgid1 : msgid2);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0}
> -=C2=A0 =C2=A0 =C2=A0locname =3D lm->name;
> -
> -=C2=A0 =C2=A0 =C2=A0catname =3D catnames[category];
> -=C2=A0 =C2=A0 =C2=A0catlen =3D catlens[category];
> -=C2=A0 =C2=A0 =C2=A0loclen =3D strlen(locname);
> -
> -=C2=A0 =C2=A0 =C2=A0size_t namelen =3D dirlen+1 + loclen+1 + catlen+1= + domlen+3;
> -=C2=A0 =C2=A0 =C2=A0char name[namelen+1], *s =3D name;
> -
> -=C2=A0 =C2=A0 =C2=A0memcpy(s, dirname, dirlen);
> -=C2=A0 =C2=A0 =C2=A0s[dirlen] =3D '/';
> -=C2=A0 =C2=A0 =C2=A0s +=3D dirlen + 1;
> -=C2=A0 =C2=A0 =C2=A0memcpy(s, locname, loclen);
> -=C2=A0 =C2=A0 =C2=A0s[loclen] =3D '/';
> -=C2=A0 =C2=A0 =C2=A0s +=3D loclen + 1;
> -=C2=A0 =C2=A0 =C2=A0memcpy(s, catname, catlen);
> -=C2=A0 =C2=A0 =C2=A0s[catlen] =3D '/';
> -=C2=A0 =C2=A0 =C2=A0s +=3D catlen + 1;
> -=C2=A0 =C2=A0 =C2=A0memcpy(s, domainname, domlen);
> -=C2=A0 =C2=A0 =C2=A0s[domlen] =3D '.';
> -=C2=A0 =C2=A0 =C2=A0s[domlen+1] =3D 'm';
> -=C2=A0 =C2=A0 =C2=A0s[domlen+2] =3D 'o';
> -=C2=A0 =C2=A0 =C2=A0s[domlen+3] =3D 0;
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0for (p=3Dcats; p; p=3Dp->next)
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!strcmp(p->nam= e, name))
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (p->binding =3D= =3D q && p->lm =3D=3D lm)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0break;

&& p->cat =3D=3D category

>=C2=A0 =C2=A0 =C2=A0 =C2=A0if (!p) {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0const char *dirname, = *locname, *catname;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0size_t dirlen, loclen= , catlen;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0void *old_cats;<= br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0size_t map_size;=
> +
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0dirname =3D q->dir= name;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0locname =3D lm->na= me;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0catname =3D catnames[= category];
> +
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0dirlen =3D q->dirl= en;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0loclen =3D strlen(loc= name);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0catlen =3D catlens[ca= tegory];

Now that these are only computed once rather than per-call, optimizing
out strlen is probably not worthwhile anymore, but it doesn't really hurt either. Not something you need to change, just a comment.

> +
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0size_t namelen =3D di= rlen+1 + loclen+1 + catlen+1 + domlen+3;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0char name[namelen+1],= *s =3D name;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0char *str =3D name; > +
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0memcpy(s, dirname, di= rlen);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[dirlen] =3D '/&= #39;;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s +=3D dirlen + 1; > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0memcpy(s, locname, lo= clen);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[loclen] =3D '/&= #39;;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s +=3D loclen + 1; > +skip_loc:
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0memcpy(s, catname, ca= tlen);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[catlen] =3D '/&= #39;;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s +=3D catlen + 1; > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0memcpy(s, domainname,= domlen);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[domlen] =3D '.&= #39;;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[domlen+1] =3D '= m';
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[domlen+2] =3D '= o';
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[domlen+3] =3D 0;
Actually, now that this code is not a hot path, it should just be
using snprintf to construct the pathname, I think. It would be a lot
simpler and easier to ensure correctness.

> +
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0const void *map =3D __map_file(name, &map_size);
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!map) goto notran= s;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!map) {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0if (s =3D strchr(name+dirlen+1, '@')) {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0*s++ =3D '/'; > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0goto skip_loc;;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0}
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0if ( str && (s =3D strchr(name+dirlen+1, '_')= ) && (s < strchr(name+dirlen+1, '/')) ) {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (str =3D strchr(locname, '@')= ) {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0loclen +=3D = locname - str;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0memcpy(s, st= r, loclen);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[loclen] = =3D '/';
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s +=3D locle= n + 1;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0str =3D 0; > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0goto skip_lo= c;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0} else {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0*s++ =3D = 9;/';
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0goto skip_lo= c;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0}
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0goto notrans;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}

Using snprintf should also make it easy to get rid of the goto/retry=
logic here, perhaps even with a 4-iteration loop and array of which
format modifications happen on each iteration.

>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0p =3D calloc(siz= eof *p + namelen + 1, 1);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!p) {
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0__munmap((void *)map, map_size);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0goto notrans;
> @@ -209,7 +209,6 @@
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0}
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0p->map =3D ma= p;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0p->map_size = =3D map_size;
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0memcpy(p->n= ame, name, namelen+1);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0do {
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0old_cats =3D cats;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0p->next =3D old_cats;
> --- a/src/locale/locale_map.c 2017-02-06 14:39:
17.797148750 +0000
> +++ b/src/locale/locale_map.c 2017-02-06 14:39:17.797148750 +0000
> @@ -49,8 +49,8 @@
>=C2=A0 =C2=A0 =C2=A0 =C2=A0}
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0/* Limit name length and forbid leading dot = or any slashes. */
> -=C2=A0 =C2=A0 =C2=A0for (n=3D0; n<LOCALE_NAME_MAX &&= ; val[n] && val[n]!=3D'/'; n++);
> -=C2=A0 =C2=A0 =C2=A0if (val[0]=3D=3D'.' || val[n]) val =3D &q= uot;C.UTF-8";
> +=C2=A0 =C2=A0 =C2=A0for (n=3D0; n<LOCALE_NAME_MAX && val[n= ] && val[n]!=3D'/' && val[n]!=3D'.'; n++);<= br> > +=C2=A0 =C2=A0 =C2=A0if (val[0]=3D=3D'.' || (val[n] &&= val[n]!=3D'.')) val =3D "C.UTF-8";
>=C2=A0 =C2=A0 =C2=A0 =C2=A0int builtin =3D (val[0]=3D=3D'C' &am= p;& !val[1])
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|| !strcmp(val, = "C.UTF-8")
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|| !strcmp(val, = "POSIX");

This looks ok but might still need some tweaks. Should an input like
"zh_CN.GBK" get treated as "zh_CN" (thus outputting UTF= -8 that might
appear as junk on the user's terminal) or as "C" (no localiza= tion)
with only ASCII characters (safe for the user's terminal), or even
cause setlocale to fail and return an error so that the application
can decide what to do? These are not technical comments on your patch
but policy matters the community should weigh in on.

Rich

--94eb2c0b771039d79d054864dc78--