From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/11030 Path: news.gmane.org!.POSTED!not-for-mail From: He X Newsgroups: gmane.linux.lib.musl.general Subject: Re: Re: a bug in bindtextdomain() and strip '.UTF-8' Date: Sun, 12 Feb 2017 15:11:08 +0800 Message-ID: References: <20170129140747.GJ1533@brightrain.aerifal.cx> <20170129155507.GK1533@brightrain.aerifal.cx> <20170129163329.GL1533@brightrain.aerifal.cx> <20170208143147.GY1533@brightrain.aerifal.cx> <20170211023610.GA1520@brightrain.aerifal.cx> <20170212023422.GE1520@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=001a114324fe34466c0548500a7b X-Trace: blaine.gmane.org 1486883504 32230 195.159.176.226 (12 Feb 2017 07:11:44 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sun, 12 Feb 2017 07:11:44 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-11045-gllmg-musl=m.gmane.org@lists.openwall.com Sun Feb 12 08:11:39 2017 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1ccoJu-000844-JL for gllmg-musl@m.gmane.org; Sun, 12 Feb 2017 08:11:38 +0100 Original-Received: (qmail 8050 invoked by uid 550); 12 Feb 2017 07:11:41 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 8032 invoked from network); 12 Feb 2017 07:11:41 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=czmwJ9Tmm0PC7rvdLVMLO7RUcGjrkidpC5CbK4X86r8=; b=ii/Lrip2n9wbzcGNWYOCKHewVP/EdxlzxubxEU2OMRY3XWKL42O95pjUygAkFVzKvq yt3vD1NnjYxUQS/x/+NHk+ioZu097prGzlAd3qHJQzhDAdY6TtAUpVsuGWCCIwccaoLK E+G6xAxXcJJaxucn35BD4pDFBrsqs0zkUfk3sLWWqotc0rjwATUj8fBxMjqLsui8oMFv tCpW9asAiJukDPLXtUUp23nQb5cECMYCPkvbrbzLfix57mDVMIvbbir81y/bHtwhOare YRUTFVxW90PtJ+xitLdk6fjSZYNOUk9smnMPyt9ZpKlkOr15Xn6E70ApxdqO8o2GujaF Nkxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=czmwJ9Tmm0PC7rvdLVMLO7RUcGjrkidpC5CbK4X86r8=; b=UT72JFQ5+KONcqA2GOWtCdl81Ip1GJRedcY6okwKHL40AyYzzaRBHKG0O5oD7pStL9 9ql9KJaNgpNDPWYUJ5bjK8aHTp0m9cGtahVRzNUqZF6PLApSkZRId+oAADhubUie0CGp 1uZYPUQaL94rOMZfu4srT6AMovmY3CvyX1/PcQABvd+btFYNNvjeA8FojjIrZ8fTEExP ZB8hYQV3Cr3mrmVEk/p/6LexWF/NxRLMLP/AWtNbMHSWmUlxYl/nkIe0tCuWQl8UBLpa 5iz/fOccW2pjpmvC68PlrMr7JLQ9LixQK7e+sIY8EvbYZwx1BpiNgc3zK8CqP+BtImE1 2E1A== X-Gm-Message-State: AMke39lJAILmqIilulblgi3STLQI9mzTGVmsv4iBiQN2MrbYVHi9dpi6ZeNkigQKUvTZdAdwI8hqI67WFCEUOg== X-Received: by 10.31.160.3 with SMTP id j3mr8392166vke.92.1486883489331; Sat, 11 Feb 2017 23:11:29 -0800 (PST) In-Reply-To: Xref: news.gmane.org gmane.linux.lib.musl.general:11030 Archived-At: --001a114324fe34466c0548500a7b Content-Type: text/plain; charset=UTF-8 sry, i uploaded the wront patch, it should be int cat, rather than struct msgcat :/ 2017-02-12 14:56 GMT+08:00 He X : > 1. cat is added to the keys, also do a validate > 2. so we what do we deal with the gettextdir() exactly? inline it or > construct a gettextpointer()? > 3. i added a extra locbuf array, and goto is replaced by a loop, memcpy is > replaced by snprintf, compiled, and working well with fcitx > 4. i just found that i forgot to store the keys to new buffer, it's ok to > just use normal expression? or we need atomic operations? > ``` > + p->cat = category; > + p->binding = q; > + p->lm = lm; > ``` > 5. I do want to rewrite all to .UTF8, but it's a bit annoying as your > words, then i changed the code to simply strip. > > > (safe for the user's terminal) > LANG is set by users who are using musl and it's modified to zh_CN at > setlocale(), app will use UTF8 directly, there's no such situation where > charset will cause troubles to users' terminal, except apps which get the > LANG manually by getenv(). I have not seen such strange applications so > far, and most apps only have the UTF8 translation files. > > For moving from glibc to musl, i think doing this way is good for now, we > could delete it later, or just keep it forever. And most people won't use > non-UTF8 at all, if they do use GBK, their app will even fallback to UTF8, > because no translation files for GBK. So, it's not so dagerous, i think > :). > > And for developers, they should not use setlocale to detect the charset, > this is wrong, nl_langinfo is the correct way. If they use, stripping will > let their app know something went wrong. > > Strip .GBK or .UTF-8, so users would be happy that their old settings are > working, developers will notice their mistakes that using setlocale() to > validate charset is wrong. We get a lot more than failing the setlocale() > and return C, the only bad thing is we need to care about a almost > impossible event: an app directly getenv(). > > 2017-02-12 10:34 GMT+08:00 Rich Felker : > >> On Sat, Feb 11, 2017 at 02:00:56PM +0800, He X wrote: >> > --- a/src/locale/dcngettext.c 2017-02-06 14:39:17.860482624 +0000 >> > +++ b/src/locale/dcngettext.c 2017-02-06 14:39:17.860482624 +0000 >> > @@ -100,7 +100,8 @@ >> > size_t map_size; >> > void *volatile plural_rule; >> > volatile int nplurals; >> > - char name[]; >> > + struct binding *binding; >> > + struct __locale_map *lm; >> > }; >> >> As stated in the reply to message body, I think you need the category >> in the keying too, because there can be different .mo files loaded >> depending on which category was requested. >> >> > static char *dummy_gettextdomain() >> > @@ -120,58 +122,87 @@ >> > struct msgcat *p; >> > struct __locale_struct *loc = CURRENT_LOCALE; >> > const struct __locale_map *lm; >> > - const char *dirname, *locname, *catname; >> > - size_t dirlen, loclen, catlen, domlen; >> > + size_t domlen; >> > + struct binding *q; >> > >> > if ((unsigned)category >= LC_ALL) goto notrans; >> > >> > if (!domainname) domainname = __gettextdomain(); >> > >> > domlen = strnlen(domainname, NAME_MAX+1); >> > if (domlen > NAME_MAX) goto notrans; >> > >> > - dirname = gettextdir(domainname, &dirlen); >> > - if (!dirname) goto notrans; >> > + for (q=bindings; q; q=q->next) >> > + if (!strcmp(q->domainname, domainname) && q->active) >> > + break; >> > + if (!q) goto notrans; >> >> Looks ok. I had said this should be a function but it really doesn't >> need to be; it's plenty simple inline. >> >> > lm = loc->cat[category]; >> > if (!lm) { >> > notrans: >> > return (char *) ((n == 1) ? msgid1 : msgid2); >> > } >> > - locname = lm->name; >> > - >> > - catname = catnames[category]; >> > - catlen = catlens[category]; >> > - loclen = strlen(locname); >> > - >> > - size_t namelen = dirlen+1 + loclen+1 + catlen+1 + domlen+3; >> > - char name[namelen+1], *s = name; >> > - >> > - memcpy(s, dirname, dirlen); >> > - s[dirlen] = '/'; >> > - s += dirlen + 1; >> > - memcpy(s, locname, loclen); >> > - s[loclen] = '/'; >> > - s += loclen + 1; >> > - memcpy(s, catname, catlen); >> > - s[catlen] = '/'; >> > - s += catlen + 1; >> > - memcpy(s, domainname, domlen); >> > - s[domlen] = '.'; >> > - s[domlen+1] = 'm'; >> > - s[domlen+2] = 'o'; >> > - s[domlen+3] = 0; >> > >> > for (p=cats; p; p=p->next) >> > - if (!strcmp(p->name, name)) >> > + if (p->binding == q && p->lm == lm) >> > break; >> >> && p->cat == category >> >> > if (!p) { >> > + const char *dirname, *locname, *catname; >> > + size_t dirlen, loclen, catlen; >> > void *old_cats; >> > size_t map_size; >> > + >> > + dirname = q->dirname; >> > + locname = lm->name; >> > + catname = catnames[category]; >> > + >> > + dirlen = q->dirlen; >> > + loclen = strlen(locname); >> > + catlen = catlens[category]; >> >> Now that these are only computed once rather than per-call, optimizing >> out strlen is probably not worthwhile anymore, but it doesn't really >> hurt either. Not something you need to change, just a comment. >> >> > + >> > + size_t namelen = dirlen+1 + loclen+1 + catlen+1 + >> domlen+3; >> > + char name[namelen+1], *s = name; >> > + char *str = name; >> > + >> > + memcpy(s, dirname, dirlen); >> > + s[dirlen] = '/'; >> > + s += dirlen + 1; >> > + memcpy(s, locname, loclen); >> > + s[loclen] = '/'; >> > + s += loclen + 1; >> > +skip_loc: >> > + memcpy(s, catname, catlen); >> > + s[catlen] = '/'; >> > + s += catlen + 1; >> > + memcpy(s, domainname, domlen); >> > + s[domlen] = '.'; >> > + s[domlen+1] = 'm'; >> > + s[domlen+2] = 'o'; >> > + s[domlen+3] = 0; >> >> Actually, now that this code is not a hot path, it should just be >> using snprintf to construct the pathname, I think. It would be a lot >> simpler and easier to ensure correctness. >> >> > + >> > const void *map = __map_file(name, &map_size); >> > - if (!map) goto notrans; >> > + if (!map) { >> > + if (s = strchr(name+dirlen+1, '@')) { >> > + *s++ = '/'; >> > + goto skip_loc;; >> > + } >> > + if ( str && (s = strchr(name+dirlen+1, '_')) && >> (s < strchr(name+dirlen+1, '/')) ) { >> > + if (str = strchr(locname, '@')) { >> > + loclen += locname - str; >> > + memcpy(s, str, loclen); >> > + s[loclen] = '/'; >> > + s += loclen + 1; >> > + str = 0; >> > + goto skip_loc; >> > + } else { >> > + *s++ = '/'; >> > + goto skip_loc; >> > + } >> > + } >> > + goto notrans; >> > + } >> >> Using snprintf should also make it easy to get rid of the goto/retry >> logic here, perhaps even with a 4-iteration loop and array of which >> format modifications happen on each iteration. >> >> > p = calloc(sizeof *p + namelen + 1, 1); >> > if (!p) { >> > __munmap((void *)map, map_size); >> > goto notrans; >> > @@ -209,7 +209,6 @@ >> > } >> > p->map = map; >> > p->map_size = map_size; >> > - memcpy(p->name, name, namelen+1); >> > do { >> > old_cats = cats; >> > p->next = old_cats; >> > --- a/src/locale/locale_map.c 2017-02-06 14:39:17.797148750 +0000 >> > +++ b/src/locale/locale_map.c 2017-02-06 14:39:17.797148750 +0000 >> > @@ -49,8 +49,8 @@ >> > } >> > >> > /* Limit name length and forbid leading dot or any slashes. */ >> > - for (n=0; n> > - if (val[0]=='.' || val[n]) val = "C.UTF-8"; >> > + for (n=0; n> val[n]!='.'; n++); >> > + if (val[0]=='.' || (val[n] && val[n]!='.')) val = "C.UTF-8"; >> > int builtin = (val[0]=='C' && !val[1]) >> > || !strcmp(val, "C.UTF-8") >> > || !strcmp(val, "POSIX"); >> >> This looks ok but might still need some tweaks. Should an input like >> "zh_CN.GBK" get treated as "zh_CN" (thus outputting UTF-8 that might >> appear as junk on the user's terminal) or as "C" (no localization) >> with only ASCII characters (safe for the user's terminal), or even >> cause setlocale to fail and return an error so that the application >> can decide what to do? These are not technical comments on your patch >> but policy matters the community should weigh in on. >> >> Rich >> > > --001a114324fe34466c0548500a7b Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
sry, i uploaded the wront patch, it should be int cat, rat= her than struct msgcat :/

2017-02-12 14:56 GMT+08:00 He X <xw897002528@gmail.com>:
1. cat is= added to the keys, also do a validate
2. so we what do we deal with th= e gettextdir() exactly? inline it or construct a gettextpointer()?
3. i added a extra locbuf array, and goto is replaced by a loop, memcpy i= s replaced by snprintf, compiled, and working well with fcitx
4. = i just found that i forgot to store the keys to new buffer, it's ok to = just use normal expression? or we need atomic operations?
```
+ p->cat =3D category;
+ p->binding =3D q;
+ p->lm =3D lm;
```
5.=C2=A0 I do want to rewr= ite all to .UTF8, but it's a bit annoying as your words, then i changed= the code to simply strip.=C2=A0

= >=C2=A0=C2=A0(safe for the user's terminal)
LANG i= s set by users who are using musl and it's modified to zh_CN at setloca= le(), app will use UTF8 directly, there's no such situation where chars= et will cause troubles to users' terminal, except apps which get the LA= NG manually by getenv(). I have not seen such strange applications so far, = and most apps only have the UTF8 translation files.

For moving from glibc to musl, i think doing this way is good for now, we= could delete it later, or just keep it forever. And most people won't = use non-UTF8 at all, if they do use GBK, their app will even fallback to UT= F8, because no translation files for GBK. So, it's n= ot so dagerous, i think :).

And for developers, = =C2=A0they should not use setlocale to detect the charset, this is wrong, n= l_langinfo is the correct way. If they use, stripping will let their app kn= ow something went wrong.

Strip .GBK or .UTF-8= , so users would be happy that their old settings are working, developers w= ill notice their mistakes that using setlocale() to validate charset is wro= ng. We get a lot more than failing the setlocale() and return C, the only b= ad thing is we need to care about a almost impossible event: an app directl= y getenv().

2017-02-12 10:34 GMT+08= :00 Rich Felker <dalias@libc.org>:
On Sat, Feb 11, 2017 at 02:00:56PM +0800, He X wrote:
> --- a/src/locale/dcngettext.c 2017-02-06 14:39:17.8604826= 24 +0000
> +++ b/src/locale/dcngettext.c 2017-02-06 14:39:17.860482624 +0000
> @@ -100,7 +100,8 @@
>=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t map_size;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0void *volatile plural_rule;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0volatile int nplurals;
> -=C2=A0 =C2=A0 =C2=A0char name[];
> +=C2=A0 =C2=A0 =C2=A0struct binding *binding;
> +=C2=A0 =C2=A0 =C2=A0struct __locale_map *lm;
>=C2=A0 };

As stated in the reply to message body, I think you need the category
in the keying too, because there can be different .mo files loaded
depending on which category was requested.

>=C2=A0 static char *dummy_gettextdomain()
> @@ -120,58 +122,87 @@
>=C2=A0 =C2=A0 =C2=A0 =C2=A0struct msgcat *p;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0struct __locale_struct *loc =3D CURRENT_LOCA= LE;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0const struct __locale_map *lm;
> -=C2=A0 =C2=A0 =C2=A0const char *dirname, *locname, *catname;
> -=C2=A0 =C2=A0 =C2=A0size_t dirlen, loclen, catlen, domlen;
> +=C2=A0 =C2=A0 =C2=A0size_t domlen;
> +=C2=A0 =C2=A0 =C2=A0struct binding *q;
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0if ((unsigned)category >=3D LC_ALL) goto = notrans;
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0if (!domainname) domainname =3D __gettextdom= ain();
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0domlen =3D strnlen(domainname, NAME_MAX+1);<= br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0if (domlen > NAME_MAX) goto notrans;
>
> -=C2=A0 =C2=A0 =C2=A0dirname =3D gettextdir(domainname, &dirlen);<= br> > -=C2=A0 =C2=A0 =C2=A0if (!dirname) goto notrans;
> +=C2=A0 =C2=A0 =C2=A0for (q=3Dbindings; q; q=3Dq->next)
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!strcmp(q->dom= ainname, domainname) && q->active)
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0break;
> +=C2=A0 =C2=A0 =C2=A0if (!q) goto notrans;

Looks ok. I had said this should be a function but it really doesn't need to be; it's plenty simple inline.

>=C2=A0 =C2=A0 =C2=A0 =C2=A0lm =3D loc->cat[category];
>=C2=A0 =C2=A0 =C2=A0 =C2=A0if (!lm) {
>=C2=A0 notrans:
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return (char *) = ((n =3D=3D 1) ? msgid1 : msgid2);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0}
> -=C2=A0 =C2=A0 =C2=A0locname =3D lm->name;
> -
> -=C2=A0 =C2=A0 =C2=A0catname =3D catnames[category];
> -=C2=A0 =C2=A0 =C2=A0catlen =3D catlens[category];
> -=C2=A0 =C2=A0 =C2=A0loclen =3D strlen(locname);
> -
> -=C2=A0 =C2=A0 =C2=A0size_t namelen =3D dirlen+1 + loclen+1 + catlen+1= + domlen+3;
> -=C2=A0 =C2=A0 =C2=A0char name[namelen+1], *s =3D name;
> -
> -=C2=A0 =C2=A0 =C2=A0memcpy(s, dirname, dirlen);
> -=C2=A0 =C2=A0 =C2=A0s[dirlen] =3D '/';
> -=C2=A0 =C2=A0 =C2=A0s +=3D dirlen + 1;
> -=C2=A0 =C2=A0 =C2=A0memcpy(s, locname, loclen);
> -=C2=A0 =C2=A0 =C2=A0s[loclen] =3D '/';
> -=C2=A0 =C2=A0 =C2=A0s +=3D loclen + 1;
> -=C2=A0 =C2=A0 =C2=A0memcpy(s, catname, catlen);
> -=C2=A0 =C2=A0 =C2=A0s[catlen] =3D '/';
> -=C2=A0 =C2=A0 =C2=A0s +=3D catlen + 1;
> -=C2=A0 =C2=A0 =C2=A0memcpy(s, domainname, domlen);
> -=C2=A0 =C2=A0 =C2=A0s[domlen] =3D '.';
> -=C2=A0 =C2=A0 =C2=A0s[domlen+1] =3D 'm';
> -=C2=A0 =C2=A0 =C2=A0s[domlen+2] =3D 'o';
> -=C2=A0 =C2=A0 =C2=A0s[domlen+3] =3D 0;
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0for (p=3Dcats; p; p=3Dp->next)
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!strcmp(p->nam= e, name))
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (p->binding =3D= =3D q && p->lm =3D=3D lm)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0break;

&& p->cat =3D=3D category

>=C2=A0 =C2=A0 =C2=A0 =C2=A0if (!p) {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0const char *dirname, = *locname, *catname;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0size_t dirlen, loclen= , catlen;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0void *old_cats;<= br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0size_t map_size;=
> +
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0dirname =3D q->dir= name;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0locname =3D lm->na= me;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0catname =3D catnames[= category];
> +
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0dirlen =3D q->dirl= en;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0loclen =3D strlen(loc= name);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0catlen =3D catlens[ca= tegory];

Now that these are only computed once rather than per-call, optimizing
out strlen is probably not worthwhile anymore, but it doesn't really hurt either. Not something you need to change, just a comment.

> +
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0size_t namelen =3D di= rlen+1 + loclen+1 + catlen+1 + domlen+3;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0char name[namelen+1],= *s =3D name;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0char *str =3D name; > +
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0memcpy(s, dirname, di= rlen);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[dirlen] =3D '/&= #39;;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s +=3D dirlen + 1; > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0memcpy(s, locname, lo= clen);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[loclen] =3D '/&= #39;;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s +=3D loclen + 1; > +skip_loc:
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0memcpy(s, catname, ca= tlen);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[catlen] =3D '/&= #39;;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s +=3D catlen + 1; > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0memcpy(s, domainname,= domlen);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[domlen] =3D '.&= #39;;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[domlen+1] =3D '= m';
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[domlen+2] =3D '= o';
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[domlen+3] =3D 0;
Actually, now that this code is not a hot path, it should just be
using snprintf to construct the pathname, I think. It would be a lot
simpler and easier to ensure correctness.

> +
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0const void= *map =3D __map_file(name, &map_size);
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!map) goto notran= s;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!map) {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0if (s =3D strchr(name+dirlen+1, '@')) {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0*s++ =3D '/';
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0goto skip_loc;;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0}
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0if ( str && (s =3D strchr(name+dirlen+1, '_')= ) && (s < strchr(name+dirlen+1, '/')) ) {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (str =3D strchr(locname, '@')= ) {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0loclen +=3D = locname - str;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0memcpy(s, st= r, loclen);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[loclen] = =3D '/';
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s +=3D locle= n + 1;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0str =3D 0; > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0goto skip_lo= c;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0} else {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0*s++ =3D = 9;/';
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0goto skip_lo= c;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0}
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0goto notrans;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}

Using snprintf should also make it easy to get rid of the goto/retry=
logic here, perhaps even with a 4-iteration loop and array of which
format modifications happen on each iteration.

>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0p =3D calloc(siz= eof *p + namelen + 1, 1);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!p) {
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0__munmap((void *)map, map_size);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0goto notrans;
> @@ -209,7 +209,6 @@
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0p->map =3D ma= p;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0p->map_size = =3D map_size;
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0memcpy(p->n= ame, name, namelen+1);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0do {
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0old_cats =3D cats;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0p->next =3D old_cats;
> --- a/src/locale/locale_map.c 2017-02-06 14:39:17.797148750 +0000=
> +++ b/src/locale/locale_map.c 2017-02-06 14:39:17.797148750 +0000
> @@ -49,8 +49,8 @@
>=C2=A0 =C2=A0 =C2=A0 =C2=A0}
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0/* Limit name length and forbid leading dot = or any slashes. */
> -=C2=A0 =C2=A0 =C2=A0for (n=3D0; n<LOCALE_NAME_MAX &&= ; val[n] && val[n]!=3D'/'; n++);
> -=C2=A0 =C2=A0 =C2=A0if (val[0]=3D=3D'.' || val[n]) val =3D &q= uot;C.UTF-8";
> +=C2=A0 =C2=A0 =C2=A0for (n=3D0; n<LOCALE_NAME_MAX && val[n= ] && val[n]!=3D'/' && val[n]!=3D'.'; n++);<= br> > +=C2=A0 =C2=A0 =C2=A0if (val[0]=3D=3D'.' || (val[n] &&= val[n]!=3D'.')) val =3D "C.UTF-8";
>=C2=A0 =C2=A0 =C2=A0 =C2=A0int builtin =3D (val[0]=3D=3D'C' &am= p;& !val[1])
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|| !strcmp(val, = "C.UTF-8")
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|| !strcmp(val, = "POSIX");

This looks ok but might still need some tweaks. Should an input like
"zh_CN.GBK" get treated as "zh_CN" (thus outputting UTF= -8 that might
appear as junk on the user's terminal) or as "C" (no localiza= tion)
with only ASCII characters (safe for the user's terminal), or even
cause setlocale to fail and return an error so that the application
can decide what to do? These are not technical comments on your patch
but policy matters the community should weigh in on.

Rich


--001a114324fe34466c0548500a7b--