From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/11029 Path: news.gmane.org!.POSTED!not-for-mail From: He X Newsgroups: gmane.linux.lib.musl.general Subject: Re: Re: a bug in bindtextdomain() and strip '.UTF-8' Date: Sun, 12 Feb 2017 14:56:53 +0800 Message-ID: References: <20170129140747.GJ1533@brightrain.aerifal.cx> <20170129155507.GK1533@brightrain.aerifal.cx> <20170129163329.GL1533@brightrain.aerifal.cx> <20170208143147.GY1533@brightrain.aerifal.cx> <20170211023610.GA1520@brightrain.aerifal.cx> <20170212023422.GE1520@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary=94eb2c14c26841ef4d05484fd7db X-Trace: blaine.gmane.org 1486882650 31942 195.159.176.226 (12 Feb 2017 06:57:30 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sun, 12 Feb 2017 06:57:30 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-11044-gllmg-musl=m.gmane.org@lists.openwall.com Sun Feb 12 07:57:25 2017 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1cco68-0007td-7J for gllmg-musl@m.gmane.org; Sun, 12 Feb 2017 07:57:24 +0100 Original-Received: (qmail 32670 invoked by uid 550); 12 Feb 2017 06:57:28 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 32649 invoked from network); 12 Feb 2017 06:57:26 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=RmE0TTCpWjwC2N9CrzJU0qP4HvV3ZMPenuv2U8ZRC0E=; b=oJ8qUYCGjiUpg1awMpDup27SHta2YYL3RuvjUKieyOeBYyVbpor/ImvnWd45Y0Taij +TfTPuGWkRIprvV1w69B3dSHqqCCZvAidtHTAvsHwZ1Oew3qXa3Xy2Iu4/WjzvgaY9UQ UaOxRWtI/vzqECe98x+8Ty0BP6yGZDnE2v9JI+QsDWzHyVO126AqXRECZSKd5rVRmuEU NhzTgm2fZ7SEz1oFJp6ji8IYpi/8ljRiKE/1OSDn6VIzWJueIs3769mDtxFrSfDR1XvD DS9gMzIW8MbN5yo5n71PLHjbSxI3MT3dUVEOCS2tKAiDt9rflnSz8V+WylatfDHUxVuz Mvfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=RmE0TTCpWjwC2N9CrzJU0qP4HvV3ZMPenuv2U8ZRC0E=; b=WeSUL7eqcdI2MqrlEU73fgC+Iu1krxS4zLy644F6n3H/k+wJR5fV2sC6o2AYi1VlvP qwekuSe0MebxhO2sFnHV+GLg/ouCXaoYox5iVPdjvD7BYUX8A5bCqbhJakf78ylfXdgB r3o6VaRmXS88XC2M7RqkUgmBNm0/zgbqUHd8LSfnmYPlpDzD/RIQu5/EFTPwqLz18SF3 uwY/07aAiZxY6Z2Ot0AKCiBeYgv5HnA28tRIp/VjkbBhInl0D4y8AyYM22d6/d+tqBxN GCIteh3R1UgWLSF0GjHAzbi24P4X+SR5gTiTdVArh8W9mXhelk163MeX8ic8cjFP5WQm cyFQ== X-Gm-Message-State: AMke39k+G1jJEzmbNJeif1VCXFxgjaXMTV5IilvFwluafuHGHUr2Tp+feeAj3p+IsD76EQLZ9W1YrZCkmJJoNw== X-Received: by 10.31.242.11 with SMTP id q11mr8800002vkh.54.1486882634569; Sat, 11 Feb 2017 22:57:14 -0800 (PST) In-Reply-To: <20170212023422.GE1520@brightrain.aerifal.cx> Xref: news.gmane.org gmane.linux.lib.musl.general:11029 Archived-At: --94eb2c14c26841ef4d05484fd7db Content-Type: multipart/alternative; boundary=94eb2c14c26841ef4805484fd7d9 --94eb2c14c26841ef4805484fd7d9 Content-Type: text/plain; charset=UTF-8 1. cat is added to the keys, also do a validate 2. so we what do we deal with the gettextdir() exactly? inline it or construct a gettextpointer()? 3. i added a extra locbuf array, and goto is replaced by a loop, memcpy is replaced by snprintf, compiled, and working well with fcitx 4. i just found that i forgot to store the keys to new buffer, it's ok to just use normal expression? or we need atomic operations? ``` + p->cat = category; + p->binding = q; + p->lm = lm; ``` 5. I do want to rewrite all to .UTF8, but it's a bit annoying as your words, then i changed the code to simply strip. > (safe for the user's terminal) LANG is set by users who are using musl and it's modified to zh_CN at setlocale(), app will use UTF8 directly, there's no such situation where charset will cause troubles to users' terminal, except apps which get the LANG manually by getenv(). I have not seen such strange applications so far, and most apps only have the UTF8 translation files. For moving from glibc to musl, i think doing this way is good for now, we could delete it later, or just keep it forever. And most people won't use non-UTF8 at all, if they do use GBK, their app will even fallback to UTF8, because no translation files for GBK. So, it's not so dagerous, i think :). And for developers, they should not use setlocale to detect the charset, this is wrong, nl_langinfo is the correct way. If they use, stripping will let their app know something went wrong. Strip .GBK or .UTF-8, so users would be happy that their old settings are working, developers will notice their mistakes that using setlocale() to validate charset is wrong. We get a lot more than failing the setlocale() and return C, the only bad thing is we need to care about a almost impossible event: an app directly getenv(). 2017-02-12 10:34 GMT+08:00 Rich Felker : > On Sat, Feb 11, 2017 at 02:00:56PM +0800, He X wrote: > > --- a/src/locale/dcngettext.c 2017-02-06 14:39:17.860482624 +0000 > > +++ b/src/locale/dcngettext.c 2017-02-06 14:39:17.860482624 +0000 > > @@ -100,7 +100,8 @@ > > size_t map_size; > > void *volatile plural_rule; > > volatile int nplurals; > > - char name[]; > > + struct binding *binding; > > + struct __locale_map *lm; > > }; > > As stated in the reply to message body, I think you need the category > in the keying too, because there can be different .mo files loaded > depending on which category was requested. > > > static char *dummy_gettextdomain() > > @@ -120,58 +122,87 @@ > > struct msgcat *p; > > struct __locale_struct *loc = CURRENT_LOCALE; > > const struct __locale_map *lm; > > - const char *dirname, *locname, *catname; > > - size_t dirlen, loclen, catlen, domlen; > > + size_t domlen; > > + struct binding *q; > > > > if ((unsigned)category >= LC_ALL) goto notrans; > > > > if (!domainname) domainname = __gettextdomain(); > > > > domlen = strnlen(domainname, NAME_MAX+1); > > if (domlen > NAME_MAX) goto notrans; > > > > - dirname = gettextdir(domainname, &dirlen); > > - if (!dirname) goto notrans; > > + for (q=bindings; q; q=q->next) > > + if (!strcmp(q->domainname, domainname) && q->active) > > + break; > > + if (!q) goto notrans; > > Looks ok. I had said this should be a function but it really doesn't > need to be; it's plenty simple inline. > > > lm = loc->cat[category]; > > if (!lm) { > > notrans: > > return (char *) ((n == 1) ? msgid1 : msgid2); > > } > > - locname = lm->name; > > - > > - catname = catnames[category]; > > - catlen = catlens[category]; > > - loclen = strlen(locname); > > - > > - size_t namelen = dirlen+1 + loclen+1 + catlen+1 + domlen+3; > > - char name[namelen+1], *s = name; > > - > > - memcpy(s, dirname, dirlen); > > - s[dirlen] = '/'; > > - s += dirlen + 1; > > - memcpy(s, locname, loclen); > > - s[loclen] = '/'; > > - s += loclen + 1; > > - memcpy(s, catname, catlen); > > - s[catlen] = '/'; > > - s += catlen + 1; > > - memcpy(s, domainname, domlen); > > - s[domlen] = '.'; > > - s[domlen+1] = 'm'; > > - s[domlen+2] = 'o'; > > - s[domlen+3] = 0; > > > > for (p=cats; p; p=p->next) > > - if (!strcmp(p->name, name)) > > + if (p->binding == q && p->lm == lm) > > break; > > && p->cat == category > > > if (!p) { > > + const char *dirname, *locname, *catname; > > + size_t dirlen, loclen, catlen; > > void *old_cats; > > size_t map_size; > > + > > + dirname = q->dirname; > > + locname = lm->name; > > + catname = catnames[category]; > > + > > + dirlen = q->dirlen; > > + loclen = strlen(locname); > > + catlen = catlens[category]; > > Now that these are only computed once rather than per-call, optimizing > out strlen is probably not worthwhile anymore, but it doesn't really > hurt either. Not something you need to change, just a comment. > > > + > > + size_t namelen = dirlen+1 + loclen+1 + catlen+1 + domlen+3; > > + char name[namelen+1], *s = name; > > + char *str = name; > > + > > + memcpy(s, dirname, dirlen); > > + s[dirlen] = '/'; > > + s += dirlen + 1; > > + memcpy(s, locname, loclen); > > + s[loclen] = '/'; > > + s += loclen + 1; > > +skip_loc: > > + memcpy(s, catname, catlen); > > + s[catlen] = '/'; > > + s += catlen + 1; > > + memcpy(s, domainname, domlen); > > + s[domlen] = '.'; > > + s[domlen+1] = 'm'; > > + s[domlen+2] = 'o'; > > + s[domlen+3] = 0; > > Actually, now that this code is not a hot path, it should just be > using snprintf to construct the pathname, I think. It would be a lot > simpler and easier to ensure correctness. > > > + > > const void *map = __map_file(name, &map_size); > > - if (!map) goto notrans; > > + if (!map) { > > + if (s = strchr(name+dirlen+1, '@')) { > > + *s++ = '/'; > > + goto skip_loc;; > > + } > > + if ( str && (s = strchr(name+dirlen+1, '_')) && (s > < strchr(name+dirlen+1, '/')) ) { > > + if (str = strchr(locname, '@')) { > > + loclen += locname - str; > > + memcpy(s, str, loclen); > > + s[loclen] = '/'; > > + s += loclen + 1; > > + str = 0; > > + goto skip_loc; > > + } else { > > + *s++ = '/'; > > + goto skip_loc; > > + } > > + } > > + goto notrans; > > + } > > Using snprintf should also make it easy to get rid of the goto/retry > logic here, perhaps even with a 4-iteration loop and array of which > format modifications happen on each iteration. > > > p = calloc(sizeof *p + namelen + 1, 1); > > if (!p) { > > __munmap((void *)map, map_size); > > goto notrans; > > @@ -209,7 +209,6 @@ > > } > > p->map = map; > > p->map_size = map_size; > > - memcpy(p->name, name, namelen+1); > > do { > > old_cats = cats; > > p->next = old_cats; > > --- a/src/locale/locale_map.c 2017-02-06 14:39:17.797148750 +0000 > > +++ b/src/locale/locale_map.c 2017-02-06 14:39:17.797148750 +0000 > > @@ -49,8 +49,8 @@ > > } > > > > /* Limit name length and forbid leading dot or any slashes. */ > > - for (n=0; n > - if (val[0]=='.' || val[n]) val = "C.UTF-8"; > > + for (n=0; n val[n]!='.'; n++); > > + if (val[0]=='.' || (val[n] && val[n]!='.')) val = "C.UTF-8"; > > int builtin = (val[0]=='C' && !val[1]) > > || !strcmp(val, "C.UTF-8") > > || !strcmp(val, "POSIX"); > > This looks ok but might still need some tweaks. Should an input like > "zh_CN.GBK" get treated as "zh_CN" (thus outputting UTF-8 that might > appear as junk on the user's terminal) or as "C" (no localization) > with only ASCII characters (safe for the user's terminal), or even > cause setlocale to fail and return an error so that the application > can decide what to do? These are not technical comments on your patch > but policy matters the community should weigh in on. > > Rich > --94eb2c14c26841ef4805484fd7d9 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
1. cat is added to the keys, also do a validate
2. so = we what do we deal with the gettextdir() exactly? inline it or construct a = gettextpointer()?
3. i added a extra locbuf array, and goto is re= placed by a loop, memcpy is replaced by snprintf, compiled, and working wel= l with fcitx
4. i just found that i forgot to store the keys to n= ew buffer, it's ok to just use normal expression? or we need atomic ope= rations?
```
+ p->cat =3D category;
+ p->b= inding =3D q;
+ p->lm =3D lm;
```
5.=C2= =A0 I do want to rewrite all to .UTF8, but it's a bit annoying as your = words, then i changed the code to simply strip.=C2=A0

<= div>>=C2=A0=C2=A0(safe for the user's terminal)
LANG is s= et by users who are using musl and it's modified to zh_CN at setlocale(= ), app will use UTF8 directly, there's no such situation where charset = will cause troubles to users' terminal, except apps which get the LANG = manually by getenv(). I have not seen such strange applications so far, and= most apps only have the UTF8 translation files.

F= or moving from glibc to musl, i think doing this way is good for now, we co= uld delete it later, or just keep it forever. And most people won't use= non-UTF8 at all, if they do use GBK, their app will even fallback to UTF8,= because no translation files for GBK. So, it's not so dagerous, i think :).

And for developers, =C2=A0they should not use setlocale to detect the= charset, this is wrong, nl_langinfo is the correct way. If they use, strip= ping will let their app know something went wrong.

Strip .GBK or .UTF-8, so users would be happy that their old settings= are working, developers will notice their mistakes that using setlocale() = to validate charset is wrong. We get a lot more than failing the setlocale(= ) and return C, the only bad thing is we need to care about a almost imposs= ible event: an app directly getenv().

2017-02-12 10:34 GMT+08:00 Rich Felke= r <dalias@libc.org>:
On Sat, Feb 11, 2017 at 02:00:56PM +0800, He X wrote:
> --- a/src/locale/dcngettext.c 2017-02-06 14:39= :17.860482624 +0000
> +++ b/src/locale/dcngettext.c 2017-02-06 14:39:17.860482624 +0000
> @@ -100,7 +100,8 @@
>=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t map_size;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0void *volatile plural_rule;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0volatile int nplurals;
> -=C2=A0 =C2=A0 =C2=A0char name[];
> +=C2=A0 =C2=A0 =C2=A0struct binding *binding;
> +=C2=A0 =C2=A0 =C2=A0struct __locale_map *lm;
>=C2=A0 };

As stated in the reply to message body, I think you need the category
in the keying too, because there can be different .mo files loaded
depending on which category was requested.

>=C2=A0 static char *dummy_gettextdomain()
> @@ -120,58 +122,87 @@
>=C2=A0 =C2=A0 =C2=A0 =C2=A0struct msgcat *p;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0struct __locale_struct *loc =3D CURRENT_LOCA= LE;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0const struct __locale_map *lm;
> -=C2=A0 =C2=A0 =C2=A0const char *dirname, *locname, *catname;
> -=C2=A0 =C2=A0 =C2=A0size_t dirlen, loclen, catlen, domlen;
> +=C2=A0 =C2=A0 =C2=A0size_t domlen;
> +=C2=A0 =C2=A0 =C2=A0struct binding *q;
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0if ((unsigned)category >=3D LC_ALL) goto = notrans;
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0if (!domainname) domainname =3D __gettextdom= ain();
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0domlen =3D strnlen(domainname, NAME_MAX+1);<= br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0if (domlen > NAME_MAX) goto notrans;
>
> -=C2=A0 =C2=A0 =C2=A0dirname =3D gettextdir(domainname, &dirlen);<= br> > -=C2=A0 =C2=A0 =C2=A0if (!dirname) goto notrans;
> +=C2=A0 =C2=A0 =C2=A0for (q=3Dbindings; q; q=3Dq->next)
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!strcmp(q->dom= ainname, domainname) && q->active)
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0break;
> +=C2=A0 =C2=A0 =C2=A0if (!q) goto notrans;

Looks ok. I had said this should be a function but it really doesn't need to be; it's plenty simple inline.

>=C2=A0 =C2=A0 =C2=A0 =C2=A0lm =3D loc->cat[category];
>=C2=A0 =C2=A0 =C2=A0 =C2=A0if (!lm) {
>=C2=A0 notrans:
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return (char *) = ((n =3D=3D 1) ? msgid1 : msgid2);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0}
> -=C2=A0 =C2=A0 =C2=A0locname =3D lm->name;
> -
> -=C2=A0 =C2=A0 =C2=A0catname =3D catnames[category];
> -=C2=A0 =C2=A0 =C2=A0catlen =3D catlens[category];
> -=C2=A0 =C2=A0 =C2=A0loclen =3D strlen(locname);
> -
> -=C2=A0 =C2=A0 =C2=A0size_t namelen =3D dirlen+1 + loclen+1 + catlen+1= + domlen+3;
> -=C2=A0 =C2=A0 =C2=A0char name[namelen+1], *s =3D name;
> -
> -=C2=A0 =C2=A0 =C2=A0memcpy(s, dirname, dirlen);
> -=C2=A0 =C2=A0 =C2=A0s[dirlen] =3D '/';
> -=C2=A0 =C2=A0 =C2=A0s +=3D dirlen + 1;
> -=C2=A0 =C2=A0 =C2=A0memcpy(s, locname, loclen);
> -=C2=A0 =C2=A0 =C2=A0s[loclen] =3D '/';
> -=C2=A0 =C2=A0 =C2=A0s +=3D loclen + 1;
> -=C2=A0 =C2=A0 =C2=A0memcpy(s, catname, catlen);
> -=C2=A0 =C2=A0 =C2=A0s[catlen] =3D '/';
> -=C2=A0 =C2=A0 =C2=A0s +=3D catlen + 1;
> -=C2=A0 =C2=A0 =C2=A0memcpy(s, domainname, domlen);
> -=C2=A0 =C2=A0 =C2=A0s[domlen] =3D '.';
> -=C2=A0 =C2=A0 =C2=A0s[domlen+1] =3D 'm';
> -=C2=A0 =C2=A0 =C2=A0s[domlen+2] =3D 'o';
> -=C2=A0 =C2=A0 =C2=A0s[domlen+3] =3D 0;
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0for (p=3Dcats; p; p=3Dp->next)
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!strcmp(p->nam= e, name))
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (p->binding =3D= =3D q && p->lm =3D=3D lm)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0break;

&& p->cat =3D=3D category

>=C2=A0 =C2=A0 =C2=A0 =C2=A0if (!p) {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0const char *dirname, = *locname, *catname;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0size_t dirlen, loclen= , catlen;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0void *old_cats;<= br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0size_t map_size;=
> +
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0dirname =3D q->dir= name;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0locname =3D lm->na= me;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0catname =3D catnames[= category];
> +
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0dirlen =3D q->dirl= en;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0loclen =3D strlen(loc= name);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0catlen =3D catlens[ca= tegory];

Now that these are only computed once rather than per-call, optimizing
out strlen is probably not worthwhile anymore, but it doesn't really hurt either. Not something you need to change, just a comment.

> +
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0size_t namelen =3D di= rlen+1 + loclen+1 + catlen+1 + domlen+3;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0char name[namelen+1],= *s =3D name;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0char *str =3D name; > +
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0memcpy(s, dirname, di= rlen);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[dirlen] =3D '/&= #39;;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s +=3D dirlen + 1; > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0memcpy(s, locname, lo= clen);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[loclen] =3D '/&= #39;;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s +=3D loclen + 1; > +skip_loc:
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0memcpy(s, catname, ca= tlen);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[catlen] =3D '/&= #39;;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s +=3D catlen + 1; > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0memcpy(s, domainname,= domlen);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[domlen] =3D '.&= #39;;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[domlen+1] =3D '= m';
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[domlen+2] =3D '= o';
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[domlen+3] =3D 0;
Actually, now that this code is not a hot path, it should just be
using snprintf to construct the pathname, I think. It would be a lot
simpler and easier to ensure correctness.

> +
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0const void *map =3D __map_file(name, &map_size);
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!map) goto notran= s;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!map) {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0if (s =3D strchr(name+dirlen+1, '@')) {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0*s++ =3D '/'; > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0goto skip_loc;;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0}
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0if ( str && (s =3D strchr(name+dirlen+1, '_')= ) && (s < strchr(name+dirlen+1, '/')) ) {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (str =3D strchr(locname, '@')= ) {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0loclen +=3D = locname - str;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0memcpy(s, st= r, loclen);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[loclen] = =3D '/';
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s +=3D locle= n + 1;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0str =3D 0; > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0goto skip_lo= c;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0} else {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0*s++ =3D = 9;/';
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0goto skip_lo= c;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0}
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0goto notrans;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}

Using snprintf should also make it easy to get rid of the goto/retry=
logic here, perhaps even with a 4-iteration loop and array of which
format modifications happen on each iteration.

>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0p =3D calloc(siz= eof *p + namelen + 1, 1);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!p) {
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0__munmap((void *)map, map_size);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0goto notrans;
> @@ -209,7 +209,6 @@
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0}
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0p->map =3D ma= p;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0p->map_size = =3D map_size;
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0memcpy(p->n= ame, name, namelen+1);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0do {
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0old_cats =3D cats;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0p->next =3D old_cats;
> --- a/src/locale/locale_map.c 2017-02-06 14:39:17.797148750 +0000
> +++ b/src/locale/locale_map.c 2017-02-06 14:39:17.797148750 +0000
> @@ -49,8 +49,8 @@
>=C2=A0 =C2=A0 =C2=A0 =C2=A0}
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0/* Limit name length and forbid leading dot = or any slashes. */
> -=C2=A0 =C2=A0 =C2=A0for (n=3D0; n<LOCALE_NAME_MAX &&= ; val[n] && val[n]!=3D'/'; n++);
> -=C2=A0 =C2=A0 =C2=A0if (val[0]=3D=3D'.' || val[n]) val =3D &q= uot;C.UTF-8";
> +=C2=A0 =C2=A0 =C2=A0for (n=3D0; n<LOCALE_NAME_MAX && val[n= ] && val[n]!=3D'/' && val[n]!=3D'.'; n++);<= br> > +=C2=A0 =C2=A0 =C2=A0if (val[0]=3D=3D'.' || (val[n] &&= val[n]!=3D'.')) val =3D "C.UTF-8";
>=C2=A0 =C2=A0 =C2=A0 =C2=A0int builtin =3D (val[0]=3D=3D'C' &am= p;& !val[1])
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|| !strcmp(val, = "C.UTF-8")
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|| !strcmp(val, = "POSIX");

This looks ok but might still need some tweaks. Should an input like
"zh_CN.GBK" get treated as "zh_CN" (thus outputting UTF= -8 that might
appear as junk on the user's terminal) or as "C" (no localiza= tion)
with only ASCII characters (safe for the user's terminal), or even
cause setlocale to fail and return an error so that the application
can decide what to do? These are not technical comments on your patch
but policy matters the community should weigh in on.

Rich

--94eb2c14c26841ef4805484fd7d9-- --94eb2c14c26841ef4d05484fd7db Content-Type: text/plain; charset=US-ASCII; name="locale.diff" Content-Disposition: attachment; filename="locale.diff" Content-Transfer-Encoding: base64 X-Attachment-Id: f_iz2bijgm0 LS0tIGEvc3JjL2xvY2FsZS9kY25nZXR0ZXh0LmMJMjAxNy0wMi0wNiAxNDozOToxNy44NjA0ODI2 MjQgKzAwMDAgCisrKyBiL3NyYy9sb2NhbGUvZGNuZ2V0dGV4dC5jCTIwMTctMDItMDYgMTQ6Mzk6 MTcuODYwNDgyNjI0ICswMDAwCkBAIC0xMDAsNyArMTAwLDkgQEAKIAlzaXplX3QgbWFwX3NpemU7 CiAJdm9pZCAqdm9sYXRpbGUgcGx1cmFsX3J1bGU7CiAJdm9sYXRpbGUgaW50IG5wbHVyYWxzOwot CWNoYXIgbmFtZVtdOworCXN0cnVjdCBiaW5kaW5nICpiaW5kaW5nOworCXN0cnVjdCBfX2xvY2Fs ZV9tYXAgKmxtOworCXN0cnVjdCBtc2djYXQgY2F0OwogfTsKIAogc3RhdGljIGNoYXIgKmR1bW15 X2dldHRleHRkb21haW4oKQpAQCAtMTIwLDggKzEyMiw4IEBACiAJc3RydWN0IG1zZ2NhdCAqcDsK IAlzdHJ1Y3QgX19sb2NhbGVfc3RydWN0ICpsb2MgPSBDVVJSRU5UX0xPQ0FMRTsKIAljb25zdCBz dHJ1Y3QgX19sb2NhbGVfbWFwICpsbTsKLQljb25zdCBjaGFyICpkaXJuYW1lLCAqbG9jbmFtZSwg KmNhdG5hbWU7Ci0Jc2l6ZV90IGRpcmxlbiwgbG9jbGVuLCBjYXRsZW4sIGRvbWxlbjsKKwlzaXpl X3QgZG9tbGVuOworCXN0cnVjdCBiaW5kaW5nICpxOwogCiAJaWYgKCh1bnNpZ25lZCljYXRlZ29y eSA+PSBMQ19BTEwpIGdvdG8gbm90cmFuczsKIApAQCAtMTMwLDQ3ICsxMzIsNjIgQEAKIAlkb21s ZW4gPSBzdHJubGVuKGRvbWFpbm5hbWUsIE5BTUVfTUFYKzEpOwogCWlmIChkb21sZW4gPiBOQU1F X01BWCkgZ290byBub3RyYW5zOwogCi0JZGlybmFtZSA9IGdldHRleHRkaXIoZG9tYWlubmFtZSwg JmRpcmxlbik7Ci0JaWYgKCFkaXJuYW1lKSBnb3RvIG5vdHJhbnM7CisJZm9yIChxPWJpbmRpbmdz OyBxOyBxPXEtPm5leHQpCisJCWlmICghc3RyY21wKHEtPmRvbWFpbm5hbWUsIGRvbWFpbm5hbWUp ICYmIHEtPmFjdGl2ZSkKKwkJCWJyZWFrOworCWlmICghcSkgZ290byBub3RyYW5zOwogCiAJbG0g PSBsb2MtPmNhdFtjYXRlZ29yeV07CiAJaWYgKCFsbSkgewogbm90cmFuczoKIAkJcmV0dXJuIChj aGFyICopICgobiA9PSAxKSA/IG1zZ2lkMSA6IG1zZ2lkMik7CiAJfQotCWxvY25hbWUgPSBsbS0+ bmFtZTsKLQotCWNhdG5hbWUgPSBjYXRuYW1lc1tjYXRlZ29yeV07Ci0JY2F0bGVuID0gY2F0bGVu c1tjYXRlZ29yeV07Ci0JbG9jbGVuID0gc3RybGVuKGxvY25hbWUpOwotCi0Jc2l6ZV90IG5hbWVs ZW4gPSBkaXJsZW4rMSArIGxvY2xlbisxICsgY2F0bGVuKzEgKyBkb21sZW4rMzsKLQljaGFyIG5h bWVbbmFtZWxlbisxXSwgKnMgPSBuYW1lOwotCi0JbWVtY3B5KHMsIGRpcm5hbWUsIGRpcmxlbik7 Ci0Jc1tkaXJsZW5dID0gJy8nOwotCXMgKz0gZGlybGVuICsgMTsKLQltZW1jcHkocywgbG9jbmFt ZSwgbG9jbGVuKTsKLQlzW2xvY2xlbl0gPSAnLyc7Ci0JcyArPSBsb2NsZW4gKyAxOwotCW1lbWNw eShzLCBjYXRuYW1lLCBjYXRsZW4pOwotCXNbY2F0bGVuXSA9ICcvJzsKLQlzICs9IGNhdGxlbiAr IDE7Ci0JbWVtY3B5KHMsIGRvbWFpbm5hbWUsIGRvbWxlbik7Ci0Jc1tkb21sZW5dID0gJy4nOwot CXNbZG9tbGVuKzFdID0gJ20nOwotCXNbZG9tbGVuKzJdID0gJ28nOwotCXNbZG9tbGVuKzNdID0g MDsKIAogCWZvciAocD1jYXRzOyBwOyBwPXAtPm5leHQpCi0JCWlmICghc3RyY21wKHAtPm5hbWUs IG5hbWUpKQorCQlpZiAocC0+YmluZGluZyA9PSBxICYmIHAtPmxtID09IGxtICYmIHAtPmNhdCA9 PSBjYXRlZ29yeSkKIAkJCWJyZWFrOwogCiAJaWYgKCFwKSB7CisJCWNvbnN0IGNoYXIgKmRpcm5h bWUsICpsb2NuYW1lLCAqY2F0bmFtZTsKKwkJc2l6ZV90IGRpcmxlbiwgbG9jbGVuLCBjYXRsZW47 CiAJCXZvaWQgKm9sZF9jYXRzOwogCQlzaXplX3QgbWFwX3NpemU7Ci0JCWNvbnN0IHZvaWQgKm1h cCA9IF9fbWFwX2ZpbGUobmFtZSwgJm1hcF9zaXplKTsKKworCQlkaXJuYW1lID0gcS0+ZGlybmFt ZTsKKwkJbG9jbmFtZSA9IGxtLT5uYW1lOworCQljYXRuYW1lID0gY2F0bmFtZXNbY2F0ZWdvcnld OworCisJCWRpcmxlbiA9IHEtPmRpcmxlbjsKKwkJbG9jbGVuID0gc3RybGVuKGxvY25hbWUpOwor CQljYXRsZW4gPSBjYXRsZW5zW2NhdGVnb3J5XTsKKworCQlzaXplX3QgbmFtZWxlbiA9IGRpcmxl bisxICsgbG9jbGVuKzEgKyBjYXRsZW4rMSArIGRvbWxlbiszOworCQljaGFyIG5hbWVbbmFtZWxl bisxXTsKKwkJY2hhciBsb2NidWZbbG9jbGVuKzFdLCAqbG9jcCA9IGxvY2J1ZjsKKwkJY29uc3Qg dm9pZCAqbWFwOworCisJCW1lbWNweShsb2NidWYsIGxvY25hbWUsIGxvY2xlbik7CisJCWxvY2J1 Zltsb2NsZW5dID0gMDsKKworCQlmb3IgKDs7KSB7CisJCQlzbnByaW50ZihuYW1lLCBuYW1lbGVu KzEsICIlcy8lcy8lcy8lcy5tb1wwIiwgZGlybmFtZSwgbG9jYnVmLCBjYXRuYW1lLCBkb21haW5u YW1lKTsKKwkJCWlmIChtYXAgPSBfX21hcF9maWxlKG5hbWUsICZtYXBfc2l6ZSkpIGJyZWFrOwor CisJCQlpZiAobG9jcCA9IHN0cmNocihsb2NidWYsICdAJykpIHsKKwkJCQkqbG9jcCA9IDA7CisJ CQkJbG9jYnVmW2xvY2xlbl0gPSAnQCc7CisJCQl9IGVsc2UgaWYgKGxvY3AgPSBzdHJjaHIobG9j YnVmLCAnXycpKSB7CisJCQkJaWYgKGxvY2J1Zltsb2NsZW5dID09ICdAJykgeworCQkJCQlsb2Ni dWZbbG9jbGVuXSA9IDA7CisJCQkJCSpsb2NwID0gJ0AnOworCQkJCQlzdHJjYXQobG9jcCsxLCBs b2NidWYgKyBzdHJsZW4obG9jYnVmKSArIDEpOworCQkJCX0gZWxzZSAqbG9jcCA9IDA7CisJCQl9 IGVsc2UgeworCQkJCWJyZWFrOworCQkJfQorCQl9CiAJCWlmICghbWFwKSBnb3RvIG5vdHJhbnM7 CisKIAkJcCA9IGNhbGxvYyhzaXplb2YgKnAgKyBuYW1lbGVuICsgMSwgMSk7CiAJCWlmICghcCkg ewogCQkJX19tdW5tYXAoKHZvaWQgKiltYXAsIG1hcF9zaXplKTsKQEAgLTE3OCw3ICsxOTUsNiBA QAogCQl9CiAJCXAtPm1hcCA9IG1hcDsKIAkJcC0+bWFwX3NpemUgPSBtYXBfc2l6ZTsKLQkJbWVt Y3B5KHAtPm5hbWUsIG5hbWUsIG5hbWVsZW4rMSk7CiAJCWRvIHsKIAkJCW9sZF9jYXRzID0gY2F0 czsKIAkJCXAtPm5leHQgPSBvbGRfY2F0czsKQEAgLTE5Myw2ICsxOTMsOSBAQAogCQkJX19tdW5t YXAoKHZvaWQgKiltYXAsIG1hcF9zaXplKTsKIAkJCWdvdG8gbm90cmFuczsKIAkJfQorCQlwLT5j YXQgPSBjYXRlZ29yeTsKKwkJcC0+YmluZGluZyA9IHE7CisJCXAtPmxtID0gbG07CiAJCXAtPm1h cCA9IG1hcDsKIAkJcC0+bWFwX3NpemUgPSBtYXBfc2l6ZTsKIAkJZG8gewo= --94eb2c14c26841ef4d05484fd7db--