From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/11030
Path: news.gmane.org!.POSTED!not-for-mail
From: He X <xw897002528@gmail.com>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: Re: a bug in bindtextdomain() and strip '.UTF-8'
Date: Sun, 12 Feb 2017 15:11:08 +0800
Message-ID: <CAPG2z0-VUJ96WvvR3Jn4q20Gd3CUKGqtUOMBZ-_ENWfvbt47aw@mail.gmail.com>
References: <20170129140747.GJ1533@brightrain.aerifal.cx> <CAPG2z0_EHU=U0=pkd31b5fPN__-Ly_qd9W9ftK=1e40SDJHX1w@mail.gmail.com>
 <20170129155507.GK1533@brightrain.aerifal.cx> <CAPG2z09NR77cEgUVZD4JkOt0yUyNFrbeoKDvCpD6UQiUi8UBkA@mail.gmail.com>
 <20170129163329.GL1533@brightrain.aerifal.cx> <CAPG2z08tv19E8VzDUYSG6uJqBBiHHuEUnK84WHudjSdQC-6vLg@mail.gmail.com>
 <20170208143147.GY1533@brightrain.aerifal.cx> <CAPG2z0860oCsim2uvw_6je=vPXk_vPBkYqYV2qSMmcVCnoqSOQ@mail.gmail.com>
 <20170211023610.GA1520@brightrain.aerifal.cx> <CAPG2z08yePs-6pqHcoBbMfWRPyXunuT-2Ge_JDWH8E5Y+_0wtw@mail.gmail.com>
 <20170212023422.GE1520@brightrain.aerifal.cx> <CAPG2z08MUcj-0i=_kOO=WTP4bAufc2Dx8v6kGvmzHnoVu4c-nA@mail.gmail.com>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: blaine.gmane.org
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary=001a114324fe34466c0548500a7b
X-Trace: blaine.gmane.org 1486883504 32230 195.159.176.226 (12 Feb 2017 07:11:44 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Sun, 12 Feb 2017 07:11:44 +0000 (UTC)
To: musl@lists.openwall.com
Original-X-From: musl-return-11045-gllmg-musl=m.gmane.org@lists.openwall.com Sun Feb 12 08:11:39 2017
Return-path: <musl-return-11045-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@m.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by blaine.gmane.org with smtp (Exim 4.84_2)
	(envelope-from <musl-return-11045-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1ccoJu-000844-JL
	for gllmg-musl@m.gmane.org; Sun, 12 Feb 2017 08:11:38 +0100
Original-Received: (qmail 8050 invoked by uid 550); 12 Feb 2017 07:11:41 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Original-Received: (qmail 8032 invoked from network); 12 Feb 2017 07:11:41 -0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to;
        bh=czmwJ9Tmm0PC7rvdLVMLO7RUcGjrkidpC5CbK4X86r8=;
        b=ii/Lrip2n9wbzcGNWYOCKHewVP/EdxlzxubxEU2OMRY3XWKL42O95pjUygAkFVzKvq
         yt3vD1NnjYxUQS/x/+NHk+ioZu097prGzlAd3qHJQzhDAdY6TtAUpVsuGWCCIwccaoLK
         E+G6xAxXcJJaxucn35BD4pDFBrsqs0zkUfk3sLWWqotc0rjwATUj8fBxMjqLsui8oMFv
         tCpW9asAiJukDPLXtUUp23nQb5cECMYCPkvbrbzLfix57mDVMIvbbir81y/bHtwhOare
         YRUTFVxW90PtJ+xitLdk6fjSZYNOUk9smnMPyt9ZpKlkOr15Xn6E70ApxdqO8o2GujaF
         Nkxg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:in-reply-to:references:from:date
         :message-id:subject:to;
        bh=czmwJ9Tmm0PC7rvdLVMLO7RUcGjrkidpC5CbK4X86r8=;
        b=UT72JFQ5+KONcqA2GOWtCdl81Ip1GJRedcY6okwKHL40AyYzzaRBHKG0O5oD7pStL9
         9ql9KJaNgpNDPWYUJ5bjK8aHTp0m9cGtahVRzNUqZF6PLApSkZRId+oAADhubUie0CGp
         1uZYPUQaL94rOMZfu4srT6AMovmY3CvyX1/PcQABvd+btFYNNvjeA8FojjIrZ8fTEExP
         ZB8hYQV3Cr3mrmVEk/p/6LexWF/NxRLMLP/AWtNbMHSWmUlxYl/nkIe0tCuWQl8UBLpa
         5iz/fOccW2pjpmvC68PlrMr7JLQ9LixQK7e+sIY8EvbYZwx1BpiNgc3zK8CqP+BtImE1
         2E1A==
X-Gm-Message-State: AMke39lJAILmqIilulblgi3STLQI9mzTGVmsv4iBiQN2MrbYVHi9dpi6ZeNkigQKUvTZdAdwI8hqI67WFCEUOg==
X-Received: by 10.31.160.3 with SMTP id j3mr8392166vke.92.1486883489331; Sat,
 11 Feb 2017 23:11:29 -0800 (PST)
In-Reply-To: <CAPG2z08MUcj-0i=_kOO=WTP4bAufc2Dx8v6kGvmzHnoVu4c-nA@mail.gmail.com>
Xref: news.gmane.org gmane.linux.lib.musl.general:11030
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/11030>

--001a114324fe34466c0548500a7b
Content-Type: text/plain; charset=UTF-8

sry, i uploaded the wront patch, it should be int cat, rather than struct
msgcat :/

2017-02-12 14:56 GMT+08:00 He X <xw897002528@gmail.com>:

> 1. cat is added to the keys, also do a validate
> 2. so we what do we deal with the gettextdir() exactly? inline it or
> construct a gettextpointer()?
> 3. i added a extra locbuf array, and goto is replaced by a loop, memcpy is
> replaced by snprintf, compiled, and working well with fcitx
> 4. i just found that i forgot to store the keys to new buffer, it's ok to
> just use normal expression? or we need atomic operations?
> ```
> + p->cat = category;
> + p->binding = q;
> + p->lm = lm;
> ```
> 5.  I do want to rewrite all to .UTF8, but it's a bit annoying as your
> words, then i changed the code to simply strip.
>
> >  (safe for the user's terminal)
> LANG is set by users who are using musl and it's modified to zh_CN at
> setlocale(), app will use UTF8 directly, there's no such situation where
> charset will cause troubles to users' terminal, except apps which get the
> LANG manually by getenv(). I have not seen such strange applications so
> far, and most apps only have the UTF8 translation files.
>
> For moving from glibc to musl, i think doing this way is good for now, we
> could delete it later, or just keep it forever. And most people won't use
> non-UTF8 at all, if they do use GBK, their app will even fallback to UTF8,
> because no translation files for GBK. So, it's not so dagerous, i think
> :).
>
> And for developers,  they should not use setlocale to detect the charset,
> this is wrong, nl_langinfo is the correct way. If they use, stripping will
> let their app know something went wrong.
>
> Strip .GBK or .UTF-8, so users would be happy that their old settings are
> working, developers will notice their mistakes that using setlocale() to
> validate charset is wrong. We get a lot more than failing the setlocale()
> and return C, the only bad thing is we need to care about a almost
> impossible event: an app directly getenv().
>
> 2017-02-12 10:34 GMT+08:00 Rich Felker <dalias@libc.org>:
>
>> On Sat, Feb 11, 2017 at 02:00:56PM +0800, He X wrote:
>> > --- a/src/locale/dcngettext.c 2017-02-06 14:39:17.860482624 +0000
>> > +++ b/src/locale/dcngettext.c 2017-02-06 14:39:17.860482624 +0000
>> > @@ -100,7 +100,8 @@
>> >       size_t map_size;
>> >       void *volatile plural_rule;
>> >       volatile int nplurals;
>> > -     char name[];
>> > +     struct binding *binding;
>> > +     struct __locale_map *lm;
>> >  };
>>
>> As stated in the reply to message body, I think you need the category
>> in the keying too, because there can be different .mo files loaded
>> depending on which category was requested.
>>
>> >  static char *dummy_gettextdomain()
>> > @@ -120,58 +122,87 @@
>> >       struct msgcat *p;
>> >       struct __locale_struct *loc = CURRENT_LOCALE;
>> >       const struct __locale_map *lm;
>> > -     const char *dirname, *locname, *catname;
>> > -     size_t dirlen, loclen, catlen, domlen;
>> > +     size_t domlen;
>> > +     struct binding *q;
>> >
>> >       if ((unsigned)category >= LC_ALL) goto notrans;
>> >
>> >       if (!domainname) domainname = __gettextdomain();
>> >
>> >       domlen = strnlen(domainname, NAME_MAX+1);
>> >       if (domlen > NAME_MAX) goto notrans;
>> >
>> > -     dirname = gettextdir(domainname, &dirlen);
>> > -     if (!dirname) goto notrans;
>> > +     for (q=bindings; q; q=q->next)
>> > +             if (!strcmp(q->domainname, domainname) && q->active)
>> > +                     break;
>> > +     if (!q) goto notrans;
>>
>> Looks ok. I had said this should be a function but it really doesn't
>> need to be; it's plenty simple inline.
>>
>> >       lm = loc->cat[category];
>> >       if (!lm) {
>> >  notrans:
>> >               return (char *) ((n == 1) ? msgid1 : msgid2);
>> >       }
>> > -     locname = lm->name;
>> > -
>> > -     catname = catnames[category];
>> > -     catlen = catlens[category];
>> > -     loclen = strlen(locname);
>> > -
>> > -     size_t namelen = dirlen+1 + loclen+1 + catlen+1 + domlen+3;
>> > -     char name[namelen+1], *s = name;
>> > -
>> > -     memcpy(s, dirname, dirlen);
>> > -     s[dirlen] = '/';
>> > -     s += dirlen + 1;
>> > -     memcpy(s, locname, loclen);
>> > -     s[loclen] = '/';
>> > -     s += loclen + 1;
>> > -     memcpy(s, catname, catlen);
>> > -     s[catlen] = '/';
>> > -     s += catlen + 1;
>> > -     memcpy(s, domainname, domlen);
>> > -     s[domlen] = '.';
>> > -     s[domlen+1] = 'm';
>> > -     s[domlen+2] = 'o';
>> > -     s[domlen+3] = 0;
>> >
>> >       for (p=cats; p; p=p->next)
>> > -             if (!strcmp(p->name, name))
>> > +             if (p->binding == q && p->lm == lm)
>> >                       break;
>>
>> && p->cat == category
>>
>> >       if (!p) {
>> > +             const char *dirname, *locname, *catname;
>> > +             size_t dirlen, loclen, catlen;
>> >               void *old_cats;
>> >               size_t map_size;
>> > +
>> > +             dirname = q->dirname;
>> > +             locname = lm->name;
>> > +             catname = catnames[category];
>> > +
>> > +             dirlen = q->dirlen;
>> > +             loclen = strlen(locname);
>> > +             catlen = catlens[category];
>>
>> Now that these are only computed once rather than per-call, optimizing
>> out strlen is probably not worthwhile anymore, but it doesn't really
>> hurt either. Not something you need to change, just a comment.
>>
>> > +
>> > +             size_t namelen = dirlen+1 + loclen+1 + catlen+1 +
>> domlen+3;
>> > +             char name[namelen+1], *s = name;
>> > +             char *str = name;
>> > +
>> > +             memcpy(s, dirname, dirlen);
>> > +             s[dirlen] = '/';
>> > +             s += dirlen + 1;
>> > +             memcpy(s, locname, loclen);
>> > +             s[loclen] = '/';
>> > +             s += loclen + 1;
>> > +skip_loc:
>> > +             memcpy(s, catname, catlen);
>> > +             s[catlen] = '/';
>> > +             s += catlen + 1;
>> > +             memcpy(s, domainname, domlen);
>> > +             s[domlen] = '.';
>> > +             s[domlen+1] = 'm';
>> > +             s[domlen+2] = 'o';
>> > +             s[domlen+3] = 0;
>>
>> Actually, now that this code is not a hot path, it should just be
>> using snprintf to construct the pathname, I think. It would be a lot
>> simpler and easier to ensure correctness.
>>
>> > +
>> >               const void *map = __map_file(name, &map_size);
>> > -             if (!map) goto notrans;
>> > +             if (!map) {
>> > +                     if (s = strchr(name+dirlen+1, '@')) {
>> > +                             *s++ = '/';
>> > +                             goto skip_loc;;
>> > +                     }
>> > +                     if ( str && (s = strchr(name+dirlen+1, '_')) &&
>> (s < strchr(name+dirlen+1, '/')) ) {
>> > +                             if (str = strchr(locname, '@')) {
>> > +                                     loclen += locname - str;
>> > +                                     memcpy(s, str, loclen);
>> > +                                     s[loclen] = '/';
>> > +                                     s += loclen + 1;
>> > +                                     str = 0;
>> > +                                     goto skip_loc;
>> > +                             } else {
>> > +                                     *s++ = '/';
>> > +                                     goto skip_loc;
>> > +                             }
>> > +                     }
>> > +                     goto notrans;
>> > +             }
>>
>> Using snprintf should also make it easy to get rid of the goto/retry
>> logic here, perhaps even with a 4-iteration loop and array of which
>> format modifications happen on each iteration.
>>
>> >               p = calloc(sizeof *p + namelen + 1, 1);
>> >               if (!p) {
>> >                       __munmap((void *)map, map_size);
>> >                       goto notrans;
>> > @@ -209,7 +209,6 @@
>> >               }
>> >               p->map = map;
>> >               p->map_size = map_size;
>> > -             memcpy(p->name, name, namelen+1);
>> >               do {
>> >                       old_cats = cats;
>> >                       p->next = old_cats;
>> > --- a/src/locale/locale_map.c 2017-02-06 14:39:17.797148750 +0000
>> > +++ b/src/locale/locale_map.c 2017-02-06 14:39:17.797148750 +0000
>> > @@ -49,8 +49,8 @@
>> >       }
>> >
>> >       /* Limit name length and forbid leading dot or any slashes. */
>> > -     for (n=0; n<LOCALE_NAME_MAX && val[n] && val[n]!='/'; n++);
>> > -     if (val[0]=='.' || val[n]) val = "C.UTF-8";
>> > +     for (n=0; n<LOCALE_NAME_MAX && val[n] && val[n]!='/' &&
>> val[n]!='.'; n++);
>> > +     if (val[0]=='.' || (val[n] && val[n]!='.')) val = "C.UTF-8";
>> >       int builtin = (val[0]=='C' && !val[1])
>> >               || !strcmp(val, "C.UTF-8")
>> >               || !strcmp(val, "POSIX");
>>
>> This looks ok but might still need some tweaks. Should an input like
>> "zh_CN.GBK" get treated as "zh_CN" (thus outputting UTF-8 that might
>> appear as junk on the user's terminal) or as "C" (no localization)
>> with only ASCII characters (safe for the user's terminal), or even
>> cause setlocale to fail and return an error so that the application
>> can decide what to do? These are not technical comments on your patch
>> but policy matters the community should weigh in on.
>>
>> Rich
>>
>
>

--001a114324fe34466c0548500a7b
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">sry, i uploaded the wront patch, it should be int cat, rat=
her than struct msgcat :/</div><div class=3D"gmail_extra"><br><div class=3D=
"gmail_quote">2017-02-12 14:56 GMT+08:00 He X <span dir=3D"ltr">&lt;<a href=
=3D"mailto:xw897002528@gmail.com" target=3D"_blank">xw897002528@gmail.com</=
a>&gt;</span>:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .=
8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">1. cat is=
 added to the keys, also do a validate<div>2. so we what do we deal with th=
e gettextdir() exactly? inline it or construct a gettextpointer()?</div><di=
v>3. i added a extra locbuf array, and goto is replaced by a loop, memcpy i=
s replaced by snprintf, compiled, and working well with fcitx</div><div>4. =
i just found that i forgot to store the keys to new buffer, it&#39;s ok to =
just use normal expression? or we need atomic operations?</div><div>```</di=
v><div><div>+<span class=3D"m_-4738127677699250593gmail-Apple-tab-span" sty=
le=3D"white-space:pre-wrap">		</span>p-&gt;cat =3D category;</div><div>+<sp=
an class=3D"m_-4738127677699250593gmail-Apple-tab-span" style=3D"white-spac=
e:pre-wrap">		</span>p-&gt;binding =3D q;</div><div>+<span class=3D"m_-4738=
127677699250593gmail-Apple-tab-span" style=3D"white-space:pre-wrap">		</spa=
n>p-&gt;lm =3D lm;</div></div><div>```</div><div>5.=C2=A0 I do want to rewr=
ite all to .UTF8, but it&#39;s a bit annoying as your words, then i changed=
 the code to simply strip.=C2=A0</div><span class=3D""><div><br></div><div>=
&gt;=C2=A0<span style=3D"font-size:14px">=C2=A0</span><span style=3D"font-s=
ize:14px">(safe for the user&#39;s terminal)</span></div></span><div>LANG i=
s set by users who are using musl and it&#39;s modified to zh_CN at setloca=
le(), app will use UTF8 directly, there&#39;s no such situation where chars=
et will cause troubles to users&#39; terminal, except apps which get the LA=
NG manually by getenv(). I have not seen such strange applications so far, =
and most apps only have the UTF8 translation files.</div><div><br></div><di=
v>For moving from glibc to musl, i think doing this way is good for now, we=
 could delete it later, or just keep it forever. And most people won&#39;t =
use non-UTF8 at all, if they do use GBK, their app will even fallback to UT=
F8, because no translation files for GBK. <span id=3D"m_-473812767769925059=
3gmail-w_17" class=3D"m_-4738127677699250593gmail-" style=3D"color:rgb(51,5=
1,51);font-family:arial,stheiti,=E5=AE=8B=E4=BD=93,&quot;wenquanyi micro he=
i&quot;,sans-serif;background-color:rgb(249,249,249)">So, i</span>t&#39;s n=
ot so dagerous, i think :).</div><div><br></div><div>And for developers, =
=C2=A0they should not use setlocale to detect the charset, this is wrong, n=
l_langinfo is the correct way. If they use, stripping will let their app kn=
ow something went wrong.</div><div><br></div><div><div>Strip .GBK or .UTF-8=
, so users would be happy that their old settings are working, developers w=
ill notice their mistakes that using setlocale() to validate charset is wro=
ng. We get a lot more than failing the setlocale() and return C, the only b=
ad thing is we need to care about a almost impossible event: an app directl=
y getenv().</div></div></div><div class=3D"HOEnZb"><div class=3D"h5"><div c=
lass=3D"gmail_extra"><br><div class=3D"gmail_quote">2017-02-12 10:34 GMT+08=
:00 Rich Felker <span dir=3D"ltr">&lt;<a href=3D"mailto:dalias@libc.org" ta=
rget=3D"_blank">dalias@libc.org</a>&gt;</span>:<br><blockquote class=3D"gma=
il_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-lef=
t:1ex"><span>On Sat, Feb 11, 2017 at 02:00:56PM +0800, He X wrote:<br>
</span><span>&gt; --- a/src/locale/dcngettext.c 2017-02-06 14:39:17.8604826=
24 +0000<br>
&gt; +++ b/src/locale/dcngettext.c 2017-02-06 14:39:17.860482624 +0000<br>
</span>&gt; @@ -100,7 +100,8 @@<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t map_size;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0void *volatile plural_rule;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0volatile int nplurals;<br>
&gt; -=C2=A0 =C2=A0 =C2=A0char name[];<br>
&gt; +=C2=A0 =C2=A0 =C2=A0struct binding *binding;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0struct __locale_map *lm;<br>
&gt;=C2=A0 };<br>
<br>
As stated in the reply to message body, I think you need the category<br>
in the keying too, because there can be different .mo files loaded<br>
depending on which category was requested.<br>
<br>
&gt;=C2=A0 static char *dummy_gettextdomain()<br>
&gt; @@ -120,58 +122,87 @@<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0struct msgcat *p;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0struct __locale_struct *loc =3D CURRENT_LOCA=
LE;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0const struct __locale_map *lm;<br>
&gt; -=C2=A0 =C2=A0 =C2=A0const char *dirname, *locname, *catname;<br>
&gt; -=C2=A0 =C2=A0 =C2=A0size_t dirlen, loclen, catlen, domlen;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0size_t domlen;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0struct binding *q;<br>
&gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0if ((unsigned)category &gt;=3D LC_ALL) goto =
notrans;<br>
&gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0if (!domainname) domainname =3D __gettextdom=
ain();<br>
&gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0domlen =3D strnlen(domainname, NAME_MAX+1);<=
br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0if (domlen &gt; NAME_MAX) goto notrans;<br>
&gt;<br>
&gt; -=C2=A0 =C2=A0 =C2=A0dirname =3D gettextdir(domainname, &amp;dirlen);<=
br>
&gt; -=C2=A0 =C2=A0 =C2=A0if (!dirname) goto notrans;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0for (q=3Dbindings; q; q=3Dq-&gt;next)<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!strcmp(q-&gt;dom=
ainname, domainname) &amp;&amp; q-&gt;active)<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0break;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0if (!q) goto notrans;<br>
<br>
Looks ok. I had said this should be a function but it really doesn&#39;t<br=
>
need to be; it&#39;s plenty simple inline.<br>
<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0lm =3D loc-&gt;cat[category];<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0if (!lm) {<br>
&gt;=C2=A0 notrans:<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return (char *) =
((n =3D=3D 1) ? msgid1 : msgid2);<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
&gt; -=C2=A0 =C2=A0 =C2=A0locname =3D lm-&gt;name;<br>
&gt; -<br>
&gt; -=C2=A0 =C2=A0 =C2=A0catname =3D catnames[category];<br>
&gt; -=C2=A0 =C2=A0 =C2=A0catlen =3D catlens[category];<br>
&gt; -=C2=A0 =C2=A0 =C2=A0loclen =3D strlen(locname);<br>
&gt; -<br>
&gt; -=C2=A0 =C2=A0 =C2=A0size_t namelen =3D dirlen+1 + loclen+1 + catlen+1=
 + domlen+3;<br>
&gt; -=C2=A0 =C2=A0 =C2=A0char name[namelen+1], *s =3D name;<br>
&gt; -<br>
&gt; -=C2=A0 =C2=A0 =C2=A0memcpy(s, dirname, dirlen);<br>
&gt; -=C2=A0 =C2=A0 =C2=A0s[dirlen] =3D &#39;/&#39;;<br>
&gt; -=C2=A0 =C2=A0 =C2=A0s +=3D dirlen + 1;<br>
&gt; -=C2=A0 =C2=A0 =C2=A0memcpy(s, locname, loclen);<br>
&gt; -=C2=A0 =C2=A0 =C2=A0s[loclen] =3D &#39;/&#39;;<br>
&gt; -=C2=A0 =C2=A0 =C2=A0s +=3D loclen + 1;<br>
&gt; -=C2=A0 =C2=A0 =C2=A0memcpy(s, catname, catlen);<br>
&gt; -=C2=A0 =C2=A0 =C2=A0s[catlen] =3D &#39;/&#39;;<br>
&gt; -=C2=A0 =C2=A0 =C2=A0s +=3D catlen + 1;<br>
&gt; -=C2=A0 =C2=A0 =C2=A0memcpy(s, domainname, domlen);<br>
&gt; -=C2=A0 =C2=A0 =C2=A0s[domlen] =3D &#39;.&#39;;<br>
&gt; -=C2=A0 =C2=A0 =C2=A0s[domlen+1] =3D &#39;m&#39;;<br>
&gt; -=C2=A0 =C2=A0 =C2=A0s[domlen+2] =3D &#39;o&#39;;<br>
&gt; -=C2=A0 =C2=A0 =C2=A0s[domlen+3] =3D 0;<br>
&gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0for (p=3Dcats; p; p=3Dp-&gt;next)<br>
&gt; -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!strcmp(p-&gt;nam=
e, name))<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (p-&gt;binding =3D=
=3D q &amp;&amp; p-&gt;lm =3D=3D lm)<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0break;<br>
<br>
&amp;&amp; p-&gt;cat =3D=3D category<br>
<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0if (!p) {<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0const char *dirname, =
*locname, *catname;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0size_t dirlen, loclen=
, catlen;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0void *old_cats;<=
br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0size_t map_size;=
<br>
&gt; +<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0dirname =3D q-&gt;dir=
name;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0locname =3D lm-&gt;na=
me;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0catname =3D catnames[=
category];<br>
&gt; +<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0dirlen =3D q-&gt;dirl=
en;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0loclen =3D strlen(loc=
name);<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0catlen =3D catlens[ca=
tegory];<br>
<br>
Now that these are only computed once rather than per-call, optimizing<br>
out strlen is probably not worthwhile anymore, but it doesn&#39;t really<br=
>
hurt either. Not something you need to change, just a comment.<br>
<br>
&gt; +<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0size_t namelen =3D di=
rlen+1 + loclen+1 + catlen+1 + domlen+3;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0char name[namelen+1],=
 *s =3D name;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0char *str =3D name;<b=
r>
&gt; +<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0memcpy(s, dirname, di=
rlen);<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[dirlen] =3D &#39;/&=
#39;;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s +=3D dirlen + 1;<br=
>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0memcpy(s, locname, lo=
clen);<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[loclen] =3D &#39;/&=
#39;;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s +=3D loclen + 1;<br=
>
&gt; +skip_loc:<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0memcpy(s, catname, ca=
tlen);<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[catlen] =3D &#39;/&=
#39;;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s +=3D catlen + 1;<br=
>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0memcpy(s, domainname,=
 domlen);<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[domlen] =3D &#39;.&=
#39;;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[domlen+1] =3D &#39;=
m&#39;;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[domlen+2] =3D &#39;=
o&#39;;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[domlen+3] =3D 0;<br=
>
<br>
Actually, now that this code is not a hot path, it should just be<br>
using snprintf to construct the pathname, I think. It would be a lot<br>
simpler and easier to ensure correctness.<br>
<br>
&gt; +<br>
<span>&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0const void=
 *map =3D __map_file(name, &amp;map_size);<br>
&gt; -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!map) goto notran=
s;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!map) {<br>
</span>&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0if (s =3D strchr(name+dirlen+1, &#39;@&#39;)) {<br>
<span>&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0*s++ =3D &#39;/&#39;;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0goto skip_loc;;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0}<br>
</span>&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0if ( str &amp;&amp; (s =3D strchr(name+dirlen+1, &#39;_&#39;)=
) &amp;&amp; (s &lt; strchr(name+dirlen+1, &#39;/&#39;)) ) {<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (str =3D strchr(locname, &#39;@&#39;)=
) {<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0loclen +=3D =
locname - str;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0memcpy(s, st=
r, loclen);<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s[loclen] =
=3D &#39;/&#39;;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s +=3D locle=
n + 1;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0str =3D 0;<b=
r>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0goto skip_lo=
c;<br>
<span>&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0} else {<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0*s++ =3D =
9;/&#39;;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0goto skip_lo=
c;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0}<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0goto notrans;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
<br>
</span>Using snprintf should also make it easy to get rid of the goto/retry=
<br>
logic here, perhaps even with a 4-iteration loop and array of which<br>
format modifications happen on each iteration.<br>
<span><br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0p =3D calloc(siz=
eof *p + namelen + 1, 1);<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!p) {<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0__munmap((void *)map, map_size);<br>
</span>&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0goto notrans;<br>
&gt; @@ -209,7 +209,6 @@<br>
<span>&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0p-&gt;map =3D ma=
p;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0p-&gt;map_size =
=3D map_size;<br>
</span>&gt; -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0memcpy(p-&gt;n=
ame, name, namelen+1);<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0do {<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0old_cats =3D cats;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0p-&gt;next =3D old_cats;<br>
<span>&gt; --- a/src/locale/locale_map.c 2017-02-06 14:39:<a href=3D"tel:17=
.797148750" value=3D"+17797148750" target=3D"_blank">17.797148750</a> +0000=
<br>
&gt; +++ b/src/locale/locale_map.c 2017-02-06 14:39:<a href=3D"tel:17.79714=
8750" value=3D"+17797148750" target=3D"_blank">17.797148750</a> +0000<br>
</span>&gt; @@ -49,8 +49,8 @@<br>
<span>&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>
&gt;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0/* Limit name length and forbid leading dot =
or any slashes. */<br>
</span>&gt; -=C2=A0 =C2=A0 =C2=A0for (n=3D0; n&lt;LOCALE_NAME_MAX &amp;&amp=
; val[n] &amp;&amp; val[n]!=3D&#39;/&#39;; n++);<br>
&gt; -=C2=A0 =C2=A0 =C2=A0if (val[0]=3D=3D&#39;.&#39; || val[n]) val =3D &q=
uot;C.UTF-8&quot;;<br>
&gt; +=C2=A0 =C2=A0 =C2=A0for (n=3D0; n&lt;LOCALE_NAME_MAX &amp;&amp; val[n=
] &amp;&amp; val[n]!=3D&#39;/&#39; &amp;&amp; val[n]!=3D&#39;.&#39;; n++);<=
br>
&gt; +=C2=A0 =C2=A0 =C2=A0if (val[0]=3D=3D&#39;.&#39; || (val[n] &amp;&amp;=
 val[n]!=3D&#39;.&#39;)) val =3D &quot;C.UTF-8&quot;;<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0int builtin =3D (val[0]=3D=3D&#39;C&#39; &am=
p;&amp; !val[1])<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|| !strcmp(val, =
&quot;C.UTF-8&quot;)<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|| !strcmp(val, =
&quot;POSIX&quot;);<br>
<br>
This looks ok but might still need some tweaks. Should an input like<br>
&quot;zh_CN.GBK&quot; get treated as &quot;zh_CN&quot; (thus outputting UTF=
-8 that might<br>
appear as junk on the user&#39;s terminal) or as &quot;C&quot; (no localiza=
tion)<br>
with only ASCII characters (safe for the user&#39;s terminal), or even<br>
cause setlocale to fail and return an error so that the application<br>
can decide what to do? These are not technical comments on your patch<br>
but policy matters the community should weigh in on.<br>
<span class=3D"m_-4738127677699250593HOEnZb"><font color=3D"#888888"><br>
Rich<br>
</font></span></blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--001a114324fe34466c0548500a7b--