From: He X <xw897002528@gmail.com>
To: musl@lists.openwall.com
Subject: Re: Re: a bug in bindtextdomain() and strip '.UTF-8'
Date: Sun, 12 Feb 2017 14:56:53 +0800 [thread overview]
Message-ID: <CAPG2z08MUcj-0i=_kOO=WTP4bAufc2Dx8v6kGvmzHnoVu4c-nA@mail.gmail.com> (raw)
In-Reply-To: <20170212023422.GE1520@brightrain.aerifal.cx>
[-- Attachment #1.1: Type: text/plain, Size: 9048 bytes --]
1. cat is added to the keys, also do a validate
2. so we what do we deal with the gettextdir() exactly? inline it or
construct a gettextpointer()?
3. i added a extra locbuf array, and goto is replaced by a loop, memcpy is
replaced by snprintf, compiled, and working well with fcitx
4. i just found that i forgot to store the keys to new buffer, it's ok to
just use normal expression? or we need atomic operations?
```
+ p->cat = category;
+ p->binding = q;
+ p->lm = lm;
```
5. I do want to rewrite all to .UTF8, but it's a bit annoying as your
words, then i changed the code to simply strip.
> (safe for the user's terminal)
LANG is set by users who are using musl and it's modified to zh_CN at
setlocale(), app will use UTF8 directly, there's no such situation where
charset will cause troubles to users' terminal, except apps which get the
LANG manually by getenv(). I have not seen such strange applications so
far, and most apps only have the UTF8 translation files.
For moving from glibc to musl, i think doing this way is good for now, we
could delete it later, or just keep it forever. And most people won't use
non-UTF8 at all, if they do use GBK, their app will even fallback to UTF8,
because no translation files for GBK. So, it's not so dagerous, i think :).
And for developers, they should not use setlocale to detect the charset,
this is wrong, nl_langinfo is the correct way. If they use, stripping will
let their app know something went wrong.
Strip .GBK or .UTF-8, so users would be happy that their old settings are
working, developers will notice their mistakes that using setlocale() to
validate charset is wrong. We get a lot more than failing the setlocale()
and return C, the only bad thing is we need to care about a almost
impossible event: an app directly getenv().
2017-02-12 10:34 GMT+08:00 Rich Felker <dalias@libc.org>:
> On Sat, Feb 11, 2017 at 02:00:56PM +0800, He X wrote:
> > --- a/src/locale/dcngettext.c 2017-02-06 14:39:17.860482624 +0000
> > +++ b/src/locale/dcngettext.c 2017-02-06 14:39:17.860482624 +0000
> > @@ -100,7 +100,8 @@
> > size_t map_size;
> > void *volatile plural_rule;
> > volatile int nplurals;
> > - char name[];
> > + struct binding *binding;
> > + struct __locale_map *lm;
> > };
>
> As stated in the reply to message body, I think you need the category
> in the keying too, because there can be different .mo files loaded
> depending on which category was requested.
>
> > static char *dummy_gettextdomain()
> > @@ -120,58 +122,87 @@
> > struct msgcat *p;
> > struct __locale_struct *loc = CURRENT_LOCALE;
> > const struct __locale_map *lm;
> > - const char *dirname, *locname, *catname;
> > - size_t dirlen, loclen, catlen, domlen;
> > + size_t domlen;
> > + struct binding *q;
> >
> > if ((unsigned)category >= LC_ALL) goto notrans;
> >
> > if (!domainname) domainname = __gettextdomain();
> >
> > domlen = strnlen(domainname, NAME_MAX+1);
> > if (domlen > NAME_MAX) goto notrans;
> >
> > - dirname = gettextdir(domainname, &dirlen);
> > - if (!dirname) goto notrans;
> > + for (q=bindings; q; q=q->next)
> > + if (!strcmp(q->domainname, domainname) && q->active)
> > + break;
> > + if (!q) goto notrans;
>
> Looks ok. I had said this should be a function but it really doesn't
> need to be; it's plenty simple inline.
>
> > lm = loc->cat[category];
> > if (!lm) {
> > notrans:
> > return (char *) ((n == 1) ? msgid1 : msgid2);
> > }
> > - locname = lm->name;
> > -
> > - catname = catnames[category];
> > - catlen = catlens[category];
> > - loclen = strlen(locname);
> > -
> > - size_t namelen = dirlen+1 + loclen+1 + catlen+1 + domlen+3;
> > - char name[namelen+1], *s = name;
> > -
> > - memcpy(s, dirname, dirlen);
> > - s[dirlen] = '/';
> > - s += dirlen + 1;
> > - memcpy(s, locname, loclen);
> > - s[loclen] = '/';
> > - s += loclen + 1;
> > - memcpy(s, catname, catlen);
> > - s[catlen] = '/';
> > - s += catlen + 1;
> > - memcpy(s, domainname, domlen);
> > - s[domlen] = '.';
> > - s[domlen+1] = 'm';
> > - s[domlen+2] = 'o';
> > - s[domlen+3] = 0;
> >
> > for (p=cats; p; p=p->next)
> > - if (!strcmp(p->name, name))
> > + if (p->binding == q && p->lm == lm)
> > break;
>
> && p->cat == category
>
> > if (!p) {
> > + const char *dirname, *locname, *catname;
> > + size_t dirlen, loclen, catlen;
> > void *old_cats;
> > size_t map_size;
> > +
> > + dirname = q->dirname;
> > + locname = lm->name;
> > + catname = catnames[category];
> > +
> > + dirlen = q->dirlen;
> > + loclen = strlen(locname);
> > + catlen = catlens[category];
>
> Now that these are only computed once rather than per-call, optimizing
> out strlen is probably not worthwhile anymore, but it doesn't really
> hurt either. Not something you need to change, just a comment.
>
> > +
> > + size_t namelen = dirlen+1 + loclen+1 + catlen+1 + domlen+3;
> > + char name[namelen+1], *s = name;
> > + char *str = name;
> > +
> > + memcpy(s, dirname, dirlen);
> > + s[dirlen] = '/';
> > + s += dirlen + 1;
> > + memcpy(s, locname, loclen);
> > + s[loclen] = '/';
> > + s += loclen + 1;
> > +skip_loc:
> > + memcpy(s, catname, catlen);
> > + s[catlen] = '/';
> > + s += catlen + 1;
> > + memcpy(s, domainname, domlen);
> > + s[domlen] = '.';
> > + s[domlen+1] = 'm';
> > + s[domlen+2] = 'o';
> > + s[domlen+3] = 0;
>
> Actually, now that this code is not a hot path, it should just be
> using snprintf to construct the pathname, I think. It would be a lot
> simpler and easier to ensure correctness.
>
> > +
> > const void *map = __map_file(name, &map_size);
> > - if (!map) goto notrans;
> > + if (!map) {
> > + if (s = strchr(name+dirlen+1, '@')) {
> > + *s++ = '/';
> > + goto skip_loc;;
> > + }
> > + if ( str && (s = strchr(name+dirlen+1, '_')) && (s
> < strchr(name+dirlen+1, '/')) ) {
> > + if (str = strchr(locname, '@')) {
> > + loclen += locname - str;
> > + memcpy(s, str, loclen);
> > + s[loclen] = '/';
> > + s += loclen + 1;
> > + str = 0;
> > + goto skip_loc;
> > + } else {
> > + *s++ = '/';
> > + goto skip_loc;
> > + }
> > + }
> > + goto notrans;
> > + }
>
> Using snprintf should also make it easy to get rid of the goto/retry
> logic here, perhaps even with a 4-iteration loop and array of which
> format modifications happen on each iteration.
>
> > p = calloc(sizeof *p + namelen + 1, 1);
> > if (!p) {
> > __munmap((void *)map, map_size);
> > goto notrans;
> > @@ -209,7 +209,6 @@
> > }
> > p->map = map;
> > p->map_size = map_size;
> > - memcpy(p->name, name, namelen+1);
> > do {
> > old_cats = cats;
> > p->next = old_cats;
> > --- a/src/locale/locale_map.c 2017-02-06 14:39:17.797148750 +0000
> > +++ b/src/locale/locale_map.c 2017-02-06 14:39:17.797148750 +0000
> > @@ -49,8 +49,8 @@
> > }
> >
> > /* Limit name length and forbid leading dot or any slashes. */
> > - for (n=0; n<LOCALE_NAME_MAX && val[n] && val[n]!='/'; n++);
> > - if (val[0]=='.' || val[n]) val = "C.UTF-8";
> > + for (n=0; n<LOCALE_NAME_MAX && val[n] && val[n]!='/' &&
> val[n]!='.'; n++);
> > + if (val[0]=='.' || (val[n] && val[n]!='.')) val = "C.UTF-8";
> > int builtin = (val[0]=='C' && !val[1])
> > || !strcmp(val, "C.UTF-8")
> > || !strcmp(val, "POSIX");
>
> This looks ok but might still need some tweaks. Should an input like
> "zh_CN.GBK" get treated as "zh_CN" (thus outputting UTF-8 that might
> appear as junk on the user's terminal) or as "C" (no localization)
> with only ASCII characters (safe for the user's terminal), or even
> cause setlocale to fail and return an error so that the application
> can decide what to do? These are not technical comments on your patch
> but policy matters the community should weigh in on.
>
> Rich
>
[-- Attachment #1.2: Type: text/html, Size: 12745 bytes --]
[-- Attachment #2: locale.diff --]
[-- Type: text/plain, Size: 3236 bytes --]
--- a/src/locale/dcngettext.c 2017-02-06 14:39:17.860482624 +0000
+++ b/src/locale/dcngettext.c 2017-02-06 14:39:17.860482624 +0000
@@ -100,7 +100,9 @@
size_t map_size;
void *volatile plural_rule;
volatile int nplurals;
- char name[];
+ struct binding *binding;
+ struct __locale_map *lm;
+ struct msgcat cat;
};
static char *dummy_gettextdomain()
@@ -120,8 +122,8 @@
struct msgcat *p;
struct __locale_struct *loc = CURRENT_LOCALE;
const struct __locale_map *lm;
- const char *dirname, *locname, *catname;
- size_t dirlen, loclen, catlen, domlen;
+ size_t domlen;
+ struct binding *q;
if ((unsigned)category >= LC_ALL) goto notrans;
@@ -130,47 +132,62 @@
domlen = strnlen(domainname, NAME_MAX+1);
if (domlen > NAME_MAX) goto notrans;
- dirname = gettextdir(domainname, &dirlen);
- if (!dirname) goto notrans;
+ for (q=bindings; q; q=q->next)
+ if (!strcmp(q->domainname, domainname) && q->active)
+ break;
+ if (!q) goto notrans;
lm = loc->cat[category];
if (!lm) {
notrans:
return (char *) ((n == 1) ? msgid1 : msgid2);
}
- locname = lm->name;
-
- catname = catnames[category];
- catlen = catlens[category];
- loclen = strlen(locname);
-
- size_t namelen = dirlen+1 + loclen+1 + catlen+1 + domlen+3;
- char name[namelen+1], *s = name;
-
- memcpy(s, dirname, dirlen);
- s[dirlen] = '/';
- s += dirlen + 1;
- memcpy(s, locname, loclen);
- s[loclen] = '/';
- s += loclen + 1;
- memcpy(s, catname, catlen);
- s[catlen] = '/';
- s += catlen + 1;
- memcpy(s, domainname, domlen);
- s[domlen] = '.';
- s[domlen+1] = 'm';
- s[domlen+2] = 'o';
- s[domlen+3] = 0;
for (p=cats; p; p=p->next)
- if (!strcmp(p->name, name))
+ if (p->binding == q && p->lm == lm && p->cat == category)
break;
if (!p) {
+ const char *dirname, *locname, *catname;
+ size_t dirlen, loclen, catlen;
void *old_cats;
size_t map_size;
- const void *map = __map_file(name, &map_size);
+
+ dirname = q->dirname;
+ locname = lm->name;
+ catname = catnames[category];
+
+ dirlen = q->dirlen;
+ loclen = strlen(locname);
+ catlen = catlens[category];
+
+ size_t namelen = dirlen+1 + loclen+1 + catlen+1 + domlen+3;
+ char name[namelen+1];
+ char locbuf[loclen+1], *locp = locbuf;
+ const void *map;
+
+ memcpy(locbuf, locname, loclen);
+ locbuf[loclen] = 0;
+
+ for (;;) {
+ snprintf(name, namelen+1, "%s/%s/%s/%s.mo\0", dirname, locbuf, catname, domainname);
+ if (map = __map_file(name, &map_size)) break;
+
+ if (locp = strchr(locbuf, '@')) {
+ *locp = 0;
+ locbuf[loclen] = '@';
+ } else if (locp = strchr(locbuf, '_')) {
+ if (locbuf[loclen] == '@') {
+ locbuf[loclen] = 0;
+ *locp = '@';
+ strcat(locp+1, locbuf + strlen(locbuf) + 1);
+ } else *locp = 0;
+ } else {
+ break;
+ }
+ }
if (!map) goto notrans;
+
p = calloc(sizeof *p + namelen + 1, 1);
if (!p) {
__munmap((void *)map, map_size);
@@ -178,7 +195,6 @@
}
p->map = map;
p->map_size = map_size;
- memcpy(p->name, name, namelen+1);
do {
old_cats = cats;
p->next = old_cats;
@@ -193,6 +193,9 @@
__munmap((void *)map, map_size);
goto notrans;
}
+ p->cat = category;
+ p->binding = q;
+ p->lm = lm;
p->map = map;
p->map_size = map_size;
do {
next prev parent reply other threads:[~2017-02-12 6:56 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-20 11:25 He X
2017-01-29 4:52 ` He X
2017-01-29 13:39 ` Szabolcs Nagy
2017-01-29 14:07 ` Rich Felker
2017-01-29 14:48 ` He X
2017-01-29 15:55 ` Rich Felker
2017-01-29 16:14 ` He X
2017-01-29 16:33 ` Rich Felker
2017-02-08 10:13 ` He X
2017-02-08 14:31 ` Rich Felker
2017-02-09 9:49 ` He X
2017-02-11 2:36 ` Rich Felker
2017-02-11 6:00 ` He X
2017-02-11 23:59 ` Rich Felker
2017-02-12 2:34 ` Rich Felker
2017-02-12 6:56 ` He X [this message]
2017-02-12 7:11 ` He X
2017-02-13 17:08 ` Rich Felker
2017-02-13 8:01 ` He X
2017-02-13 13:28 ` Rich Felker
2017-02-13 14:06 ` He X
2017-02-13 17:12 ` Rich Felker
2017-03-04 8:02 ` He X
2017-03-17 19:27 ` Rich Felker
2017-03-17 19:37 ` Rich Felker
2017-03-18 7:34 ` He X
2017-03-18 12:28 ` Rich Felker
2017-03-18 13:50 ` He X
2017-02-13 14:12 ` He X
2017-02-13 17:13 ` Rich Felker
2017-01-29 16:37 ` Rich Felker
2017-01-30 0:37 ` He X
2017-01-30 14:17 ` He X
2017-01-29 16:40 ` Szabolcs Nagy
2017-01-29 16:49 ` Rich Felker
2017-01-30 12:36 ` He X
2017-01-30 13:05 ` Szabolcs Nagy
2017-01-30 1:32 ` He X
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAPG2z08MUcj-0i=_kOO=WTP4bAufc2Dx8v6kGvmzHnoVu4c-nA@mail.gmail.com' \
--to=xw897002528@gmail.com \
--cc=musl@lists.openwall.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).