mailing list of musl libc
 help / color / mirror / code / Atom feed
From: He X <xw897002528@gmail.com>
To: musl@lists.openwall.com
Subject: Re: Re: a bug in bindtextdomain() and strip '.UTF-8'
Date: Sat, 11 Feb 2017 14:00:56 +0800	[thread overview]
Message-ID: <CAPG2z08yePs-6pqHcoBbMfWRPyXunuT-2Ge_JDWH8E5Y+_0wtw@mail.gmail.com> (raw)
In-Reply-To: <20170211023610.GA1520@brightrain.aerifal.cx>


[-- Attachment #1.1: Type: text/plain, Size: 6301 bytes --]

fresh patch :)
1. It's easier that just stopping at dot, and i think this should be
commented in the wiki or somewhere.
2. I read your first part of reply for 20mins, but im not sure; If i
understand right, you mean, let the __locale_map* and strcut binding* be
the id-card for msgcat list instead of the long name string, not only
faster, but also more easy to construct pathname string. But there's some
questions:
+ I removed name from msgcat, i can't find its use there, is it safe?
+ gettextdir() is replaced by a new loop, since i need the pointer of
struct binding not only the dirname, but then, gettextdir() is only called
by bindtextdomain(), is there a need to keep it? Or we have a better way to
get the pointer of struct binding?
+ you said msgcat's indexed by  ( struct __locale_map *, struct binding *,
category ), but i found lm(locale_map) is located by category, so if
category is different, then we can't get the same lm, so we can just
compare lm, right?

2017-02-11 10:36 GMT+08:00 Rich Felker <dalias@libc.org>:

> On Thu, Feb 09, 2017 at 05:49:13PM +0800, He X wrote:
> > sry!
> >
> > 2017-02-08 22:31 GMT+08:00 Rich Felker <dalias@libc.org>:
> >
> > > On Wed, Feb 08, 2017 at 06:13:30PM +0800, He X wrote:
> > > > here the patch is: http://paste.ubuntu.com/23953329/
> > > > The code tested, but maybe it sucks.
> > >
> > > Patches need to be attached and sent to the list, not pastebins that
> > > might disappear. The latter don't work for discussing and preserving
> > > discussion of the patch.
>
> > --- a/src/locale/dcngettext.c 2017-02-06 14:39:17.860482624 +0000
> > +++ b/src/locale/dcngettext.c 2017-02-06 14:39:17.860482624 +0000
> > @@ -19,6 +19,7 @@
> >  };
> >
> >  static void *volatile bindings;
> > +char *__strchrnul(const char *, int);
> >
> >  static char *gettextdir(const char *domainname, size_t *dirlen)
> >  {
> > @@ -143,7 +143,7 @@
> >
> >       catname = catnames[category];
> >       catlen = catlens[category];
> > -     loclen = strlen(locname);
> > +     loclen = __strchrnul(locname, '.') - locname;
> >
> >       size_t namelen = dirlen+1 + loclen+1 + catlen+1 + domlen+3;
> >       char name[namelen+1], *s = name;
> > @@ -157,6 +157,8 @@
> > +rewrite_loc:
> >       memcpy(s, locname, loclen);
> >       s[loclen] = '/';
> >       s += loclen + 1;
> > +skip_loc:
> >       memcpy(s, catname, catlen);
> >       s[catlen] = '/';
> >       s += catlen + 1;
> > @@ -174,7 +175,22 @@
> >               void *old_cats;
> >               size_t map_size;
> >               const void *map = __map_file(name, &map_size);
> > -             if (!map) goto notrans;
> > +             if (!map) {
> > +                     if (s = strchr(name + dirlen + 1, '@')) {
> > +                             *s++ = '/';
> > +                             goto skip_loc;
> > +                     }
> > +                     if (locname && (s = strchr(name + dirlen + 1,
> '_')) && (strchr(name + dirlen +1, '/') > s) ) {
> > +                             if (locname = strchr(locname, '@')) {
> > +                                     loclen = __strchrnul(lm->name,
> '.') - locname;
> > +                                     goto rewrite_loc;
> > +                             } else {
> > +                                     *s++ = '/';
> > +                                     goto skip_loc;
> > +                             }
> > +                     }
> > +                     goto notrans;
> > +             }
>
> This doesn't work because it changes both the key used for the lookup
> and the filename mapped. If you try this code with a translation that
> requires a fallback, and run it under strace, you'll see that _every_
> call to gettext will try again to find the nonexistent files.
>
> It could be fixed, but I think the code should be refactored so that,
> rather than the msgcat list being indexed by pathname strings, it's
> indexed by tuples of:
>
>         ( struct __locale_map *, struct binding *, category )
>
> These are all integers/pointers and thus compare very fast versus the
> current strcmp operation, and it's very quick to look them up. Then we
> only have to construct the pathname string when a new file needs to be
> loaded, not on every call, and you're free to clobber the pathname
> string while doing fallbacks.
>
> >               p = calloc(sizeof *p + namelen + 1, 1);
> >               if (!p) {
> >                       __munmap((void *)map, map_size);
> > --- a/src/locale/locale_map.c 2017-02-06 14:39:17.797148750 +0000
> > +++ b/src/locale/locale_map.c 2017-02-06 14:39:17.797148750 +0000
> > @@ -32,6 +32,7 @@
> >       struct __locale_map *new = 0;
> >       const char *path = 0, *z;
> >       char buf[256];
> > +     char *dotp;
> >       size_t l, n;
> >
> >       if (!*val) {
> > @@ -40,6 +41,12 @@
> >               (val = getenv("LANG")) && *val ||
> >               (val = "C.UTF-8");
> >       }
> > +     if (dotp = strchr(val, '.')) {
> > +             char part[256];
> > +             memcpy(part, val, dotp - val);
> > +             memcpy(&part[dotp - val], ".UTF-8\0", 7);
> > +             val = part;
> > +     }
> >
> >       /* Limit name length and forbid leading dot or any slashes. */
> >       for (n=0; n<LOCALE_NAME_MAX && val[n] && val[n]!='/'; n++);
>
> I don't think this part is desirable, but if it were, it would need to
> be done differently. As-is, it has serious UB, use of part[] after the
> end of its lifetime. It also seems to have no check to see that
> dotp-val is less than 256-7 or even that it's bounded, whereas the
> code that immediately follows checks the length of the string pointed
> to by val.
>
> I think what it should be doing is the opposite, stopping when hitting
> a dot in the name and only using the part up to the dot, except in the
> one special case "C.UTF-8". The subsequent path search for the locale
> file should probably then be repeated with combinations of dropping
> @mod and _CC suffixes, but this dropping should _not_ affect the name
> that's saved and reported back. (That is, if LC_TIME=fr_CA but only a
> "fr" locale file exists, the "fr" file should get mapped but the name
> returned by setlocale, and saved for use by gettext, should still be
> the full "fr_CA" in case applications have "fr_CA" translations.)
>
> Rich
>

[-- Attachment #1.2: Type: text/html, Size: 8221 bytes --]

[-- Attachment #2: locale.diff --]
[-- Type: text/plain, Size: 3878 bytes --]

--- a/src/locale/dcngettext.c	2017-02-06 14:39:17.860482624 +0000 
+++ b/src/locale/dcngettext.c	2017-02-06 14:39:17.860482624 +0000
@@ -100,7 +100,8 @@
 	size_t map_size;
 	void *volatile plural_rule;
 	volatile int nplurals;
-	char name[];
+	struct binding *binding;
+	struct __locale_map *lm;
 };
 
 static char *dummy_gettextdomain()
@@ -120,58 +122,87 @@
 	struct msgcat *p;
 	struct __locale_struct *loc = CURRENT_LOCALE;
 	const struct __locale_map *lm;
-	const char *dirname, *locname, *catname;
-	size_t dirlen, loclen, catlen, domlen;
+	size_t domlen;
+	struct binding *q;
 
 	if ((unsigned)category >= LC_ALL) goto notrans;
 
 	if (!domainname) domainname = __gettextdomain();
 
 	domlen = strnlen(domainname, NAME_MAX+1);
 	if (domlen > NAME_MAX) goto notrans;
 
-	dirname = gettextdir(domainname, &dirlen);
-	if (!dirname) goto notrans;
+	for (q=bindings; q; q=q->next)
+		if (!strcmp(q->domainname, domainname) && q->active)
+			break;
+	if (!q) goto notrans;
 
 	lm = loc->cat[category];
 	if (!lm) {
 notrans:
 		return (char *) ((n == 1) ? msgid1 : msgid2);
 	}
-	locname = lm->name;
-
-	catname = catnames[category];
-	catlen = catlens[category];
-	loclen = strlen(locname);
-
-	size_t namelen = dirlen+1 + loclen+1 + catlen+1 + domlen+3;
-	char name[namelen+1], *s = name;
-
-	memcpy(s, dirname, dirlen);
-	s[dirlen] = '/';
-	s += dirlen + 1;
-	memcpy(s, locname, loclen);
-	s[loclen] = '/';
-	s += loclen + 1;
-	memcpy(s, catname, catlen);
-	s[catlen] = '/';
-	s += catlen + 1;
-	memcpy(s, domainname, domlen);
-	s[domlen] = '.';
-	s[domlen+1] = 'm';
-	s[domlen+2] = 'o';
-	s[domlen+3] = 0;
 
 	for (p=cats; p; p=p->next)
-		if (!strcmp(p->name, name))
+		if (p->binding == q && p->lm == lm)
 			break;
 
 	if (!p) {
+		const char *dirname, *locname, *catname;
+		size_t dirlen, loclen, catlen;
 		void *old_cats;
 		size_t map_size;
+
+		dirname = q->dirname;
+		locname = lm->name;
+		catname = catnames[category];
+
+		dirlen = q->dirlen;
+		loclen = strlen(locname);
+		catlen = catlens[category];
+
+		size_t namelen = dirlen+1 + loclen+1 + catlen+1 + domlen+3;
+		char name[namelen+1], *s = name;
+		char *str = name;
+
+		memcpy(s, dirname, dirlen);
+		s[dirlen] = '/';
+		s += dirlen + 1;
+		memcpy(s, locname, loclen);
+		s[loclen] = '/';
+		s += loclen + 1;
+skip_loc:
+		memcpy(s, catname, catlen);
+		s[catlen] = '/';
+		s += catlen + 1;
+		memcpy(s, domainname, domlen);
+		s[domlen] = '.';
+		s[domlen+1] = 'm';
+		s[domlen+2] = 'o';
+		s[domlen+3] = 0;
+
 		const void *map = __map_file(name, &map_size);
-		if (!map) goto notrans;
+		if (!map) {
+			if (s = strchr(name+dirlen+1, '@')) {
+ 				*s++ = '/';
+ 				goto skip_loc;;
+ 			}
+ 			if ( str && (s = strchr(name+dirlen+1, '_')) && (s < strchr(name+dirlen+1, '/')) ) {
+ 				if (str = strchr(locname, '@')) {
+ 					loclen += locname - str;
+					memcpy(s, str, loclen);
+					s[loclen] = '/';
+					s += loclen + 1;
+					str = 0;
+ 					goto skip_loc;
+ 				} else {
+					*s++ = '/';
+ 					goto skip_loc;
+ 				}
+ 			}
+			goto notrans;
+		}
 		p = calloc(sizeof *p + namelen + 1, 1);
 		if (!p) {
 			__munmap((void *)map, map_size);
 			goto notrans;
@@ -209,7 +209,6 @@
 		}
 		p->map = map;
 		p->map_size = map_size;
-		memcpy(p->name, name, namelen+1);
 		do {
 			old_cats = cats;
 			p->next = old_cats;
--- a/src/locale/locale_map.c	2017-02-06 14:39:17.797148750 +0000
+++ b/src/locale/locale_map.c	2017-02-06 14:39:17.797148750 +0000
@@ -49,8 +49,8 @@
 	}
 
 	/* Limit name length and forbid leading dot or any slashes. */
-	for (n=0; n<LOCALE_NAME_MAX && val[n] && val[n]!='/'; n++);
-	if (val[0]=='.' || val[n]) val = "C.UTF-8";
+	for (n=0; n<LOCALE_NAME_MAX && val[n] && val[n]!='/' && val[n]!='.'; n++);
+	if (val[0]=='.' || (val[n] && val[n]!='.')) val = "C.UTF-8";
 	int builtin = (val[0]=='C' && !val[1])
 		|| !strcmp(val, "C.UTF-8")
 		|| !strcmp(val, "POSIX");

  reply	other threads:[~2017-02-11  6:00 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-20 11:25 He X
2017-01-29  4:52 ` He X
2017-01-29 13:39   ` Szabolcs Nagy
2017-01-29 14:07     ` Rich Felker
2017-01-29 14:48       ` He X
2017-01-29 15:55         ` Rich Felker
2017-01-29 16:14           ` He X
2017-01-29 16:33             ` Rich Felker
2017-02-08 10:13               ` He X
2017-02-08 14:31                 ` Rich Felker
2017-02-09  9:49                   ` He X
2017-02-11  2:36                     ` Rich Felker
2017-02-11  6:00                       ` He X [this message]
2017-02-11 23:59                         ` Rich Felker
2017-02-12  2:34                         ` Rich Felker
2017-02-12  6:56                           ` He X
2017-02-12  7:11                             ` He X
2017-02-13 17:08                             ` Rich Felker
2017-02-13  8:01                           ` He X
2017-02-13 13:28                             ` Rich Felker
2017-02-13 14:06                               ` He X
2017-02-13 17:12                                 ` Rich Felker
2017-03-04  8:02                                   ` He X
2017-03-17 19:27                                     ` Rich Felker
2017-03-17 19:37                                       ` Rich Felker
2017-03-18  7:34                                         ` He X
2017-03-18 12:28                                           ` Rich Felker
2017-03-18 13:50                                             ` He X
2017-02-13 14:12                               ` He X
2017-02-13 17:13                                 ` Rich Felker
2017-01-29 16:37         ` Rich Felker
2017-01-30  0:37           ` He X
2017-01-30 14:17           ` He X
2017-01-29 16:40         ` Szabolcs Nagy
2017-01-29 16:49           ` Rich Felker
2017-01-30 12:36             ` He X
2017-01-30 13:05               ` Szabolcs Nagy
2017-01-30  1:32           ` He X

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPG2z08yePs-6pqHcoBbMfWRPyXunuT-2Ge_JDWH8E5Y+_0wtw@mail.gmail.com \
    --to=xw897002528@gmail.com \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).