From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/11022 Path: news.gmane.org!.POSTED!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: Re: a bug in bindtextdomain() and strip '.UTF-8' Date: Fri, 10 Feb 2017 21:36:10 -0500 Message-ID: <20170211023610.GA1520@brightrain.aerifal.cx> References: <20170129133946.GT17692@port70.net> <20170129140747.GJ1533@brightrain.aerifal.cx> <20170129155507.GK1533@brightrain.aerifal.cx> <20170129163329.GL1533@brightrain.aerifal.cx> <20170208143147.GY1533@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1486780590 25946 195.159.176.226 (11 Feb 2017 02:36:30 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sat, 11 Feb 2017 02:36:30 +0000 (UTC) User-Agent: Mutt/1.5.21 (2010-09-15) To: musl@lists.openwall.com Original-X-From: musl-return-11037-gllmg-musl=m.gmane.org@lists.openwall.com Sat Feb 11 03:36:25 2017 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1ccNXz-0006Ik-M2 for gllmg-musl@m.gmane.org; Sat, 11 Feb 2017 03:36:23 +0100 Original-Received: (qmail 23684 invoked by uid 550); 11 Feb 2017 02:36:26 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 23663 invoked from network); 11 Feb 2017 02:36:25 -0000 Content-Disposition: inline In-Reply-To: Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:11022 Archived-At: On Thu, Feb 09, 2017 at 05:49:13PM +0800, He X wrote: > sry! > > 2017-02-08 22:31 GMT+08:00 Rich Felker : > > > On Wed, Feb 08, 2017 at 06:13:30PM +0800, He X wrote: > > > here the patch is: http://paste.ubuntu.com/23953329/ > > > The code tested, but maybe it sucks. > > > > Patches need to be attached and sent to the list, not pastebins that > > might disappear. The latter don't work for discussing and preserving > > discussion of the patch. > --- a/src/locale/dcngettext.c 2017-02-06 14:39:17.860482624 +0000 > +++ b/src/locale/dcngettext.c 2017-02-06 14:39:17.860482624 +0000 > @@ -19,6 +19,7 @@ > }; > > static void *volatile bindings; > +char *__strchrnul(const char *, int); > > static char *gettextdir(const char *domainname, size_t *dirlen) > { > @@ -143,7 +143,7 @@ > > catname = catnames[category]; > catlen = catlens[category]; > - loclen = strlen(locname); > + loclen = __strchrnul(locname, '.') - locname; > > size_t namelen = dirlen+1 + loclen+1 + catlen+1 + domlen+3; > char name[namelen+1], *s = name; > @@ -157,6 +157,8 @@ > +rewrite_loc: > memcpy(s, locname, loclen); > s[loclen] = '/'; > s += loclen + 1; > +skip_loc: > memcpy(s, catname, catlen); > s[catlen] = '/'; > s += catlen + 1; > @@ -174,7 +175,22 @@ > void *old_cats; > size_t map_size; > const void *map = __map_file(name, &map_size); > - if (!map) goto notrans; > + if (!map) { > + if (s = strchr(name + dirlen + 1, '@')) { > + *s++ = '/'; > + goto skip_loc; > + } > + if (locname && (s = strchr(name + dirlen + 1, '_')) && (strchr(name + dirlen +1, '/') > s) ) { > + if (locname = strchr(locname, '@')) { > + loclen = __strchrnul(lm->name, '.') - locname; > + goto rewrite_loc; > + } else { > + *s++ = '/'; > + goto skip_loc; > + } > + } > + goto notrans; > + } This doesn't work because it changes both the key used for the lookup and the filename mapped. If you try this code with a translation that requires a fallback, and run it under strace, you'll see that _every_ call to gettext will try again to find the nonexistent files. It could be fixed, but I think the code should be refactored so that, rather than the msgcat list being indexed by pathname strings, it's indexed by tuples of: ( struct __locale_map *, struct binding *, category ) These are all integers/pointers and thus compare very fast versus the current strcmp operation, and it's very quick to look them up. Then we only have to construct the pathname string when a new file needs to be loaded, not on every call, and you're free to clobber the pathname string while doing fallbacks. > p = calloc(sizeof *p + namelen + 1, 1); > if (!p) { > __munmap((void *)map, map_size); > --- a/src/locale/locale_map.c 2017-02-06 14:39:17.797148750 +0000 > +++ b/src/locale/locale_map.c 2017-02-06 14:39:17.797148750 +0000 > @@ -32,6 +32,7 @@ > struct __locale_map *new = 0; > const char *path = 0, *z; > char buf[256]; > + char *dotp; > size_t l, n; > > if (!*val) { > @@ -40,6 +41,12 @@ > (val = getenv("LANG")) && *val || > (val = "C.UTF-8"); > } > + if (dotp = strchr(val, '.')) { > + char part[256]; > + memcpy(part, val, dotp - val); > + memcpy(&part[dotp - val], ".UTF-8\0", 7); > + val = part; > + } > > /* Limit name length and forbid leading dot or any slashes. */ > for (n=0; n