mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Subject: Re: Re: a bug in bindtextdomain() and strip '.UTF-8'
Date: Fri, 10 Feb 2017 21:36:10 -0500	[thread overview]
Message-ID: <20170211023610.GA1520@brightrain.aerifal.cx> (raw)
In-Reply-To: <CAPG2z0860oCsim2uvw_6je=vPXk_vPBkYqYV2qSMmcVCnoqSOQ@mail.gmail.com>

On Thu, Feb 09, 2017 at 05:49:13PM +0800, He X wrote:
> sry!
> 
> 2017-02-08 22:31 GMT+08:00 Rich Felker <dalias@libc.org>:
> 
> > On Wed, Feb 08, 2017 at 06:13:30PM +0800, He X wrote:
> > > here the patch is: http://paste.ubuntu.com/23953329/
> > > The code tested, but maybe it sucks.
> >
> > Patches need to be attached and sent to the list, not pastebins that
> > might disappear. The latter don't work for discussing and preserving
> > discussion of the patch.

> --- a/src/locale/dcngettext.c	2017-02-06 14:39:17.860482624 +0000 
> +++ b/src/locale/dcngettext.c	2017-02-06 14:39:17.860482624 +0000
> @@ -19,6 +19,7 @@
>  };
>  
>  static void *volatile bindings;
> +char *__strchrnul(const char *, int);
>  
>  static char *gettextdir(const char *domainname, size_t *dirlen)
>  {
> @@ -143,7 +143,7 @@
>  
>  	catname = catnames[category];
>  	catlen = catlens[category];
> -	loclen = strlen(locname);
> +	loclen = __strchrnul(locname, '.') - locname;
>  
>  	size_t namelen = dirlen+1 + loclen+1 + catlen+1 + domlen+3;
>  	char name[namelen+1], *s = name;
> @@ -157,6 +157,8 @@
> +rewrite_loc:
>  	memcpy(s, locname, loclen);
>  	s[loclen] = '/';
>  	s += loclen + 1;
> +skip_loc:
>  	memcpy(s, catname, catlen);
>  	s[catlen] = '/';
>  	s += catlen + 1;
> @@ -174,7 +175,22 @@
>  		void *old_cats;
>  		size_t map_size;
>  		const void *map = __map_file(name, &map_size);
> -		if (!map) goto notrans;
> +		if (!map) {
> +			if (s = strchr(name + dirlen + 1, '@')) {
> +				*s++ = '/';
> +				goto skip_loc;
> +			}
> +			if (locname && (s = strchr(name + dirlen + 1, '_')) && (strchr(name + dirlen +1, '/') > s) ) {
> +				if (locname = strchr(locname, '@')) {
> +					loclen = __strchrnul(lm->name, '.') - locname;
> +					goto rewrite_loc;
> +				} else {
> +					*s++ = '/';
> +					goto skip_loc;
> +				}
> +			}
> +			goto notrans;
> +		}

This doesn't work because it changes both the key used for the lookup
and the filename mapped. If you try this code with a translation that
requires a fallback, and run it under strace, you'll see that _every_
call to gettext will try again to find the nonexistent files.

It could be fixed, but I think the code should be refactored so that,
rather than the msgcat list being indexed by pathname strings, it's
indexed by tuples of:

	( struct __locale_map *, struct binding *, category )

These are all integers/pointers and thus compare very fast versus the
current strcmp operation, and it's very quick to look them up. Then we
only have to construct the pathname string when a new file needs to be
loaded, not on every call, and you're free to clobber the pathname
string while doing fallbacks.

>  		p = calloc(sizeof *p + namelen + 1, 1);
>  		if (!p) {
>  			__munmap((void *)map, map_size);
> --- a/src/locale/locale_map.c	2017-02-06 14:39:17.797148750 +0000
> +++ b/src/locale/locale_map.c	2017-02-06 14:39:17.797148750 +0000
> @@ -32,6 +32,7 @@
>  	struct __locale_map *new = 0;
>  	const char *path = 0, *z;
>  	char buf[256];
> +	char *dotp;
>  	size_t l, n;
>  
>  	if (!*val) {
> @@ -40,6 +41,12 @@
>  		(val = getenv("LANG")) && *val ||
>  		(val = "C.UTF-8");
>  	}
> +	if (dotp = strchr(val, '.')) {
> +		char part[256];
> +		memcpy(part, val, dotp - val);
> +		memcpy(&part[dotp - val], ".UTF-8\0", 7);
> +		val = part;
> +	}
>  
>  	/* Limit name length and forbid leading dot or any slashes. */
>  	for (n=0; n<LOCALE_NAME_MAX && val[n] && val[n]!='/'; n++);

I don't think this part is desirable, but if it were, it would need to
be done differently. As-is, it has serious UB, use of part[] after the
end of its lifetime. It also seems to have no check to see that
dotp-val is less than 256-7 or even that it's bounded, whereas the
code that immediately follows checks the length of the string pointed
to by val.

I think what it should be doing is the opposite, stopping when hitting
a dot in the name and only using the part up to the dot, except in the
one special case "C.UTF-8". The subsequent path search for the locale
file should probably then be repeated with combinations of dropping
@mod and _CC suffixes, but this dropping should _not_ affect the name
that's saved and reported back. (That is, if LC_TIME=fr_CA but only a
"fr" locale file exists, the "fr" file should get mapped but the name
returned by setlocale, and saved for use by gettext, should still be
the full "fr_CA" in case applications have "fr_CA" translations.)

Rich


  reply	other threads:[~2017-02-11  2:36 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-20 11:25 He X
2017-01-29  4:52 ` He X
2017-01-29 13:39   ` Szabolcs Nagy
2017-01-29 14:07     ` Rich Felker
2017-01-29 14:48       ` He X
2017-01-29 15:55         ` Rich Felker
2017-01-29 16:14           ` He X
2017-01-29 16:33             ` Rich Felker
2017-02-08 10:13               ` He X
2017-02-08 14:31                 ` Rich Felker
2017-02-09  9:49                   ` He X
2017-02-11  2:36                     ` Rich Felker [this message]
2017-02-11  6:00                       ` He X
2017-02-11 23:59                         ` Rich Felker
2017-02-12  2:34                         ` Rich Felker
2017-02-12  6:56                           ` He X
2017-02-12  7:11                             ` He X
2017-02-13 17:08                             ` Rich Felker
2017-02-13  8:01                           ` He X
2017-02-13 13:28                             ` Rich Felker
2017-02-13 14:06                               ` He X
2017-02-13 17:12                                 ` Rich Felker
2017-03-04  8:02                                   ` He X
2017-03-17 19:27                                     ` Rich Felker
2017-03-17 19:37                                       ` Rich Felker
2017-03-18  7:34                                         ` He X
2017-03-18 12:28                                           ` Rich Felker
2017-03-18 13:50                                             ` He X
2017-02-13 14:12                               ` He X
2017-02-13 17:13                                 ` Rich Felker
2017-01-29 16:37         ` Rich Felker
2017-01-30  0:37           ` He X
2017-01-30 14:17           ` He X
2017-01-29 16:40         ` Szabolcs Nagy
2017-01-29 16:49           ` Rich Felker
2017-01-30 12:36             ` He X
2017-01-30 13:05               ` Szabolcs Nagy
2017-01-30  1:32           ` He X

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170211023610.GA1520@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).