From: He X <xw897002528@gmail.com>
To: musl@lists.openwall.com
Subject: Re: Re: a bug in bindtextdomain() and strip '.UTF-8'
Date: Sat, 18 Mar 2017 21:50:28 +0800 [thread overview]
Message-ID: <CAPG2z09edp7keF3ZfHxKR8_2LPu=nSj4aL-XE8VXtiL_+3LuHA@mail.gmail.com> (raw)
In-Reply-To: <20170318122833.GN1693@brightrain.aerifal.cx>
[-- Attachment #1.1: Type: text/plain, Size: 3467 bytes --]
OK, i think there's no further needs of discussion. I got your idea, if
this is what musl want to be. I will try to make patches to vim later!
But for the checking of `charset=`, i can't help, i did not understand
what's up in __mo_lookup(). Hope you can make the patch. The attached has
deleted all things related to drop .charset.
2017-03-18 20:28 GMT+08:00 Rich Felker <dalias@libc.org>:
> On Sat, Mar 18, 2017 at 07:34:58AM +0000, He X wrote:
> > > As discussed on irc, .charset suffixes should be dropped before the
> > loop even begins (never used in pathnames), and they occur before the
> > @mod, not after it, so the logic for dropping them is different.
> >
> > 1. drop .charset: Sorry for proposing it again, i forget this case after
> > around three weeks, as i said before, vim will generate three different
> .mo
> > files with different charset -> zh_CN.UTF-8.po, zh_CN.cp936.po, zh_CN.po.
> > In that case, dropping is to generate a lots of junk.
> >
> > I now found it's not a bug of msgfmt. That is charset is converted by:
> > iconv -f UTF-8 -t cp936 zh_CN.UTF-8.po | sed -e
> > 's/charset=utf-8/charset=gbk/ > ... So that means, charset and pathname
> is
> > decided by softwares, msgfmt does not do charset converting at all, just
> a
> > format-translator. (btw, iconv.c is from alpine)
>
> There are two things you seem to be missing:
>
> 1. musl does not, and won't, support non-UTF-8 locales, so there is no
> point in trying to load translations for them. Moreover, with the
> proposed changes to setlocale/locale_map.c, it will never be possible
> for the locale name to contain a . with anything other than UTF-8 (or,
> for compatibility, some variant like utf8) after it. So I don't see
> how there's any point in iterating and trying with/without .charset
> when the only possibilities are that .charset is blank, .UTF-8, or
> some misspelling of .UTF-8. In the latter case, we'd even have to do
> remapping of the misspellings to avoid having to have multiple
> dirs/symlinks.
>
> 2. From my perspective, msgfmt's production of non-UTF-8 .mo files is
> a bug. Yes the .po file can be something else, but msgfmt should be
> transcoding it at 'compile' time. There's at least one other change
> msgfmt needs for all features to work with musl's gettext -- expansion
> of SYSDEP strings to all their possible format patterns -- so I don't
> think it's a significant additional burden to ensure that the msgfmt
> used on musl-based systems outputs UTF-8.
>
> Of course software trying to do multiple encodings like you described
> will still install duplicate files unless patched, but any of them
> should work as long as msgfmt recoded them. In the mean time, distros
> can just patch the build process for software that's still installing
> non-UTF-8 locale files. AFAIK doing that is not a recommended practice
> even by the GNU gettext project, so the patches might even make it
> upstream.
>
> One thing we could do for robustness is check the .mo header at load
> time and, if it has a charset= specification with something other than
> UTF-8, reject it. I mainly suggest this in case the program is running
> on a non-musl system where a glibc-built version of the same program
> (e.g. vi) with non-UTF-8 .mo files is present and they're using the
> same textdomain dir (actually unlikely since prefix should be
> different). But if we do this it should be a separate patch because
> it's a separate functional change.
>
> Rich
>
[-- Attachment #1.2: Type: text/html, Size: 4138 bytes --]
[-- Attachment #2: locale.diff --]
[-- Type: text/plain, Size: 3824 bytes --]
diff --git a/src/locale/dcngettext.c b/src/locale/dcngettext.c
index b68e24b..abaa414 100644
--- a/src/locale/dcngettext.c
+++ b/src/locale/dcngettext.c
@@ -100,7 +100,9 @@ struct msgcat {
size_t map_size;
void *volatile plural_rule;
volatile int nplurals;
- char name[];
+ struct binding *binding;
+ const struct __locale_map *lm;
+ int cat;
};
static char *dummy_gettextdomain()
@@ -120,8 +122,8 @@ char *dcngettext(const char *domainname, const char *msgid1, const char *msgid2,
struct msgcat *p;
struct __locale_struct *loc = CURRENT_LOCALE;
const struct __locale_map *lm;
- const char *dirname, *locname, *catname;
- size_t dirlen, loclen, catlen, domlen;
+ size_t domlen;
+ struct binding *q;
if ((unsigned)category >= LC_ALL) goto notrans;
@@ -130,55 +132,76 @@ char *dcngettext(const char *domainname, const char *msgid1, const char *msgid2,
domlen = strnlen(domainname, NAME_MAX+1);
if (domlen > NAME_MAX) goto notrans;
- dirname = gettextdir(domainname, &dirlen);
- if (!dirname) goto notrans;
+ for (q=bindings; q; q=q->next)
+ if (!strcmp(q->domainname, domainname) && q->active)
+ break;
+ if (!q) goto notrans;
lm = loc->cat[category];
if (!lm) {
notrans:
return (char *) ((n == 1) ? msgid1 : msgid2);
}
- locname = lm->name;
-
- catname = catnames[category];
- catlen = catlens[category];
- loclen = strlen(locname);
-
- size_t namelen = dirlen+1 + loclen+1 + catlen+1 + domlen+3;
- char name[namelen+1], *s = name;
-
- memcpy(s, dirname, dirlen);
- s[dirlen] = '/';
- s += dirlen + 1;
- memcpy(s, locname, loclen);
- s[loclen] = '/';
- s += loclen + 1;
- memcpy(s, catname, catlen);
- s[catlen] = '/';
- s += catlen + 1;
- memcpy(s, domainname, domlen);
- s[domlen] = '.';
- s[domlen+1] = 'm';
- s[domlen+2] = 'o';
- s[domlen+3] = 0;
for (p=cats; p; p=p->next)
- if (!strcmp(p->name, name))
+ if (p->binding == q && p->lm == lm && p->cat == category)
break;
if (!p) {
+ const char *dirname, *locname, *catname, *modname, *locp;
+ size_t dirlen, loclen, catlen, modlen, alt_modlen;
void *old_cats;
size_t map_size;
- const void *map = __map_file(name, &map_size);
+
+ dirname = q->dirname;
+ locname = lm->name;
+ catname = catnames[category];
+
+ dirlen = q->dirlen;
+ loclen = strlen(locname);
+ catlen = catlens[category];
+
+ /* Logically split @mod suffix from locale name. */
+ modname = memchr(locname, '@', loclen);
+ if (!modname) modname = locname + loclen;
+ alt_modlen = modlen = loclen - (modname-locname);
+ loclen = modname-locname;
+
+ /* Drop .charset identifier; it is not used. */
+ const char *csp = memchr(locname, '.', loclen);
+ if (csp) loclen = csp-locname;
+
+ char name[dirlen+1 + loclen+modlen+1 + catlen+1 + domlen+3 + 1];
+ const void *map;
+
+ for (;;) {
+ snprintf(name, sizeof name, "%s/%.*s%.*s/%s/%s.mo\0",
+ dirname, (int)loclen, locname,
+ (int)alt_modlen, modname, catname, domainname);
+ if (map = __map_file(name, &map_size)) break;
+
+ /* Try dropping @mod, _YY, then both. */
+ if (alt_modlen) {
+ alt_modlen = 0;
+ } else if ((locp = memchr(locname, '_', loclen))) {
+ loclen = locp-locname;
+ alt_modlen = modlen;
+ } else {
+ break;
+ }
+ }
if (!map) goto notrans;
- p = calloc(sizeof *p + namelen + 1, 1);
+
+ p = calloc(sizeof *p, 1);
if (!p) {
__munmap((void *)map, map_size);
goto notrans;
}
+ p->cat = category;
+ p->binding = q;
+ p->lm = lm;
p->map = map;
p->map_size = map_size;
- memcpy(p->name, name, namelen+1);
do {
old_cats = cats;
p->next = old_cats;
--- musl-1.1.16/src/internal/locale_impl.h
+++ musl-1.1.16/src/internal/locale_impl.h
@@ -6,7 +6,7 @@
#include "libc.h"
#include "pthread_impl.h"
-#define LOCALE_NAME_MAX 15
+#define LOCALE_NAME_MAX 23
struct __locale_map {
const void *map;
next prev parent reply other threads:[~2017-03-18 13:50 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-20 11:25 He X
2017-01-29 4:52 ` He X
2017-01-29 13:39 ` Szabolcs Nagy
2017-01-29 14:07 ` Rich Felker
2017-01-29 14:48 ` He X
2017-01-29 15:55 ` Rich Felker
2017-01-29 16:14 ` He X
2017-01-29 16:33 ` Rich Felker
2017-02-08 10:13 ` He X
2017-02-08 14:31 ` Rich Felker
2017-02-09 9:49 ` He X
2017-02-11 2:36 ` Rich Felker
2017-02-11 6:00 ` He X
2017-02-11 23:59 ` Rich Felker
2017-02-12 2:34 ` Rich Felker
2017-02-12 6:56 ` He X
2017-02-12 7:11 ` He X
2017-02-13 17:08 ` Rich Felker
2017-02-13 8:01 ` He X
2017-02-13 13:28 ` Rich Felker
2017-02-13 14:06 ` He X
2017-02-13 17:12 ` Rich Felker
2017-03-04 8:02 ` He X
2017-03-17 19:27 ` Rich Felker
2017-03-17 19:37 ` Rich Felker
2017-03-18 7:34 ` He X
2017-03-18 12:28 ` Rich Felker
2017-03-18 13:50 ` He X [this message]
2017-02-13 14:12 ` He X
2017-02-13 17:13 ` Rich Felker
2017-01-29 16:37 ` Rich Felker
2017-01-30 0:37 ` He X
2017-01-30 14:17 ` He X
2017-01-29 16:40 ` Szabolcs Nagy
2017-01-29 16:49 ` Rich Felker
2017-01-30 12:36 ` He X
2017-01-30 13:05 ` Szabolcs Nagy
2017-01-30 1:32 ` He X
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAPG2z09edp7keF3ZfHxKR8_2LPu=nSj4aL-XE8VXtiL_+3LuHA@mail.gmail.com' \
--to=xw897002528@gmail.com \
--cc=musl@lists.openwall.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).