From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/11038 Path: news.gmane.org!.POSTED!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: Re: a bug in bindtextdomain() and strip '.UTF-8' Date: Mon, 13 Feb 2017 12:08:25 -0500 Message-ID: <20170213170825.GH1520@brightrain.aerifal.cx> References: <20170129155507.GK1533@brightrain.aerifal.cx> <20170129163329.GL1533@brightrain.aerifal.cx> <20170208143147.GY1533@brightrain.aerifal.cx> <20170211023610.GA1520@brightrain.aerifal.cx> <20170212023422.GE1520@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1487005721 15483 195.159.176.226 (13 Feb 2017 17:08:41 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Mon, 13 Feb 2017 17:08:41 +0000 (UTC) User-Agent: Mutt/1.5.21 (2010-09-15) To: musl@lists.openwall.com Original-X-From: musl-return-11053-gllmg-musl=m.gmane.org@lists.openwall.com Mon Feb 13 18:08:37 2017 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1cdK7A-0003mZ-C2 for gllmg-musl@m.gmane.org; Mon, 13 Feb 2017 18:08:36 +0100 Original-Received: (qmail 14054 invoked by uid 550); 13 Feb 2017 17:08:40 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 14030 invoked from network); 13 Feb 2017 17:08:39 -0000 Content-Disposition: inline In-Reply-To: Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:11038 Archived-At: On Sun, Feb 12, 2017 at 02:56:53PM +0800, He X wrote: > 1. cat is added to the keys, also do a validate > 2. so we what do we deal with the gettextdir() exactly? inline it or > construct a gettextpointer()? > 3. i added a extra locbuf array, and goto is replaced by a loop, memcpy is > replaced by snprintf, compiled, and working well with fcitx I haven't verified the loop logic yet but on a high level it looks correct. > 4. i just found that i forgot to store the keys to new buffer, it's ok to > just use normal expression? or we need atomic operations? > ``` > + p->cat = category; > + p->binding = q; > + p->lm = lm; > ``` This is fine since the new msgcat is not visible to other threads until it's installed with an atomic, which makes all previous writes visible. I do want to rework this all with a lock structure rather than atomics but that's a separate project. > 5. I do want to rewrite all to .UTF8, but it's a bit annoying as your > words, then i changed the code to simply strip. Since this part is separate and there seems to be disagreement about what it should do, let's separate it from the issue at hand; it's really a separate change from making gettext do proper fallbacks anyway. > > (safe for the user's terminal) > LANG is set by users who are using musl and it's modified to zh_CN at > setlocale(), app will use UTF8 directly, there's no such situation where > charset will cause troubles to users' terminal, except apps which get the > LANG manually by getenv(). I have not seen such strange applications so > far, and most apps only have the UTF8 translation files. > > For moving from glibc to musl, i think doing this way is good for now, we > could delete it later, or just keep it forever. And most people won't use > non-UTF8 at all, if they do use GBK, their app will even fallback to UTF8, > because no translation files for GBK. So, it's not so dagerous, i think :). The main considerations are: 1. what happens when a glibc user ssh's into a musl-based system 2. what happens when a musl user ssh's into a glibc-based system 3. what happens when running musl binaries on a glibc-based system For #1 and #3, it's desirable for musl to accept ".UTF-8" in the locale name, and for #2, users may desire to have ".UTF-8" in their LC_* env vars so that remote glibc programs behave correctly. For #1 and #3, if a glibc uses is using a legacy non-UTF-8 locale and runs a musl program, they're either going to get messed-up output or ASCII-only, depending on decisions we make and/or what their locale value is. These are not really important since legacy encodings are not supported, but it might be nice to make least-bad. If the user has a locale name like "fr_FR" or "zh_CN" that, that's going to be interpreted differently by musl vs glibc; that was already decided a long time ago in the interest of designing around the future rather than broken legacy stuff. But if the locale name is explicitly non-UTF-8 like "zh_CN.GBK", we could opt to reject it without breaking anything, and this may give users better feedback about what's going wrong if they have such settings when ssh'ing into a musl-based system. Rich