From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/10832 Path: news.gmane.org!.POSTED!not-for-mail From: Laine Gholson Newsgroups: gmane.linux.lib.musl.general Subject: Re: [PATCH] bind_textdomain_codeset: don't return failure unless encoding isn't UTF-8 Date: Fri, 30 Dec 2016 16:13:44 -0600 Message-ID: <3446e663-1252-bb02-4248-2132cfc4d086@gmail.com> References: <0b0335bc-f4da-b345-bf19-aabce9a0be93@gmail.com> <20161217035954.GE1555@brightrain.aerifal.cx> <20161230031450.GQ1555@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Trace: blaine.gmane.org 1483136045 32211 195.159.176.226 (30 Dec 2016 22:14:05 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 30 Dec 2016 22:14:05 +0000 (UTC) User-Agent: Mozilla/5.0 (X11; Linux armv7l; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 To: musl@lists.openwall.com Original-X-From: musl-return-10845-gllmg-musl=m.gmane.org@lists.openwall.com Fri Dec 30 23:14:01 2016 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1cN5R0-0007LC-2h for gllmg-musl@m.gmane.org; Fri, 30 Dec 2016 23:13:58 +0100 Original-Received: (qmail 32406 invoked by uid 550); 30 Dec 2016 22:14:00 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 32382 invoked from network); 30 Dec 2016 22:13:59 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-transfer-encoding; bh=V1A+lnGy3zNSEsIZYPyVwdBL9hBGik+IAzR31eDZyrA=; b=ZWUqUEwYw/4HB54TF4Sl+RoHOyHkR6ltm44f8Ywo4Pf/CUhSadYptoXY8Tqvnp8f9A cnGZZCAo1822iB2tHxJoiFiWs556VMJ8J++/pRcDI6LByrD1oQgyOsZKbiHywMFMmlxD UAAD+TMe6gCx6zmASZyIiYxLHTpXrV6oCCkIH18NAiqIXoeKJZEexYIZ3DGmNZiVww1A gC7zyvw1B/l/ja9RKvHPQyN9ulgLA79xv5dEjNx+1cKtZM9nNw7MQU5dRLuLJuw8edmd AGC5A3tf5qRwvOAv+Wl1GbQW4chqQ/gypQIfTORBbFcfWxY/5id4wVs7wUAgaiGhtKSx X44Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=V1A+lnGy3zNSEsIZYPyVwdBL9hBGik+IAzR31eDZyrA=; b=ekeRuakfU9oK9sZopR+vFqvSf47MSxl7xN0iumgxhAbHFKLWo3w5YjiYOnghurd/t0 fp0zZZirnYioasXANugOf4jhyARCCHa6W0zxvfyhqRMbTSRVAr2rLfj7LNzDAF70pYc5 AKYHx8+Jm0FBlKvy9I9QUKLlobgBtb++Zd1XVJJB15LCPIQR1Z3I7qYA0ZAx+g2Jorac 2EkrBej5dSwjvGzRBZfhqlI2pNsv9khy/l5GAJfKqi1fxaljXK2p2ow+wCKRpwmYzoaY 7Q6Dmb7V9wccv44AnDZlI6mSk0LMVRYMCjtZtTTeCDY91MtXLa8L65VBru/I9dSAHEKe OO6A== X-Gm-Message-State: AIkVDXIB4tE4J+L0/06UtWZtLF8qy/C+y5ZHrcBuQWuTNAF5YtwtR+7dsTzDSzMQaTI0NQ== X-Received: by 10.36.3.1 with SMTP id e1mr42216720ite.97.1483136027262; Fri, 30 Dec 2016 14:13:47 -0800 (PST) In-Reply-To: <20161230031450.GQ1555@brightrain.aerifal.cx> Xref: news.gmane.org gmane.linux.lib.musl.general:10832 Archived-At: option 1 is the only sane choice, and I don't see how something could break unless they constantly check for the GNU behavior and break if it isn't the GNU behavior, in which case it is the program's fault anyways. On 12/29/16 21:14, Rich Felker wrote: > On Fri, Dec 16, 2016 at 10:59:54PM -0500, Rich Felker wrote: >> On Sat, Dec 03, 2016 at 09:04:42PM -0600, Laine Gholson wrote: >>> returning null broke a vlc media player built with gettext support >> >>> >From 2f79aa294db5d9230ad71298e3de4b5561b441be Mon Sep 17 00:00:00 2001 >>> From: Laine Gholson >>> Date: Wed, 9 Nov 2016 20:19:00 -0600 >>> Subject: [PATCH] bind_textdomain_codeset: don't return failure unless encoding isn't UTF-8 >>> >>> VLC isn't happy when bind_textdomain_codeset returns NULL >>> --- >>> src/locale/bind_textdomain_codeset.c | 4 +++- >>> 1 file changed, 3 insertions(+), 1 deletion(-) >>> >>> diff --git a/src/locale/bind_textdomain_codeset.c b/src/locale/bind_textdomain_codeset.c >>> index 5ebfd5e..e5f3f52 100644 >>> --- a/src/locale/bind_textdomain_codeset.c >>> +++ b/src/locale/bind_textdomain_codeset.c >>> @@ -5,7 +5,9 @@ >>> III >>> char *bind_textdomain_codeset(const char *domainname, const char *codeset) >>> { >>> - if (codeset && strcasecmp(codeset, "UTF-8")) >>> + if (codeset && ((strcasecmp(codeset, "UTF-8") == 0) || (strcasecmp(codeset, "UTF8") == 0))) { >>> + return "UTF-8"; >>> + } else if (codeset) >>> errno = EINVAL; >>> return NULL; >>> } >>> -- >>> 2.10.2 >> >> I think this needs some more thought. The documentation of the API is >> that a null pointer argument/result means "the locale's character >> encoding", and that the default is null; presumably even when the >> locale's codeset is "foo", null (default) and "foo" are still >> different states. >> >> I don't actually like that, and don't think we should copy it -- >> especially since, now that we also have a C locale with "ASCII" as the >> codeset, we _can't_ provide a codeset matching the locale in all cases >> -- but I also don't think it's right for the return value (null or >> "UTF-8") to depend on the argument rather than on the "previous state" >> like it's documented to. >> >> There seem to be two possible reasonable behaviors: >> >> 1. Diverge from the GNU behavior and treat textdomains as always-bound >> to "UTF-8", regardless of whether bind_textdomain_codeset has been >> called. The function would then return a null pointer with EINVAL >> set for strings other than "UTF-8"/"UTF8", and would return "UTF-8" >> for a valid or null-pointer argument. >> >> 2. Keep a 1-bit state for each textdomain reflecting whether its >> nominally in "default" mode or "UTF-8" mode. Either way the >> original UTF-8 string would be returned; the only point of the >> state would be providing a return value for bind_textdomain_codeset >> that reflects how it was previously called. >> >> Being that 2 is gratuitous complexity to do something stupid and >> meaningless, I'd lean towards 1, but I don't want to break anything >> that works. Does this seem safe to do? > > Ping. Anyone else have thoughts on this? > > Rich >