mailing list of musl libc
 help / color / mirror / code / Atom feed
* [PATCH] bind_textdomain_codeset: don't return failure unless encoding isn't UTF-8
@ 2016-12-04  3:04 Laine Gholson
  2016-12-17  3:59 ` Rich Felker
  0 siblings, 1 reply; 5+ messages in thread
From: Laine Gholson @ 2016-12-04  3:04 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 67 bytes --]

returning null broke a vlc media player built with gettext support

[-- Attachment #2: bind_textdomain_codeset.patch --]
[-- Type: text/x-patch, Size: 939 bytes --]

From 2f79aa294db5d9230ad71298e3de4b5561b441be Mon Sep 17 00:00:00 2001
From: Laine Gholson <laine.gholson@gmail.com>
Date: Wed, 9 Nov 2016 20:19:00 -0600
Subject: [PATCH] bind_textdomain_codeset: don't return failure unless encoding isn't UTF-8

VLC isn't happy when bind_textdomain_codeset returns NULL
---
 src/locale/bind_textdomain_codeset.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/locale/bind_textdomain_codeset.c b/src/locale/bind_textdomain_codeset.c
index 5ebfd5e..e5f3f52 100644
--- a/src/locale/bind_textdomain_codeset.c
+++ b/src/locale/bind_textdomain_codeset.c
@@ -5,7 +5,9 @@
 
 char *bind_textdomain_codeset(const char *domainname, const char *codeset)
 {
-	if (codeset && strcasecmp(codeset, "UTF-8"))
+	if (codeset && ((strcasecmp(codeset, "UTF-8") == 0) || (strcasecmp(codeset, "UTF8") == 0))) {
+		return "UTF-8";
+	} else if (codeset)
 		errno = EINVAL;
 	return NULL;
 }
-- 
2.10.2


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] bind_textdomain_codeset: don't return failure unless encoding isn't UTF-8
  2016-12-04  3:04 [PATCH] bind_textdomain_codeset: don't return failure unless encoding isn't UTF-8 Laine Gholson
@ 2016-12-17  3:59 ` Rich Felker
  2016-12-30  3:14   ` Rich Felker
  0 siblings, 1 reply; 5+ messages in thread
From: Rich Felker @ 2016-12-17  3:59 UTC (permalink / raw)
  To: musl

On Sat, Dec 03, 2016 at 09:04:42PM -0600, Laine Gholson wrote:
> returning null broke a vlc media player built with gettext support

> >From 2f79aa294db5d9230ad71298e3de4b5561b441be Mon Sep 17 00:00:00 2001
> From: Laine Gholson <laine.gholson@gmail.com>
> Date: Wed, 9 Nov 2016 20:19:00 -0600
> Subject: [PATCH] bind_textdomain_codeset: don't return failure unless encoding isn't UTF-8
> 
> VLC isn't happy when bind_textdomain_codeset returns NULL
> ---
>  src/locale/bind_textdomain_codeset.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/src/locale/bind_textdomain_codeset.c b/src/locale/bind_textdomain_codeset.c
> index 5ebfd5e..e5f3f52 100644
> --- a/src/locale/bind_textdomain_codeset.c
> +++ b/src/locale/bind_textdomain_codeset.c
> @@ -5,7 +5,9 @@
>  
>  char *bind_textdomain_codeset(const char *domainname, const char *codeset)
>  {
> -	if (codeset && strcasecmp(codeset, "UTF-8"))
> +	if (codeset && ((strcasecmp(codeset, "UTF-8") == 0) || (strcasecmp(codeset, "UTF8") == 0))) {
> +		return "UTF-8";
> +	} else if (codeset)
>  		errno = EINVAL;
>  	return NULL;
>  }
> -- 
> 2.10.2

I think this needs some more thought. The documentation of the API is
that a null pointer argument/result means "the locale's character
encoding", and that the default is null; presumably even when the
locale's codeset is "foo", null (default) and "foo" are still
different states.

I don't actually like that, and don't think we should copy it --
especially since, now that we also have a C locale with "ASCII" as the
codeset, we _can't_ provide a codeset matching the locale in all cases
-- but I also don't think it's right for the return value (null or
"UTF-8") to depend on the argument rather than on the "previous state"
like it's documented to.

There seem to be two possible reasonable behaviors:

1. Diverge from the GNU behavior and treat textdomains as always-bound
   to "UTF-8", regardless of whether bind_textdomain_codeset has been
   called. The function would then return a null pointer with EINVAL
   set for strings other than "UTF-8"/"UTF8", and would return "UTF-8"
   for a valid or null-pointer argument.

2. Keep a 1-bit state for each textdomain reflecting whether its
   nominally in "default" mode or "UTF-8" mode. Either way the
   original UTF-8 string would be returned; the only point of the
   state would be providing a return value for bind_textdomain_codeset
   that reflects how it was previously called.

Being that 2 is gratuitous complexity to do something stupid and
meaningless, I'd lean towards 1, but I don't want to break anything
that works. Does this seem safe to do?

Rich


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] bind_textdomain_codeset: don't return failure unless encoding isn't UTF-8
  2016-12-17  3:59 ` Rich Felker
@ 2016-12-30  3:14   ` Rich Felker
  2016-12-30 22:13     ` Laine Gholson
  0 siblings, 1 reply; 5+ messages in thread
From: Rich Felker @ 2016-12-30  3:14 UTC (permalink / raw)
  To: musl

On Fri, Dec 16, 2016 at 10:59:54PM -0500, Rich Felker wrote:
> On Sat, Dec 03, 2016 at 09:04:42PM -0600, Laine Gholson wrote:
> > returning null broke a vlc media player built with gettext support
> 
> > >From 2f79aa294db5d9230ad71298e3de4b5561b441be Mon Sep 17 00:00:00 2001
> > From: Laine Gholson <laine.gholson@gmail.com>
> > Date: Wed, 9 Nov 2016 20:19:00 -0600
> > Subject: [PATCH] bind_textdomain_codeset: don't return failure unless encoding isn't UTF-8
> > 
> > VLC isn't happy when bind_textdomain_codeset returns NULL
> > ---
> >  src/locale/bind_textdomain_codeset.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/src/locale/bind_textdomain_codeset.c b/src/locale/bind_textdomain_codeset.c
> > index 5ebfd5e..e5f3f52 100644
> > --- a/src/locale/bind_textdomain_codeset.c
> > +++ b/src/locale/bind_textdomain_codeset.c
> > @@ -5,7 +5,9 @@
> >  
> >  char *bind_textdomain_codeset(const char *domainname, const char *codeset)
> >  {
> > -	if (codeset && strcasecmp(codeset, "UTF-8"))
> > +	if (codeset && ((strcasecmp(codeset, "UTF-8") == 0) || (strcasecmp(codeset, "UTF8") == 0))) {
> > +		return "UTF-8";
> > +	} else if (codeset)
> >  		errno = EINVAL;
> >  	return NULL;
> >  }
> > -- 
> > 2.10.2
> 
> I think this needs some more thought. The documentation of the API is
> that a null pointer argument/result means "the locale's character
> encoding", and that the default is null; presumably even when the
> locale's codeset is "foo", null (default) and "foo" are still
> different states.
> 
> I don't actually like that, and don't think we should copy it --
> especially since, now that we also have a C locale with "ASCII" as the
> codeset, we _can't_ provide a codeset matching the locale in all cases
> -- but I also don't think it's right for the return value (null or
> "UTF-8") to depend on the argument rather than on the "previous state"
> like it's documented to.
> 
> There seem to be two possible reasonable behaviors:
> 
> 1. Diverge from the GNU behavior and treat textdomains as always-bound
>    to "UTF-8", regardless of whether bind_textdomain_codeset has been
>    called. The function would then return a null pointer with EINVAL
>    set for strings other than "UTF-8"/"UTF8", and would return "UTF-8"
>    for a valid or null-pointer argument.
> 
> 2. Keep a 1-bit state for each textdomain reflecting whether its
>    nominally in "default" mode or "UTF-8" mode. Either way the
>    original UTF-8 string would be returned; the only point of the
>    state would be providing a return value for bind_textdomain_codeset
>    that reflects how it was previously called.
> 
> Being that 2 is gratuitous complexity to do something stupid and
> meaningless, I'd lean towards 1, but I don't want to break anything
> that works. Does this seem safe to do?

Ping. Anyone else have thoughts on this?

Rich


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] bind_textdomain_codeset: don't return failure unless encoding isn't UTF-8
  2016-12-30  3:14   ` Rich Felker
@ 2016-12-30 22:13     ` Laine Gholson
  2016-12-30 22:22       ` Rich Felker
  0 siblings, 1 reply; 5+ messages in thread
From: Laine Gholson @ 2016-12-30 22:13 UTC (permalink / raw)
  To: musl

option 1 is the only sane choice, and I don't see how something could break unless they constantly check for the GNU behavior and break if it isn't the GNU behavior, in which case it is the program's fault anyways.

On 12/29/16 21:14, Rich Felker wrote:
> On Fri, Dec 16, 2016 at 10:59:54PM -0500, Rich Felker wrote:
>> On Sat, Dec 03, 2016 at 09:04:42PM -0600, Laine Gholson wrote:
>>> returning null broke a vlc media player built with gettext support
>>
>>> >From 2f79aa294db5d9230ad71298e3de4b5561b441be Mon Sep 17 00:00:00 2001
>>> From: Laine Gholson <laine.gholson@gmail.com>
>>> Date: Wed, 9 Nov 2016 20:19:00 -0600
>>> Subject: [PATCH] bind_textdomain_codeset: don't return failure unless encoding isn't UTF-8
>>>
>>> VLC isn't happy when bind_textdomain_codeset returns NULL
>>> ---
>>>  src/locale/bind_textdomain_codeset.c | 4 +++-
>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/src/locale/bind_textdomain_codeset.c b/src/locale/bind_textdomain_codeset.c
>>> index 5ebfd5e..e5f3f52 100644
>>> --- a/src/locale/bind_textdomain_codeset.c
>>> +++ b/src/locale/bind_textdomain_codeset.c
>>> @@ -5,7 +5,9 @@
>>>  III
>>>  char *bind_textdomain_codeset(const char *domainname, const char *codeset)
>>>  {
>>> -	if (codeset && strcasecmp(codeset, "UTF-8"))
>>> +	if (codeset && ((strcasecmp(codeset, "UTF-8") == 0) || (strcasecmp(codeset, "UTF8") == 0))) {
>>> +		return "UTF-8";
>>> +	} else if (codeset)
>>>  		errno = EINVAL;
>>>  	return NULL;
>>>  }
>>> --
>>> 2.10.2
>>
>> I think this needs some more thought. The documentation of the API is
>> that a null pointer argument/result means "the locale's character
>> encoding", and that the default is null; presumably even when the
>> locale's codeset is "foo", null (default) and "foo" are still
>> different states.
>>
>> I don't actually like that, and don't think we should copy it --
>> especially since, now that we also have a C locale with "ASCII" as the
>> codeset, we _can't_ provide a codeset matching the locale in all cases
>> -- but I also don't think it's right for the return value (null or
>> "UTF-8") to depend on the argument rather than on the "previous state"
>> like it's documented to.
>>
>> There seem to be two possible reasonable behaviors:
>>
>> 1. Diverge from the GNU behavior and treat textdomains as always-bound
>>    to "UTF-8", regardless of whether bind_textdomain_codeset has been
>>    called. The function would then return a null pointer with EINVAL
>>    set for strings other than "UTF-8"/"UTF8", and would return "UTF-8"
>>    for a valid or null-pointer argument.
>>
>> 2. Keep a 1-bit state for each textdomain reflecting whether its
>>    nominally in "default" mode or "UTF-8" mode. Either way the
>>    original UTF-8 string would be returned; the only point of the
>>    state would be providing a return value for bind_textdomain_codeset
>>    that reflects how it was previously called.
>>
>> Being that 2 is gratuitous complexity to do something stupid and
>> meaningless, I'd lean towards 1, but I don't want to break anything
>> that works. Does this seem safe to do?
>
> Ping. Anyone else have thoughts on this?
>
> Rich
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] bind_textdomain_codeset: don't return failure unless encoding isn't UTF-8
  2016-12-30 22:13     ` Laine Gholson
@ 2016-12-30 22:22       ` Rich Felker
  0 siblings, 0 replies; 5+ messages in thread
From: Rich Felker @ 2016-12-30 22:22 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 3621 bytes --]

On Fri, Dec 30, 2016 at 04:13:44PM -0600, Laine Gholson wrote:
> option 1 is the only sane choice, and I don't see how something
> could break unless they constantly check for the GNU behavior and
> break if it isn't the GNU behavior, in which case it is the
> program's fault anyways.

Does the attached patch look reasonable? The "UTF8" alternative could
be added separately if needed; did you find software that's passing
the string without the '-'?

I think the main functional difference from your patch is that "UTF-8"
is returned in the case where the codeset argument is null.

Rich


> On 12/29/16 21:14, Rich Felker wrote:
> >On Fri, Dec 16, 2016 at 10:59:54PM -0500, Rich Felker wrote:
> >>On Sat, Dec 03, 2016 at 09:04:42PM -0600, Laine Gholson wrote:
> >>>returning null broke a vlc media player built with gettext support
> >>
> >>>>From 2f79aa294db5d9230ad71298e3de4b5561b441be Mon Sep 17 00:00:00 2001
> >>>From: Laine Gholson <laine.gholson@gmail.com>
> >>>Date: Wed, 9 Nov 2016 20:19:00 -0600
> >>>Subject: [PATCH] bind_textdomain_codeset: don't return failure unless encoding isn't UTF-8
> >>>
> >>>VLC isn't happy when bind_textdomain_codeset returns NULL
> >>>---
> >>> src/locale/bind_textdomain_codeset.c | 4 +++-
> >>> 1 file changed, 3 insertions(+), 1 deletion(-)
> >>>
> >>>diff --git a/src/locale/bind_textdomain_codeset.c b/src/locale/bind_textdomain_codeset.c
> >>>index 5ebfd5e..e5f3f52 100644
> >>>--- a/src/locale/bind_textdomain_codeset.c
> >>>+++ b/src/locale/bind_textdomain_codeset.c
> >>>@@ -5,7 +5,9 @@
> >>> III
> >>> char *bind_textdomain_codeset(const char *domainname, const char *codeset)
> >>> {
> >>>-	if (codeset && strcasecmp(codeset, "UTF-8"))
> >>>+	if (codeset && ((strcasecmp(codeset, "UTF-8") == 0) || (strcasecmp(codeset, "UTF8") == 0))) {
> >>>+		return "UTF-8";
> >>>+	} else if (codeset)
> >>> 		errno = EINVAL;
> >>> 	return NULL;
> >>> }
> >>>--
> >>>2.10.2
> >>
> >>I think this needs some more thought. The documentation of the API is
> >>that a null pointer argument/result means "the locale's character
> >>encoding", and that the default is null; presumably even when the
> >>locale's codeset is "foo", null (default) and "foo" are still
> >>different states.
> >>
> >>I don't actually like that, and don't think we should copy it --
> >>especially since, now that we also have a C locale with "ASCII" as the
> >>codeset, we _can't_ provide a codeset matching the locale in all cases
> >>-- but I also don't think it's right for the return value (null or
> >>"UTF-8") to depend on the argument rather than on the "previous state"
> >>like it's documented to.
> >>
> >>There seem to be two possible reasonable behaviors:
> >>
> >>1. Diverge from the GNU behavior and treat textdomains as always-bound
> >>   to "UTF-8", regardless of whether bind_textdomain_codeset has been
> >>   called. The function would then return a null pointer with EINVAL
> >>   set for strings other than "UTF-8"/"UTF8", and would return "UTF-8"
> >>   for a valid or null-pointer argument.
> >>
> >>2. Keep a 1-bit state for each textdomain reflecting whether its
> >>   nominally in "default" mode or "UTF-8" mode. Either way the
> >>   original UTF-8 string would be returned; the only point of the
> >>   state would be providing a return value for bind_textdomain_codeset
> >>   that reflects how it was previously called.
> >>
> >>Being that 2 is gratuitous complexity to do something stupid and
> >>meaningless, I'd lean towards 1, but I don't want to break anything
> >>that works. Does this seem safe to do?
> >
> >Ping. Anyone else have thoughts on this?
> >
> >Rich
> >

[-- Attachment #2: btdc.diff --]
[-- Type: text/plain, Size: 470 bytes --]

diff --git a/src/locale/bind_textdomain_codeset.c b/src/locale/bind_textdomain_codeset.c
index 5ebfd5e..240e83e 100644
--- a/src/locale/bind_textdomain_codeset.c
+++ b/src/locale/bind_textdomain_codeset.c
@@ -5,7 +5,9 @@
 
 char *bind_textdomain_codeset(const char *domainname, const char *codeset)
 {
-	if (codeset && strcasecmp(codeset, "UTF-8"))
+	if (codeset && strcasecmp(codeset, "UTF-8")) {
 		errno = EINVAL;
-	return NULL;
+		return 0;
+	}
+	return "UTF-8";
 }

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-12-30 22:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-04  3:04 [PATCH] bind_textdomain_codeset: don't return failure unless encoding isn't UTF-8 Laine Gholson
2016-12-17  3:59 ` Rich Felker
2016-12-30  3:14   ` Rich Felker
2016-12-30 22:13     ` Laine Gholson
2016-12-30 22:22       ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).