* [PATCH] bind_textdomain_codeset: don't return failure unless encoding isn't UTF-8 @ 2016-12-04 3:04 Laine Gholson 2016-12-17 3:59 ` Rich Felker 0 siblings, 1 reply; 6+ messages in thread From: Laine Gholson @ 2016-12-04 3:04 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 67 bytes --] returning null broke a vlc media player built with gettext support [-- Attachment #2: bind_textdomain_codeset.patch --] [-- Type: text/x-patch, Size: 939 bytes --] From 2f79aa294db5d9230ad71298e3de4b5561b441be Mon Sep 17 00:00:00 2001 From: Laine Gholson <laine.gholson@gmail.com> Date: Wed, 9 Nov 2016 20:19:00 -0600 Subject: [PATCH] bind_textdomain_codeset: don't return failure unless encoding isn't UTF-8 VLC isn't happy when bind_textdomain_codeset returns NULL --- src/locale/bind_textdomain_codeset.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/locale/bind_textdomain_codeset.c b/src/locale/bind_textdomain_codeset.c index 5ebfd5e..e5f3f52 100644 --- a/src/locale/bind_textdomain_codeset.c +++ b/src/locale/bind_textdomain_codeset.c @@ -5,7 +5,9 @@ char *bind_textdomain_codeset(const char *domainname, const char *codeset) { - if (codeset && strcasecmp(codeset, "UTF-8")) + if (codeset && ((strcasecmp(codeset, "UTF-8") == 0) || (strcasecmp(codeset, "UTF8") == 0))) { + return "UTF-8"; + } else if (codeset) errno = EINVAL; return NULL; } -- 2.10.2 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] bind_textdomain_codeset: don't return failure unless encoding isn't UTF-8 2016-12-04 3:04 [PATCH] bind_textdomain_codeset: don't return failure unless encoding isn't UTF-8 Laine Gholson @ 2016-12-17 3:59 ` Rich Felker 2016-12-30 3:14 ` Rich Felker 0 siblings, 1 reply; 6+ messages in thread From: Rich Felker @ 2016-12-17 3:59 UTC (permalink / raw) To: musl On Sat, Dec 03, 2016 at 09:04:42PM -0600, Laine Gholson wrote: > returning null broke a vlc media player built with gettext support > >From 2f79aa294db5d9230ad71298e3de4b5561b441be Mon Sep 17 00:00:00 2001 > From: Laine Gholson <laine.gholson@gmail.com> > Date: Wed, 9 Nov 2016 20:19:00 -0600 > Subject: [PATCH] bind_textdomain_codeset: don't return failure unless encoding isn't UTF-8 > > VLC isn't happy when bind_textdomain_codeset returns NULL > --- > src/locale/bind_textdomain_codeset.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/src/locale/bind_textdomain_codeset.c b/src/locale/bind_textdomain_codeset.c > index 5ebfd5e..e5f3f52 100644 > --- a/src/locale/bind_textdomain_codeset.c > +++ b/src/locale/bind_textdomain_codeset.c > @@ -5,7 +5,9 @@ > > char *bind_textdomain_codeset(const char *domainname, const char *codeset) > { > - if (codeset && strcasecmp(codeset, "UTF-8")) > + if (codeset && ((strcasecmp(codeset, "UTF-8") == 0) || (strcasecmp(codeset, "UTF8") == 0))) { > + return "UTF-8"; > + } else if (codeset) > errno = EINVAL; > return NULL; > } > -- > 2.10.2 I think this needs some more thought. The documentation of the API is that a null pointer argument/result means "the locale's character encoding", and that the default is null; presumably even when the locale's codeset is "foo", null (default) and "foo" are still different states. I don't actually like that, and don't think we should copy it -- especially since, now that we also have a C locale with "ASCII" as the codeset, we _can't_ provide a codeset matching the locale in all cases -- but I also don't think it's right for the return value (null or "UTF-8") to depend on the argument rather than on the "previous state" like it's documented to. There seem to be two possible reasonable behaviors: 1. Diverge from the GNU behavior and treat textdomains as always-bound to "UTF-8", regardless of whether bind_textdomain_codeset has been called. The function would then return a null pointer with EINVAL set for strings other than "UTF-8"/"UTF8", and would return "UTF-8" for a valid or null-pointer argument. 2. Keep a 1-bit state for each textdomain reflecting whether its nominally in "default" mode or "UTF-8" mode. Either way the original UTF-8 string would be returned; the only point of the state would be providing a return value for bind_textdomain_codeset that reflects how it was previously called. Being that 2 is gratuitous complexity to do something stupid and meaningless, I'd lean towards 1, but I don't want to break anything that works. Does this seem safe to do? Rich ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] bind_textdomain_codeset: don't return failure unless encoding isn't UTF-8 2016-12-17 3:59 ` Rich Felker @ 2016-12-30 3:14 ` Rich Felker 2016-12-30 22:13 ` Laine Gholson 0 siblings, 1 reply; 6+ messages in thread From: Rich Felker @ 2016-12-30 3:14 UTC (permalink / raw) To: musl On Fri, Dec 16, 2016 at 10:59:54PM -0500, Rich Felker wrote: > On Sat, Dec 03, 2016 at 09:04:42PM -0600, Laine Gholson wrote: > > returning null broke a vlc media player built with gettext support > > > >From 2f79aa294db5d9230ad71298e3de4b5561b441be Mon Sep 17 00:00:00 2001 > > From: Laine Gholson <laine.gholson@gmail.com> > > Date: Wed, 9 Nov 2016 20:19:00 -0600 > > Subject: [PATCH] bind_textdomain_codeset: don't return failure unless encoding isn't UTF-8 > > > > VLC isn't happy when bind_textdomain_codeset returns NULL > > --- > > src/locale/bind_textdomain_codeset.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/src/locale/bind_textdomain_codeset.c b/src/locale/bind_textdomain_codeset.c > > index 5ebfd5e..e5f3f52 100644 > > --- a/src/locale/bind_textdomain_codeset.c > > +++ b/src/locale/bind_textdomain_codeset.c > > @@ -5,7 +5,9 @@ > > > > char *bind_textdomain_codeset(const char *domainname, const char *codeset) > > { > > - if (codeset && strcasecmp(codeset, "UTF-8")) > > + if (codeset && ((strcasecmp(codeset, "UTF-8") == 0) || (strcasecmp(codeset, "UTF8") == 0))) { > > + return "UTF-8"; > > + } else if (codeset) > > errno = EINVAL; > > return NULL; > > } > > -- > > 2.10.2 > > I think this needs some more thought. The documentation of the API is > that a null pointer argument/result means "the locale's character > encoding", and that the default is null; presumably even when the > locale's codeset is "foo", null (default) and "foo" are still > different states. > > I don't actually like that, and don't think we should copy it -- > especially since, now that we also have a C locale with "ASCII" as the > codeset, we _can't_ provide a codeset matching the locale in all cases > -- but I also don't think it's right for the return value (null or > "UTF-8") to depend on the argument rather than on the "previous state" > like it's documented to. > > There seem to be two possible reasonable behaviors: > > 1. Diverge from the GNU behavior and treat textdomains as always-bound > to "UTF-8", regardless of whether bind_textdomain_codeset has been > called. The function would then return a null pointer with EINVAL > set for strings other than "UTF-8"/"UTF8", and would return "UTF-8" > for a valid or null-pointer argument. > > 2. Keep a 1-bit state for each textdomain reflecting whether its > nominally in "default" mode or "UTF-8" mode. Either way the > original UTF-8 string would be returned; the only point of the > state would be providing a return value for bind_textdomain_codeset > that reflects how it was previously called. > > Being that 2 is gratuitous complexity to do something stupid and > meaningless, I'd lean towards 1, but I don't want to break anything > that works. Does this seem safe to do? Ping. Anyone else have thoughts on this? Rich ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] bind_textdomain_codeset: don't return failure unless encoding isn't UTF-8 2016-12-30 3:14 ` Rich Felker @ 2016-12-30 22:13 ` Laine Gholson 2016-12-30 22:22 ` Rich Felker 0 siblings, 1 reply; 6+ messages in thread From: Laine Gholson @ 2016-12-30 22:13 UTC (permalink / raw) To: musl option 1 is the only sane choice, and I don't see how something could break unless they constantly check for the GNU behavior and break if it isn't the GNU behavior, in which case it is the program's fault anyways. On 12/29/16 21:14, Rich Felker wrote: > On Fri, Dec 16, 2016 at 10:59:54PM -0500, Rich Felker wrote: >> On Sat, Dec 03, 2016 at 09:04:42PM -0600, Laine Gholson wrote: >>> returning null broke a vlc media player built with gettext support >> >>> >From 2f79aa294db5d9230ad71298e3de4b5561b441be Mon Sep 17 00:00:00 2001 >>> From: Laine Gholson <laine.gholson@gmail.com> >>> Date: Wed, 9 Nov 2016 20:19:00 -0600 >>> Subject: [PATCH] bind_textdomain_codeset: don't return failure unless encoding isn't UTF-8 >>> >>> VLC isn't happy when bind_textdomain_codeset returns NULL >>> --- >>> src/locale/bind_textdomain_codeset.c | 4 +++- >>> 1 file changed, 3 insertions(+), 1 deletion(-) >>> >>> diff --git a/src/locale/bind_textdomain_codeset.c b/src/locale/bind_textdomain_codeset.c >>> index 5ebfd5e..e5f3f52 100644 >>> --- a/src/locale/bind_textdomain_codeset.c >>> +++ b/src/locale/bind_textdomain_codeset.c >>> @@ -5,7 +5,9 @@ >>> III >>> char *bind_textdomain_codeset(const char *domainname, const char *codeset) >>> { >>> - if (codeset && strcasecmp(codeset, "UTF-8")) >>> + if (codeset && ((strcasecmp(codeset, "UTF-8") == 0) || (strcasecmp(codeset, "UTF8") == 0))) { >>> + return "UTF-8"; >>> + } else if (codeset) >>> errno = EINVAL; >>> return NULL; >>> } >>> -- >>> 2.10.2 >> >> I think this needs some more thought. The documentation of the API is >> that a null pointer argument/result means "the locale's character >> encoding", and that the default is null; presumably even when the >> locale's codeset is "foo", null (default) and "foo" are still >> different states. >> >> I don't actually like that, and don't think we should copy it -- >> especially since, now that we also have a C locale with "ASCII" as the >> codeset, we _can't_ provide a codeset matching the locale in all cases >> -- but I also don't think it's right for the return value (null or >> "UTF-8") to depend on the argument rather than on the "previous state" >> like it's documented to. >> >> There seem to be two possible reasonable behaviors: >> >> 1. Diverge from the GNU behavior and treat textdomains as always-bound >> to "UTF-8", regardless of whether bind_textdomain_codeset has been >> called. The function would then return a null pointer with EINVAL >> set for strings other than "UTF-8"/"UTF8", and would return "UTF-8" >> for a valid or null-pointer argument. >> >> 2. Keep a 1-bit state for each textdomain reflecting whether its >> nominally in "default" mode or "UTF-8" mode. Either way the >> original UTF-8 string would be returned; the only point of the >> state would be providing a return value for bind_textdomain_codeset >> that reflects how it was previously called. >> >> Being that 2 is gratuitous complexity to do something stupid and >> meaningless, I'd lean towards 1, but I don't want to break anything >> that works. Does this seem safe to do? > > Ping. Anyone else have thoughts on this? > > Rich > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] bind_textdomain_codeset: don't return failure unless encoding isn't UTF-8 2016-12-30 22:13 ` Laine Gholson @ 2016-12-30 22:22 ` Rich Felker 2024-08-31 0:31 ` [musl] " alice 0 siblings, 1 reply; 6+ messages in thread From: Rich Felker @ 2016-12-30 22:22 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 3621 bytes --] On Fri, Dec 30, 2016 at 04:13:44PM -0600, Laine Gholson wrote: > option 1 is the only sane choice, and I don't see how something > could break unless they constantly check for the GNU behavior and > break if it isn't the GNU behavior, in which case it is the > program's fault anyways. Does the attached patch look reasonable? The "UTF8" alternative could be added separately if needed; did you find software that's passing the string without the '-'? I think the main functional difference from your patch is that "UTF-8" is returned in the case where the codeset argument is null. Rich > On 12/29/16 21:14, Rich Felker wrote: > >On Fri, Dec 16, 2016 at 10:59:54PM -0500, Rich Felker wrote: > >>On Sat, Dec 03, 2016 at 09:04:42PM -0600, Laine Gholson wrote: > >>>returning null broke a vlc media player built with gettext support > >> > >>>>From 2f79aa294db5d9230ad71298e3de4b5561b441be Mon Sep 17 00:00:00 2001 > >>>From: Laine Gholson <laine.gholson@gmail.com> > >>>Date: Wed, 9 Nov 2016 20:19:00 -0600 > >>>Subject: [PATCH] bind_textdomain_codeset: don't return failure unless encoding isn't UTF-8 > >>> > >>>VLC isn't happy when bind_textdomain_codeset returns NULL > >>>--- > >>> src/locale/bind_textdomain_codeset.c | 4 +++- > >>> 1 file changed, 3 insertions(+), 1 deletion(-) > >>> > >>>diff --git a/src/locale/bind_textdomain_codeset.c b/src/locale/bind_textdomain_codeset.c > >>>index 5ebfd5e..e5f3f52 100644 > >>>--- a/src/locale/bind_textdomain_codeset.c > >>>+++ b/src/locale/bind_textdomain_codeset.c > >>>@@ -5,7 +5,9 @@ > >>> III > >>> char *bind_textdomain_codeset(const char *domainname, const char *codeset) > >>> { > >>>- if (codeset && strcasecmp(codeset, "UTF-8")) > >>>+ if (codeset && ((strcasecmp(codeset, "UTF-8") == 0) || (strcasecmp(codeset, "UTF8") == 0))) { > >>>+ return "UTF-8"; > >>>+ } else if (codeset) > >>> errno = EINVAL; > >>> return NULL; > >>> } > >>>-- > >>>2.10.2 > >> > >>I think this needs some more thought. The documentation of the API is > >>that a null pointer argument/result means "the locale's character > >>encoding", and that the default is null; presumably even when the > >>locale's codeset is "foo", null (default) and "foo" are still > >>different states. > >> > >>I don't actually like that, and don't think we should copy it -- > >>especially since, now that we also have a C locale with "ASCII" as the > >>codeset, we _can't_ provide a codeset matching the locale in all cases > >>-- but I also don't think it's right for the return value (null or > >>"UTF-8") to depend on the argument rather than on the "previous state" > >>like it's documented to. > >> > >>There seem to be two possible reasonable behaviors: > >> > >>1. Diverge from the GNU behavior and treat textdomains as always-bound > >> to "UTF-8", regardless of whether bind_textdomain_codeset has been > >> called. The function would then return a null pointer with EINVAL > >> set for strings other than "UTF-8"/"UTF8", and would return "UTF-8" > >> for a valid or null-pointer argument. > >> > >>2. Keep a 1-bit state for each textdomain reflecting whether its > >> nominally in "default" mode or "UTF-8" mode. Either way the > >> original UTF-8 string would be returned; the only point of the > >> state would be providing a return value for bind_textdomain_codeset > >> that reflects how it was previously called. > >> > >>Being that 2 is gratuitous complexity to do something stupid and > >>meaningless, I'd lean towards 1, but I don't want to break anything > >>that works. Does this seem safe to do? > > > >Ping. Anyone else have thoughts on this? > > > >Rich > > [-- Attachment #2: btdc.diff --] [-- Type: text/plain, Size: 470 bytes --] diff --git a/src/locale/bind_textdomain_codeset.c b/src/locale/bind_textdomain_codeset.c index 5ebfd5e..240e83e 100644 --- a/src/locale/bind_textdomain_codeset.c +++ b/src/locale/bind_textdomain_codeset.c @@ -5,7 +5,9 @@ char *bind_textdomain_codeset(const char *domainname, const char *codeset) { - if (codeset && strcasecmp(codeset, "UTF-8")) + if (codeset && strcasecmp(codeset, "UTF-8")) { errno = EINVAL; - return NULL; + return 0; + } + return "UTF-8"; } ^ permalink raw reply [flat|nested] 6+ messages in thread
* [musl] Re: [PATCH] bind_textdomain_codeset: don't return failure unless encoding isn't UTF-8 2016-12-30 22:22 ` Rich Felker @ 2024-08-31 0:31 ` alice 0 siblings, 0 replies; 6+ messages in thread From: alice @ 2024-08-31 0:31 UTC (permalink / raw) To: dalias; +Cc: musl > On Fri, Dec 30, 2016 at 04:13:44PM -0600, Laine Gholson wrote: > > option 1 is the only sane choice, and I don't see how something > > could break unless they constantly check for the GNU behavior and > > break if it isn't the GNU behavior, in which case it is the > > program's fault anyways. > > Does the attached patch look reasonable? The "UTF8" alternative could > be added separately if needed; did you find software that's passing > the string without the '-'? > > I think the main functional difference from your patch is that "UTF-8" > is returned in the case where the codeset argument is null. > > Rich > > > > On 12/29/16 21:14, Rich Felker wrote: > > >On Fri, Dec 16, 2016 at 10:59:54PM -0500, Rich Felker wrote: > > >>On Sat, Dec 03, 2016 at 09:04:42PM -0600, Laine Gholson wrote: > > >>>returning null broke a vlc media player built with gettext support > > >> > > >>>>From 2f79aa294db5d9230ad71298e3de4b5561b441be Mon Sep 17 00:00:00 2001 > > >>>From: Laine Gholson <laine.gholson@gmail.com> > > >>>Date: Wed, 9 Nov 2016 20:19:00 -0600 > > >>>Subject: [PATCH] bind_textdomain_codeset: don't return failure unless encoding isn't UTF-8 > > >>> > > >>>VLC isn't happy when bind_textdomain_codeset returns NULL > > >>>--- > > >>> src/locale/bind_textdomain_codeset.c | 4 +++- > > >>> 1 file changed, 3 insertions(+), 1 deletion(-) > > >>> > > >>>diff --git a/src/locale/bind_textdomain_codeset.c b/src/locale/bind_textdomain_codeset.c > > >>>index 5ebfd5e..e5f3f52 100644 > > >>>--- a/src/locale/bind_textdomain_codeset.c > > >>>+++ b/src/locale/bind_textdomain_codeset.c > > >>>@@ -5,7 +5,9 @@ > > >>> III > > >>> char *bind_textdomain_codeset(const char *domainname, const char *codeset) > > >>> { > > >>>- if (codeset && strcasecmp(codeset, "UTF-8")) > > >>>+ if (codeset && ((strcasecmp(codeset, "UTF-8") == 0) || (strcasecmp(codeset, "UTF8") == 0))) { > > >>>+ return "UTF-8"; > > >>>+ } else if (codeset) > > >>> errno = EINVAL; > > >>> return NULL; > > >>> } > > >>>-- > > >>>2.10.2 > > >> > > >>I think this needs some more thought. The documentation of the API is > > >>that a null pointer argument/result means "the locale's character > > >>encoding", and that the default is null; presumably even when the > > >>locale's codeset is "foo", null (default) and "foo" are still > > >>different states. > > >> > > >>I don't actually like that, and don't think we should copy it -- > > >>especially since, now that we also have a C locale with "ASCII" as the > > >>codeset, we _can't_ provide a codeset matching the locale in all cases > > >>-- but I also don't think it's right for the return value (null or > > >>"UTF-8") to depend on the argument rather than on the "previous state" > > >>like it's documented to. > > >> > > >>There seem to be two possible reasonable behaviors: > > >> > > >>1. Diverge from the GNU behavior and treat textdomains as always-bound > > >> to "UTF-8", regardless of whether bind_textdomain_codeset has been > > >> called. The function would then return a null pointer with EINVAL > > >> set for strings other than "UTF-8"/"UTF8", and would return "UTF-8" > > >> for a valid or null-pointer argument. > > >> > > >>2. Keep a 1-bit state for each textdomain reflecting whether its > > >> nominally in "default" mode or "UTF-8" mode. Either way the > > >> original UTF-8 string would be returned; the only point of the > > >> state would be providing a return value for bind_textdomain_codeset > > >> that reflects how it was previously called. > > >> > > >>Being that 2 is gratuitous complexity to do something stupid and > > >>meaningless, I'd lean towards 1, but I don't want to break anything > > >>that works. Does this seem safe to do? > > > > > >Ping. Anyone else have thoughts on this? > > > > > >Rich > > > ping :) the patch attached to that old email looks fine, and fixes a runtime i crash i ran into with an application getting confused with the incorrect NULL return in subsequent logic handling. i guess it might've just been forgotten in 2016. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-08-31 0:32 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-12-04 3:04 [PATCH] bind_textdomain_codeset: don't return failure unless encoding isn't UTF-8 Laine Gholson 2016-12-17 3:59 ` Rich Felker 2016-12-30 3:14 ` Rich Felker 2016-12-30 22:13 ` Laine Gholson 2016-12-30 22:22 ` Rich Felker 2024-08-31 0:31 ` [musl] " alice
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).