Re: First feedback on new C locale problems

mailing list of musl libc
 help / color / mirror / code / Atom feed

* Re: First feedback on new C locale problems
@ 2015-09-26  4:58 Felix Janda
  2015-09-26 19:35 ` Rich Felker
  0 siblings, 1 reply; 8+ messages in thread
From: Felix Janda @ 2015-09-26  4:58 UTC (permalink / raw)
  To: musl

On 2015-09-09 05:56:48 GMT, Rich Felker wrote:
> On Tue, Sep 01, 2015 at 02:32:35AM -0400, Rich Felker wrote:
> > What I'd like to do to fix it is just always return "UTF-8" for
> > nl_langinfo(CODESET) regardless of locale (rather than returning
> > "UTF-8-CODE-UNITS" when in C locale). POSIX places no requirements on
> > nl_langinfo that would preclude this, and it seems like it would
> > restore the desired properties and fix all the regressions.
>
> Committed.
>
> Rich

GNU sed seems to care about the output from nl_langinfo:

https://bugs.gentoo.org/show_bug.cgi?id=560728

More specifically, so does lib/localecharset.c, which is used in
the replacement of re_compile_pattern.

Felix


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Re: First feedback on new C locale problems
  2015-09-26  4:58 First feedback on new C locale problems Felix Janda
@ 2015-09-26 19:35 ` Rich Felker
  2015-09-27  6:17   ` Felix Janda
  0 siblings, 1 reply; 8+ messages in thread
From: Rich Felker @ 2015-09-26 19:35 UTC (permalink / raw)
  To: musl

On Sat, Sep 26, 2015 at 06:58:36AM +0200, Felix Janda wrote:
> On 2015-09-09 05:56:48 GMT, Rich Felker wrote:
> > On Tue, Sep 01, 2015 at 02:32:35AM -0400, Rich Felker wrote:
> > > What I'd like to do to fix it is just always return "UTF-8" for
> > > nl_langinfo(CODESET) regardless of locale (rather than returning
> > > "UTF-8-CODE-UNITS" when in C locale). POSIX places no requirements on
> > > nl_langinfo that would preclude this, and it seems like it would
> > > restore the desired properties and fix all the regressions.
> >
> > Committed.
> >
> > Rich
> 
> GNU sed seems to care about the output from nl_langinfo:
> 
> https://bugs.gentoo.org/show_bug.cgi?id=560728
> 
> More specifically, so does lib/localecharset.c, which is used in
> the replacement of re_compile_pattern.

I was able to reproduce this (with slightly different output, "a© a'")
on Alpine. Clearly this is some sort of bug in the gnulib code or sed
itself, since it's producing corrupt output. I think we should explore
why that's happening and whether it's possible to fix there. But if
there remain other reasons that returning "UTF-8" in the C locale is
not practical then perhaps we could resort to returning "ASCII".

Rich


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Re: First feedback on new C locale problems
  2015-09-26 19:35 ` Rich Felker
@ 2015-09-27  6:17   ` Felix Janda
  2015-09-27 13:47     ` Rich Felker
  0 siblings, 1 reply; 8+ messages in thread
From: Felix Janda @ 2015-09-27  6:17 UTC (permalink / raw)
  To: musl

Rich Felker wrote:
> On Sat, Sep 26, 2015 at 06:58:36AM +0200, Felix Janda wrote:
> > On 2015-09-09 05:56:48 GMT, Rich Felker wrote:
> > > On Tue, Sep 01, 2015 at 02:32:35AM -0400, Rich Felker wrote:
> > > > What I'd like to do to fix it is just always return "UTF-8" for
> > > > nl_langinfo(CODESET) regardless of locale (rather than returning
> > > > "UTF-8-CODE-UNITS" when in C locale). POSIX places no requirements on
> > > > nl_langinfo that would preclude this, and it seems like it would
> > > > restore the desired properties and fix all the regressions.
> > >
> > > Committed.
> > >
> > > Rich
> > 
> > GNU sed seems to care about the output from nl_langinfo:
> > 
> > https://bugs.gentoo.org/show_bug.cgi?id=560728
> > 
> > More specifically, so does lib/localecharset.c, which is used in
> > the replacement of re_compile_pattern.
> 
> I was able to reproduce this (with slightly different output, "a© a'")
> on Alpine. Clearly this is some sort of bug in the gnulib code or sed
> itself, since it's producing corrupt output. I think we should explore
> why that's happening and whether it's possible to fix there. But if
> there remain other reasons that returning "UTF-8" in the C locale is
> not practical then perhaps we could resort to returning "ASCII".

A possible fix is

--- ./a/sed-4.2.1/lib/regcomp.c
+++ ./a/sed-4.2.1/lib/regcomp.c
@@ -824,7 +824,7 @@ re_compile_internal (regex_t *preg, cons
 
 #ifdef RE_ENABLE_I18N
   /* If possible, do searching in single byte encoding to speed things up.  */
-  if (dfa->is_utf8 && dfa->mb_cur_max != 1 && !(syntax & RE_ICASE) && preg->translate == NULL)
+  if (dfa->is_utf8 && !(syntax & RE_ICASE) && preg->translate == NULL)
     optimize_utf8 (dfa);
 #endif
 

In our case is_utf8 is 1 and mb_cur_max is also 1. The function
optimize_utf8() would change "." to match utf8 characters instead of
bytes. For some reason I have not investigated further then "©" (or any
other non-ASCII) character is not matched, but in the C locale we want
"." also to match non-valid utf8 characters anyway.

glibc seems to be the upstream for the code.

Felix


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Re: First feedback on new C locale problems
  2015-09-27  6:17   ` Felix Janda
@ 2015-09-27 13:47     ` Rich Felker
  2015-09-27 13:49       ` Felix Janda
  0 siblings, 1 reply; 8+ messages in thread
From: Rich Felker @ 2015-09-27 13:47 UTC (permalink / raw)
  To: musl

On Sun, Sep 27, 2015 at 08:17:38AM +0200, Felix Janda wrote:
> Rich Felker wrote:
> > On Sat, Sep 26, 2015 at 06:58:36AM +0200, Felix Janda wrote:
> > > On 2015-09-09 05:56:48 GMT, Rich Felker wrote:
> > > > On Tue, Sep 01, 2015 at 02:32:35AM -0400, Rich Felker wrote:
> > > > > What I'd like to do to fix it is just always return "UTF-8" for
> > > > > nl_langinfo(CODESET) regardless of locale (rather than returning
> > > > > "UTF-8-CODE-UNITS" when in C locale). POSIX places no requirements on
> > > > > nl_langinfo that would preclude this, and it seems like it would
> > > > > restore the desired properties and fix all the regressions.
> > > >
> > > > Committed.
> > > >
> > > > Rich
> > > 
> > > GNU sed seems to care about the output from nl_langinfo:
> > > 
> > > https://bugs.gentoo.org/show_bug.cgi?id=560728
> > > 
> > > More specifically, so does lib/localecharset.c, which is used in
> > > the replacement of re_compile_pattern.
> > 
> > I was able to reproduce this (with slightly different output, "a© a'")
> > on Alpine. Clearly this is some sort of bug in the gnulib code or sed
> > itself, since it's producing corrupt output. I think we should explore
> > why that's happening and whether it's possible to fix there. But if
> > there remain other reasons that returning "UTF-8" in the C locale is
> > not practical then perhaps we could resort to returning "ASCII".
> 
> A possible fix is
> 
> --- ./a/sed-4.2.1/lib/regcomp.c
> +++ ./a/sed-4.2.1/lib/regcomp.c
> @@ -824,7 +824,7 @@ re_compile_internal (regex_t *preg, cons
>  
>  #ifdef RE_ENABLE_I18N
>    /* If possible, do searching in single byte encoding to speed things up.  */
> -  if (dfa->is_utf8 && dfa->mb_cur_max != 1 && !(syntax & RE_ICASE) && preg->translate == NULL)
> +  if (dfa->is_utf8 && !(syntax & RE_ICASE) && preg->translate == NULL)
>      optimize_utf8 (dfa);
>  #endif
>  
> 
> In our case is_utf8 is 1 and mb_cur_max is also 1. The function
> optimize_utf8() would change "." to match utf8 characters instead of
> bytes. For some reason I have not investigated further then "©" (or any
> other non-ASCII) character is not matched, but in the C locale we want
> "." also to match non-valid utf8 characters anyway.

I think this fix is misplaced; it looks like it would make GNU regex
do UTF-8 character matching rather than byte matching in the C locale.
Rather one of the other places that has an is_utf8 check also needs to
have the mb_cur_max!=1 check added, I think.

Rich


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Re: First feedback on new C locale problems
  2015-09-27 13:47     ` Rich Felker
@ 2015-09-27 13:49       ` Felix Janda
  2015-09-27 16:59         ` Rich Felker
  0 siblings, 1 reply; 8+ messages in thread
From: Felix Janda @ 2015-09-27 13:49 UTC (permalink / raw)
  To: musl

Rich Felker wrote:
> On Sun, Sep 27, 2015 at 08:17:38AM +0200, Felix Janda wrote:
> > Rich Felker wrote:
> > > On Sat, Sep 26, 2015 at 06:58:36AM +0200, Felix Janda wrote:
> > > > On 2015-09-09 05:56:48 GMT, Rich Felker wrote:
> > > > > On Tue, Sep 01, 2015 at 02:32:35AM -0400, Rich Felker wrote:
> > > > > > What I'd like to do to fix it is just always return "UTF-8" for
> > > > > > nl_langinfo(CODESET) regardless of locale (rather than returning
> > > > > > "UTF-8-CODE-UNITS" when in C locale). POSIX places no requirements on
> > > > > > nl_langinfo that would preclude this, and it seems like it would
> > > > > > restore the desired properties and fix all the regressions.
> > > > >
> > > > > Committed.
> > > > >
> > > > > Rich
> > > > 
> > > > GNU sed seems to care about the output from nl_langinfo:
> > > > 
> > > > https://bugs.gentoo.org/show_bug.cgi?id=560728
> > > > 
> > > > More specifically, so does lib/localecharset.c, which is used in
> > > > the replacement of re_compile_pattern.
> > > 
> > > I was able to reproduce this (with slightly different output, "a© a'")
> > > on Alpine. Clearly this is some sort of bug in the gnulib code or sed
> > > itself, since it's producing corrupt output. I think we should explore
> > > why that's happening and whether it's possible to fix there. But if
> > > there remain other reasons that returning "UTF-8" in the C locale is
> > > not practical then perhaps we could resort to returning "ASCII".
> > 
> > A possible fix is
> > 
> > --- ./a/sed-4.2.1/lib/regcomp.c
> > +++ ./a/sed-4.2.1/lib/regcomp.c
> > @@ -824,7 +824,7 @@ re_compile_internal (regex_t *preg, cons
> >  
> >  #ifdef RE_ENABLE_I18N
> >    /* If possible, do searching in single byte encoding to speed things up.  */
> > -  if (dfa->is_utf8 && dfa->mb_cur_max != 1 && !(syntax & RE_ICASE) && preg->translate == NULL)
> > +  if (dfa->is_utf8 && !(syntax & RE_ICASE) && preg->translate == NULL)
> >      optimize_utf8 (dfa);
> >  #endif
> >  
> > 
> > In our case is_utf8 is 1 and mb_cur_max is also 1. The function
> > optimize_utf8() would change "." to match utf8 characters instead of
> > bytes. For some reason I have not investigated further then "©" (or any
> > other non-ASCII) character is not matched, but in the C locale we want
> > "." also to match non-valid utf8 characters anyway.
> 
> I think this fix is misplaced; it looks like it would make GNU regex
> do UTF-8 character matching rather than byte matching in the C locale.
> Rather one of the other places that has an is_utf8 check also needs to
> have the mb_cur_max!=1 check added, I think.

Oh, sorry for the confusion. The patch is inverted...

Felix


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Re: First feedback on new C locale problems
  2015-09-27 13:49       ` Felix Janda
@ 2015-09-27 16:59         ` Rich Felker
  2015-09-28 18:58           ` Rich Felker
  0 siblings, 1 reply; 8+ messages in thread
From: Rich Felker @ 2015-09-27 16:59 UTC (permalink / raw)
  To: musl

On Sun, Sep 27, 2015 at 03:49:02PM +0200, Felix Janda wrote:
> Rich Felker wrote:
> > On Sun, Sep 27, 2015 at 08:17:38AM +0200, Felix Janda wrote:
> > > Rich Felker wrote:
> > > > On Sat, Sep 26, 2015 at 06:58:36AM +0200, Felix Janda wrote:
> > > > > On 2015-09-09 05:56:48 GMT, Rich Felker wrote:
> > > > > > On Tue, Sep 01, 2015 at 02:32:35AM -0400, Rich Felker wrote:
> > > > > > > What I'd like to do to fix it is just always return "UTF-8" for
> > > > > > > nl_langinfo(CODESET) regardless of locale (rather than returning
> > > > > > > "UTF-8-CODE-UNITS" when in C locale). POSIX places no requirements on
> > > > > > > nl_langinfo that would preclude this, and it seems like it would
> > > > > > > restore the desired properties and fix all the regressions.
> > > > > >
> > > > > > Committed.
> > > > > >
> > > > > > Rich
> > > > > 
> > > > > GNU sed seems to care about the output from nl_langinfo:
> > > > > 
> > > > > https://bugs.gentoo.org/show_bug.cgi?id=560728
> > > > > 
> > > > > More specifically, so does lib/localecharset.c, which is used in
> > > > > the replacement of re_compile_pattern.
> > > > 
> > > > I was able to reproduce this (with slightly different output, "a© a'")
> > > > on Alpine. Clearly this is some sort of bug in the gnulib code or sed
> > > > itself, since it's producing corrupt output. I think we should explore
> > > > why that's happening and whether it's possible to fix there. But if
> > > > there remain other reasons that returning "UTF-8" in the C locale is
> > > > not practical then perhaps we could resort to returning "ASCII".
> > > 
> > > A possible fix is
> > > 
> > > --- ./a/sed-4.2.1/lib/regcomp.c
> > > +++ ./a/sed-4.2.1/lib/regcomp.c
> > > @@ -824,7 +824,7 @@ re_compile_internal (regex_t *preg, cons
> > >  
> > >  #ifdef RE_ENABLE_I18N
> > >    /* If possible, do searching in single byte encoding to speed things up.  */
> > > -  if (dfa->is_utf8 && dfa->mb_cur_max != 1 && !(syntax & RE_ICASE) && preg->translate == NULL)
> > > +  if (dfa->is_utf8 && !(syntax & RE_ICASE) && preg->translate == NULL)
> > >      optimize_utf8 (dfa);
> > >  #endif
> > >  
> > > 
> > > In our case is_utf8 is 1 and mb_cur_max is also 1. The function
> > > optimize_utf8() would change "." to match utf8 characters instead of
> > > bytes. For some reason I have not investigated further then "©" (or any
> > > other non-ASCII) character is not matched, but in the C locale we want
> > > "." also to match non-valid utf8 characters anyway.
> > 
> > I think this fix is misplaced; it looks like it would make GNU regex
> > do UTF-8 character matching rather than byte matching in the C locale.
> > Rather one of the other places that has an is_utf8 check also needs to
> > have the mb_cur_max!=1 check added, I think.
> 
> Oh, sorry for the confusion. The patch is inverted...

Ah, ok. But in that case, it's probably best not to detect is_utf8 to
begin with if MB_CUR_MAX==1.

I should probably read the code and try to get a better understanding
of what it's doing.

Rich


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Re: First feedback on new C locale problems
  2015-09-27 16:59         ` Rich Felker
@ 2015-09-28 18:58           ` Rich Felker
  2015-09-29  4:00             ` Felix Janda
  0 siblings, 1 reply; 8+ messages in thread
From: Rich Felker @ 2015-09-28 18:58 UTC (permalink / raw)
  To: musl

On Sun, Sep 27, 2015 at 12:59:25PM -0400, Rich Felker wrote:
> On Sun, Sep 27, 2015 at 03:49:02PM +0200, Felix Janda wrote:
> > Rich Felker wrote:
> > > On Sun, Sep 27, 2015 at 08:17:38AM +0200, Felix Janda wrote:
> > > > Rich Felker wrote:
> > > > > On Sat, Sep 26, 2015 at 06:58:36AM +0200, Felix Janda wrote:
> > > > > > On 2015-09-09 05:56:48 GMT, Rich Felker wrote:
> > > > > > > On Tue, Sep 01, 2015 at 02:32:35AM -0400, Rich Felker wrote:
> > > > > > > > What I'd like to do to fix it is just always return "UTF-8" for
> > > > > > > > nl_langinfo(CODESET) regardless of locale (rather than returning
> > > > > > > > "UTF-8-CODE-UNITS" when in C locale). POSIX places no requirements on
> > > > > > > > nl_langinfo that would preclude this, and it seems like it would
> > > > > > > > restore the desired properties and fix all the regressions.
> > > > > > >
> > > > > > > Committed.
> > > > > > >
> > > > > > > Rich
> > > > > > 
> > > > > > GNU sed seems to care about the output from nl_langinfo:
> > > > > > 
> > > > > > https://bugs.gentoo.org/show_bug.cgi?id=560728
> > > > > > 
> > > > > > More specifically, so does lib/localecharset.c, which is used in
> > > > > > the replacement of re_compile_pattern.
> > > > > 
> > > > > I was able to reproduce this (with slightly different output, "a© a'")
> > > > > on Alpine. Clearly this is some sort of bug in the gnulib code or sed
> > > > > itself, since it's producing corrupt output. I think we should explore
> > > > > why that's happening and whether it's possible to fix there. But if
> > > > > there remain other reasons that returning "UTF-8" in the C locale is
> > > > > not practical then perhaps we could resort to returning "ASCII".
> > > > 
> > > > A possible fix is
> > > > 
> > > > --- ./a/sed-4.2.1/lib/regcomp.c
> > > > +++ ./a/sed-4.2.1/lib/regcomp.c
> > > > @@ -824,7 +824,7 @@ re_compile_internal (regex_t *preg, cons
> > > >  
> > > >  #ifdef RE_ENABLE_I18N
> > > >    /* If possible, do searching in single byte encoding to speed things up.  */
> > > > -  if (dfa->is_utf8 && dfa->mb_cur_max != 1 && !(syntax & RE_ICASE) && preg->translate == NULL)
> > > > +  if (dfa->is_utf8 && !(syntax & RE_ICASE) && preg->translate == NULL)
> > > >      optimize_utf8 (dfa);
> > > >  #endif
> > > >  
> > > > 
> > > > In our case is_utf8 is 1 and mb_cur_max is also 1. The function
> > > > optimize_utf8() would change "." to match utf8 characters instead of
> > > > bytes. For some reason I have not investigated further then "©" (or any
> > > > other non-ASCII) character is not matched, but in the C locale we want
> > > > "." also to match non-valid utf8 characters anyway.
> > > 
> > > I think this fix is misplaced; it looks like it would make GNU regex
> > > do UTF-8 character matching rather than byte matching in the C locale.
> > > Rather one of the other places that has an is_utf8 check also needs to
> > > have the mb_cur_max!=1 check added, I think.
> > 
> > Oh, sorry for the confusion. The patch is inverted...
> 
> Ah, ok. But in that case, it's probably best not to detect is_utf8 to
> begin with if MB_CUR_MAX==1.
> 
> I should probably read the code and try to get a better understanding
> of what it's doing.

I think the actual error is here:

http://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/regcomp.c#n903

In the _LIBC code path, they check MB_CUR_LEN==6 (glibc's nonstandard
value they use for UTF-8) perhaps just as an optimization of the
non-UTF-8 case, but they don't check it for !_LIBC; they just rely on
the CODESET name matching.

I'm still somewhat concerned that returning "UTF-8" is problematic
here, but I think gnulib also has a bug; trusting their interpretation
of the string returned by nl_langinfo(CODESET) seems to be leading to
corrupt results.

Rich


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Re: First feedback on new C locale problems
  2015-09-28 18:58           ` Rich Felker
@ 2015-09-29  4:00             ` Felix Janda
  0 siblings, 0 replies; 8+ messages in thread
From: Felix Janda @ 2015-09-29  4:00 UTC (permalink / raw)
  To: musl

Rich Felker wrote:
> On Sun, Sep 27, 2015 at 12:59:25PM -0400, Rich Felker wrote:
> > On Sun, Sep 27, 2015 at 03:49:02PM +0200, Felix Janda wrote:
> > > Rich Felker wrote:
> > > > On Sun, Sep 27, 2015 at 08:17:38AM +0200, Felix Janda wrote:
> > > > > Rich Felker wrote:
> > > > > > On Sat, Sep 26, 2015 at 06:58:36AM +0200, Felix Janda wrote:
> > > > > > > On 2015-09-09 05:56:48 GMT, Rich Felker wrote:
> > > > > > > > On Tue, Sep 01, 2015 at 02:32:35AM -0400, Rich Felker wrote:
> > > > > > > > > What I'd like to do to fix it is just always return "UTF-8" for
> > > > > > > > > nl_langinfo(CODESET) regardless of locale (rather than returning
> > > > > > > > > "UTF-8-CODE-UNITS" when in C locale). POSIX places no requirements on
> > > > > > > > > nl_langinfo that would preclude this, and it seems like it would
> > > > > > > > > restore the desired properties and fix all the regressions.
> > > > > > > >
> > > > > > > > Committed.
> > > > > > > >
> > > > > > > > Rich
> > > > > > > 
> > > > > > > GNU sed seems to care about the output from nl_langinfo:
> > > > > > > 
> > > > > > > https://bugs.gentoo.org/show_bug.cgi?id=560728
> > > > > > > 
> > > > > > > More specifically, so does lib/localecharset.c, which is used in
> > > > > > > the replacement of re_compile_pattern.
> > > > > > 
> > > > > > I was able to reproduce this (with slightly different output, "a© a'")
> > > > > > on Alpine. Clearly this is some sort of bug in the gnulib code or sed
> > > > > > itself, since it's producing corrupt output. I think we should explore
> > > > > > why that's happening and whether it's possible to fix there. But if
> > > > > > there remain other reasons that returning "UTF-8" in the C locale is
> > > > > > not practical then perhaps we could resort to returning "ASCII".
> > > > > 
> > > > > A possible fix is
> > > > > 
> > > > > --- ./a/sed-4.2.1/lib/regcomp.c
> > > > > +++ ./a/sed-4.2.1/lib/regcomp.c
> > > > > @@ -824,7 +824,7 @@ re_compile_internal (regex_t *preg, cons
> > > > >  
> > > > >  #ifdef RE_ENABLE_I18N
> > > > >    /* If possible, do searching in single byte encoding to speed things up.  */
> > > > > -  if (dfa->is_utf8 && dfa->mb_cur_max != 1 && !(syntax & RE_ICASE) && preg->translate == NULL)
> > > > > +  if (dfa->is_utf8 && !(syntax & RE_ICASE) && preg->translate == NULL)
> > > > >      optimize_utf8 (dfa);
> > > > >  #endif
> > > > >  
> > > > > 
> > > > > In our case is_utf8 is 1 and mb_cur_max is also 1. The function
> > > > > optimize_utf8() would change "." to match utf8 characters instead of
> > > > > bytes. For some reason I have not investigated further then "©" (or any
> > > > > other non-ASCII) character is not matched, but in the C locale we want
> > > > > "." also to match non-valid utf8 characters anyway.
> > > > 
> > > > I think this fix is misplaced; it looks like it would make GNU regex
> > > > do UTF-8 character matching rather than byte matching in the C locale.
> > > > Rather one of the other places that has an is_utf8 check also needs to
> > > > have the mb_cur_max!=1 check added, I think.
> > > 
> > > Oh, sorry for the confusion. The patch is inverted...
> > 
> > Ah, ok. But in that case, it's probably best not to detect is_utf8 to
> > begin with if MB_CUR_MAX==1.
> > 
> > I should probably read the code and try to get a better understanding
> > of what it's doing.
> 
> I think the actual error is here:
> 
> http://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/regcomp.c#n903
> 
> In the _LIBC code path, they check MB_CUR_LEN==6 (glibc's nonstandard
> value they use for UTF-8) perhaps just as an optimization of the
> non-UTF-8 case, but they don't check it for !_LIBC; they just rely on
> the CODESET name matching.

Upon your previous mail I had come to the same conclusions. Maybe they
would not be opposed to optimizing the non-UTF-8 when !_LIBC using
MB_CUR_MAX.

Unfortunately, the GNU regex code seems to be copied into quite a lot
of projects.

Felix

> I'm still somewhat concerned that returning "UTF-8" is problematic
> here, but I think gnulib also has a bug; trusting their interpretation
> of the string returned by nl_langinfo(CODESET) seems to be leading to
> corrupt results.
>
> Rich


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-09-29  4:00 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-26  4:58 First feedback on new C locale problems Felix Janda
2015-09-26 19:35 ` Rich Felker
2015-09-27  6:17   ` Felix Janda
2015-09-27 13:47     ` Rich Felker
2015-09-27 13:49       ` Felix Janda
2015-09-27 16:59         ` Rich Felker
2015-09-28 18:58           ` Rich Felker
2015-09-29  4:00             ` Felix Janda

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).