mailing list of musl libc
 help / color / mirror / code / Atom feed
* printf doesn't respect locale
@ 2019-09-09 16:31 Daniel Schoepe
  2019-09-09 16:39 ` Daniel Schoepe
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Daniel Schoepe @ 2019-09-09 16:31 UTC (permalink / raw)
  To: musl

Hi,

I think I found a discrepancy between musl's behavior and the POSIX standard:

According to the POSIX standard, the decimal separator used when using
printf to print floating point numbers should come from the locale
(https://pubs.opengroup.org/onlinepubs/9699919799/functions/fprintf.html):

"The radix character is defined in the current locale (category
LC_NUMERIC). In the POSIX locale, or in a locale where the radix
character is not defined, the radix character shall default to a
<period> ( '.' )."

However, it seems that in musl, a period is always used for printing
floating point numbers. For example, the following program prints
"12.0" instead of "12,0" (which is printed when using GNU libc):

#include <stdio.h>
#include <locale.h>

int main(int argc, char **argv) {
    setlocale(LC_ALL, "DE_de");
    printf("%f\n", 12.0f);
}

This was tested using the latest git checkout of musl
(a882841baf42e6a8b74cc33a239b84a9a79493db), compiled on Ubuntu 18.04
using the musl-gcc script. It looks like the usage of "." as a
separator is hardcoded in `fmt_fp`, for instance here:
https://git.musl-libc.org/cgit/musl/tree/src/stdio/vfprintf.c#n392

Best regards,
Daniel


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: printf doesn't respect locale
  2019-09-09 16:31 printf doesn't respect locale Daniel Schoepe
@ 2019-09-09 16:39 ` Daniel Schoepe
  2019-09-09 16:51 ` Szabolcs Nagy
  2019-09-09 17:54 ` Rich Felker
  2 siblings, 0 replies; 21+ messages in thread
From: Daniel Schoepe @ 2019-09-09 16:39 UTC (permalink / raw)
  To: musl

Small correction: The example works as the standard suggests on OSX,
but exhibits the same behavior as with musl with GNU libc as well.

On Mon, Sep 9, 2019 at 5:31 PM Daniel Schoepe <daniel@schoepe.org> wrote:
>
> Hi,
>
> I think I found a discrepancy between musl's behavior and the POSIX standard:
>
> According to the POSIX standard, the decimal separator used when using
> printf to print floating point numbers should come from the locale
> (https://pubs.opengroup.org/onlinepubs/9699919799/functions/fprintf.html):
>
> "The radix character is defined in the current locale (category
> LC_NUMERIC). In the POSIX locale, or in a locale where the radix
> character is not defined, the radix character shall default to a
> <period> ( '.' )."
>
> However, it seems that in musl, a period is always used for printing
> floating point numbers. For example, the following program prints
> "12.0" instead of "12,0" (which is printed when using GNU libc):
>
> #include <stdio.h>
> #include <locale.h>
>
> int main(int argc, char **argv) {
>     setlocale(LC_ALL, "DE_de");
>     printf("%f\n", 12.0f);
> }
>
> This was tested using the latest git checkout of musl
> (a882841baf42e6a8b74cc33a239b84a9a79493db), compiled on Ubuntu 18.04
> using the musl-gcc script. It looks like the usage of "." as a
> separator is hardcoded in `fmt_fp`, for instance here:
> https://git.musl-libc.org/cgit/musl/tree/src/stdio/vfprintf.c#n392
>
> Best regards,
> Daniel


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: printf doesn't respect locale
  2019-09-09 16:31 printf doesn't respect locale Daniel Schoepe
  2019-09-09 16:39 ` Daniel Schoepe
@ 2019-09-09 16:51 ` Szabolcs Nagy
  2019-09-09 17:55   ` Rich Felker
  2019-09-09 17:54 ` Rich Felker
  2 siblings, 1 reply; 21+ messages in thread
From: Szabolcs Nagy @ 2019-09-09 16:51 UTC (permalink / raw)
  To: musl

* Daniel Schoepe <daniel@schoepe.org> [2019-09-09 17:31:01 +0100]:
> I think I found a discrepancy between musl's behavior and the POSIX standard:
> 
> According to the POSIX standard, the decimal separator used when using
> printf to print floating point numbers should come from the locale
> (https://pubs.opengroup.org/onlinepubs/9699919799/functions/fprintf.html):
> 
> "The radix character is defined in the current locale (category
> LC_NUMERIC). In the POSIX locale, or in a locale where the radix
> character is not defined, the radix character shall default to a
> <period> ( '.' )."
> 
> However, it seems that in musl, a period is always used for printing
> floating point numbers. For example, the following program prints
> "12.0" instead of "12,0" (which is printed when using GNU libc):

musl is posix conform.

it just only supports LC_NUMERIC locales where the radix
character is a period.

if you see a musl based system where LC_NUMERIC is defined
otherwise then report the issue to the integrator or
distributor of that system.


> 
> #include <stdio.h>
> #include <locale.h>
> 
> int main(int argc, char **argv) {
>     setlocale(LC_ALL, "DE_de");
>     printf("%f\n", 12.0f);
> }

the musl DE_de locale must use . as radix, so the output
is expected.

> 
> This was tested using the latest git checkout of musl
> (a882841baf42e6a8b74cc33a239b84a9a79493db), compiled on Ubuntu 18.04
> using the musl-gcc script. It looks like the usage of "." as a
> separator is hardcoded in `fmt_fp`, for instance here:
> https://git.musl-libc.org/cgit/musl/tree/src/stdio/vfprintf.c#n392
> 
> Best regards,
> Daniel


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: printf doesn't respect locale
  2019-09-09 16:31 printf doesn't respect locale Daniel Schoepe
  2019-09-09 16:39 ` Daniel Schoepe
  2019-09-09 16:51 ` Szabolcs Nagy
@ 2019-09-09 17:54 ` Rich Felker
  2019-09-10 16:00   ` Daniel Schoepe
  2 siblings, 1 reply; 21+ messages in thread
From: Rich Felker @ 2019-09-09 17:54 UTC (permalink / raw)
  To: musl

On Mon, Sep 09, 2019 at 05:31:01PM +0100, Daniel Schoepe wrote:
> Hi,
> 
> I think I found a discrepancy between musl's behavior and the POSIX standard:
> 
> According to the POSIX standard, the decimal separator used when using
> printf to print floating point numbers should come from the locale
> (https://pubs.opengroup.org/onlinepubs/9699919799/functions/fprintf.html):
> 
> "The radix character is defined in the current locale (category
> LC_NUMERIC). In the POSIX locale, or in a locale where the radix
> character is not defined, the radix character shall default to a
> <period> ( '.' )."
> 
> However, it seems that in musl, a period is always used for printing
> floating point numbers. For example, the following program prints
> "12.0" instead of "12,0" (which is printed when using GNU libc):

It's not a discrepancy; the set of locales supported by an
implementation, unless it includes the POSIX localedef utility/option,
is implementation-defined. musl's definition does not include locales
where the radix point is not '.'

I really really really don't like the feature of changing the radix
point, and this implementation choice was intentional, but it's come
up several times with people being upset that it's not in line with
musl's mission of being multilingual-friendly. I think it deserves
some consideration again along with upcoming locale improvements.
There's at least one past thread with design sketches on how it would
need to be done (and what needs to be done anyway for LC_MONETARY
stuff), and sadly it got no feedback from people interested in
improved locale functionality which is why I've kinda let it be for
the time being...

Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: printf doesn't respect locale
  2019-09-09 16:51 ` Szabolcs Nagy
@ 2019-09-09 17:55   ` Rich Felker
  0 siblings, 0 replies; 21+ messages in thread
From: Rich Felker @ 2019-09-09 17:55 UTC (permalink / raw)
  To: musl

On Mon, Sep 09, 2019 at 06:51:00PM +0200, Szabolcs Nagy wrote:
> * Daniel Schoepe <daniel@schoepe.org> [2019-09-09 17:31:01 +0100]:
> > I think I found a discrepancy between musl's behavior and the POSIX standard:
> > 
> > According to the POSIX standard, the decimal separator used when using
> > printf to print floating point numbers should come from the locale
> > (https://pubs.opengroup.org/onlinepubs/9699919799/functions/fprintf.html):
> > 
> > "The radix character is defined in the current locale (category
> > LC_NUMERIC). In the POSIX locale, or in a locale where the radix
> > character is not defined, the radix character shall default to a
> > <period> ( '.' )."
> > 
> > However, it seems that in musl, a period is always used for printing
> > floating point numbers. For example, the following program prints
> > "12.0" instead of "12,0" (which is printed when using GNU libc):
> 
> musl is posix conform.
> 
> it just only supports LC_NUMERIC locales where the radix
> character is a period.
> 
> if you see a musl based system where LC_NUMERIC is defined
> otherwise then report the issue to the integrator or
> distributor of that system.

I don't understand what that would mean. musl's locale definition
system simply has no way to represent different radix point
characters, so there cant' be such an integration/distribution.

Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: printf doesn't respect locale
  2019-09-09 17:54 ` Rich Felker
@ 2019-09-10 16:00   ` Daniel Schoepe
  2019-09-10 16:31     ` Szabolcs Nagy
  0 siblings, 1 reply; 21+ messages in thread
From: Daniel Schoepe @ 2019-09-10 16:00 UTC (permalink / raw)
  To: musl

On Mon, Sep 9, 2019 at 6:55 PM Rich Felker <dalias@libc.org> wrote:
> It's not a discrepancy; the set of locales supported by an
> implementation, unless it includes the POSIX localedef utility/option,
> is implementation-defined. musl's definition does not include locales
> where the radix point is not '.'

Thanks, that makes sense. However, it may make sense to document this
assumption in the FAQ entries related to printf.

> I really really really don't like the feature of changing the radix
> point, and this implementation choice was intentional, but it's come
> up several times with people being upset that it's not in line with
> musl's mission of being multilingual-friendly. I think it deserves
> some consideration again along with upcoming locale improvements.
> There's at least one past thread with design sketches on how it would
> need to be done (and what needs to be done anyway for LC_MONETARY
> stuff), and sadly it got no feedback from people interested in
> improved locale functionality which is why I've kinda let it be for
> the time being...

I'm also not a fan of this behavior, I actually stumbled across this
when tracking
down a bug the different radix usage caused.

Best,
Daniel

>
> Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: printf doesn't respect locale
  2019-09-10 16:00   ` Daniel Schoepe
@ 2019-09-10 16:31     ` Szabolcs Nagy
  2019-09-10 16:44       ` Tim Tassonis
  2019-09-10 17:10       ` Daniel Schoepe
  0 siblings, 2 replies; 21+ messages in thread
From: Szabolcs Nagy @ 2019-09-10 16:31 UTC (permalink / raw)
  To: musl

* Daniel Schoepe <daniel@schoepe.org> [2019-09-10 17:00:49 +0100]:
> I'm also not a fan of this behavior, I actually stumbled across this
> when tracking
> down a bug the different radix usage caused.

i'm interested in how this can cause a bug in correct software.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: printf doesn't respect locale
  2019-09-10 16:31     ` Szabolcs Nagy
@ 2019-09-10 16:44       ` Tim Tassonis
  2019-09-10 17:30         ` Rich Felker
  2019-09-10 17:10       ` Daniel Schoepe
  1 sibling, 1 reply; 21+ messages in thread
From: Tim Tassonis @ 2019-09-10 16:44 UTC (permalink / raw)
  To: musl

On 9/10/19 6:31 PM, Szabolcs Nagy wrote:
> * Daniel Schoepe <daniel@schoepe.org> [2019-09-10 17:00:49 +0100]:
>> I'm also not a fan of this behavior, I actually stumbled across this
>> when tracking
>> down a bug the different radix usage caused.
> 
> i'm interested in how this can cause a bug in correct software.

Depends on your definition of "correct software". I'd say correct 
software has no bugs at all...

Anyway, I can think of cases where the usually correct assumption is 
made that the floating  point delimiter is one byte, while some locales 
maybe need two bytes. This could then of course lead to memory 
corruption when using sprintf with a too small buffer.


Bye
Tim




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: printf doesn't respect locale
  2019-09-10 16:31     ` Szabolcs Nagy
  2019-09-10 16:44       ` Tim Tassonis
@ 2019-09-10 17:10       ` Daniel Schoepe
  2019-09-10 17:33         ` Rich Felker
  2019-09-10 18:43         ` Szabolcs Nagy
  1 sibling, 2 replies; 21+ messages in thread
From: Daniel Schoepe @ 2019-09-10 17:10 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: musl

Basically, someone used printf to produce json output and was unaware
that the radix used by printf was locale-dependent. When this was run
on a system with a non-English locale, it no longer produced valid
JSON as output.

Best,
Daniel

On Tue, Sep 10, 2019 at 5:31 PM Szabolcs Nagy <nsz@port70.net> wrote:
>
> * Daniel Schoepe <daniel@schoepe.org> [2019-09-10 17:00:49 +0100]:
> > I'm also not a fan of this behavior, I actually stumbled across this
> > when tracking
> > down a bug the different radix usage caused.
>
> i'm interested in how this can cause a bug in correct software.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: printf doesn't respect locale
  2019-09-10 16:44       ` Tim Tassonis
@ 2019-09-10 17:30         ` Rich Felker
  0 siblings, 0 replies; 21+ messages in thread
From: Rich Felker @ 2019-09-10 17:30 UTC (permalink / raw)
  To: musl

On Tue, Sep 10, 2019 at 06:44:24PM +0200, Tim Tassonis wrote:
> On 9/10/19 6:31 PM, Szabolcs Nagy wrote:
> >* Daniel Schoepe <daniel@schoepe.org> [2019-09-10 17:00:49 +0100]:
> >>I'm also not a fan of this behavior, I actually stumbled across this
> >>when tracking
> >>down a bug the different radix usage caused.
> >
> >i'm interested in how this can cause a bug in correct software.
> 
> Depends on your definition of "correct software". I'd say correct
> software has no bugs at all...
> 
> Anyway, I can think of cases where the usually correct assumption is
> made that the floating  point delimiter is one byte, while some
> locales maybe need two bytes. This could then of course lead to
> memory corruption when using sprintf with a too small buffer.

FWIW, if musl does adopt support for locale-variant radix point, it
will be a one-bit property switching between '.' and ','

The issue with wrong space reservations for multibyte radix points you
raise is definitely one of the motivations. There are also attacks on
glibc and other localedef-based implementations where you make a
custom locale where the radix point is something else, like a digit or
letter, to cause data to be misinterpreted in dangerous ways.

Normally attackers don't have control to do this, but it can happen
with things like ssh propagating locale environment variables to a
git-only remote account or similar.

Since there are only two values of the radix point character with any
cultural significance, support for anything else is just YAGNI
generality for its own sake, at the expense of safety.

Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: printf doesn't respect locale
  2019-09-10 17:10       ` Daniel Schoepe
@ 2019-09-10 17:33         ` Rich Felker
  2019-09-10 18:43         ` Szabolcs Nagy
  1 sibling, 0 replies; 21+ messages in thread
From: Rich Felker @ 2019-09-10 17:33 UTC (permalink / raw)
  To: musl

On Tue, Sep 10, 2019 at 06:10:20PM +0100, Daniel Schoepe wrote:
> Basically, someone used printf to produce json output and was unaware
> that the radix used by printf was locale-dependent. When this was run
> on a system with a non-English locale, it no longer produced valid
> JSON as output.

Yes, like you say it's not really a bug in correct software so much as
a pitfall programmers are unaware of, that's hard to program around.

But it can actually be a bug in correct *application* software due to
incorrect library software. Various library software (I think glib or
gtk, IIRC, among many others) calls setlocale(LC_ALL,"") behind the
application's back, rather than trusting that the application set the
locale the way it wants (incidentially, this is not thread-safe or
library-safe and makes these libraries unsafe to use via dlopen or
anywhere but at the top of main!). If the application only intends to
set other categories, but leave LC_NUMERIC as "C", then it should
rightfully expect a '.' radix point, but this expectation will be
violated if certain third-party libraries are involved.

Rich

> On Tue, Sep 10, 2019 at 5:31 PM Szabolcs Nagy <nsz@port70.net> wrote:
> >
> > * Daniel Schoepe <daniel@schoepe.org> [2019-09-10 17:00:49 +0100]:
> > > I'm also not a fan of this behavior, I actually stumbled across this
> > > when tracking
> > > down a bug the different radix usage caused.
> >
> > i'm interested in how this can cause a bug in correct software.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: printf doesn't respect locale
  2019-09-10 17:10       ` Daniel Schoepe
  2019-09-10 17:33         ` Rich Felker
@ 2019-09-10 18:43         ` Szabolcs Nagy
  2019-09-10 21:55           ` A. Wilcox
  1 sibling, 1 reply; 21+ messages in thread
From: Szabolcs Nagy @ 2019-09-10 18:43 UTC (permalink / raw)
  To: Daniel Schoepe; +Cc: musl

* Daniel Schoepe <daniel@schoepe.org> [2019-09-10 18:10:20 +0100]:
> Basically, someone used printf to produce json output and was unaware
> that the radix used by printf was locale-dependent. When this was run
> on a system with a non-English locale, it no longer produced valid
> JSON as output.

ok, i thought using '.' unconditionally caused some problem.

i've seen plenty issues with locale dependent radix point
when numbers unexpectedly have ',', but the current musl
behaviour exactly prevents those types of bugs and i'd
prefer to keep it that way.

simple scripts parsing some program output will not be tested
across different locales. global state dependence is bad in
general in systems software which often communicates between
machines, not humans, and you cant afford to synchronize that
global state or deal with its combinatorics. in particular
libraries can't use any api with global state dependence if
that state may change asynchronously, thread-local state is
a bit better (and since posix2008 locales can be thread-local),
but it still has issues e.g. dprintf is implemented to be
async-signal-safe, but in a signal handler you can't change
the locale setting to get reliable dprintf behaviour and
it's inefficient/inconvenient to save/restore tls state
around every printf call anyway.

i think libc should mainly aim for reliability of systems
software and not for friendliness of ui applications.

> 
> Best,
> Daniel
> 
> On Tue, Sep 10, 2019 at 5:31 PM Szabolcs Nagy <nsz@port70.net> wrote:
> >
> > * Daniel Schoepe <daniel@schoepe.org> [2019-09-10 17:00:49 +0100]:
> > > I'm also not a fan of this behavior, I actually stumbled across this
> > > when tracking
> > > down a bug the different radix usage caused.
> >
> > i'm interested in how this can cause a bug in correct software.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: printf doesn't respect locale
  2019-09-10 18:43         ` Szabolcs Nagy
@ 2019-09-10 21:55           ` A. Wilcox
  2019-09-11 10:01             ` Szabolcs Nagy
  0 siblings, 1 reply; 21+ messages in thread
From: A. Wilcox @ 2019-09-10 21:55 UTC (permalink / raw)
  To: musl


[-- Attachment #1.1: Type: text/plain, Size: 1108 bytes --]

On 10/09/2019 13:43, Szabolcs Nagy wrote:
> i think libc should mainly aim for reliability of systems
> software and not for friendliness of ui applications.


While reliability is important, I disagree that reliability should
*exclude* UI.

musl already causes crashes or other unexpected behaviour when system
software isn't written correctly.  That's a feature, not a bug.  If musl
supporting ',' as radix point causes bad software to crash, then that
software needs to be fixed.

We at Adélie field requests nearly every day wondering why our system
doesn't support other locales for things like sort, LC_NUMERIC,
LC_MONETARY, etc etc.  The only reason I haven't been more active in
developing musl's locale support is because I'm too busy doing other
important work.

We would be *extremely* disappointed if LC_NUMERIC would never be
supported in upstream musl.  We would have to maintain a patch to add
LC_NUMERIC support when the rest of musl's locale support is developed.

Best,
--arw

-- 
A. Wilcox (awilfox)
Project Lead, Adélie Linux
https://www.adelielinux.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: printf doesn't respect locale
  2019-09-10 21:55           ` A. Wilcox
@ 2019-09-11 10:01             ` Szabolcs Nagy
  2019-09-11 10:07               ` Jens Gustedt
  0 siblings, 1 reply; 21+ messages in thread
From: Szabolcs Nagy @ 2019-09-11 10:01 UTC (permalink / raw)
  To: musl

* A. Wilcox <awilfox@adelielinux.org> [2019-09-10 16:55:52 -0500]:
> On 10/09/2019 13:43, Szabolcs Nagy wrote:
> > i think libc should mainly aim for reliability of systems
> > software and not for friendliness of ui applications.
> 
> 
> While reliability is important, I disagree that reliability should
> *exclude* UI.
> 
> musl already causes crashes or other unexpected behaviour when system
> software isn't written correctly.  That's a feature, not a bug.  If musl
> supporting ',' as radix point causes bad software to crash, then that
> software needs to be fixed.
> 
> We at Adélie field requests nearly every day wondering why our system
> doesn't support other locales for things like sort, LC_NUMERIC,
> LC_MONETARY, etc etc.  The only reason I haven't been more active in
> developing musl's locale support is because I'm too busy doing other
> important work.
> 
> We would be *extremely* disappointed if LC_NUMERIC would never be
> supported in upstream musl.  We would have to maintain a patch to add
> LC_NUMERIC support when the rest of musl's locale support is developed.

i consider this a posix/iso c bug.

there is a need for printf with fixed C.UTF-8 locale in
library code that implements a file format, language or
protocol that cannot be locale dependent.

in iso c there is no way to get this.

in posix 2008 you have to jump through very bizarre hoops
to get it (in a slow and resource wasting way).

so the world is full of printf users that just expect
fixed C.UTF-8 locale and hope nobody calls setlocale.

telling ppl that their code is wrong does not help unless
you provide an alternative, but introducing new api for
this would not be portable.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: printf doesn't respect locale
  2019-09-11 10:01             ` Szabolcs Nagy
@ 2019-09-11 10:07               ` Jens Gustedt
  2019-09-11 11:44                 ` Rich Felker
  0 siblings, 1 reply; 21+ messages in thread
From: Jens Gustedt @ 2019-09-11 10:07 UTC (permalink / raw)
  Cc: musl

[-- Attachment #1: Type: text/plain, Size: 1434 bytes --]

Hello Szabolcs,

On Wed, 11 Sep 2019 12:01:59 +0200 Szabolcs Nagy <nsz@port70.net> wrote:

> > We would be *extremely* disappointed if LC_NUMERIC would never be
> > supported in upstream musl.  We would have to maintain a patch to
> > add LC_NUMERIC support when the rest of musl's locale support is
> > developed.  
> 
> i consider this a posix/iso c bug.

I agree

> there is a need for printf with fixed C.UTF-8 locale in
> library code that implements a file format, language or
> protocol that cannot be locale dependent.
> 
> in iso c there is no way to get this.
> 
> in posix 2008 you have to jump through very bizarre hoops
> to get it (in a slow and resource wasting way).
> 
> so the world is full of printf users that just expect
> fixed C.UTF-8 locale and hope nobody calls setlocale.
> 
> telling ppl that their code is wrong does not help unless
> you provide an alternative, but introducing new api for
> this would not be portable.

I think that WG14 would be happy to hear any suggestions how we could
get out of this trap, a proposal for C2x would even be better.


Thanks
Jens

-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: printf doesn't respect locale
  2019-09-11 10:07               ` Jens Gustedt
@ 2019-09-11 11:44                 ` Rich Felker
  2019-09-11 12:53                   ` Jens Gustedt
  0 siblings, 1 reply; 21+ messages in thread
From: Rich Felker @ 2019-09-11 11:44 UTC (permalink / raw)
  To: musl

On Wed, Sep 11, 2019 at 12:07:22PM +0200, Jens Gustedt wrote:
> Hello Szabolcs,
> 
> On Wed, 11 Sep 2019 12:01:59 +0200 Szabolcs Nagy <nsz@port70.net> wrote:
> 
> > > We would be *extremely* disappointed if LC_NUMERIC would never be
> > > supported in upstream musl.  We would have to maintain a patch to
> > > add LC_NUMERIC support when the rest of musl's locale support is
> > > developed.  
> > 
> > i consider this a posix/iso c bug.
> 
> I agree
> 
> > there is a need for printf with fixed C.UTF-8 locale in
> > library code that implements a file format, language or
> > protocol that cannot be locale dependent.
> > 
> > in iso c there is no way to get this.
> > 
> > in posix 2008 you have to jump through very bizarre hoops
> > to get it (in a slow and resource wasting way).
> > 
> > so the world is full of printf users that just expect
> > fixed C.UTF-8 locale and hope nobody calls setlocale.
> > 
> > telling ppl that their code is wrong does not help unless
> > you provide an alternative, but introducing new api for
> > this would not be portable.
> 
> I think that WG14 would be happy to hear any suggestions how we could
> get out of this trap, a proposal for C2x would even be better.

The obvious solution is a modifier character to printf/scanf format
strings that applies to numeric conversions and means "always
format/interpret this as if in the C locale". However this is hard to
test for at build time unless there's a macro declaring its
availability, so ideally WG14 would also adopt the sort of
fine-grained feature availability macros some of us have been
proposing for extensions.

An alternative/additional solution, which I actually might like
better, is having a function which sets a thread-local flag to treat
certain locale properties (at least the problematic LC_NUMERIC ones)
as if the current locale were "C". This is weaker than the uselocale
API from POSIX, but doesn't have the problems with the possibility of
failure (likely with no way to make forward progress) like it does,
and more importantly, would avoid *breaking* m17n/i18n functionality
by turning off other unrelated, non-problematic locale features.
Application or library code could then just set/restore this flag
around *printf/*scanf/strto*/etc calls, or could set it and leave it
if they never want to see ',' again.

Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: printf doesn't respect locale
  2019-09-11 11:44                 ` Rich Felker
@ 2019-09-11 12:53                   ` Jens Gustedt
  2019-09-11 13:47                     ` Rich Felker
  0 siblings, 1 reply; 21+ messages in thread
From: Jens Gustedt @ 2019-09-11 12:53 UTC (permalink / raw)
  Cc: musl

[-- Attachment #1: Type: text/plain, Size: 2385 bytes --]

Hello Rich,

On Wed, 11 Sep 2019 07:44:37 -0400 Rich Felker <dalias@libc.org> wrote:

> On Wed, Sep 11, 2019 at 12:07:22PM +0200, Jens Gustedt wrote:

> > I think that WG14 would be happy to hear any suggestions how we
> > could get out of this trap, a proposal for C2x would even be
> > better.  
> 
> The obvious solution is a modifier character to printf/scanf format
> strings that applies to numeric conversions and means "always
> format/interpret this as if in the C locale". However this is hard to
> test for at build time unless there's a macro declaring its
> availability, so ideally WG14 would also adopt the sort of
> fine-grained feature availability macros some of us have been
> proposing for extensions.

If such a proposal would be made, it would have to be based on a
reference implementation in the field. Would musl be willing to be
such a reference implementation?

In addition, I would think that it should not switch off all locale
feature but should leave the encoding properties such as UTF-8
functional.

> An alternative/additional solution, which I actually might like
> better, is having a function which sets a thread-local flag to treat
> certain locale properties (at least the problematic LC_NUMERIC ones)
> as if the current locale were "C". This is weaker than the uselocale
> API from POSIX, but doesn't have the problems with the possibility of
> failure (likely with no way to make forward progress) like it does,
> and more importantly, would avoid *breaking* m17n/i18n functionality
> by turning off other unrelated, non-problematic locale features.
> Application or library code could then just set/restore this flag
> around *printf/*scanf/strto*/etc calls, or could set it and leave it
> if they never want to see ',' again.

Interesting.

Would this be difficult to implement in musl? (I guess not)

Would you be willing to write this up?

Once we'd have that in musl (even before having it in C2x) it could be
easier for ourselves to convice us to have full locale support.

Thanks
Jens

-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: printf doesn't respect locale
  2019-09-11 12:53                   ` Jens Gustedt
@ 2019-09-11 13:47                     ` Rich Felker
  2019-09-11 15:15                       ` Jens Gustedt
  0 siblings, 1 reply; 21+ messages in thread
From: Rich Felker @ 2019-09-11 13:47 UTC (permalink / raw)
  To: musl

On Wed, Sep 11, 2019 at 02:53:36PM +0200, Jens Gustedt wrote:
> Hello Rich,
> 
> On Wed, 11 Sep 2019 07:44:37 -0400 Rich Felker <dalias@libc.org> wrote:
> 
> > On Wed, Sep 11, 2019 at 12:07:22PM +0200, Jens Gustedt wrote:
> 
> > > I think that WG14 would be happy to hear any suggestions how we
> > > could get out of this trap, a proposal for C2x would even be
> > > better.  
> > 
> > The obvious solution is a modifier character to printf/scanf format
> > strings that applies to numeric conversions and means "always
> > format/interpret this as if in the C locale". However this is hard to
> > test for at build time unless there's a macro declaring its
> > availability, so ideally WG14 would also adopt the sort of
> > fine-grained feature availability macros some of us have been
> > proposing for extensions.
> 
> If such a proposal would be made, it would have to be based on a
> reference implementation in the field. Would musl be willing to be
> such a reference implementation?

Possibly, contingent on some willingness of other parties to be on
board with it (even if not implementing it at first). I don't want
musl to be in the position of implementing something new that's not
standardized and likely to *conflict* with future standards, which
custom format flags could do.

> In addition, I would think that it should not switch off all locale
> feature but should leave the encoding properties such as UTF-8
> functional.

Absolutely, but encoding is not relevant to numeric fields. Everything
else is strictly specified, at least for formatting (printf). For
conversion (scanf) implementation-defined locale-specific forms are
also allowed, but this is probably not wanted when you're processing
data from a serialized form that's intended to be universal.

> > An alternative/additional solution, which I actually might like
> > better, is having a function which sets a thread-local flag to treat
> > certain locale properties (at least the problematic LC_NUMERIC ones)
> > as if the current locale were "C". This is weaker than the uselocale
> > API from POSIX, but doesn't have the problems with the possibility of
> > failure (likely with no way to make forward progress) like it does,
> > and more importantly, would avoid *breaking* m17n/i18n functionality
> > by turning off other unrelated, non-problematic locale features.
> > Application or library code could then just set/restore this flag
> > around *printf/*scanf/strto*/etc calls, or could set it and leave it
> > if they never want to see ',' again.
> 
> Interesting.
> 
> Would this be difficult to implement in musl? (I guess not)

I would think not, but I'd have to look at the details a little more.

One other advantage of this approach is that it has a more graceful
fallback. If an application needs portable LC_NUMERIC behavior, it can
check at build time for the presence of the new interface. If present,
LC_NUMERIC can be set to "" (user's preference) and the new interface
can be used to get the needed behavior. If absent, the application can
refrain from setting LC_NUMERIC, only setting the other categories and
leaving it as "C" (default).

Note that having it be thread-locally stateful is, in my opinion, much
better than having new variants of the affected functions or new
formats, since a caller using LC_NUMERIC can set/restore the state to
safely call library code that's completely unaware of the new
interfaces.

Of course there may be complications I haven't thought of. One that
comes to mind right away is what localeconv() should return under such
conditions.

> Would you be willing to write this up?

What form would it need to be in?

> Once we'd have that in musl (even before having it in C2x) it could be
> easier for ourselves to convice us to have full locale support.

By "full" you mean variable radix point? I'm not sure it makes a big
difference in that it won't help code that's not prepared for radix
point to vary. What it does help is making it so code that is being
careful to avoid the breakage can still use LC_NUMERIC when it wants
to, without depending on POSIX.

Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: printf doesn't respect locale
  2019-09-11 13:47                     ` Rich Felker
@ 2019-09-11 15:15                       ` Jens Gustedt
  2019-09-11 15:38                         ` Rich Felker
  0 siblings, 1 reply; 21+ messages in thread
From: Jens Gustedt @ 2019-09-11 15:15 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 3069 bytes --]

Hello Rich,

On Wed, 11 Sep 2019 09:47:27 -0400 Rich Felker <dalias@libc.org> wrote:

> > > An alternative/additional solution, which I actually might like
> > > better, is having a function which sets a thread-local flag to
> > > treat certain locale properties (at least the problematic
> > > LC_NUMERIC ones) as if the current locale were "C". This is
> > > weaker than the uselocale API from POSIX, but doesn't have the
> > > problems with the possibility of failure (likely with no way to
> > > make forward progress) like it does, and more importantly, would
> > > avoid *breaking* m17n/i18n functionality by turning off other
> > > unrelated, non-problematic locale features. Application or
> > > library code could then just set/restore this flag around
> > > *printf/*scanf/strto*/etc calls, or could set it and leave it if
> > > they never want to see ',' again.  
> > 
> > Interesting.
> > 
> > Would this be difficult to implement in musl? (I guess not)  
> 
> I would think not, but I'd have to look at the details a little more.
> 
> One other advantage of this approach is that it has a more graceful
> fallback. If an application needs portable LC_NUMERIC behavior, it can
> check at build time for the presence of the new interface. If present,
> LC_NUMERIC can be set to "" (user's preference) and the new interface
> can be used to get the needed behavior. If absent, the application can
> refrain from setting LC_NUMERIC, only setting the other categories and
> leaving it as "C" (default).
> 
> Note that having it be thread-locally stateful is, in my opinion, much
> better than having new variants of the affected functions or new
> formats, since a caller using LC_NUMERIC can set/restore the state to
> safely call library code that's completely unaware of the new
> interfaces.
> 
> Of course there may be complications I haven't thought of. One that
> comes to mind right away is what localeconv() should return under such
> conditions.

Ok, yes so this path sounds much more promissing than to concur with
all the different parties to find a free modification character, and
agree on the semantics.

> > Would you be willing to write this up?  
> 
> What form would it need to be in?

At the end this should be an N-document to submit to WG14, but that is
really at the end. Just one or two pages would be good to get perhaps
some discussion going, first, and also make it clear what it would
imply for and need from musl.

Do you think that a highlevel implementation using _Thread_local or
(tss calls) and setlocale would be doable, such that we could even
provide a reference implementation for all POSIX systems that also
implement some form of thread local variables?

Jens

-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: printf doesn't respect locale
  2019-09-11 15:15                       ` Jens Gustedt
@ 2019-09-11 15:38                         ` Rich Felker
  2019-09-11 18:08                           ` Jens Gustedt
  0 siblings, 1 reply; 21+ messages in thread
From: Rich Felker @ 2019-09-11 15:38 UTC (permalink / raw)
  To: musl

On Wed, Sep 11, 2019 at 05:15:45PM +0200, Jens Gustedt wrote:
> Hello Rich,
> 
> On Wed, 11 Sep 2019 09:47:27 -0400 Rich Felker <dalias@libc.org> wrote:
> 
> > > > An alternative/additional solution, which I actually might like
> > > > better, is having a function which sets a thread-local flag to
> > > > treat certain locale properties (at least the problematic
> > > > LC_NUMERIC ones) as if the current locale were "C". This is
> > > > weaker than the uselocale API from POSIX, but doesn't have the
> > > > problems with the possibility of failure (likely with no way to
> > > > make forward progress) like it does, and more importantly, would
> > > > avoid *breaking* m17n/i18n functionality by turning off other
> > > > unrelated, non-problematic locale features. Application or
> > > > library code could then just set/restore this flag around
> > > > *printf/*scanf/strto*/etc calls, or could set it and leave it if
> > > > they never want to see ',' again.  
> > > 
> > > Interesting.
> > > 
> > > Would this be difficult to implement in musl? (I guess not)  
> > 
> > I would think not, but I'd have to look at the details a little more.
> > 
> > One other advantage of this approach is that it has a more graceful
> > fallback. If an application needs portable LC_NUMERIC behavior, it can
> > check at build time for the presence of the new interface. If present,
> > LC_NUMERIC can be set to "" (user's preference) and the new interface
> > can be used to get the needed behavior. If absent, the application can
> > refrain from setting LC_NUMERIC, only setting the other categories and
> > leaving it as "C" (default).
> > 
> > Note that having it be thread-locally stateful is, in my opinion, much
> > better than having new variants of the affected functions or new
> > formats, since a caller using LC_NUMERIC can set/restore the state to
> > safely call library code that's completely unaware of the new
> > interfaces.
> > 
> > Of course there may be complications I haven't thought of. One that
> > comes to mind right away is what localeconv() should return under such
> > conditions.
> 
> Ok, yes so this path sounds much more promissing than to concur with
> all the different parties to find a free modification character, and
> agree on the semantics.
> 
> > > Would you be willing to write this up?  
> > 
> > What form would it need to be in?
> 
> At the end this should be an N-document to submit to WG14, but that is
> really at the end. Just one or two pages would be good to get perhaps
> some discussion going, first, and also make it clear what it would
> imply for and need from musl.
> 
> Do you think that a highlevel implementation using _Thread_local or
> (tss calls) and setlocale would be doable, such that we could even
> provide a reference implementation for all POSIX systems that also
> implement some form of thread local variables?

It can't be done in terms of setlocale because setlocale is not
thread-safe or thread-local. It could be done in terms of POSIX
uselocale, but such an implementation would not be fail-safe -- it
needs to be able to allocate a locale_t object via duplocale, since
the uselocale API works with a locale_t objects that describe the
value of *all* locale categories, rather than the categories being
individually settable on a per-thread basis (this is a design flaw in
the POSIX interfaces, and the historic xlocale ones they were based
on, IMO).

So such an implementation could be a pseudo-code/demo of the
functionality, but I think I'd want the proposed functionality to be
always-succeeds to discourage erroneous code that ignores the result
(resulting in wrong formatting/parsing, which is unsafe) or aborts the
program (eew).

Rich


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: printf doesn't respect locale
  2019-09-11 15:38                         ` Rich Felker
@ 2019-09-11 18:08                           ` Jens Gustedt
  0 siblings, 0 replies; 21+ messages in thread
From: Jens Gustedt @ 2019-09-11 18:08 UTC (permalink / raw)
  Cc: musl

[-- Attachment #1: Type: text/plain, Size: 1762 bytes --]

On Wed, 11 Sep 2019 11:38:53 -0400 Rich Felker <dalias@libc.org> wrote:

> On Wed, Sep 11, 2019 at 05:15:45PM +0200, Jens Gustedt wrote:
> > Do you think that a highlevel implementation using _Thread_local or
> > (tss calls) and setlocale would be doable, such that we could even
> > provide a reference implementation for all POSIX systems that also
> > implement some form of thread local variables?  
> 
> It can't be done in terms of setlocale because setlocale is not
> thread-safe or thread-local. It could be done in terms of POSIX
> uselocale, but such an implementation would not be fail-safe -- it
> needs to be able to allocate a locale_t object via duplocale, since
> the uselocale API works with a locale_t objects that describe the
> value of *all* locale categories, rather than the categories being
> individually settable on a per-thread basis (this is a design flaw in
> the POSIX interfaces, and the historic xlocale ones they were based
> on, IMO).

Ok, yes this sounds too complicated.

> So such an implementation could be a pseudo-code/demo of the
> functionality, but I think I'd want the proposed functionality to be
> always-succeeds to discourage erroneous code that ignores the result
> (resulting in wrong formatting/parsing, which is unsafe) or aborts the
> program (eew).

Yes, "can't fail" is an important property for such a function. This
should be part of the normative requirement, then.

Jens

-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2019-09-11 18:08 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-09 16:31 printf doesn't respect locale Daniel Schoepe
2019-09-09 16:39 ` Daniel Schoepe
2019-09-09 16:51 ` Szabolcs Nagy
2019-09-09 17:55   ` Rich Felker
2019-09-09 17:54 ` Rich Felker
2019-09-10 16:00   ` Daniel Schoepe
2019-09-10 16:31     ` Szabolcs Nagy
2019-09-10 16:44       ` Tim Tassonis
2019-09-10 17:30         ` Rich Felker
2019-09-10 17:10       ` Daniel Schoepe
2019-09-10 17:33         ` Rich Felker
2019-09-10 18:43         ` Szabolcs Nagy
2019-09-10 21:55           ` A. Wilcox
2019-09-11 10:01             ` Szabolcs Nagy
2019-09-11 10:07               ` Jens Gustedt
2019-09-11 11:44                 ` Rich Felker
2019-09-11 12:53                   ` Jens Gustedt
2019-09-11 13:47                     ` Rich Felker
2019-09-11 15:15                       ` Jens Gustedt
2019-09-11 15:38                         ` Rich Felker
2019-09-11 18:08                           ` Jens Gustedt

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).