* printf doesn't respect locale @ 2019-09-09 16:31 Daniel Schoepe 2019-09-09 16:39 ` Daniel Schoepe ` (2 more replies) 0 siblings, 3 replies; 21+ messages in thread From: Daniel Schoepe @ 2019-09-09 16:31 UTC (permalink / raw) To: musl Hi, I think I found a discrepancy between musl's behavior and the POSIX standard: According to the POSIX standard, the decimal separator used when using printf to print floating point numbers should come from the locale (https://pubs.opengroup.org/onlinepubs/9699919799/functions/fprintf.html): "The radix character is defined in the current locale (category LC_NUMERIC). In the POSIX locale, or in a locale where the radix character is not defined, the radix character shall default to a <period> ( '.' )." However, it seems that in musl, a period is always used for printing floating point numbers. For example, the following program prints "12.0" instead of "12,0" (which is printed when using GNU libc): #include <stdio.h> #include <locale.h> int main(int argc, char **argv) { setlocale(LC_ALL, "DE_de"); printf("%f\n", 12.0f); } This was tested using the latest git checkout of musl (a882841baf42e6a8b74cc33a239b84a9a79493db), compiled on Ubuntu 18.04 using the musl-gcc script. It looks like the usage of "." as a separator is hardcoded in `fmt_fp`, for instance here: https://git.musl-libc.org/cgit/musl/tree/src/stdio/vfprintf.c#n392 Best regards, Daniel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: printf doesn't respect locale 2019-09-09 16:31 printf doesn't respect locale Daniel Schoepe @ 2019-09-09 16:39 ` Daniel Schoepe 2019-09-09 16:51 ` Szabolcs Nagy 2019-09-09 17:54 ` Rich Felker 2 siblings, 0 replies; 21+ messages in thread From: Daniel Schoepe @ 2019-09-09 16:39 UTC (permalink / raw) To: musl Small correction: The example works as the standard suggests on OSX, but exhibits the same behavior as with musl with GNU libc as well. On Mon, Sep 9, 2019 at 5:31 PM Daniel Schoepe <daniel@schoepe.org> wrote: > > Hi, > > I think I found a discrepancy between musl's behavior and the POSIX standard: > > According to the POSIX standard, the decimal separator used when using > printf to print floating point numbers should come from the locale > (https://pubs.opengroup.org/onlinepubs/9699919799/functions/fprintf.html): > > "The radix character is defined in the current locale (category > LC_NUMERIC). In the POSIX locale, or in a locale where the radix > character is not defined, the radix character shall default to a > <period> ( '.' )." > > However, it seems that in musl, a period is always used for printing > floating point numbers. For example, the following program prints > "12.0" instead of "12,0" (which is printed when using GNU libc): > > #include <stdio.h> > #include <locale.h> > > int main(int argc, char **argv) { > setlocale(LC_ALL, "DE_de"); > printf("%f\n", 12.0f); > } > > This was tested using the latest git checkout of musl > (a882841baf42e6a8b74cc33a239b84a9a79493db), compiled on Ubuntu 18.04 > using the musl-gcc script. It looks like the usage of "." as a > separator is hardcoded in `fmt_fp`, for instance here: > https://git.musl-libc.org/cgit/musl/tree/src/stdio/vfprintf.c#n392 > > Best regards, > Daniel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: printf doesn't respect locale 2019-09-09 16:31 printf doesn't respect locale Daniel Schoepe 2019-09-09 16:39 ` Daniel Schoepe @ 2019-09-09 16:51 ` Szabolcs Nagy 2019-09-09 17:55 ` Rich Felker 2019-09-09 17:54 ` Rich Felker 2 siblings, 1 reply; 21+ messages in thread From: Szabolcs Nagy @ 2019-09-09 16:51 UTC (permalink / raw) To: musl * Daniel Schoepe <daniel@schoepe.org> [2019-09-09 17:31:01 +0100]: > I think I found a discrepancy between musl's behavior and the POSIX standard: > > According to the POSIX standard, the decimal separator used when using > printf to print floating point numbers should come from the locale > (https://pubs.opengroup.org/onlinepubs/9699919799/functions/fprintf.html): > > "The radix character is defined in the current locale (category > LC_NUMERIC). In the POSIX locale, or in a locale where the radix > character is not defined, the radix character shall default to a > <period> ( '.' )." > > However, it seems that in musl, a period is always used for printing > floating point numbers. For example, the following program prints > "12.0" instead of "12,0" (which is printed when using GNU libc): musl is posix conform. it just only supports LC_NUMERIC locales where the radix character is a period. if you see a musl based system where LC_NUMERIC is defined otherwise then report the issue to the integrator or distributor of that system. > > #include <stdio.h> > #include <locale.h> > > int main(int argc, char **argv) { > setlocale(LC_ALL, "DE_de"); > printf("%f\n", 12.0f); > } the musl DE_de locale must use . as radix, so the output is expected. > > This was tested using the latest git checkout of musl > (a882841baf42e6a8b74cc33a239b84a9a79493db), compiled on Ubuntu 18.04 > using the musl-gcc script. It looks like the usage of "." as a > separator is hardcoded in `fmt_fp`, for instance here: > https://git.musl-libc.org/cgit/musl/tree/src/stdio/vfprintf.c#n392 > > Best regards, > Daniel ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: printf doesn't respect locale 2019-09-09 16:51 ` Szabolcs Nagy @ 2019-09-09 17:55 ` Rich Felker 0 siblings, 0 replies; 21+ messages in thread From: Rich Felker @ 2019-09-09 17:55 UTC (permalink / raw) To: musl On Mon, Sep 09, 2019 at 06:51:00PM +0200, Szabolcs Nagy wrote: > * Daniel Schoepe <daniel@schoepe.org> [2019-09-09 17:31:01 +0100]: > > I think I found a discrepancy between musl's behavior and the POSIX standard: > > > > According to the POSIX standard, the decimal separator used when using > > printf to print floating point numbers should come from the locale > > (https://pubs.opengroup.org/onlinepubs/9699919799/functions/fprintf.html): > > > > "The radix character is defined in the current locale (category > > LC_NUMERIC). In the POSIX locale, or in a locale where the radix > > character is not defined, the radix character shall default to a > > <period> ( '.' )." > > > > However, it seems that in musl, a period is always used for printing > > floating point numbers. For example, the following program prints > > "12.0" instead of "12,0" (which is printed when using GNU libc): > > musl is posix conform. > > it just only supports LC_NUMERIC locales where the radix > character is a period. > > if you see a musl based system where LC_NUMERIC is defined > otherwise then report the issue to the integrator or > distributor of that system. I don't understand what that would mean. musl's locale definition system simply has no way to represent different radix point characters, so there cant' be such an integration/distribution. Rich ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: printf doesn't respect locale 2019-09-09 16:31 printf doesn't respect locale Daniel Schoepe 2019-09-09 16:39 ` Daniel Schoepe 2019-09-09 16:51 ` Szabolcs Nagy @ 2019-09-09 17:54 ` Rich Felker 2019-09-10 16:00 ` Daniel Schoepe 2 siblings, 1 reply; 21+ messages in thread From: Rich Felker @ 2019-09-09 17:54 UTC (permalink / raw) To: musl On Mon, Sep 09, 2019 at 05:31:01PM +0100, Daniel Schoepe wrote: > Hi, > > I think I found a discrepancy between musl's behavior and the POSIX standard: > > According to the POSIX standard, the decimal separator used when using > printf to print floating point numbers should come from the locale > (https://pubs.opengroup.org/onlinepubs/9699919799/functions/fprintf.html): > > "The radix character is defined in the current locale (category > LC_NUMERIC). In the POSIX locale, or in a locale where the radix > character is not defined, the radix character shall default to a > <period> ( '.' )." > > However, it seems that in musl, a period is always used for printing > floating point numbers. For example, the following program prints > "12.0" instead of "12,0" (which is printed when using GNU libc): It's not a discrepancy; the set of locales supported by an implementation, unless it includes the POSIX localedef utility/option, is implementation-defined. musl's definition does not include locales where the radix point is not '.' I really really really don't like the feature of changing the radix point, and this implementation choice was intentional, but it's come up several times with people being upset that it's not in line with musl's mission of being multilingual-friendly. I think it deserves some consideration again along with upcoming locale improvements. There's at least one past thread with design sketches on how it would need to be done (and what needs to be done anyway for LC_MONETARY stuff), and sadly it got no feedback from people interested in improved locale functionality which is why I've kinda let it be for the time being... Rich ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: printf doesn't respect locale 2019-09-09 17:54 ` Rich Felker @ 2019-09-10 16:00 ` Daniel Schoepe 2019-09-10 16:31 ` Szabolcs Nagy 0 siblings, 1 reply; 21+ messages in thread From: Daniel Schoepe @ 2019-09-10 16:00 UTC (permalink / raw) To: musl On Mon, Sep 9, 2019 at 6:55 PM Rich Felker <dalias@libc.org> wrote: > It's not a discrepancy; the set of locales supported by an > implementation, unless it includes the POSIX localedef utility/option, > is implementation-defined. musl's definition does not include locales > where the radix point is not '.' Thanks, that makes sense. However, it may make sense to document this assumption in the FAQ entries related to printf. > I really really really don't like the feature of changing the radix > point, and this implementation choice was intentional, but it's come > up several times with people being upset that it's not in line with > musl's mission of being multilingual-friendly. I think it deserves > some consideration again along with upcoming locale improvements. > There's at least one past thread with design sketches on how it would > need to be done (and what needs to be done anyway for LC_MONETARY > stuff), and sadly it got no feedback from people interested in > improved locale functionality which is why I've kinda let it be for > the time being... I'm also not a fan of this behavior, I actually stumbled across this when tracking down a bug the different radix usage caused. Best, Daniel > > Rich ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: printf doesn't respect locale 2019-09-10 16:00 ` Daniel Schoepe @ 2019-09-10 16:31 ` Szabolcs Nagy 2019-09-10 16:44 ` Tim Tassonis 2019-09-10 17:10 ` Daniel Schoepe 0 siblings, 2 replies; 21+ messages in thread From: Szabolcs Nagy @ 2019-09-10 16:31 UTC (permalink / raw) To: musl * Daniel Schoepe <daniel@schoepe.org> [2019-09-10 17:00:49 +0100]: > I'm also not a fan of this behavior, I actually stumbled across this > when tracking > down a bug the different radix usage caused. i'm interested in how this can cause a bug in correct software. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: printf doesn't respect locale 2019-09-10 16:31 ` Szabolcs Nagy @ 2019-09-10 16:44 ` Tim Tassonis 2019-09-10 17:30 ` Rich Felker 2019-09-10 17:10 ` Daniel Schoepe 1 sibling, 1 reply; 21+ messages in thread From: Tim Tassonis @ 2019-09-10 16:44 UTC (permalink / raw) To: musl On 9/10/19 6:31 PM, Szabolcs Nagy wrote: > * Daniel Schoepe <daniel@schoepe.org> [2019-09-10 17:00:49 +0100]: >> I'm also not a fan of this behavior, I actually stumbled across this >> when tracking >> down a bug the different radix usage caused. > > i'm interested in how this can cause a bug in correct software. Depends on your definition of "correct software". I'd say correct software has no bugs at all... Anyway, I can think of cases where the usually correct assumption is made that the floating point delimiter is one byte, while some locales maybe need two bytes. This could then of course lead to memory corruption when using sprintf with a too small buffer. Bye Tim ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: printf doesn't respect locale 2019-09-10 16:44 ` Tim Tassonis @ 2019-09-10 17:30 ` Rich Felker 0 siblings, 0 replies; 21+ messages in thread From: Rich Felker @ 2019-09-10 17:30 UTC (permalink / raw) To: musl On Tue, Sep 10, 2019 at 06:44:24PM +0200, Tim Tassonis wrote: > On 9/10/19 6:31 PM, Szabolcs Nagy wrote: > >* Daniel Schoepe <daniel@schoepe.org> [2019-09-10 17:00:49 +0100]: > >>I'm also not a fan of this behavior, I actually stumbled across this > >>when tracking > >>down a bug the different radix usage caused. > > > >i'm interested in how this can cause a bug in correct software. > > Depends on your definition of "correct software". I'd say correct > software has no bugs at all... > > Anyway, I can think of cases where the usually correct assumption is > made that the floating point delimiter is one byte, while some > locales maybe need two bytes. This could then of course lead to > memory corruption when using sprintf with a too small buffer. FWIW, if musl does adopt support for locale-variant radix point, it will be a one-bit property switching between '.' and ',' The issue with wrong space reservations for multibyte radix points you raise is definitely one of the motivations. There are also attacks on glibc and other localedef-based implementations where you make a custom locale where the radix point is something else, like a digit or letter, to cause data to be misinterpreted in dangerous ways. Normally attackers don't have control to do this, but it can happen with things like ssh propagating locale environment variables to a git-only remote account or similar. Since there are only two values of the radix point character with any cultural significance, support for anything else is just YAGNI generality for its own sake, at the expense of safety. Rich ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: printf doesn't respect locale 2019-09-10 16:31 ` Szabolcs Nagy 2019-09-10 16:44 ` Tim Tassonis @ 2019-09-10 17:10 ` Daniel Schoepe 2019-09-10 17:33 ` Rich Felker 2019-09-10 18:43 ` Szabolcs Nagy 1 sibling, 2 replies; 21+ messages in thread From: Daniel Schoepe @ 2019-09-10 17:10 UTC (permalink / raw) To: Szabolcs Nagy; +Cc: musl Basically, someone used printf to produce json output and was unaware that the radix used by printf was locale-dependent. When this was run on a system with a non-English locale, it no longer produced valid JSON as output. Best, Daniel On Tue, Sep 10, 2019 at 5:31 PM Szabolcs Nagy <nsz@port70.net> wrote: > > * Daniel Schoepe <daniel@schoepe.org> [2019-09-10 17:00:49 +0100]: > > I'm also not a fan of this behavior, I actually stumbled across this > > when tracking > > down a bug the different radix usage caused. > > i'm interested in how this can cause a bug in correct software. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: printf doesn't respect locale 2019-09-10 17:10 ` Daniel Schoepe @ 2019-09-10 17:33 ` Rich Felker 2019-09-10 18:43 ` Szabolcs Nagy 1 sibling, 0 replies; 21+ messages in thread From: Rich Felker @ 2019-09-10 17:33 UTC (permalink / raw) To: musl On Tue, Sep 10, 2019 at 06:10:20PM +0100, Daniel Schoepe wrote: > Basically, someone used printf to produce json output and was unaware > that the radix used by printf was locale-dependent. When this was run > on a system with a non-English locale, it no longer produced valid > JSON as output. Yes, like you say it's not really a bug in correct software so much as a pitfall programmers are unaware of, that's hard to program around. But it can actually be a bug in correct *application* software due to incorrect library software. Various library software (I think glib or gtk, IIRC, among many others) calls setlocale(LC_ALL,"") behind the application's back, rather than trusting that the application set the locale the way it wants (incidentially, this is not thread-safe or library-safe and makes these libraries unsafe to use via dlopen or anywhere but at the top of main!). If the application only intends to set other categories, but leave LC_NUMERIC as "C", then it should rightfully expect a '.' radix point, but this expectation will be violated if certain third-party libraries are involved. Rich > On Tue, Sep 10, 2019 at 5:31 PM Szabolcs Nagy <nsz@port70.net> wrote: > > > > * Daniel Schoepe <daniel@schoepe.org> [2019-09-10 17:00:49 +0100]: > > > I'm also not a fan of this behavior, I actually stumbled across this > > > when tracking > > > down a bug the different radix usage caused. > > > > i'm interested in how this can cause a bug in correct software. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: printf doesn't respect locale 2019-09-10 17:10 ` Daniel Schoepe 2019-09-10 17:33 ` Rich Felker @ 2019-09-10 18:43 ` Szabolcs Nagy 2019-09-10 21:55 ` A. Wilcox 1 sibling, 1 reply; 21+ messages in thread From: Szabolcs Nagy @ 2019-09-10 18:43 UTC (permalink / raw) To: Daniel Schoepe; +Cc: musl * Daniel Schoepe <daniel@schoepe.org> [2019-09-10 18:10:20 +0100]: > Basically, someone used printf to produce json output and was unaware > that the radix used by printf was locale-dependent. When this was run > on a system with a non-English locale, it no longer produced valid > JSON as output. ok, i thought using '.' unconditionally caused some problem. i've seen plenty issues with locale dependent radix point when numbers unexpectedly have ',', but the current musl behaviour exactly prevents those types of bugs and i'd prefer to keep it that way. simple scripts parsing some program output will not be tested across different locales. global state dependence is bad in general in systems software which often communicates between machines, not humans, and you cant afford to synchronize that global state or deal with its combinatorics. in particular libraries can't use any api with global state dependence if that state may change asynchronously, thread-local state is a bit better (and since posix2008 locales can be thread-local), but it still has issues e.g. dprintf is implemented to be async-signal-safe, but in a signal handler you can't change the locale setting to get reliable dprintf behaviour and it's inefficient/inconvenient to save/restore tls state around every printf call anyway. i think libc should mainly aim for reliability of systems software and not for friendliness of ui applications. > > Best, > Daniel > > On Tue, Sep 10, 2019 at 5:31 PM Szabolcs Nagy <nsz@port70.net> wrote: > > > > * Daniel Schoepe <daniel@schoepe.org> [2019-09-10 17:00:49 +0100]: > > > I'm also not a fan of this behavior, I actually stumbled across this > > > when tracking > > > down a bug the different radix usage caused. > > > > i'm interested in how this can cause a bug in correct software. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: printf doesn't respect locale 2019-09-10 18:43 ` Szabolcs Nagy @ 2019-09-10 21:55 ` A. Wilcox 2019-09-11 10:01 ` Szabolcs Nagy 0 siblings, 1 reply; 21+ messages in thread From: A. Wilcox @ 2019-09-10 21:55 UTC (permalink / raw) To: musl [-- Attachment #1.1: Type: text/plain, Size: 1108 bytes --] On 10/09/2019 13:43, Szabolcs Nagy wrote: > i think libc should mainly aim for reliability of systems > software and not for friendliness of ui applications. While reliability is important, I disagree that reliability should *exclude* UI. musl already causes crashes or other unexpected behaviour when system software isn't written correctly. That's a feature, not a bug. If musl supporting ',' as radix point causes bad software to crash, then that software needs to be fixed. We at Adélie field requests nearly every day wondering why our system doesn't support other locales for things like sort, LC_NUMERIC, LC_MONETARY, etc etc. The only reason I haven't been more active in developing musl's locale support is because I'm too busy doing other important work. We would be *extremely* disappointed if LC_NUMERIC would never be supported in upstream musl. We would have to maintain a patch to add LC_NUMERIC support when the rest of musl's locale support is developed. Best, --arw -- A. Wilcox (awilfox) Project Lead, Adélie Linux https://www.adelielinux.org [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: printf doesn't respect locale 2019-09-10 21:55 ` A. Wilcox @ 2019-09-11 10:01 ` Szabolcs Nagy 2019-09-11 10:07 ` Jens Gustedt 0 siblings, 1 reply; 21+ messages in thread From: Szabolcs Nagy @ 2019-09-11 10:01 UTC (permalink / raw) To: musl * A. Wilcox <awilfox@adelielinux.org> [2019-09-10 16:55:52 -0500]: > On 10/09/2019 13:43, Szabolcs Nagy wrote: > > i think libc should mainly aim for reliability of systems > > software and not for friendliness of ui applications. > > > While reliability is important, I disagree that reliability should > *exclude* UI. > > musl already causes crashes or other unexpected behaviour when system > software isn't written correctly. That's a feature, not a bug. If musl > supporting ',' as radix point causes bad software to crash, then that > software needs to be fixed. > > We at Adélie field requests nearly every day wondering why our system > doesn't support other locales for things like sort, LC_NUMERIC, > LC_MONETARY, etc etc. The only reason I haven't been more active in > developing musl's locale support is because I'm too busy doing other > important work. > > We would be *extremely* disappointed if LC_NUMERIC would never be > supported in upstream musl. We would have to maintain a patch to add > LC_NUMERIC support when the rest of musl's locale support is developed. i consider this a posix/iso c bug. there is a need for printf with fixed C.UTF-8 locale in library code that implements a file format, language or protocol that cannot be locale dependent. in iso c there is no way to get this. in posix 2008 you have to jump through very bizarre hoops to get it (in a slow and resource wasting way). so the world is full of printf users that just expect fixed C.UTF-8 locale and hope nobody calls setlocale. telling ppl that their code is wrong does not help unless you provide an alternative, but introducing new api for this would not be portable. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: printf doesn't respect locale 2019-09-11 10:01 ` Szabolcs Nagy @ 2019-09-11 10:07 ` Jens Gustedt 2019-09-11 11:44 ` Rich Felker 0 siblings, 1 reply; 21+ messages in thread From: Jens Gustedt @ 2019-09-11 10:07 UTC (permalink / raw) Cc: musl [-- Attachment #1: Type: text/plain, Size: 1434 bytes --] Hello Szabolcs, On Wed, 11 Sep 2019 12:01:59 +0200 Szabolcs Nagy <nsz@port70.net> wrote: > > We would be *extremely* disappointed if LC_NUMERIC would never be > > supported in upstream musl. We would have to maintain a patch to > > add LC_NUMERIC support when the rest of musl's locale support is > > developed. > > i consider this a posix/iso c bug. I agree > there is a need for printf with fixed C.UTF-8 locale in > library code that implements a file format, language or > protocol that cannot be locale dependent. > > in iso c there is no way to get this. > > in posix 2008 you have to jump through very bizarre hoops > to get it (in a slow and resource wasting way). > > so the world is full of printf users that just expect > fixed C.UTF-8 locale and hope nobody calls setlocale. > > telling ppl that their code is wrong does not help unless > you provide an alternative, but introducing new api for > this would not be portable. I think that WG14 would be happy to hear any suggestions how we could get out of this trap, a proposal for C2x would even be better. Thanks Jens -- :: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS ::: :: ::::::::::::::: office Strasbourg : +33 368854536 :: :: :::::::::::::::::::::: gsm France : +33 651400183 :: :: ::::::::::::::: gsm international : +49 15737185122 :: :: http://icube-icps.unistra.fr/index.php/Jens_Gustedt :: [-- Attachment #2: Digitale Signatur von OpenPGP --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: printf doesn't respect locale 2019-09-11 10:07 ` Jens Gustedt @ 2019-09-11 11:44 ` Rich Felker 2019-09-11 12:53 ` Jens Gustedt 0 siblings, 1 reply; 21+ messages in thread From: Rich Felker @ 2019-09-11 11:44 UTC (permalink / raw) To: musl On Wed, Sep 11, 2019 at 12:07:22PM +0200, Jens Gustedt wrote: > Hello Szabolcs, > > On Wed, 11 Sep 2019 12:01:59 +0200 Szabolcs Nagy <nsz@port70.net> wrote: > > > > We would be *extremely* disappointed if LC_NUMERIC would never be > > > supported in upstream musl. We would have to maintain a patch to > > > add LC_NUMERIC support when the rest of musl's locale support is > > > developed. > > > > i consider this a posix/iso c bug. > > I agree > > > there is a need for printf with fixed C.UTF-8 locale in > > library code that implements a file format, language or > > protocol that cannot be locale dependent. > > > > in iso c there is no way to get this. > > > > in posix 2008 you have to jump through very bizarre hoops > > to get it (in a slow and resource wasting way). > > > > so the world is full of printf users that just expect > > fixed C.UTF-8 locale and hope nobody calls setlocale. > > > > telling ppl that their code is wrong does not help unless > > you provide an alternative, but introducing new api for > > this would not be portable. > > I think that WG14 would be happy to hear any suggestions how we could > get out of this trap, a proposal for C2x would even be better. The obvious solution is a modifier character to printf/scanf format strings that applies to numeric conversions and means "always format/interpret this as if in the C locale". However this is hard to test for at build time unless there's a macro declaring its availability, so ideally WG14 would also adopt the sort of fine-grained feature availability macros some of us have been proposing for extensions. An alternative/additional solution, which I actually might like better, is having a function which sets a thread-local flag to treat certain locale properties (at least the problematic LC_NUMERIC ones) as if the current locale were "C". This is weaker than the uselocale API from POSIX, but doesn't have the problems with the possibility of failure (likely with no way to make forward progress) like it does, and more importantly, would avoid *breaking* m17n/i18n functionality by turning off other unrelated, non-problematic locale features. Application or library code could then just set/restore this flag around *printf/*scanf/strto*/etc calls, or could set it and leave it if they never want to see ',' again. Rich ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: printf doesn't respect locale 2019-09-11 11:44 ` Rich Felker @ 2019-09-11 12:53 ` Jens Gustedt 2019-09-11 13:47 ` Rich Felker 0 siblings, 1 reply; 21+ messages in thread From: Jens Gustedt @ 2019-09-11 12:53 UTC (permalink / raw) Cc: musl [-- Attachment #1: Type: text/plain, Size: 2385 bytes --] Hello Rich, On Wed, 11 Sep 2019 07:44:37 -0400 Rich Felker <dalias@libc.org> wrote: > On Wed, Sep 11, 2019 at 12:07:22PM +0200, Jens Gustedt wrote: > > I think that WG14 would be happy to hear any suggestions how we > > could get out of this trap, a proposal for C2x would even be > > better. > > The obvious solution is a modifier character to printf/scanf format > strings that applies to numeric conversions and means "always > format/interpret this as if in the C locale". However this is hard to > test for at build time unless there's a macro declaring its > availability, so ideally WG14 would also adopt the sort of > fine-grained feature availability macros some of us have been > proposing for extensions. If such a proposal would be made, it would have to be based on a reference implementation in the field. Would musl be willing to be such a reference implementation? In addition, I would think that it should not switch off all locale feature but should leave the encoding properties such as UTF-8 functional. > An alternative/additional solution, which I actually might like > better, is having a function which sets a thread-local flag to treat > certain locale properties (at least the problematic LC_NUMERIC ones) > as if the current locale were "C". This is weaker than the uselocale > API from POSIX, but doesn't have the problems with the possibility of > failure (likely with no way to make forward progress) like it does, > and more importantly, would avoid *breaking* m17n/i18n functionality > by turning off other unrelated, non-problematic locale features. > Application or library code could then just set/restore this flag > around *printf/*scanf/strto*/etc calls, or could set it and leave it > if they never want to see ',' again. Interesting. Would this be difficult to implement in musl? (I guess not) Would you be willing to write this up? Once we'd have that in musl (even before having it in C2x) it could be easier for ourselves to convice us to have full locale support. Thanks Jens -- :: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS ::: :: ::::::::::::::: office Strasbourg : +33 368854536 :: :: :::::::::::::::::::::: gsm France : +33 651400183 :: :: ::::::::::::::: gsm international : +49 15737185122 :: :: http://icube-icps.unistra.fr/index.php/Jens_Gustedt :: [-- Attachment #2: Digitale Signatur von OpenPGP --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: printf doesn't respect locale 2019-09-11 12:53 ` Jens Gustedt @ 2019-09-11 13:47 ` Rich Felker 2019-09-11 15:15 ` Jens Gustedt 0 siblings, 1 reply; 21+ messages in thread From: Rich Felker @ 2019-09-11 13:47 UTC (permalink / raw) To: musl On Wed, Sep 11, 2019 at 02:53:36PM +0200, Jens Gustedt wrote: > Hello Rich, > > On Wed, 11 Sep 2019 07:44:37 -0400 Rich Felker <dalias@libc.org> wrote: > > > On Wed, Sep 11, 2019 at 12:07:22PM +0200, Jens Gustedt wrote: > > > > I think that WG14 would be happy to hear any suggestions how we > > > could get out of this trap, a proposal for C2x would even be > > > better. > > > > The obvious solution is a modifier character to printf/scanf format > > strings that applies to numeric conversions and means "always > > format/interpret this as if in the C locale". However this is hard to > > test for at build time unless there's a macro declaring its > > availability, so ideally WG14 would also adopt the sort of > > fine-grained feature availability macros some of us have been > > proposing for extensions. > > If such a proposal would be made, it would have to be based on a > reference implementation in the field. Would musl be willing to be > such a reference implementation? Possibly, contingent on some willingness of other parties to be on board with it (even if not implementing it at first). I don't want musl to be in the position of implementing something new that's not standardized and likely to *conflict* with future standards, which custom format flags could do. > In addition, I would think that it should not switch off all locale > feature but should leave the encoding properties such as UTF-8 > functional. Absolutely, but encoding is not relevant to numeric fields. Everything else is strictly specified, at least for formatting (printf). For conversion (scanf) implementation-defined locale-specific forms are also allowed, but this is probably not wanted when you're processing data from a serialized form that's intended to be universal. > > An alternative/additional solution, which I actually might like > > better, is having a function which sets a thread-local flag to treat > > certain locale properties (at least the problematic LC_NUMERIC ones) > > as if the current locale were "C". This is weaker than the uselocale > > API from POSIX, but doesn't have the problems with the possibility of > > failure (likely with no way to make forward progress) like it does, > > and more importantly, would avoid *breaking* m17n/i18n functionality > > by turning off other unrelated, non-problematic locale features. > > Application or library code could then just set/restore this flag > > around *printf/*scanf/strto*/etc calls, or could set it and leave it > > if they never want to see ',' again. > > Interesting. > > Would this be difficult to implement in musl? (I guess not) I would think not, but I'd have to look at the details a little more. One other advantage of this approach is that it has a more graceful fallback. If an application needs portable LC_NUMERIC behavior, it can check at build time for the presence of the new interface. If present, LC_NUMERIC can be set to "" (user's preference) and the new interface can be used to get the needed behavior. If absent, the application can refrain from setting LC_NUMERIC, only setting the other categories and leaving it as "C" (default). Note that having it be thread-locally stateful is, in my opinion, much better than having new variants of the affected functions or new formats, since a caller using LC_NUMERIC can set/restore the state to safely call library code that's completely unaware of the new interfaces. Of course there may be complications I haven't thought of. One that comes to mind right away is what localeconv() should return under such conditions. > Would you be willing to write this up? What form would it need to be in? > Once we'd have that in musl (even before having it in C2x) it could be > easier for ourselves to convice us to have full locale support. By "full" you mean variable radix point? I'm not sure it makes a big difference in that it won't help code that's not prepared for radix point to vary. What it does help is making it so code that is being careful to avoid the breakage can still use LC_NUMERIC when it wants to, without depending on POSIX. Rich ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: printf doesn't respect locale 2019-09-11 13:47 ` Rich Felker @ 2019-09-11 15:15 ` Jens Gustedt 2019-09-11 15:38 ` Rich Felker 0 siblings, 1 reply; 21+ messages in thread From: Jens Gustedt @ 2019-09-11 15:15 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 3069 bytes --] Hello Rich, On Wed, 11 Sep 2019 09:47:27 -0400 Rich Felker <dalias@libc.org> wrote: > > > An alternative/additional solution, which I actually might like > > > better, is having a function which sets a thread-local flag to > > > treat certain locale properties (at least the problematic > > > LC_NUMERIC ones) as if the current locale were "C". This is > > > weaker than the uselocale API from POSIX, but doesn't have the > > > problems with the possibility of failure (likely with no way to > > > make forward progress) like it does, and more importantly, would > > > avoid *breaking* m17n/i18n functionality by turning off other > > > unrelated, non-problematic locale features. Application or > > > library code could then just set/restore this flag around > > > *printf/*scanf/strto*/etc calls, or could set it and leave it if > > > they never want to see ',' again. > > > > Interesting. > > > > Would this be difficult to implement in musl? (I guess not) > > I would think not, but I'd have to look at the details a little more. > > One other advantage of this approach is that it has a more graceful > fallback. If an application needs portable LC_NUMERIC behavior, it can > check at build time for the presence of the new interface. If present, > LC_NUMERIC can be set to "" (user's preference) and the new interface > can be used to get the needed behavior. If absent, the application can > refrain from setting LC_NUMERIC, only setting the other categories and > leaving it as "C" (default). > > Note that having it be thread-locally stateful is, in my opinion, much > better than having new variants of the affected functions or new > formats, since a caller using LC_NUMERIC can set/restore the state to > safely call library code that's completely unaware of the new > interfaces. > > Of course there may be complications I haven't thought of. One that > comes to mind right away is what localeconv() should return under such > conditions. Ok, yes so this path sounds much more promissing than to concur with all the different parties to find a free modification character, and agree on the semantics. > > Would you be willing to write this up? > > What form would it need to be in? At the end this should be an N-document to submit to WG14, but that is really at the end. Just one or two pages would be good to get perhaps some discussion going, first, and also make it clear what it would imply for and need from musl. Do you think that a highlevel implementation using _Thread_local or (tss calls) and setlocale would be doable, such that we could even provide a reference implementation for all POSIX systems that also implement some form of thread local variables? Jens -- :: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS ::: :: ::::::::::::::: office Strasbourg : +33 368854536 :: :: :::::::::::::::::::::: gsm France : +33 651400183 :: :: ::::::::::::::: gsm international : +49 15737185122 :: :: http://icube-icps.unistra.fr/index.php/Jens_Gustedt :: [-- Attachment #2: Digitale Signatur von OpenPGP --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: printf doesn't respect locale 2019-09-11 15:15 ` Jens Gustedt @ 2019-09-11 15:38 ` Rich Felker 2019-09-11 18:08 ` Jens Gustedt 0 siblings, 1 reply; 21+ messages in thread From: Rich Felker @ 2019-09-11 15:38 UTC (permalink / raw) To: musl On Wed, Sep 11, 2019 at 05:15:45PM +0200, Jens Gustedt wrote: > Hello Rich, > > On Wed, 11 Sep 2019 09:47:27 -0400 Rich Felker <dalias@libc.org> wrote: > > > > > An alternative/additional solution, which I actually might like > > > > better, is having a function which sets a thread-local flag to > > > > treat certain locale properties (at least the problematic > > > > LC_NUMERIC ones) as if the current locale were "C". This is > > > > weaker than the uselocale API from POSIX, but doesn't have the > > > > problems with the possibility of failure (likely with no way to > > > > make forward progress) like it does, and more importantly, would > > > > avoid *breaking* m17n/i18n functionality by turning off other > > > > unrelated, non-problematic locale features. Application or > > > > library code could then just set/restore this flag around > > > > *printf/*scanf/strto*/etc calls, or could set it and leave it if > > > > they never want to see ',' again. > > > > > > Interesting. > > > > > > Would this be difficult to implement in musl? (I guess not) > > > > I would think not, but I'd have to look at the details a little more. > > > > One other advantage of this approach is that it has a more graceful > > fallback. If an application needs portable LC_NUMERIC behavior, it can > > check at build time for the presence of the new interface. If present, > > LC_NUMERIC can be set to "" (user's preference) and the new interface > > can be used to get the needed behavior. If absent, the application can > > refrain from setting LC_NUMERIC, only setting the other categories and > > leaving it as "C" (default). > > > > Note that having it be thread-locally stateful is, in my opinion, much > > better than having new variants of the affected functions or new > > formats, since a caller using LC_NUMERIC can set/restore the state to > > safely call library code that's completely unaware of the new > > interfaces. > > > > Of course there may be complications I haven't thought of. One that > > comes to mind right away is what localeconv() should return under such > > conditions. > > Ok, yes so this path sounds much more promissing than to concur with > all the different parties to find a free modification character, and > agree on the semantics. > > > > Would you be willing to write this up? > > > > What form would it need to be in? > > At the end this should be an N-document to submit to WG14, but that is > really at the end. Just one or two pages would be good to get perhaps > some discussion going, first, and also make it clear what it would > imply for and need from musl. > > Do you think that a highlevel implementation using _Thread_local or > (tss calls) and setlocale would be doable, such that we could even > provide a reference implementation for all POSIX systems that also > implement some form of thread local variables? It can't be done in terms of setlocale because setlocale is not thread-safe or thread-local. It could be done in terms of POSIX uselocale, but such an implementation would not be fail-safe -- it needs to be able to allocate a locale_t object via duplocale, since the uselocale API works with a locale_t objects that describe the value of *all* locale categories, rather than the categories being individually settable on a per-thread basis (this is a design flaw in the POSIX interfaces, and the historic xlocale ones they were based on, IMO). So such an implementation could be a pseudo-code/demo of the functionality, but I think I'd want the proposed functionality to be always-succeeds to discourage erroneous code that ignores the result (resulting in wrong formatting/parsing, which is unsafe) or aborts the program (eew). Rich ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: printf doesn't respect locale 2019-09-11 15:38 ` Rich Felker @ 2019-09-11 18:08 ` Jens Gustedt 0 siblings, 0 replies; 21+ messages in thread From: Jens Gustedt @ 2019-09-11 18:08 UTC (permalink / raw) Cc: musl [-- Attachment #1: Type: text/plain, Size: 1762 bytes --] On Wed, 11 Sep 2019 11:38:53 -0400 Rich Felker <dalias@libc.org> wrote: > On Wed, Sep 11, 2019 at 05:15:45PM +0200, Jens Gustedt wrote: > > Do you think that a highlevel implementation using _Thread_local or > > (tss calls) and setlocale would be doable, such that we could even > > provide a reference implementation for all POSIX systems that also > > implement some form of thread local variables? > > It can't be done in terms of setlocale because setlocale is not > thread-safe or thread-local. It could be done in terms of POSIX > uselocale, but such an implementation would not be fail-safe -- it > needs to be able to allocate a locale_t object via duplocale, since > the uselocale API works with a locale_t objects that describe the > value of *all* locale categories, rather than the categories being > individually settable on a per-thread basis (this is a design flaw in > the POSIX interfaces, and the historic xlocale ones they were based > on, IMO). Ok, yes this sounds too complicated. > So such an implementation could be a pseudo-code/demo of the > functionality, but I think I'd want the proposed functionality to be > always-succeeds to discourage erroneous code that ignores the result > (resulting in wrong formatting/parsing, which is unsafe) or aborts the > program (eew). Yes, "can't fail" is an important property for such a function. This should be part of the normative requirement, then. Jens -- :: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS ::: :: ::::::::::::::: office Strasbourg : +33 368854536 :: :: :::::::::::::::::::::: gsm France : +33 651400183 :: :: ::::::::::::::: gsm international : +49 15737185122 :: :: http://icube-icps.unistra.fr/index.php/Jens_Gustedt :: [-- Attachment #2: Digitale Signatur von OpenPGP --] [-- Type: application/pgp-signature, Size: 195 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2019-09-11 18:08 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-09-09 16:31 printf doesn't respect locale Daniel Schoepe 2019-09-09 16:39 ` Daniel Schoepe 2019-09-09 16:51 ` Szabolcs Nagy 2019-09-09 17:55 ` Rich Felker 2019-09-09 17:54 ` Rich Felker 2019-09-10 16:00 ` Daniel Schoepe 2019-09-10 16:31 ` Szabolcs Nagy 2019-09-10 16:44 ` Tim Tassonis 2019-09-10 17:30 ` Rich Felker 2019-09-10 17:10 ` Daniel Schoepe 2019-09-10 17:33 ` Rich Felker 2019-09-10 18:43 ` Szabolcs Nagy 2019-09-10 21:55 ` A. Wilcox 2019-09-11 10:01 ` Szabolcs Nagy 2019-09-11 10:07 ` Jens Gustedt 2019-09-11 11:44 ` Rich Felker 2019-09-11 12:53 ` Jens Gustedt 2019-09-11 13:47 ` Rich Felker 2019-09-11 15:15 ` Jens Gustedt 2019-09-11 15:38 ` Rich Felker 2019-09-11 18:08 ` Jens Gustedt
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).