* [musl] [PATCH 0/2] Support printing localized RADIXCHAR @ 2023-12-16 19:36 Pablo Correa Gómez 2023-12-16 19:36 ` [musl] [PATCH 1/2] langinfo: add support for LC_NUMERIC translations Pablo Correa Gómez ` (2 more replies) 0 siblings, 3 replies; 6+ messages in thread From: Pablo Correa Gómez @ 2023-12-16 19:36 UTC (permalink / raw) To: musl; +Cc: Pablo Correa Gómez From: Pablo Correa Gómez <ablocorrea@hotmail.com> Since we've been discussing about translations, I've been looking a bit around, and have found some low-hanging fruit, in the form of improving printf-family output for localized systems. I've tried to do the same for strtof family of functions, but I was not completely sure on how to approach that. Forcing the radix char there has the problem that numeric values as written for programming stop being supported, and treating equally a "." and the localized case seems to not be supported by POSIX. Does anybody have any thoughts about this? Without that, this patch series might be a bit incomplete, since certain localized printf outputs would not be possible to ingest in strtof. Although I'm also unequally unsure if that's a requirement Pablo Correa Gómez (2): langinfo: add support for LC_NUMERIC translations printf: translate RADIXCHAR for floating-point numbers src/locale/langinfo.c | 2 +- src/stdio/vfprintf.c | 5 +++-- 2 files changed, 4 insertions(+), 3 deletions(-) -- 2.43.0 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [musl] [PATCH 1/2] langinfo: add support for LC_NUMERIC translations 2023-12-16 19:36 [musl] [PATCH 0/2] Support printing localized RADIXCHAR Pablo Correa Gómez @ 2023-12-16 19:36 ` Pablo Correa Gómez 2023-12-16 19:36 ` [musl] [PATCH 2/2] printf: translate RADIXCHAR for floating-point numbers Pablo Correa Gómez 2023-12-16 23:10 ` [musl] [PATCH 0/2] Support printing localized RADIXCHAR Rich Felker 2 siblings, 0 replies; 6+ messages in thread From: Pablo Correa Gómez @ 2023-12-16 19:36 UTC (permalink / raw) To: musl; +Cc: Pablo Correa Gómez From: Pablo Correa Gómez <ablocorrea@hotmail.com> The lack of supports seem to come from the fact that there are no users to it. But we are going to introduce a new user in the next commit, so enable it. --- src/locale/langinfo.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/locale/langinfo.c b/src/locale/langinfo.c index 14773093..8acc7fd6 100644 --- a/src/locale/langinfo.c +++ b/src/locale/langinfo.c @@ -60,7 +60,7 @@ char *__nl_langinfo_l(nl_item item, locale_t loc) } for (; idx; idx--, str++) for (; *str; str++); - if (cat != LC_NUMERIC && *str) str = LCTRANS(str, cat, loc); + if (*str) str = LCTRANS(str, cat, loc); return (char *)str; } -- 2.43.0 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [musl] [PATCH 2/2] printf: translate RADIXCHAR for floating-point numbers 2023-12-16 19:36 [musl] [PATCH 0/2] Support printing localized RADIXCHAR Pablo Correa Gómez 2023-12-16 19:36 ` [musl] [PATCH 1/2] langinfo: add support for LC_NUMERIC translations Pablo Correa Gómez @ 2023-12-16 19:36 ` Pablo Correa Gómez 2023-12-16 23:10 ` [musl] [PATCH 0/2] Support printing localized RADIXCHAR Rich Felker 2 siblings, 0 replies; 6+ messages in thread From: Pablo Correa Gómez @ 2023-12-16 19:36 UTC (permalink / raw) To: musl; +Cc: Pablo Correa Gómez From: Pablo Correa Gómez <ablocorrea@hotmail.com> This honors POSIX.1-2017, and allows to get the RADIXCHAR when the locale is correctly translated. --- src/stdio/vfprintf.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/stdio/vfprintf.c b/src/stdio/vfprintf.c index 497c5e19..33f98532 100644 --- a/src/stdio/vfprintf.c +++ b/src/stdio/vfprintf.c @@ -10,6 +10,7 @@ #include <inttypes.h> #include <math.h> #include <float.h> +#include <langinfo.h> /* Some useful macros */ @@ -389,7 +390,7 @@ static int fmt_fp(FILE *f, long double y, int w, int p, int fl, int t) else if (s==buf+9) *--s='0'; out(f, s, buf+9-s); } - if (p || (fl&ALT_FORM)) out(f, ".", 1); + if (p || (fl&ALT_FORM)) out(f, nl_langinfo(RADIXCHAR), 1); for (; d<z && p>0; d++, p-=9) { char *s = fmt_u(*d, buf+9); while (s>buf) *--s='0'; @@ -404,7 +405,7 @@ static int fmt_fp(FILE *f, long double y, int w, int p, int fl, int t) if (d!=a) while (s>buf) *--s='0'; else { out(f, s++, 1); - if (p>0||(fl&ALT_FORM)) out(f, ".", 1); + if (p>0||(fl&ALT_FORM)) out(f, nl_langinfo(RADIXCHAR), 1); } out(f, s, MIN(buf+9-s, p)); p -= buf+9-s; -- 2.43.0 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [musl] [PATCH 0/2] Support printing localized RADIXCHAR 2023-12-16 19:36 [musl] [PATCH 0/2] Support printing localized RADIXCHAR Pablo Correa Gómez 2023-12-16 19:36 ` [musl] [PATCH 1/2] langinfo: add support for LC_NUMERIC translations Pablo Correa Gómez 2023-12-16 19:36 ` [musl] [PATCH 2/2] printf: translate RADIXCHAR for floating-point numbers Pablo Correa Gómez @ 2023-12-16 23:10 ` Rich Felker 2023-12-17 7:26 ` Markus Wichmann 2023-12-17 22:25 ` Pablo Correa Gomez 2 siblings, 2 replies; 6+ messages in thread From: Rich Felker @ 2023-12-16 23:10 UTC (permalink / raw) To: Pablo Correa Gómez; +Cc: musl, Pablo Correa Gómez On Sat, Dec 16, 2023 at 08:36:42PM +0100, Pablo Correa Gómez wrote: > From: Pablo Correa Gómez <ablocorrea@hotmail.com> > > Since we've been discussing about translations, I've been looking a bit > around, and have found some low-hanging fruit, in the form of improving > printf-family output for localized systems. > > I've tried to do the same for strtof family of functions, but I was not > completely sure on how to approach that. Forcing the radix char there > has the problem that numeric values as written for programming stop > being supported, and treating equally a "." and the localized case seems > to not be supported by POSIX. Does anybody have any thoughts about this? > Without that, this patch series might be a bit incomplete, since > certain localized printf outputs would not be possible to ingest in > strtof. Although I'm also unequally unsure if that's a requirement > > Pablo Correa Gómez (2): > langinfo: add support for LC_NUMERIC translations > printf: translate RADIXCHAR for floating-point numbers > > src/locale/langinfo.c | 2 +- > src/stdio/vfprintf.c | 5 +++-- > 2 files changed, 4 insertions(+), 3 deletions(-) > > -- > 2.43.0 This is a topic that's been controversial. I have always been against having variable radix character, but I've also been seeking input from users who want localized output whether the lack of this functionality is a serious problem that needs revisiting. Last time it was discussed, I believe my position was that, if we do this, it needs to be a 1-bit setting, where a locale necessarily has either '.' or ',' as the radix. No other values actually appear in real-world conventions, and on other implementations such as glibc, the allowance for arbitrary characters allows doing some ~nasty~ stuff with output and input processing. For example, you could define the radix character to be '1' or something that makes conversions fail to round-trip. As written to support arbitrary radix characters, the patch also fails to handle the case where the radix character is multi-byte, copying only a single byte of it and thereby producing broken output. This is actually a nasty case where printf semantics for field width are not what the caller is likely to expect, and it breaks our wide printf implementation, which assumes when it uses byte-based printf for numbers that the byte count and character count are the same. Supporting only '.' and ',' avoids all of these issues, too. Another detail you've overlooked is that scanf/strto{d,ld,f}/atof need to process the radix point character. This in turn requires making the _l wrappers for strto{d,ld,f} so that they actually apply the locale argument rather than ignoring it. Before proceeding on all of this we should probably try to reach a decision on whether it's really needed/wanted functionality. Rich ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [musl] [PATCH 0/2] Support printing localized RADIXCHAR 2023-12-16 23:10 ` [musl] [PATCH 0/2] Support printing localized RADIXCHAR Rich Felker @ 2023-12-17 7:26 ` Markus Wichmann 2023-12-17 22:25 ` Pablo Correa Gomez 1 sibling, 0 replies; 6+ messages in thread From: Markus Wichmann @ 2023-12-17 7:26 UTC (permalink / raw) To: musl; +Cc: Pablo Correa Gómez Am Sat, Dec 16, 2023 at 06:10:37PM -0500 schrieb Rich Felker: > This is a topic that's been controversial. I have always been against > having variable radix character, but I've also been seeking input from > users who want localized output whether the lack of this functionality > is a serious problem that needs revisiting. > Speaking as a German, if output is mostly translated (because LC_MESSAGES is at the discretion of gettext) but numbers have a '.' as radix and are output with a precision of 3, it is disorienting and requires re-reading every time. Because in the German locale, '.' is a valid thousands separator (space and inch mark also being acceptable), and the thousands grouping is just 3. And if they're output with a different precision, it is still jarring at least. > Last time it was discussed, I believe my position was that, if we do > this, it needs to be a 1-bit setting, where a locale necessarily has > either '.' or ',' as the radix. No other values actually appear in > real-world conventions, and on other implementations such as glibc, > the allowance for arbitrary characters allows doing some ~nasty~ stuff > with output and input processing. For example, you could define the > radix character to be '1' or something that makes conversions fail to > round-trip. > That is reasonable. Overgeneralization usually leads to bad code or unforseen outcomes. This is also the case for file names, with just recently a bug report being introduced against POSIX to disallow control characters there. (Imagine someone setting '\r' as radix point). Ciao, Markus ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [musl] [PATCH 0/2] Support printing localized RADIXCHAR 2023-12-16 23:10 ` [musl] [PATCH 0/2] Support printing localized RADIXCHAR Rich Felker 2023-12-17 7:26 ` Markus Wichmann @ 2023-12-17 22:25 ` Pablo Correa Gomez 1 sibling, 0 replies; 6+ messages in thread From: Pablo Correa Gomez @ 2023-12-17 22:25 UTC (permalink / raw) To: Rich Felker; +Cc: musl El sab, 16-12-2023 a las 18:10 -0500, Rich Felker escribió: > On Sat, Dec 16, 2023 at 08:36:42PM +0100, Pablo Correa Gómez wrote: > > From: Pablo Correa Gómez <ablocorrea@hotmail.com> > > > > Since we've been discussing about translations, I've been looking a > > bit > > around, and have found some low-hanging fruit, in the form of > > improving > > printf-family output for localized systems. > > > > I've tried to do the same for strtof family of functions, but I was > > not > > completely sure on how to approach that. Forcing the radix char > > there > > has the problem that numeric values as written for programming stop > > being supported, and treating equally a "." and the localized case > > seems > > to not be supported by POSIX. Does anybody have any thoughts about > > this? > > Without that, this patch series might be a bit incomplete, since > > certain localized printf outputs would not be possible to ingest in > > strtof. Although I'm also unequally unsure if that's a requirement > > > > Pablo Correa Gómez (2): > > langinfo: add support for LC_NUMERIC translations > > printf: translate RADIXCHAR for floating-point numbers > > > > src/locale/langinfo.c | 2 +- > > src/stdio/vfprintf.c | 5 +++-- > > 2 files changed, 4 insertions(+), 3 deletions(-) > > > > -- > > 2.43.0 > > This is a topic that's been controversial. I have always been against > having variable radix character, but I've also been seeking input > from > users who want localized output whether the lack of this > functionality > is a serious problem that needs revisiting. > > Last time it was discussed, I believe my position was that, if we do > this, it needs to be a 1-bit setting, where a locale necessarily has > either '.' or ',' as the radix. No other values actually appear in > real-world conventions, and on other implementations such as glibc, > the allowance for arbitrary characters allows doing some ~nasty~ > stuff > with output and input processing. For example, you could define the > radix character to be '1' or something that makes conversions fail to > round-trip. Makes total sense. I came from the wrong assumption that Spanish might have use an appostrophe as number separator. But seems like that has changed since I went to primary school, and certainly the comma is what I'm used to online in Spanish. All the technical comments you have make sense, I certainly put this together a bit too fast, but I'm happy that it spark a discussion on how to do it right. > > As written to support arbitrary radix characters, the patch also > fails > to handle the case where the radix character is multi-byte, copying > only a single byte of it and thereby producing broken output. This is > actually a nasty case where printf semantics for field width are not > what the caller is likely to expect, and it breaks our wide printf > implementation, which assumes when it uses byte-based printf for > numbers that the byte count and character count are the same. > Supporting only '.' and ',' avoids all of these issues, too. > > Another detail you've overlooked is that scanf/strto{d,ld,f}/atof > need > to process the radix point character. This in turn requires making > the > _l wrappers for strto{d,ld,f} so that they actually apply the locale > argument rather than ignoring it. > > Before proceeding on all of this we should probably try to reach a > decision on whether it's really needed/wanted functionality. I really think so. This was indeed a part of Alastair's original comment on setlocale (https://www.openwall.com/lists/musl/2023/08/10/3) So it's a thing in Frech, as well as in Spanish, where we have same problem that Markus mentions in German. For me personally, I really thing getting these sort of things functional and well integrated in musl (the way you want to do it), are pretty important for the postmarketOS project being able to reach a wider audience :) So is this convincing enough, that a well-put patch with the changes you request here and in the other message would make it? If so, I'm happy to give this a try once the setlocale changes from Alastair get merged (I already contacted a Polish user from postmarketOS with which we're going to test a protocol to help users add support to musl- locales for their language). It would certainly be my first time trying to write something this low- level, so might need some guidance on how to approach the changes like you've written in your other message. Best, Pablo. > > Rich ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-12-18 17:12 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-12-16 19:36 [musl] [PATCH 0/2] Support printing localized RADIXCHAR Pablo Correa Gómez 2023-12-16 19:36 ` [musl] [PATCH 1/2] langinfo: add support for LC_NUMERIC translations Pablo Correa Gómez 2023-12-16 19:36 ` [musl] [PATCH 2/2] printf: translate RADIXCHAR for floating-point numbers Pablo Correa Gómez 2023-12-16 23:10 ` [musl] [PATCH 0/2] Support printing localized RADIXCHAR Rich Felker 2023-12-17 7:26 ` Markus Wichmann 2023-12-17 22:25 ` Pablo Correa Gomez
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).