mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] [PATCH 0/2] Support printing localized RADIXCHAR
@ 2023-12-16 19:36 Pablo Correa Gómez
  2023-12-16 19:36 ` [musl] [PATCH 1/2] langinfo: add support for LC_NUMERIC translations Pablo Correa Gómez
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Pablo Correa Gómez @ 2023-12-16 19:36 UTC (permalink / raw)
  To: musl; +Cc: Pablo Correa Gómez

From: Pablo Correa Gómez <ablocorrea@hotmail.com>

Since we've been discussing about translations, I've been looking a bit
around, and have found some low-hanging fruit, in the form of improving
printf-family output for localized systems.

I've tried to do the same for strtof family of functions, but I was not
completely sure on how to approach that. Forcing the radix char there
has the problem that numeric values as written for programming stop
being supported, and treating equally a "." and the localized case seems
to not be supported by POSIX. Does anybody have any thoughts about this?
Without that, this patch series might be a bit incomplete, since
certain localized printf outputs would not be possible to ingest in
strtof. Although I'm also unequally unsure if that's a requirement

Pablo Correa Gómez (2):
  langinfo: add support for LC_NUMERIC translations
  printf: translate RADIXCHAR for floating-point numbers

 src/locale/langinfo.c | 2 +-
 src/stdio/vfprintf.c  | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

--
2.43.0

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [musl] [PATCH 1/2] langinfo: add support for LC_NUMERIC translations
  2023-12-16 19:36 [musl] [PATCH 0/2] Support printing localized RADIXCHAR Pablo Correa Gómez
@ 2023-12-16 19:36 ` Pablo Correa Gómez
  2023-12-16 19:36 ` [musl] [PATCH 2/2] printf: translate RADIXCHAR for floating-point numbers Pablo Correa Gómez
  2023-12-16 23:10 ` [musl] [PATCH 0/2] Support printing localized RADIXCHAR Rich Felker
  2 siblings, 0 replies; 6+ messages in thread
From: Pablo Correa Gómez @ 2023-12-16 19:36 UTC (permalink / raw)
  To: musl; +Cc: Pablo Correa Gómez

From: Pablo Correa Gómez <ablocorrea@hotmail.com>

The lack of supports seem to come from the fact that there are no
users to it. But we are going to introduce a new user in the next
commit, so enable it.
---
 src/locale/langinfo.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/locale/langinfo.c b/src/locale/langinfo.c
index 14773093..8acc7fd6 100644
--- a/src/locale/langinfo.c
+++ b/src/locale/langinfo.c
@@ -60,7 +60,7 @@ char *__nl_langinfo_l(nl_item item, locale_t loc)
 	}
 
 	for (; idx; idx--, str++) for (; *str; str++);
-	if (cat != LC_NUMERIC && *str) str = LCTRANS(str, cat, loc);
+	if (*str) str = LCTRANS(str, cat, loc);
 	return (char *)str;
 }
 
-- 
2.43.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [musl] [PATCH 2/2] printf: translate RADIXCHAR for floating-point numbers
  2023-12-16 19:36 [musl] [PATCH 0/2] Support printing localized RADIXCHAR Pablo Correa Gómez
  2023-12-16 19:36 ` [musl] [PATCH 1/2] langinfo: add support for LC_NUMERIC translations Pablo Correa Gómez
@ 2023-12-16 19:36 ` Pablo Correa Gómez
  2023-12-16 23:10 ` [musl] [PATCH 0/2] Support printing localized RADIXCHAR Rich Felker
  2 siblings, 0 replies; 6+ messages in thread
From: Pablo Correa Gómez @ 2023-12-16 19:36 UTC (permalink / raw)
  To: musl; +Cc: Pablo Correa Gómez

From: Pablo Correa Gómez <ablocorrea@hotmail.com>

This honors POSIX.1-2017, and allows to get the RADIXCHAR when the
locale is correctly translated.
---
 src/stdio/vfprintf.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/stdio/vfprintf.c b/src/stdio/vfprintf.c
index 497c5e19..33f98532 100644
--- a/src/stdio/vfprintf.c
+++ b/src/stdio/vfprintf.c
@@ -10,6 +10,7 @@
 #include <inttypes.h>
 #include <math.h>
 #include <float.h>
+#include <langinfo.h>
 
 /* Some useful macros */
 
@@ -389,7 +390,7 @@ static int fmt_fp(FILE *f, long double y, int w, int p, int fl, int t)
 			else if (s==buf+9) *--s='0';
 			out(f, s, buf+9-s);
 		}
-		if (p || (fl&ALT_FORM)) out(f, ".", 1);
+		if (p || (fl&ALT_FORM)) out(f, nl_langinfo(RADIXCHAR), 1);
 		for (; d<z && p>0; d++, p-=9) {
 			char *s = fmt_u(*d, buf+9);
 			while (s>buf) *--s='0';
@@ -404,7 +405,7 @@ static int fmt_fp(FILE *f, long double y, int w, int p, int fl, int t)
 			if (d!=a) while (s>buf) *--s='0';
 			else {
 				out(f, s++, 1);
-				if (p>0||(fl&ALT_FORM)) out(f, ".", 1);
+				if (p>0||(fl&ALT_FORM)) out(f, nl_langinfo(RADIXCHAR), 1);
 			}
 			out(f, s, MIN(buf+9-s, p));
 			p -= buf+9-s;
-- 
2.43.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [musl] [PATCH 0/2] Support printing localized RADIXCHAR
  2023-12-16 19:36 [musl] [PATCH 0/2] Support printing localized RADIXCHAR Pablo Correa Gómez
  2023-12-16 19:36 ` [musl] [PATCH 1/2] langinfo: add support for LC_NUMERIC translations Pablo Correa Gómez
  2023-12-16 19:36 ` [musl] [PATCH 2/2] printf: translate RADIXCHAR for floating-point numbers Pablo Correa Gómez
@ 2023-12-16 23:10 ` Rich Felker
  2023-12-17  7:26   ` Markus Wichmann
  2023-12-17 22:25   ` Pablo Correa Gomez
  2 siblings, 2 replies; 6+ messages in thread
From: Rich Felker @ 2023-12-16 23:10 UTC (permalink / raw)
  To: Pablo Correa Gómez; +Cc: musl, Pablo Correa Gómez

On Sat, Dec 16, 2023 at 08:36:42PM +0100, Pablo Correa Gómez wrote:
> From: Pablo Correa Gómez <ablocorrea@hotmail.com>
> 
> Since we've been discussing about translations, I've been looking a bit
> around, and have found some low-hanging fruit, in the form of improving
> printf-family output for localized systems.
> 
> I've tried to do the same for strtof family of functions, but I was not
> completely sure on how to approach that. Forcing the radix char there
> has the problem that numeric values as written for programming stop
> being supported, and treating equally a "." and the localized case seems
> to not be supported by POSIX. Does anybody have any thoughts about this?
> Without that, this patch series might be a bit incomplete, since
> certain localized printf outputs would not be possible to ingest in
> strtof. Although I'm also unequally unsure if that's a requirement
> 
> Pablo Correa Gómez (2):
>   langinfo: add support for LC_NUMERIC translations
>   printf: translate RADIXCHAR for floating-point numbers
> 
>  src/locale/langinfo.c | 2 +-
>  src/stdio/vfprintf.c  | 5 +++--
>  2 files changed, 4 insertions(+), 3 deletions(-)
> 
> --
> 2.43.0

This is a topic that's been controversial. I have always been against
having variable radix character, but I've also been seeking input from
users who want localized output whether the lack of this functionality
is a serious problem that needs revisiting.

Last time it was discussed, I believe my position was that, if we do
this, it needs to be a 1-bit setting, where a locale necessarily has
either '.' or ',' as the radix. No other values actually appear in
real-world conventions, and on other implementations such as glibc,
the allowance for arbitrary characters allows doing some ~nasty~ stuff
with output and input processing. For example, you could define the
radix character to be '1' or something that makes conversions fail to
round-trip.

As written to support arbitrary radix characters, the patch also fails
to handle the case where the radix character is multi-byte, copying
only a single byte of it and thereby producing broken output. This is
actually a nasty case where printf semantics for field width are not
what the caller is likely to expect, and it breaks our wide printf
implementation, which assumes when it uses byte-based printf for
numbers that the byte count and character count are the same.
Supporting only '.' and ',' avoids all of these issues, too.

Another detail you've overlooked is that scanf/strto{d,ld,f}/atof need
to process the radix point character. This in turn requires making the
_l wrappers for strto{d,ld,f} so that they actually apply the locale
argument rather than ignoring it.

Before proceeding on all of this we should probably try to reach a
decision on whether it's really needed/wanted functionality.

Rich

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [musl] [PATCH 0/2] Support printing localized RADIXCHAR
  2023-12-16 23:10 ` [musl] [PATCH 0/2] Support printing localized RADIXCHAR Rich Felker
@ 2023-12-17  7:26   ` Markus Wichmann
  2023-12-17 22:25   ` Pablo Correa Gomez
  1 sibling, 0 replies; 6+ messages in thread
From: Markus Wichmann @ 2023-12-17  7:26 UTC (permalink / raw)
  To: musl; +Cc: Pablo Correa Gómez

Am Sat, Dec 16, 2023 at 06:10:37PM -0500 schrieb Rich Felker:
> This is a topic that's been controversial. I have always been against
> having variable radix character, but I've also been seeking input from
> users who want localized output whether the lack of this functionality
> is a serious problem that needs revisiting.
>

Speaking as a German, if output is mostly translated (because
LC_MESSAGES is at the discretion of gettext) but numbers have a '.' as
radix and are output with a precision of 3, it is disorienting and
requires re-reading every time. Because in the German locale, '.' is a
valid thousands separator (space and inch mark also being acceptable),
and the thousands grouping is just 3.

And if they're output with a different precision, it is still jarring at
least.

> Last time it was discussed, I believe my position was that, if we do
> this, it needs to be a 1-bit setting, where a locale necessarily has
> either '.' or ',' as the radix. No other values actually appear in
> real-world conventions, and on other implementations such as glibc,
> the allowance for arbitrary characters allows doing some ~nasty~ stuff
> with output and input processing. For example, you could define the
> radix character to be '1' or something that makes conversions fail to
> round-trip.
>

That is reasonable. Overgeneralization usually leads to bad code or
unforseen outcomes. This is also the case for file names, with just
recently a bug report being introduced against POSIX to disallow control
characters there. (Imagine someone setting '\r' as radix point).

Ciao,
Markus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [musl] [PATCH 0/2] Support printing localized RADIXCHAR
  2023-12-16 23:10 ` [musl] [PATCH 0/2] Support printing localized RADIXCHAR Rich Felker
  2023-12-17  7:26   ` Markus Wichmann
@ 2023-12-17 22:25   ` Pablo Correa Gomez
  1 sibling, 0 replies; 6+ messages in thread
From: Pablo Correa Gomez @ 2023-12-17 22:25 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

El sab, 16-12-2023 a las 18:10 -0500, Rich Felker escribió:
> On Sat, Dec 16, 2023 at 08:36:42PM +0100, Pablo Correa Gómez wrote:
> > From: Pablo Correa Gómez <ablocorrea@hotmail.com>
> > 
> > Since we've been discussing about translations, I've been looking a
> > bit
> > around, and have found some low-hanging fruit, in the form of
> > improving
> > printf-family output for localized systems.
> > 
> > I've tried to do the same for strtof family of functions, but I was
> > not
> > completely sure on how to approach that. Forcing the radix char
> > there
> > has the problem that numeric values as written for programming stop
> > being supported, and treating equally a "." and the localized case
> > seems
> > to not be supported by POSIX. Does anybody have any thoughts about
> > this?
> > Without that, this patch series might be a bit incomplete, since
> > certain localized printf outputs would not be possible to ingest in
> > strtof. Although I'm also unequally unsure if that's a requirement
> > 
> > Pablo Correa Gómez (2):
> >   langinfo: add support for LC_NUMERIC translations
> >   printf: translate RADIXCHAR for floating-point numbers
> > 
> >  src/locale/langinfo.c | 2 +-
> >  src/stdio/vfprintf.c  | 5 +++--
> >  2 files changed, 4 insertions(+), 3 deletions(-)
> > 
> > --
> > 2.43.0
> 
> This is a topic that's been controversial. I have always been against
> having variable radix character, but I've also been seeking input
> from
> users who want localized output whether the lack of this
> functionality
> is a serious problem that needs revisiting.
> 
> Last time it was discussed, I believe my position was that, if we do
> this, it needs to be a 1-bit setting, where a locale necessarily has
> either '.' or ',' as the radix. No other values actually appear in
> real-world conventions, and on other implementations such as glibc,
> the allowance for arbitrary characters allows doing some ~nasty~
> stuff
> with output and input processing. For example, you could define the
> radix character to be '1' or something that makes conversions fail to
> round-trip.

Makes total sense. I came from the wrong assumption that Spanish might
have use an appostrophe as number separator. But seems like that has
changed since I went to primary school, and certainly the comma is what
I'm used to online in Spanish. All the technical comments you have make
sense, I certainly put this together a bit too fast, but I'm happy that
it spark a discussion on how to do it right.

> 
> As written to support arbitrary radix characters, the patch also
> fails
> to handle the case where the radix character is multi-byte, copying
> only a single byte of it and thereby producing broken output. This is
> actually a nasty case where printf semantics for field width are not
> what the caller is likely to expect, and it breaks our wide printf
> implementation, which assumes when it uses byte-based printf for
> numbers that the byte count and character count are the same.
> Supporting only '.' and ',' avoids all of these issues, too.
> 
> Another detail you've overlooked is that scanf/strto{d,ld,f}/atof
> need
> to process the radix point character. This in turn requires making
> the
> _l wrappers for strto{d,ld,f} so that they actually apply the locale
> argument rather than ignoring it.
> 
> Before proceeding on all of this we should probably try to reach a
> decision on whether it's really needed/wanted functionality.

I really think so. This was indeed a part of Alastair's original
comment on setlocale (https://www.openwall.com/lists/musl/2023/08/10/3)
So it's a thing in Frech, as well as in Spanish, where we have same
problem that Markus mentions in German.

For me personally, I really thing getting these sort of things
functional and well integrated in musl (the way you want to do it), are
pretty important for the postmarketOS project being able to reach a
wider audience :)

So is this convincing enough, that a well-put patch with the changes
you request here and in the other message would make it? If so, I'm
happy to give this a try once the setlocale changes from Alastair get
merged (I already contacted a Polish user from postmarketOS with which
we're going to test a protocol to help users add support to musl-
locales for their language). 

It would certainly be my first time trying to write something this low-
level, so might need some guidance on how to approach the changes like
you've written in your other message.

Best,
Pablo.

> 
> Rich


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-12-18 17:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-16 19:36 [musl] [PATCH 0/2] Support printing localized RADIXCHAR Pablo Correa Gómez
2023-12-16 19:36 ` [musl] [PATCH 1/2] langinfo: add support for LC_NUMERIC translations Pablo Correa Gómez
2023-12-16 19:36 ` [musl] [PATCH 2/2] printf: translate RADIXCHAR for floating-point numbers Pablo Correa Gómez
2023-12-16 23:10 ` [musl] [PATCH 0/2] Support printing localized RADIXCHAR Rich Felker
2023-12-17  7:26   ` Markus Wichmann
2023-12-17 22:25   ` Pablo Correa Gomez

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).