mailing list of musl libc
 help / color / mirror / code / Atom feed
* Re: [musl] Planned locale work and community thoughts
       [not found]   ` <20250618192847.GA1827@brightrain.aerifal.cx>
@ 2025-06-18 21:23     ` Rich Felker
  2025-06-18 22:42       ` Thorsten Glaser
  0 siblings, 1 reply; 3+ messages in thread
From: Rich Felker @ 2025-06-18 21:23 UTC (permalink / raw)
  To: Pablo Correa Gomez; +Cc: musl

On Wed, Jun 18, 2025 at 03:28:47PM -0400, Rich Felker wrote:
> > * Implement RADIXCHAR so that "." is not the only possible separator.
> > THOUSEP will in principle not be implemented due to it breaking quite
> > some assumptions, and it being less critical for users.
> 
> To give some background on this: from the start I was largely opposed
> to having the radix char be localizable at all, as this has been a
> source of perpetual problems for parsing and generating text-based
> data formats intended for interchange, and I didn't really think there
> was any modern demand for it.
> 
> However, in past discussions of the topic, it's come up that some
> people do want it, and I don't want us to be the bad guys who are
> being stubborn dismissing someone else's cultural expectations, so the
> tentative plan has been to offer this with 1-bit degree of freedom
> between '.' and ',' as the only choices.
> 
> I've been made aware that, at least historically prior to use in
> computer systems, there have been other notations for radix point, but
> it's not clear if there's any modern expectation to be able to do
> that. What I think would be a useful next step is to grep the Unicode
> CLDR for whether there are non-'.' non-',' radix chars in any locale
> definitions. If there are none, I think that already settles it. If
> there are any, we should attempt to figure out whether there are
> real-world systems that support them and precedent for users to expect
> they work.
> 
> Note that supporting basically anything plausble other than '.' and
> ',' as radix characters has major technical issues that may introduce
> vulns into programs not expecting it, so in the absence of both strong
> evidence of necessity and research into what would break and whether
> unsafe breakage is unlikely, I want to just say no to this.
> 
> It may however make sense for the on-disk data format to allow for the
> possibility, and for musl to just treat anything but "," as if it were

I've run a textual grep on the data from cldr-47.0.0-json-full.zip:

    grep '"decimal": *"[^,.]"' cldr-numbers-full/main/*/numbers.json

and the only results seem to be for alternative-numerals Arabic
profiles under "symbols-numberSystem-arabext", which is not
used/usable in the C/POSIX locale system.

(There is an alternate symbol, but it's only used with alternate
numeral characters, and C/POSIX can't use alternate numeral characters
in their locale model.)

Theoretically it's possible the textual grep missed things if there is
inconsistent json formatting anywhere, so if anyone familiar with jq
wants to conduct a search using it instead to confirm, go ahead. I
think we're good though.

Rich

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [musl] Planned locale work and community thoughts
  2025-06-18 21:23     ` [musl] Planned locale work and community thoughts Rich Felker
@ 2025-06-18 22:42       ` Thorsten Glaser
  2025-06-18 23:14         ` Rich Felker
  0 siblings, 1 reply; 3+ messages in thread
From: Thorsten Glaser @ 2025-06-18 22:42 UTC (permalink / raw)
  To: musl; +Cc: Pablo Correa Gomez

On Wed, 18 Jun 2025, Rich Felker wrote:

>Theoretically it's possible the textual grep missed things if there is
>inconsistent json formatting anywhere, so if anyone familiar with jq
>wants to conduct a search using it instead to confirm, go ahead. I

My jq-foo is not very good, but I managed this:

tg@x61p:/tmp/u/cldr-numbers-full/main $ cat */numbers.json | jq 'paths(.decimal?|scalars) as $p | [">>" + ($p | join(".")), getpath($p).decimal]' | sed 's/">>/>>/' | grep -e '^  "[^.,]"' -e '^  ".[^"]' | uniq
  "٫"

So yes, U+066B is the only other one, and no multi-char ones.

tg@x61p:/tmp/u/cldr-numbers-full/main $ cat */numbers.json | jq 'paths(.decimal?|scalars) as $p | [">>" + ($p | join(".")), getpath($p).decimal]' | sed 's/">>/>>/' | grep -B 1 -e '^  "[^.,]"' -e '^  ".[^"]'

… shows all the occurrences, but a quick filter shows that we have
both symbols-numberSystem-arabext and symbols-numberSystem-arab but
assuming both are out of scope…

tg@x61p:/tmp/u/cldr-numbers-full/main $ cat */numbers.json | jq 'paths(.decimal?|scalars) as $p | [">>" + ($p | join(".")), getpath($p).decimal]' | sed 's/">>/>>/' | grep -B 1 -e '^  "[^.,]"' -e '^  ".[^"]' | fgrep '>>' | fgrep -v -e '.symbols-numberSystem-arabext"' -e '.symbols-numberSystem-arab"'
  >>main.bgn-AE.numbers.symbols-numberSystem-latn",
  >>main.bgn-AF.numbers.symbols-numberSystem-latn",
  >>main.bgn-IR.numbers.symbols-numberSystem-latn",
  >>main.bgn-OM.numbers.symbols-numberSystem-latn",
  >>main.bgn.numbers.symbols-numberSystem-latn",

… leaves us with this; bgn/numbers.json examplary:

{
  "main": {
    "bgn": {
      "numbers": {
        "symbols-numberSystem-arabext": {
          "decimal": "٫",
          "group": "٬",
          "list": "؛",
…
        },
        "symbols-numberSystem-latn": {
          "decimal": "٫",
          "group": "،",
          "list": ";",
…

So, if the bgn locales are ever going to be relevant…
unsure what that exactly is, but my acronyms database says…
	[ISO 639-3] Western Balochi (cf. bal)
… which seems to fit.

bye,
//mirabilos
-- 
<ch> you introduced a merge commit        │<mika> % g rebase -i HEAD^^
<mika> sorry, no idea and rebasing just fscked │<mika> Segmentation
<ch> should have cloned into a clean repo      │  fault (core dumped)
<ch> if I rebase that now, it's really ugh     │<mika:#grml> wuahhhhhh

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [musl] Planned locale work and community thoughts
  2025-06-18 22:42       ` Thorsten Glaser
@ 2025-06-18 23:14         ` Rich Felker
  0 siblings, 0 replies; 3+ messages in thread
From: Rich Felker @ 2025-06-18 23:14 UTC (permalink / raw)
  To: Thorsten Glaser; +Cc: musl, Pablo Correa Gomez

On Thu, Jun 19, 2025 at 12:42:50AM +0200, Thorsten Glaser wrote:
> On Wed, 18 Jun 2025, Rich Felker wrote:
> 
> >Theoretically it's possible the textual grep missed things if there is
> >inconsistent json formatting anywhere, so if anyone familiar with jq
> >wants to conduct a search using it instead to confirm, go ahead. I
> 
> My jq-foo is not very good, but I managed this:
> 
> tg@x61p:/tmp/u/cldr-numbers-full/main $ cat */numbers.json | jq 'paths(.decimal?|scalars) as $p | [">>" + ($p | join(".")), getpath($p).decimal]' | sed 's/">>/>>/' | grep -e '^  "[^.,]"' -e '^  ".[^"]' | uniq
>   "٫"
> 
> So yes, U+066B is the only other one, and no multi-char ones.
> 
> tg@x61p:/tmp/u/cldr-numbers-full/main $ cat */numbers.json | jq 'paths(.decimal?|scalars) as $p | [">>" + ($p | join(".")), getpath($p).decimal]' | sed 's/">>/>>/' | grep -B 1 -e '^  "[^.,]"' -e '^  ".[^"]'
> 
> … shows all the occurrences, but a quick filter shows that we have
> both symbols-numberSystem-arabext and symbols-numberSystem-arab but
> assuming both are out of scope…
> 
> tg@x61p:/tmp/u/cldr-numbers-full/main $ cat */numbers.json | jq 'paths(.decimal?|scalars) as $p | [">>" + ($p | join(".")), getpath($p).decimal]' | sed 's/">>/>>/' | grep -B 1 -e '^  "[^.,]"' -e '^  ".[^"]' | fgrep '>>' | fgrep -v -e '.symbols-numberSystem-arabext"' -e '.symbols-numberSystem-arab"'
>   >>main.bgn-AE.numbers.symbols-numberSystem-latn",
>   >>main.bgn-AF.numbers.symbols-numberSystem-latn",
>   >>main.bgn-IR.numbers.symbols-numberSystem-latn",
>   >>main.bgn-OM.numbers.symbols-numberSystem-latn",
>   >>main.bgn.numbers.symbols-numberSystem-latn",
> 
> … leaves us with this; bgn/numbers.json examplary:
> 
> {
>   "main": {
>     "bgn": {
>       "numbers": {
>         "symbols-numberSystem-arabext": {
>           "decimal": "٫",
>           "group": "٬",
>           "list": "؛",
> …
>         },
>         "symbols-numberSystem-latn": {
>           "decimal": "٫",
>           "group": "،",
>           "list": ";",
> …
> 
> So, if the bgn locales are ever going to be relevant…
> unsure what that exactly is, but my acronyms database says…
> 	[ISO 639-3] Western Balochi (cf. bal)
> … which seems to fit.

Thanks. My grapping seems to have overlooked that just because it was
the same character that would normally only be used in an alt-digits
context. I wonder if the above is intentional or a mistake and if any
systems are actually doing that.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-06-18 23:14 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <ecdf29e7b16f50fbb734870f5c34abaa61d946cd.camel@postmarketos.org>
     [not found] ` <44bbfa21a60d3dc318db0940af9367e612c672e5.camel@postmarketos.org>
     [not found]   ` <20250618192847.GA1827@brightrain.aerifal.cx>
2025-06-18 21:23     ` [musl] Planned locale work and community thoughts Rich Felker
2025-06-18 22:42       ` Thorsten Glaser
2025-06-18 23:14         ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).