mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] Selecting locale source format
@ 2025-09-17  1:14 Rich Felker
  2025-09-17  1:23 ` A. Wilcox
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Rich Felker @ 2025-09-17  1:14 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 2627 bytes --]

I have a proposed binary format for new locale files that I'm in the
process of writing up, but Pablo brought it to my attention that,
while binary format (ABI) is what's important to have down and stable
at the time we integrate into musl, pinning down the source format is
what's important/blocking for collaboration with localization folks.

I have two candidate formats in the works right now for this:



Option 1: subset+extension of POSIX localedef format.

The basis for this format is described in
https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap07.html

If we go this way, it would be a "subset" because (1) some parts are
not relevant, like LC_CTYPE, which does not vary by locale, (2) some
parts will necessarily be represented in different ways, like
collation where we're using UCA rather than the POSIX form, and (3)
the format just has a lot of gratuitous cruft like symbolic character
names. It will also necessarily be extended because POSIX localedef
has no way to represent translated error strings etc. - keys for them
have to be added.

Going this route would have the source data in a fairly compact and
"well-known" (to certain audiences) form, but requires that the
tooling to produce binary locale files be aware of how these fields
translate to the data model for the binary form.

A sample (should be roughly correct C/POSIX locale) is attached for
reference.




Option 2: human-readable/text representation of the binary form

Describing this requires a basic intro to the binary form, which is a
multi-level hierarchical table mapping a path of integer key values to
a data blob. In text we can represent keys with symbolic constants,
but they're just a way of writing the underlying numbers. For example
the path strerror/0 leads to the "No error information" text,
strerror/EACCES leads to the "Permission denied" text, etc. Here
"strerror" just represents a number for the first-level path component
where strerror strings are stored, subindexed by (the arch/generic
versions of) the errno codes.

Going this route mostly avoids the need for smarts in the tooling, and
"has more flexibility" to encode things. But this also potentially
makes the encoding seem more arbitrary to localization folks.

Like in option 1, a sample (some hybrid between C/POSIX and a
hypothetical US-English locale, whipped up quick by hand as an
example) of one way this format could look is attached for reference.
An obvious variant that might be friendlier/more-familiar to folks
working with the data would be representing the same in json (which is
easy).




My leaning is towards option 1.


[-- Attachment #2: sample_posix_localedef.txt --]
[-- Type: text/plain, Size: 1459 bytes --]

LC_TIME
abday "Sun";"Mon";"Tue";"Wed";"Thu";"Fri";"Sat"
day "Sunday";"Monday";"Tuesday";"Wednesday";"Thursday";"Friday";"Saturday"
abmon "Jan";"Feb";"Mar";"Apr";"May";"Jun";"Jul";"Aug";"Sep";"Oct";"Nov";"Dec"
mon "Januar";"February";"March";"April";"May";"June";"July";"August";"September";"October";"November";"December"
d_t_fmt "%a %b %e %H:%M:%S %Y"
d_fmt "%m/%d/%y"
t_fmt "%H:%M:%S"
t_fmt_ampm "%I:%M:%S %p"
am_pm "AM";"PM"
END LC_TIME

LC_MESSAGES
yesexpr "^[yY]"
noexpr "^[nN]"
EILSEQ "Illegal byte sequence"
EDOM "Domain error"
ERANGE "Result not representable"
ENOTTY "Not a tty"
EACCES "Permission denied"
EPERM "Operation not permitted"
...
EAI_0 "Unknown error"
EAI_BADFLAGS "Invalid flags"
EAI_NONAME "Name does not resolve"
EAI_AGAIN "Try again"
...
REG_OK "No error"
REG_NOMATCH "No match"
REG_BADPAT "Invalid regexp"
...
END LC_MESSAGES

LC_NUMERIC
decimal_point "."
thousands_sep ""
grouping -1
END LC_NUMERIC

LC_MONETARY
int_curr_symbol      ""
currency_symbol      ""
mon_decimal_point    ""
mon_thousands_sep    ""
mon_grouping         -1
positive_sign        ""
negative_sign        ""
int_frac_digits      -1
frac_digits          -1
p_cs_precedes        -1
p_sep_by_space       -1
n_cs_precedes        -1
n_sep_by_space       -1
p_sign_posn          -1
n_sign_posn          -1
int_p_cs_precedes    -1
int_p_sep_by_space   -1
int_n_cs_precedes    -1
int_n_sep_by_space   -1
int_p_sign_posn      -1
int_n_sign_posn      -1
END LC_MONETARY

[-- Attachment #3: sample_binary_as_text.txt --]
[-- Type: text/plain, Size: 4349 bytes --]

[langinfo/LC_TIME]
ABDAY1="Sun"
ABDAY2="Mon"
ABDAY3="Tue"
ABDAY4="Wed"
ABDAY5="Thu"
ABDAY6="Fri"
ABDAY7="Sat"
DAY1="Sunday"
DAY2="Monday"
DAY3="Tuesday"
DAY4="Wednesday"
DAY5="Thursday"
DAY6="Friday"
DAY7="Saturday"
...

[langinfo/LC_NUMERIC]
decimal_point="."
thousands_sep=""
grouping_sc="\177"
grouping_uc="\377"

[langinfo/LC_MONETARY]
mon_grouping_sc="\3\177"
mon_grouping_uc="\3\377"
mon_thousands_sep=","
mon_decimal_point="."
int_curr_symbol="USD "
currency_symbol="$"

[strerror]
0="No error information"
EILSEQ="Illegal byte sequence"
EDOM="Domain error"
ERANGE="Result not representable"
ENOTTY="Not a tty"
EACCES="Permission denied"
EPERM="Operation not permitted"
ENOENT="No such file or directory"
ESRCH="No such process"
EEXIST="File exists"
EOVERFLOW="Value too large for data type"
ENOSPC="No space left on device"
ENOMEM="Out of memory"
EBUSY="Resource busy"
EINTR="Interrupted system call"
EAGAIN="Resource temporarily unavailable"
ESPIPE="Invalid seek"
EXDEV="Cross-device link"
EROFS="Read-only file system"
ENOTEMPTY="Directory not empty"
ECONNRESET="Connection reset by peer"
ETIMEDOUT="Operation timed out"
ECONNREFUSED="Connection refused"
EHOSTDOWN="Host is down"
EHOSTUNREACH="Host is unreachable"
EADDRINUSE="Address in use"
EPIPE="Broken pipe"
EIO="I/O error"
ENXIO="No such device or address"
ENOTBLK="Block device required"
ENODEV="No such device"
ENOTDIR="Not a directory"
EISDIR="Is a directory"
ETXTBSY="Text file busy"
ENOEXEC="Exec format error"
EINVAL="Invalid argument"
E2BIG="Argument list too long"
ELOOP="Symbolic link loop"
ENAMETOOLONG="Filename too long"
ENFILE="Too many open files in system"
EMFILE="No file descriptors available"
EBADF="Bad file descriptor"
ECHILD="No child process"
EFAULT="Bad address"
EFBIG="File too large"
EMLINK="Too many links"
ENOLCK="No locks available"
EDEADLK="Resource deadlock would occur"
ENOTRECOVERABLE="State not recoverable"
EOWNERDEAD="Previous owner died"
ECANCELED="Operation canceled"
ENOSYS="Function not implemented"
ENOMSG="No message of desired type"
EIDRM="Identifier removed"
ENOSTR="Device not a stream"
ENODATA="No data available"
ETIME="Device timeout"
ENOSR="Out of streams resources"
ENOLINK="Link has been severed"
EPROTO="Protocol error"
EBADMSG="Bad message"
EBADFD="File descriptor in bad state"
ENOTSOCK="Not a socket"
EDESTADDRREQ="Destination address required"
EMSGSIZE="Message too large"
EPROTOTYPE="Protocol wrong type for socket"
ENOPROTOOPT="Protocol not available"
EPROTONOSUPPORT,"Protocol not supported"
ESOCKTNOSUPPORT,"Socket type not supported"
ENOTSUP="Not supported"
EPFNOSUPPORT="Protocol family not supported"
EAFNOSUPPORT="Address family not supported by protocol"
EADDRNOTAVAIL,"Address not available"
ENETDOWN="Network is down"
ENETUNREACH="Network unreachable"
ENETRESET="Connection reset by network"
ECONNABORTED="Connection aborted"
ENOBUFS="No buffer space available"
EISCONN="Socket is connected"
ENOTCONN="Socket not connected"
ESHUTDOWN="Cannot send after socket shutdown"
EALREADY="Operation already in progress"
EINPROGRESS="Operation in progress"
ESTALE="Stale file handle"
EUCLEAN="Data consistency error"
ENAVAIL="Resource not available"
EREMOTEIO="Remote I/O error"
EDQUOT="Quota exceeded"
ENOMEDIUM="No medium found"
EMEDIUMTYPE="Wrong medium type"
EMULTIHOP="Multihop attempted"
ENOKEY="Required key not available"
EKEYEXPIRED="Key has expired"
EKEYREVOKED="Key has been revoked"
EKEYREJECTED="Key was rejected by service"

[gai_strerror]
0="Unknown error"
EAI_BADFLAGS="Invalid flags"
EAI_NONAME="Name does not resolve"
EAI_AGAIN="Try again"
EAI_FAIL="Non-recoverable error"
EAI_NODATA="Name has no usable address"
EAI_FAMILY="Unrecognized address family or invalid length"
EAI_SOCKTYPE="Unrecognized socket type"
EAI_SERVICE="Unrecognized service"
EAI_MEMORY="Out of memory"
EAI_SYSTEM="System error"
EAI_OVERFLOW="Overflow"

[regerror]
REG_OK="No error"
REG_NOMATCH="No match"
REG_BADPAT="Invalid regexp"
REG_ECOLLATE="Unknown collating element"
REG_ECTYPE="Unknown character class name"
REG_EESCAPE="Trailing backslash"
REG_ESUBREG="Invalid back reference"
REG_EBRACK="Missing ']'"
REG_EPAREN="Missing ')'"
REG_EBRACE="Missing '}'"
REG_BADBR="Invalid contents of {}"
REG_ERANGE="Invalid character range"
REG_ESPACE="Out of memory"
REG_BADRPT="Repetition not preceded by valid expression"
???="Unknown error"

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [musl] Selecting locale source format
  2025-09-17  1:14 [musl] Selecting locale source format Rich Felker
@ 2025-09-17  1:23 ` A. Wilcox
  2025-09-17  1:36   ` Rich Felker
  2025-09-17 15:43 ` enh
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: A. Wilcox @ 2025-09-17  1:23 UTC (permalink / raw)
  To: musl

On Sep 16, 2025, at 20:14, Rich Felker <dalias@libc.org> wrote:
> 
> I have a proposed binary format for new locale files that I'm in the
> process of writing up, but Pablo brought it to my attention that,
> while binary format (ABI) is what's important to have down and stable
> at the time we integrate into musl, pinning down the source format is
> what's important/blocking for collaboration with localization folks.
> 
> I have two candidate formats in the works right now for this:
> 
> 
> 
> Option 1: subset+extension of POSIX localedef format.
> 
> The basis for this format is described in
> https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap07.html
> 
> If we go this way, it would be a "subset" because (1) some parts are
> not relevant, like LC_CTYPE, which does not vary by locale, (2) some
> parts will necessarily be represented in different ways, like
> collation where we're using UCA rather than the POSIX form, and (3)
> the format just has a lot of gratuitous cruft like symbolic character
> names. It will also necessarily be extended because POSIX localedef
> has no way to represent translated error strings etc. - keys for them
> have to be added.
> 
> Going this route would have the source data in a fairly compact and
> "well-known" (to certain audiences) form, but requires that the
> tooling to produce binary locale files be aware of how these fields
> translate to the data model for the binary form.
> 
> A sample (should be roughly correct C/POSIX locale) is attached for
> reference.
> 
> 
> 
> 
> Option 2: human-readable/text representation of the binary form
> 
> Describing this requires a basic intro to the binary form, which is a
> multi-level hierarchical table mapping a path of integer key values to
> a data blob. In text we can represent keys with symbolic constants,
> but they're just a way of writing the underlying numbers. For example
> the path strerror/0 leads to the "No error information" text,
> strerror/EACCES leads to the "Permission denied" text, etc. Here
> "strerror" just represents a number for the first-level path component
> where strerror strings are stored, subindexed by (the arch/generic
> versions of) the errno codes.
> 
> Going this route mostly avoids the need for smarts in the tooling, and
> "has more flexibility" to encode things. But this also potentially
> makes the encoding seem more arbitrary to localization folks.
> 
> Like in option 1, a sample (some hybrid between C/POSIX and a
> hypothetical US-English locale, whipped up quick by hand as an
> example) of one way this format could look is attached for reference.
> An obvious variant that might be friendlier/more-familiar to folks
> working with the data would be representing the same in json (which is
> easy).
> 
> 
> 
> 
> My leaning is towards option 1.
> 
> <sample_posix_localedef.txt><sample_binary_as_text.txt>

Hi Rich,

Thanks for continuing the locale work - very happy to see it
progressing!

I definitely prefer option 1 as well.  This will allow an easy
migration path for people using other Unix or Unix-like systems
(Solaris, AIX, glibc Linux) where localedef is also used.  It also
means there is also a large corpus of existing files we can use,
both for testing the tooling and for initial drafts at porting musl
to other locales.

I think it is reasonable to extend the file to handle translations
for days of the week/months.  Is there a reason the existing system
of gettext(3) can’t be used for strerror_l?

Best,
-Anna


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [musl] Selecting locale source format
  2025-09-17  1:23 ` A. Wilcox
@ 2025-09-17  1:36   ` Rich Felker
  2025-09-19 14:06     ` Pablo Correa Gomez
  2026-03-02 13:22     ` Pablo Correa Gomez
  0 siblings, 2 replies; 15+ messages in thread
From: Rich Felker @ 2025-09-17  1:36 UTC (permalink / raw)
  To: A. Wilcox; +Cc: musl

On Tue, Sep 16, 2025 at 08:23:09PM -0500, A. Wilcox wrote:
> On Sep 16, 2025, at 20:14, Rich Felker <dalias@libc.org> wrote:
> > 
> > I have a proposed binary format for new locale files that I'm in the
> > process of writing up, but Pablo brought it to my attention that,
> > while binary format (ABI) is what's important to have down and stable
> > at the time we integrate into musl, pinning down the source format is
> > what's important/blocking for collaboration with localization folks.
> > 
> > I have two candidate formats in the works right now for this:
> > 
> > 
> > 
> > Option 1: subset+extension of POSIX localedef format.
> > 
> > The basis for this format is described in
> > https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap07.html
> > 
> > If we go this way, it would be a "subset" because (1) some parts are
> > not relevant, like LC_CTYPE, which does not vary by locale, (2) some
> > parts will necessarily be represented in different ways, like
> > collation where we're using UCA rather than the POSIX form, and (3)
> > the format just has a lot of gratuitous cruft like symbolic character
> > names. It will also necessarily be extended because POSIX localedef
> > has no way to represent translated error strings etc. - keys for them
> > have to be added.
> > 
> > Going this route would have the source data in a fairly compact and
> > "well-known" (to certain audiences) form, but requires that the
> > tooling to produce binary locale files be aware of how these fields
> > translate to the data model for the binary form.
> > 
> > A sample (should be roughly correct C/POSIX locale) is attached for
> > reference.
> > 
> > 
> > 
> > 
> > Option 2: human-readable/text representation of the binary form
> > 
> > Describing this requires a basic intro to the binary form, which is a
> > multi-level hierarchical table mapping a path of integer key values to
> > a data blob. In text we can represent keys with symbolic constants,
> > but they're just a way of writing the underlying numbers. For example
> > the path strerror/0 leads to the "No error information" text,
> > strerror/EACCES leads to the "Permission denied" text, etc. Here
> > "strerror" just represents a number for the first-level path component
> > where strerror strings are stored, subindexed by (the arch/generic
> > versions of) the errno codes.
> > 
> > Going this route mostly avoids the need for smarts in the tooling, and
> > "has more flexibility" to encode things. But this also potentially
> > makes the encoding seem more arbitrary to localization folks.
> > 
> > Like in option 1, a sample (some hybrid between C/POSIX and a
> > hypothetical US-English locale, whipped up quick by hand as an
> > example) of one way this format could look is attached for reference.
> > An obvious variant that might be friendlier/more-familiar to folks
> > working with the data would be representing the same in json (which is
> > easy).
> > 
> > 
> > 
> > 
> > My leaning is towards option 1.
> > 
> > <sample_posix_localedef.txt><sample_binary_as_text.txt>
> 
> Hi Rich,
> 
> Thanks for continuing the locale work - very happy to see it
> progressing!
> 
> I definitely prefer option 1 as well.  This will allow an easy
> migration path for people using other Unix or Unix-like systems
> (Solaris, AIX, glibc Linux) where localedef is also used.  It also
> means there is also a large corpus of existing files we can use,
> both for testing the tooling and for initial drafts at porting musl
> to other locales.
> 
> I think it is reasonable to extend the file to handle translations
> for days of the week/months.  Is there a reason the existing system
> of gettext(3) can’t be used for strerror_l?

The fundamental problem with the current system we have is gettext
keying off of the English string. That was fatal for [AB]MON_5 "May",
but it's also less than ideal for error messages. For example it's
plausible we might use the same text for an errno code as for a regex
or getaddrinfo error message, and then the keys would clash. And of
course if the messages are changed at all, translation files get
invalidated.

I'll go over the proposed new binary format more when I finish writing
it up, but on top of avoiding all these issues, it lets us get rid of
all the repetitive linear-search-multistring operations in musl and
replace them with efficient O(1) lookup regardless of whether a locale
file or internal messages in libc are being used.

Rich

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [musl] Selecting locale source format
  2025-09-17  1:14 [musl] Selecting locale source format Rich Felker
  2025-09-17  1:23 ` A. Wilcox
@ 2025-09-17 15:43 ` enh
  2025-09-17 17:37   ` Rich Felker
  2025-09-17 20:31 ` Rich Felker
  2025-09-19 13:59 ` Pablo Correa Gomez
  3 siblings, 1 reply; 15+ messages in thread
From: enh @ 2025-09-17 15:43 UTC (permalink / raw)
  To: musl

On Tue, Sep 16, 2025 at 9:14 PM Rich Felker <dalias@libc.org> wrote:
>
> I have a proposed binary format for new locale files that I'm in the
> process of writing up, but Pablo brought it to my attention that,
> while binary format (ABI) is what's important to have down and stable
> at the time we integrate into musl, pinning down the source format is
> what's important/blocking for collaboration with localization folks.
>
> I have two candidate formats in the works right now for this:
>
>
>
> Option 1: subset+extension of POSIX localedef format.
>
> The basis for this format is described in
> https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap07.html
>
> If we go this way, it would be a "subset" because (1) some parts are
> not relevant, like LC_CTYPE, which does not vary by locale,

note that that's not true for 'i' in turkish/azeri locales, for
example. (unless you meant that you plan on using the unicode cldr
data directly here.)

see the "Language-Sensitive Mappings" section of SpecialCasing.txt for
all the special cases.

> (2) some
> parts will necessarily be represented in different ways, like
> collation where we're using UCA rather than the POSIX form, and (3)
> the format just has a lot of gratuitous cruft like symbolic character
> names. It will also necessarily be extended because POSIX localedef
> has no way to represent translated error strings etc. - keys for them
> have to be added.
>
> Going this route would have the source data in a fairly compact and
> "well-known" (to certain audiences) form, but requires that the
> tooling to produce binary locale files be aware of how these fields
> translate to the data model for the binary form.
>
> A sample (should be roughly correct C/POSIX locale) is attached for
> reference.
>
>
>
>
> Option 2: human-readable/text representation of the binary form
>
> Describing this requires a basic intro to the binary form, which is a
> multi-level hierarchical table mapping a path of integer key values to
> a data blob. In text we can represent keys with symbolic constants,
> but they're just a way of writing the underlying numbers. For example
> the path strerror/0 leads to the "No error information" text,
> strerror/EACCES leads to the "Permission denied" text, etc. Here
> "strerror" just represents a number for the first-level path component
> where strerror strings are stored, subindexed by (the arch/generic
> versions of) the errno codes.
>
> Going this route mostly avoids the need for smarts in the tooling, and
> "has more flexibility" to encode things. But this also potentially
> makes the encoding seem more arbitrary to localization folks.
>
> Like in option 1, a sample (some hybrid between C/POSIX and a
> hypothetical US-English locale, whipped up quick by hand as an
> example) of one way this format could look is attached for reference.
> An obvious variant that might be friendlier/more-familiar to folks
> working with the data would be representing the same in json (which is
> easy).
>
>
>
>
> My leaning is towards option 1.
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [musl] Selecting locale source format
  2025-09-17 15:43 ` enh
@ 2025-09-17 17:37   ` Rich Felker
  0 siblings, 0 replies; 15+ messages in thread
From: Rich Felker @ 2025-09-17 17:37 UTC (permalink / raw)
  To: enh; +Cc: musl

On Wed, Sep 17, 2025 at 11:43:46AM -0400, enh wrote:
> On Tue, Sep 16, 2025 at 9:14 PM Rich Felker <dalias@libc.org> wrote:
> >
> > I have a proposed binary format for new locale files that I'm in the
> > process of writing up, but Pablo brought it to my attention that,
> > while binary format (ABI) is what's important to have down and stable
> > at the time we integrate into musl, pinning down the source format is
> > what's important/blocking for collaboration with localization folks.
> >
> > I have two candidate formats in the works right now for this:
> >
> >
> >
> > Option 1: subset+extension of POSIX localedef format.
> >
> > The basis for this format is described in
> > https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap07.html
> >
> > If we go this way, it would be a "subset" because (1) some parts are
> > not relevant, like LC_CTYPE, which does not vary by locale,
> 
> note that that's not true

This was a statement about musl and musl's LC_CTYPE, not about what
you could theoretically do.

> for 'i' in turkish/azeri locales, for
> example. (unless you meant that you plan on using the unicode cldr
> data directly here.)
> 
> see the "Language-Sensitive Mappings" section of SpecialCasing.txt for
> all the special cases.

There really is not a way to support this except in legacy 8bit
encodings, which are out-of-scope for musl, This is because the
interface doesn't have any way for toupper() or tolower() to map to a
multibyte sequence. AFAICT tolower/toupper and towlower/towupper have
to be consistent with each other, but can't be.

In any case re-litigating this is not in the scope of the project at
hand.

There is all sorts of complexity to transforming case of
natural-language text that cannot adequately be supported by any of
the standard C interfaces but that requires a more expressive
framework. The standard interfaces are really not suitable for
anything more than case-insensitive comparisons (if even that; they
don't suffice even for that in the case of ß vs SS) or other very
basic uses.

Rich

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [musl] Selecting locale source format
  2025-09-17  1:14 [musl] Selecting locale source format Rich Felker
  2025-09-17  1:23 ` A. Wilcox
  2025-09-17 15:43 ` enh
@ 2025-09-17 20:31 ` Rich Felker
  2026-03-02 13:54   ` Pablo Correa Gomez
  2025-09-19 13:59 ` Pablo Correa Gomez
  3 siblings, 1 reply; 15+ messages in thread
From: Rich Felker @ 2025-09-17 20:31 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 3186 bytes --]

On Tue, Sep 16, 2025 at 09:14:07PM -0400, Rich Felker wrote:
> I have a proposed binary format for new locale files that I'm in the
> process of writing up, but Pablo brought it to my attention that,
> while binary format (ABI) is what's important to have down and stable
> at the time we integrate into musl, pinning down the source format is
> what's important/blocking for collaboration with localization folks.
> 
> I have two candidate formats in the works right now for this:
> 
> 
> 
> Option 1: subset+extension of POSIX localedef format.
> 
> The basis for this format is described in
> https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap07.html
> 
> If we go this way, it would be a "subset" because (1) some parts are
> not relevant, like LC_CTYPE, which does not vary by locale, (2) some
> parts will necessarily be represented in different ways, like
> collation where we're using UCA rather than the POSIX form, and (3)
> the format just has a lot of gratuitous cruft like symbolic character
> names. It will also necessarily be extended because POSIX localedef
> has no way to represent translated error strings etc. - keys for them
> have to be added.
> 
> Going this route would have the source data in a fairly compact and
> "well-known" (to certain audiences) form, but requires that the
> tooling to produce binary locale files be aware of how these fields
> translate to the data model for the binary form.
> 
> A sample (should be roughly correct C/POSIX locale) is attached for
> reference.

Based on my and others' preference so far being this option 1, I've
been putting together a short program to programmatically generate a
file in this format from the active host locale. This seems useful
both as a source of the template and as a means to verify that all of
the existing information is represented/representable.

The attached version should be dumping all should-be-localizable data
from musl except signal descriptions (strsignal). These require some
consideration since the set of signals that need naming is very
slightly arch-specific (there is a largely unused "SIGEMT" on mips* in
place of the also largely unused "SIGSTKFLT" on other archs), and
there is fundamentally no way to extract the string for the one that's
not present on the host arch.

Another slight omission that needs consideration is having keys for
the "unknown error" cases. For strerror we just treat unknowns the
same as 0 ("No error information", not "Success"), but for regerror,
REG_OK is treated distinctly from invalid error codes. gai_strerror
and hstrerror are like this too, but by choice; we could assign "0" as
"unknown" easily for them. Signals already use 0 as "unknown".

Running the program also exposes some errors in musl's built-in
C/C.UTF-8 locale, such as the LC_TIME era and alt digits stuff
containing copies of the non-era/normal-digits data rather than ""
indicating "not available". I don't know why this was done; aside from
ALT_DIGITS it doesn't even work for the old gettext-type locale
support because duplicating the non-era strings as keys inherently
gives duplicate keys.

Current draft of the generation program is attached.

Rich

[-- Attachment #2: dumplocale.c --]
[-- Type: text/plain, Size: 6763 bytes --]

#include <stdio.h>
#include <locale.h>
#include <limits.h>
#include <time.h>
#include <langinfo.h>
#include <stddef.h>
#include <errno.h>
#include <string.h>
#include <regex.h>
#include <netdb.h>

#define E0 0
#define H0 0
#define E(x) { #x, x }
struct errkey {
	const char *name;
	int key;
};

static struct errkey errors[] = {
E(E0),
E(EPERM),E(ENOENT),E(ESRCH),E(EINTR),E(EIO),E(ENXIO),E(E2BIG),E(ENOEXEC),
E(EBADF),E(ECHILD),E(EAGAIN),E(ENOMEM),E(EACCES),E(EFAULT),E(ENOTBLK),
E(EBUSY),E(EEXIST),E(EXDEV),E(ENODEV),E(ENOTDIR),E(EISDIR),E(EINVAL),
E(ENFILE),E(EMFILE),E(ENOTTY),E(ETXTBSY),E(EFBIG),E(ENOSPC),E(ESPIPE),
E(EROFS),E(EMLINK),E(EPIPE),E(EDOM),E(ERANGE),E(EDEADLK),E(ENAMETOOLONG),
E(ENOLCK),E(ENOSYS),E(ENOTEMPTY),E(ELOOP),E(EWOULDBLOCK),E(ENOMSG),
E(EIDRM),E(ECHRNG),E(EL2NSYNC),E(EL3HLT),E(EL3RST),E(ELNRNG),E(EUNATCH),
E(ENOCSI),E(EL2HLT),E(EBADE),E(EBADR),E(EXFULL),E(ENOANO),E(EBADRQC),
E(EBADSLT),E(EDEADLOCK),E(EBFONT),E(ENOSTR),E(ENODATA),E(ETIME),E(ENOSR),
E(ENONET),E(ENOPKG),E(EREMOTE),E(ENOLINK),E(EADV),E(ESRMNT),E(ECOMM),
E(EPROTO),E(EMULTIHOP),E(EDOTDOT),E(EBADMSG),E(EOVERFLOW),E(ENOTUNIQ),
E(EBADFD),E(EREMCHG),E(ELIBACC),E(ELIBBAD),E(ELIBSCN),E(ELIBMAX),
E(ELIBEXEC),E(EILSEQ),E(ERESTART),E(ESTRPIPE),E(EUSERS),E(ENOTSOCK),
E(EDESTADDRREQ),E(EMSGSIZE),E(EPROTOTYPE),E(ENOPROTOOPT),
E(EPROTONOSUPPORT),E(ESOCKTNOSUPPORT),E(EOPNOTSUPP),E(ENOTSUP),
E(EPFNOSUPPORT),E(EAFNOSUPPORT),E(EADDRINUSE),E(EADDRNOTAVAIL),
E(ENETDOWN),E(ENETUNREACH),E(ENETRESET),E(ECONNABORTED),E(ECONNRESET),
E(ENOBUFS),E(EISCONN),E(ENOTCONN),E(ESHUTDOWN),E(ETOOMANYREFS),
E(ETIMEDOUT),E(ECONNREFUSED),E(EHOSTDOWN),E(EHOSTUNREACH),
E(EALREADY),E(EINPROGRESS),E(ESTALE),E(EUCLEAN),E(ENOTNAM),E(ENAVAIL),
E(EISNAM),E(EREMOTEIO),E(EDQUOT),E(ENOMEDIUM),E(EMEDIUMTYPE),
E(ECANCELED),E(ENOKEY),E(EKEYEXPIRED),E(EKEYREVOKED),E(EKEYREJECTED),
E(EOWNERDEAD),E(ENOTRECOVERABLE),E(ERFKILL),E(EHWPOISON),
};

static struct errkey regerrors[] = {
E(REG_OK),E(REG_NOMATCH),E(REG_BADPAT),E(REG_ECOLLATE),E(REG_ECTYPE),
E(REG_EESCAPE),E(REG_ESUBREG),E(REG_EBRACK),E(REG_EPAREN),E(REG_EBRACE),
E(REG_BADBR),E(REG_ERANGE),E(REG_ESPACE),E(REG_BADRPT),
};

static struct errkey gaierrors[] = {
E(EAI_BADFLAGS),E(EAI_NONAME),E(EAI_AGAIN),E(EAI_FAIL),E(EAI_NODATA),
E(EAI_FAMILY),E(EAI_SOCKTYPE),E(EAI_SERVICE),E(EAI_MEMORY),E(EAI_SYSTEM),
E(EAI_OVERFLOW),
};

static struct errkey herrors[] = {
E(H0),E(HOST_NOT_FOUND),E(TRY_AGAIN),E(NO_RECOVERY),E(NO_DATA),
};

static int lc_remap(int v)
{
	return v==CHAR_MAX ? -1 : v;
}

static void emit_time_strings(const char *label, size_t offset, int start, int count, const char *fmt)
{
	printf("%s ", label);
	for (int i=0; i<count; i++) {
		char buf[100];
		struct tm tm = { 0 };
		*(int *)((char *)&tm + offset) = i+start;
		strftime(buf, sizeof buf, fmt, &tm);
		printf("\"%s\"", buf);
		if (i+1<count) putchar(';');
	}
	putchar('\n');
}

static const char *wrap_strerror(int err, char *buf, size_t size)
{
	return strerror(err);
}

static const char *wrap_regerror(int err, char *buf, size_t size)
{
	regerror(err, 0, buf, size);
	return buf;
}

static const char *wrap_gai_strerror(int err, char *buf, size_t size)
{
	return gai_strerror(err);
}

static const char *wrap_hstrerror(int err, char *buf, size_t size)
{
	return hstrerror(err);
}


static void emit_error_strings(struct errkey *ek, size_t count, const char *(*f)(int, char *, size_t))
{
	for (size_t i=0; i<count; i++) {
		char buf[200];
		printf("%s \"%s\"\n", ek[i].name, f(ek[i].key, buf, sizeof buf));
	}
}

int main()
{
	setlocale(LC_ALL, "");
	struct lconv *lc = localeconv();

	puts("LC_TIME");
	emit_time_strings("abday", offsetof(struct tm, tm_wday), 0, 7, "%a");
	emit_time_strings("day", offsetof(struct tm, tm_wday), 0, 7, "%A");
	emit_time_strings("abmon", offsetof(struct tm, tm_mon), 0, 12, "%b");
	emit_time_strings("mon", offsetof(struct tm, tm_mon), 0, 12, "%B");
	emit_time_strings("am_pm", offsetof(struct tm, tm_hour), 11, 2, "%p");
	printf("d_t_fmt \"%s\"\n", nl_langinfo(D_T_FMT));
	printf("d_fmt \"%s\"\n", nl_langinfo(D_FMT));
	printf("t_fmt \"%s\"\n", nl_langinfo(T_FMT));
	printf("t_fmt_ampm \"%s\"\n", nl_langinfo(T_FMT_AMPM));
	printf("era \"%s\"\n", nl_langinfo(ERA));
	printf("era_d_t_fmt \"%s\"\n", nl_langinfo(ERA_D_T_FMT));
	printf("era_d_fmt \"%s\"\n", nl_langinfo(ERA_D_FMT));
	printf("era_t_fmt \"%s\"\n", nl_langinfo(ERA_T_FMT));
	printf("alt_digits \"%s\"\n", nl_langinfo(ALT_DIGITS));
	puts("END LC_TIME");
	puts("");

	puts("LC_NUMERIC");
	printf("decimal_point \"%s\"\n", lc->decimal_point);
	printf("thousands_sep \"%s\"\n", lc->thousands_sep);
	printf("grouping ");
	if (!lc->grouping[0]) printf("%d", -1);
	for (char *p = lc->grouping; *p; p++) {
		printf("%d", lc_remap(*p));
		if (p[1]) putchar(';');
	}
	putchar('\n');
	puts("END LC_NUMERIC");
	puts("");
	
	puts("LC_MONETARY");
	printf("int_curr_symbol \"%s\"\n", lc->int_curr_symbol);
	printf("currency_symbol \"%s\"\n", lc->currency_symbol);
	printf("mon_decimal_point \"%s\"\n", lc->mon_decimal_point);
	printf("mon_thousands_sep \"%s\"\n", lc->mon_thousands_sep);
	printf("mon_grouping ");
	if (!lc->mon_grouping[0]) printf("%d", -1);
	for (char *p = lc->mon_grouping; *p; p++) {
		printf("%d", lc_remap(*p));
		if (p[1]) putchar(';');
	}
	putchar('\n');
	printf("positive_sign \"%s\"\n", lc->positive_sign);
	printf("negative_sign \"%s\"\n", lc->negative_sign);
	printf("int_frac_digits %d\n", lc_remap(lc->int_frac_digits));
	printf("frac_digits %d\n", lc_remap(lc->frac_digits));
	printf("p_cs_precedes %d\n", lc_remap(lc->p_cs_precedes));
	printf("p_sep_by_space %d\n", lc_remap(lc->p_sep_by_space));
	printf("n_cs_precedes %d\n", lc_remap(lc->n_cs_precedes));
	printf("n_sep_by_space %d\n", lc_remap(lc->n_sep_by_space));
	printf("p_sign_posn %d\n", lc_remap(lc->p_sign_posn));
	printf("n_sign_posn %d\n", lc_remap(lc->n_sign_posn));
	printf("int_p_cs_precedes %d\n", lc_remap(lc->int_p_cs_precedes));
	printf("int_p_sep_by_space %d\n", lc_remap(lc->int_p_sep_by_space));
	printf("int_n_cs_precedes %d\n", lc_remap(lc->int_n_cs_precedes));
	printf("int_n_sep_by_space %d\n", lc_remap(lc->int_n_sep_by_space));
	printf("int_p_sign_posn %d\n", lc_remap(lc->int_p_sign_posn));
	printf("int_n_sign_posn %d\n", lc_remap(lc->int_n_sign_posn));
	puts("END LC_MONETARY");
	puts("");

	puts("LC_MESSAGES");
	printf("yesexpr \"%s\"\n", nl_langinfo(YESEXPR));
	printf("noexpr \"%s\"\n", nl_langinfo(NOEXPR));
	emit_error_strings(errors, sizeof errors/sizeof *errors, wrap_strerror);
	emit_error_strings(regerrors, sizeof regerrors/sizeof *regerrors, wrap_regerror);
	emit_error_strings(gaierrors, sizeof gaierrors/sizeof *gaierrors, wrap_gai_strerror);
	emit_error_strings(herrors, sizeof herrors/sizeof *herrors, wrap_hstrerror);
	puts("END LC_MESSAGES");
	puts("");
}


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [musl] Selecting locale source format
  2025-09-17  1:14 [musl] Selecting locale source format Rich Felker
                   ` (2 preceding siblings ...)
  2025-09-17 20:31 ` Rich Felker
@ 2025-09-19 13:59 ` Pablo Correa Gomez
  2025-10-01 13:55   ` Pablo Correa Gomez
  3 siblings, 1 reply; 15+ messages in thread
From: Pablo Correa Gomez @ 2025-09-19 13:59 UTC (permalink / raw)
  To: Rich Felker, musl

Thanks a lot Rich for the follow-up.

I have now called the attention of the translators, and asked them some
further questions, most importantly if there is something they think
won't be accommodated by their language.

Personally, I do not see anything that might break Spanish, as we have
in the current format. 

I also like option (1) best, mostly out of it being more compact, and
similar to what other people are doing. IIRC glibc locale translations
look very similar.

Best,
Pablo

El mar, 16-09-2025 a las 21:14 -0400, Rich Felker escribió:
> I have a proposed binary format for new locale files that I'm in the
> process of writing up, but Pablo brought it to my attention that,
> while binary format (ABI) is what's important to have down and stable
> at the time we integrate into musl, pinning down the source format is
> what's important/blocking for collaboration with localization folks.
> 
> I have two candidate formats in the works right now for this:
> 
> 
> 
> Option 1: subset+extension of POSIX localedef format.
> 
> The basis for this format is described in
> https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap07.h
> tml
> 
> If we go this way, it would be a "subset" because (1) some parts are
> not relevant, like LC_CTYPE, which does not vary by locale, (2) some
> parts will necessarily be represented in different ways, like
> collation where we're using UCA rather than the POSIX form, and (3)
> the format just has a lot of gratuitous cruft like symbolic character
> names. It will also necessarily be extended because POSIX localedef
> has no way to represent translated error strings etc. - keys for them
> have to be added.
> 
> Going this route would have the source data in a fairly compact and
> "well-known" (to certain audiences) form, but requires that the
> tooling to produce binary locale files be aware of how these fields
> translate to the data model for the binary form.
> 
> A sample (should be roughly correct C/POSIX locale) is attached for
> reference.
> 
> 
> 
> 
> Option 2: human-readable/text representation of the binary form
> 
> Describing this requires a basic intro to the binary form, which is a
> multi-level hierarchical table mapping a path of integer key values
> to
> a data blob. In text we can represent keys with symbolic constants,
> but they're just a way of writing the underlying numbers. For example
> the path strerror/0 leads to the "No error information" text,
> strerror/EACCES leads to the "Permission denied" text, etc. Here
> "strerror" just represents a number for the first-level path
> component
> where strerror strings are stored, subindexed by (the arch/generic
> versions of) the errno codes.
> 
> Going this route mostly avoids the need for smarts in the tooling,
> and
> "has more flexibility" to encode things. But this also potentially
> makes the encoding seem more arbitrary to localization folks.
> 
> Like in option 1, a sample (some hybrid between C/POSIX and a
> hypothetical US-English locale, whipped up quick by hand as an
> example) of one way this format could look is attached for reference.
> An obvious variant that might be friendlier/more-familiar to folks
> working with the data would be representing the same in json (which
> is
> easy).
> 
> 
> 
> 
> My leaning is towards option 1.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [musl] Selecting locale source format
  2025-09-17  1:36   ` Rich Felker
@ 2025-09-19 14:06     ` Pablo Correa Gomez
  2026-03-02 13:22     ` Pablo Correa Gomez
  1 sibling, 0 replies; 15+ messages in thread
From: Pablo Correa Gomez @ 2025-09-19 14:06 UTC (permalink / raw)
  To: Rich Felker, A. Wilcox; +Cc: musl

El mar, 16-09-2025 a las 21:36 -0400, Rich Felker escribió:
> On Tue, Sep 16, 2025 at 08:23:09PM -0500, A. Wilcox wrote:
> > On Sep 16, 2025, at 20:14, Rich Felker <dalias@libc.org> wrote:
> > > 
> > > I have a proposed binary format for new locale files that I'm in
> > > the
> > > process of writing up, but Pablo brought it to my attention that,
> > > while binary format (ABI) is what's important to have down and
> > > stable
> > > at the time we integrate into musl, pinning down the source
> > > format is
> > > what's important/blocking for collaboration with localization
> > > folks.
> > > 
> > > I have two candidate formats in the works right now for this:
> > > 
> > > 
> > > 
> > > Option 1: subset+extension of POSIX localedef format.
> > > 
> > > The basis for this format is described in
> > > https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap
> > > 07.html
> > > 
> > > If we go this way, it would be a "subset" because (1) some parts
> > > are
> > > not relevant, like LC_CTYPE, which does not vary by locale, (2)
> > > some
> > > parts will necessarily be represented in different ways, like
> > > collation where we're using UCA rather than the POSIX form, and
> > > (3)
> > > the format just has a lot of gratuitous cruft like symbolic
> > > character
> > > names. It will also necessarily be extended because POSIX
> > > localedef
> > > has no way to represent translated error strings etc. - keys for
> > > them
> > > have to be added.
> > > 
> > > Going this route would have the source data in a fairly compact
> > > and
> > > "well-known" (to certain audiences) form, but requires that the
> > > tooling to produce binary locale files be aware of how these
> > > fields
> > > translate to the data model for the binary form.
> > > 
> > > A sample (should be roughly correct C/POSIX locale) is attached
> > > for
> > > reference.
> > > 
> > > 
> > > 
> > > 
> > > Option 2: human-readable/text representation of the binary form
> > > 
> > > Describing this requires a basic intro to the binary form, which
> > > is a
> > > multi-level hierarchical table mapping a path of integer key
> > > values to
> > > a data blob. In text we can represent keys with symbolic
> > > constants,
> > > but they're just a way of writing the underlying numbers. For
> > > example
> > > the path strerror/0 leads to the "No error information" text,
> > > strerror/EACCES leads to the "Permission denied" text, etc. Here
> > > "strerror" just represents a number for the first-level path
> > > component
> > > where strerror strings are stored, subindexed by (the
> > > arch/generic
> > > versions of) the errno codes.
> > > 
> > > Going this route mostly avoids the need for smarts in the
> > > tooling, and
> > > "has more flexibility" to encode things. But this also
> > > potentially
> > > makes the encoding seem more arbitrary to localization folks.
> > > 
> > > Like in option 1, a sample (some hybrid between C/POSIX and a
> > > hypothetical US-English locale, whipped up quick by hand as an
> > > example) of one way this format could look is attached for
> > > reference.
> > > An obvious variant that might be friendlier/more-familiar to
> > > folks
> > > working with the data would be representing the same in json
> > > (which is
> > > easy).
> > > 
> > > 
> > > 
> > > 
> > > My leaning is towards option 1.
> > > 
> > > <sample_posix_localedef.txt><sample_binary_as_text.txt>
> > 
> > Hi Rich,
> > 
> > Thanks for continuing the locale work - very happy to see it
> > progressing!
> > 
> > I definitely prefer option 1 as well.  This will allow an easy
> > migration path for people using other Unix or Unix-like systems
> > (Solaris, AIX, glibc Linux) where localedef is also used.  It also
> > means there is also a large corpus of existing files we can use,
> > both for testing the tooling and for initial drafts at porting musl
> > to other locales.
> > 
> > I think it is reasonable to extend the file to handle translations
> > for days of the week/months.  Is there a reason the existing system
> > of gettext(3) can’t be used for strerror_l?
> 
> The fundamental problem with the current system we have is gettext
> keying off of the English string. That was fatal for [AB]MON_5 "May",
> but it's also less than ideal for error messages. For example it's
> plausible we might use the same text for an errno code as for a regex
> or getaddrinfo error message, and then the keys would clash. And of
> course if the messages are changed at all, translation files get
> invalidated.

@A.Wilcox, in case you missed it, the decision to go for this kind of
representation was discussed in
https://www.openwall.com/lists/musl/2025/06/02/2, point 1. Sorry that
ended up being a bit of a long email.

Best,
Pablo


> 
> I'll go over the proposed new binary format more when I finish
> writing
> it up, but on top of avoiding all these issues, it lets us get rid of
> all the repetitive linear-search-multistring operations in musl and
> replace them with efficient O(1) lookup regardless of whether a
> locale
> file or internal messages in libc are being used.
> 
> Rich

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [musl] Selecting locale source format
  2025-09-19 13:59 ` Pablo Correa Gomez
@ 2025-10-01 13:55   ` Pablo Correa Gomez
  2025-10-01 17:21     ` Markus Wichmann
                       ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Pablo Correa Gomez @ 2025-10-01 13:55 UTC (permalink / raw)
  To: Rich Felker, musl

We got now a few replies from translators, and the most remarkable
thing that was brought up is how to deal with natural text whose
translations might change depending on context. Both plural forms and
declinations were brought up.

Discussing a bit with Rich, it seems that such thing will not be an
issue for strings related to the libc API, which is what is the biggest
concern of the work we are doing now. However, there are
implementation-dependent strings in libc, like dynamic linker messages,
which could potentially be added in the future. Still, since we are
setting the file format, it would be important to make sure that
whatever we come up now is flexible enough to not block future
development. Any thoughts?

Pablo 

El vie, 19-09-2025 a las 15:59 +0200, Pablo Correa Gomez escribió:
> Thanks a lot Rich for the follow-up.
> 
> I have now called the attention of the translators, and asked them
> some
> further questions, most importantly if there is something they think
> won't be accommodated by their language.
> 
> Personally, I do not see anything that might break Spanish, as we
> have
> in the current format. 
> 
> I also like option (1) best, mostly out of it being more compact, and
> similar to what other people are doing. IIRC glibc locale
> translations
> look very similar.
> 
> Best,
> Pablo
> 
> El mar, 16-09-2025 a las 21:14 -0400, Rich Felker escribió:
> > I have a proposed binary format for new locale files that I'm in
> > the
> > process of writing up, but Pablo brought it to my attention that,
> > while binary format (ABI) is what's important to have down and
> > stable
> > at the time we integrate into musl, pinning down the source format
> > is
> > what's important/blocking for collaboration with localization
> > folks.
> > 
> > I have two candidate formats in the works right now for this:
> > 
> > 
> > 
> > Option 1: subset+extension of POSIX localedef format.
> > 
> > The basis for this format is described in
> > https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap07
> > .h
> > tml
> > 
> > If we go this way, it would be a "subset" because (1) some parts
> > are
> > not relevant, like LC_CTYPE, which does not vary by locale, (2)
> > some
> > parts will necessarily be represented in different ways, like
> > collation where we're using UCA rather than the POSIX form, and (3)
> > the format just has a lot of gratuitous cruft like symbolic
> > character
> > names. It will also necessarily be extended because POSIX localedef
> > has no way to represent translated error strings etc. - keys for
> > them
> > have to be added.
> > 
> > Going this route would have the source data in a fairly compact and
> > "well-known" (to certain audiences) form, but requires that the
> > tooling to produce binary locale files be aware of how these fields
> > translate to the data model for the binary form.
> > 
> > A sample (should be roughly correct C/POSIX locale) is attached for
> > reference.
> > 
> > 
> > 
> > 
> > Option 2: human-readable/text representation of the binary form
> > 
> > Describing this requires a basic intro to the binary form, which is
> > a
> > multi-level hierarchical table mapping a path of integer key values
> > to
> > a data blob. In text we can represent keys with symbolic constants,
> > but they're just a way of writing the underlying numbers. For
> > example
> > the path strerror/0 leads to the "No error information" text,
> > strerror/EACCES leads to the "Permission denied" text, etc. Here
> > "strerror" just represents a number for the first-level path
> > component
> > where strerror strings are stored, subindexed by (the arch/generic
> > versions of) the errno codes.
> > 
> > Going this route mostly avoids the need for smarts in the tooling,
> > and
> > "has more flexibility" to encode things. But this also potentially
> > makes the encoding seem more arbitrary to localization folks.
> > 
> > Like in option 1, a sample (some hybrid between C/POSIX and a
> > hypothetical US-English locale, whipped up quick by hand as an
> > example) of one way this format could look is attached for
> > reference.
> > An obvious variant that might be friendlier/more-familiar to folks
> > working with the data would be representing the same in json (which
> > is
> > easy).
> > 
> > 
> > 
> > 
> > My leaning is towards option 1.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [musl] Selecting locale source format
  2025-10-01 13:55   ` Pablo Correa Gomez
@ 2025-10-01 17:21     ` Markus Wichmann
  2026-03-02 13:35       ` Pablo Correa Gomez
  2025-10-01 17:51     ` Demi Marie Obenour
  2025-10-02  2:34     ` Rich Felker
  2 siblings, 1 reply; 15+ messages in thread
From: Markus Wichmann @ 2025-10-01 17:21 UTC (permalink / raw)
  To: musl

Am Wed, Oct 01, 2025 at 03:55:59PM +0200 schrieb Pablo Correa Gomez:
> We got now a few replies from translators, and the most remarkable
> thing that was brought up is how to deal with natural text whose
> translations might change depending on context. Both plural forms and
> declinations were brought up.
> 
> Discussing a bit with Rich, it seems that such thing will not be an
> issue for strings related to the libc API, which is what is the biggest
> concern of the work we are doing now. However, there are
> implementation-dependent strings in libc, like dynamic linker messages,
> which could potentially be added in the future. Still, since we are
> setting the file format, it would be important to make sure that
> whatever we come up now is flexible enough to not block future
> development. Any thoughts?

The msgfmt source format specified by POSIX allows multiple plurals and
an arbitrary C expression to select the correct one. So that is one way
to go. The alternative is to stay agnostic to numbers and just always
use all forms in parentheses, e.g. "Loaded %d file(s)".

While I appreciate that the necessity for an expression parser might
increase complexity by a lot, what little I do remember of Russian
suggest that not many simpler alternatives exist. In Russian, just for
example, there are three numbered forms, namely the nominative singular,
the nominative plural, and the genitive plural. The nominative singular
is used whenever the number in question ends in 1 but not 11, the
nominative plural is used for 2-4 but not 12-14, and the genitive plural
is used in all other cases.

That logic is specific to one language (although I suspect a lot might
be shared with other Slavic languages), so it must be specified in the
source format if the feature is desired at all.

Ciao,
Markus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [musl] Selecting locale source format
  2025-10-01 13:55   ` Pablo Correa Gomez
  2025-10-01 17:21     ` Markus Wichmann
@ 2025-10-01 17:51     ` Demi Marie Obenour
  2025-10-02  2:34     ` Rich Felker
  2 siblings, 0 replies; 15+ messages in thread
From: Demi Marie Obenour @ 2025-10-01 17:51 UTC (permalink / raw)
  To: musl, Pablo Correa Gomez, Rich Felker


[-- Attachment #1.1.1: Type: text/plain, Size: 1099 bytes --]

On 10/1/25 09:55, Pablo Correa Gomez wrote:
> We got now a few replies from translators, and the most remarkable
> thing that was brought up is how to deal with natural text whose
> translations might change depending on context. Both plural forms and
> declinations were brought up.
> 
> Discussing a bit with Rich, it seems that such thing will not be an
> issue for strings related to the libc API, which is what is the biggest
> concern of the work we are doing now. However, there are
> implementation-dependent strings in libc, like dynamic linker messages,
> which could potentially be added in the future. Still, since we are
> setting the file format, it would be important to make sure that
> whatever we come up now is flexible enough to not block future
> development. Any thoughts?

https://projectfluent.org is the best I know of, but it has no
tooling for C that I am aware of.  I do prefer something that avoids
translation files containing executable code, as it lowers the risk of
accepting translation patches.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 7253 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [musl] Selecting locale source format
  2025-10-01 13:55   ` Pablo Correa Gomez
  2025-10-01 17:21     ` Markus Wichmann
  2025-10-01 17:51     ` Demi Marie Obenour
@ 2025-10-02  2:34     ` Rich Felker
  2 siblings, 0 replies; 15+ messages in thread
From: Rich Felker @ 2025-10-02  2:34 UTC (permalink / raw)
  To: Pablo Correa Gomez; +Cc: musl

On Wed, Oct 01, 2025 at 03:55:59PM +0200, Pablo Correa Gomez wrote:
> We got now a few replies from translators, and the most remarkable
> thing that was brought up is how to deal with natural text whose
> translations might change depending on context. Both plural forms and
> declinations were brought up.
> 
> Discussing a bit with Rich, it seems that such thing will not be an
> issue for strings related to the libc API, which is what is the biggest
> concern of the work we are doing now. However, there are
> implementation-dependent strings in libc, like dynamic linker messages,
> which could potentially be added in the future. Still, since we are
> setting the file format, it would be important to make sure that
> whatever we come up now is flexible enough to not block future
> development. Any thoughts?

To summarize that discussion, most of the translatable strings in
musl/libc are fixed-form messages returned to the caller to use as it
sees fit. These inherently don't have any plural or other contextual
forms because no context is available to us.

What's left are strings that are not themselves part of any standard
interface surface but where we're reporting things directly to the
user (something we mostly choose very intentionally not to do, with
the main exception being dynamic linking failure conditions at
startup) or where the interface allows us to construct a more detailed
and contextualized message (presently this is just dlerror).

The reason the first try at supporting localized text omitted the
dynamic linker (startup and dlopen) strings from being translatable by
gettext was pretty much specifically this: that I did not want
safety/correctness to depend on having type-matched format strings in
the locale definition file. My intent then was that, if/when we make
them translatable, we adjust the messages to be less
"natural-language" in form and instead consist of
separately-translatable fields, something like:

	Relocation error: foo.so: symbol: [cause of failure]

I'm not entirely committed to this if other folks disagree, but I
think it both makes translation cleaner and makes it easier to
understand bug reports with messages in a language you might not read.

If we don't do it this way, I'd want to have an internal interface for
validating that format strings are type-matched before using them. In
that case, if there are variants needed, we'd have to enumerate them
and assign them each an integer key.

Rich

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [musl] Selecting locale source format
  2025-09-17  1:36   ` Rich Felker
  2025-09-19 14:06     ` Pablo Correa Gomez
@ 2026-03-02 13:22     ` Pablo Correa Gomez
  1 sibling, 0 replies; 15+ messages in thread
From: Pablo Correa Gomez @ 2026-03-02 13:22 UTC (permalink / raw)
  To: Rich Felker, A. Wilcox; +Cc: musl

El Tue, 16-09-2025 a las 21:36 -0400, Rich Felker escribió:
> On Tue, Sep 16, 2025 at 08:23:09PM -0500, A. Wilcox wrote:
> > On Sep 16, 2025, at 20:14, Rich Felker <dalias@libc.org> wrote:
> > > 
> > > I have a proposed binary format for new locale files that I'm in the
> > > process of writing up, but Pablo brought it to my attention that,
> > > while binary format (ABI) is what's important to have down and stable
> > > at the time we integrate into musl, pinning down the source format is
> > > what's important/blocking for collaboration with localization folks.
> > > 
> > > I have two candidate formats in the works right now for this:
> > > 
> > > 
> > > 
> > > Option 1: subset+extension of POSIX localedef format.
> > > 
> > > The basis for this format is described in
> > > https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap07.html
> > > 
> > > If we go this way, it would be a "subset" because (1) some parts are
> > > not relevant, like LC_CTYPE, which does not vary by locale, (2) some
> > > parts will necessarily be represented in different ways, like
> > > collation where we're using UCA rather than the POSIX form, and (3)
> > > the format just has a lot of gratuitous cruft like symbolic character
> > > names. It will also necessarily be extended because POSIX localedef
> > > has no way to represent translated error strings etc. - keys for them
> > > have to be added.
> > > 
> > > Going this route would have the source data in a fairly compact and
> > > "well-known" (to certain audiences) form, but requires that the
> > > tooling to produce binary locale files be aware of how these fields
> > > translate to the data model for the binary form.
> > > 
> > > A sample (should be roughly correct C/POSIX locale) is attached for
> > > reference.
> > > 
> > > 
> > > 
> > > 
> > > Option 2: human-readable/text representation of the binary form
> > > 
> > > Describing this requires a basic intro to the binary form, which is a
> > > multi-level hierarchical table mapping a path of integer key values to
> > > a data blob. In text we can represent keys with symbolic constants,
> > > but they're just a way of writing the underlying numbers. For example
> > > the path strerror/0 leads to the "No error information" text,
> > > strerror/EACCES leads to the "Permission denied" text, etc. Here
> > > "strerror" just represents a number for the first-level path component
> > > where strerror strings are stored, subindexed by (the arch/generic
> > > versions of) the errno codes.
> > > 
> > > Going this route mostly avoids the need for smarts in the tooling, and
> > > "has more flexibility" to encode things. But this also potentially
> > > makes the encoding seem more arbitrary to localization folks.
> > > 
> > > Like in option 1, a sample (some hybrid between C/POSIX and a
> > > hypothetical US-English locale, whipped up quick by hand as an
> > > example) of one way this format could look is attached for reference.
> > > An obvious variant that might be friendlier/more-familiar to folks
> > > working with the data would be representing the same in json (which is
> > > easy).
> > > 
> > > 
> > > 
> > > 
> > > My leaning is towards option 1.
> > > 
> > > <sample_posix_localedef.txt><sample_binary_as_text.txt>
> > 
> > Hi Rich,
> > 
> > Thanks for continuing the locale work - very happy to see it
> > progressing!
> > 
> > I definitely prefer option 1 as well.  This will allow an easy
> > migration path for people using other Unix or Unix-like systems
> > (Solaris, AIX, glibc Linux) where localedef is also used.  It also
> > means there is also a large corpus of existing files we can use,
> > both for testing the tooling and for initial drafts at porting musl
> > to other locales.
> > 
> > I think it is reasonable to extend the file to handle translations
> > for days of the week/months.  Is there a reason the existing system
> > of gettext(3) can’t be used for strerror_l?
> 
> The fundamental problem with the current system we have is gettext
> keying off of the English string. That was fatal for [AB]MON_5 "May",
> but it's also less than ideal for error messages. For example it's
> plausible we might use the same text for an errno code as for a regex
> or getaddrinfo error message, and then the keys would clash. And of
> course if the messages are changed at all, translation files get
> invalidated.
> 
> I'll go over the proposed new binary format more when I finish writing
> it up, but on top of avoiding all these issues, it lets us get rid of
> all the repetitive linear-search-multistring operations in musl and
> replace them with efficient O(1) lookup regardless of whether a locale
> file or internal messages in libc are being used.

A. Wilcox, now that there is a proposed binary format
in https://www.openwall.com/lists/musl/2026/02/25/1  do you have any further
thoughts on this? 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [musl] Selecting locale source format
  2025-10-01 17:21     ` Markus Wichmann
@ 2026-03-02 13:35       ` Pablo Correa Gomez
  0 siblings, 0 replies; 15+ messages in thread
From: Pablo Correa Gomez @ 2026-03-02 13:35 UTC (permalink / raw)
  To: Markus Wichmann, musl

El Wed, 01-10-2025 a las 19:21 +0200, Markus Wichmann escribió:
> Am Wed, Oct 01, 2025 at 03:55:59PM +0200 schrieb Pablo Correa Gomez:
> > We got now a few replies from translators, and the most remarkable
> > thing that was brought up is how to deal with natural text whose
> > translations might change depending on context. Both plural forms and
> > declinations were brought up.
> > 
> > Discussing a bit with Rich, it seems that such thing will not be an
> > issue for strings related to the libc API, which is what is the biggest
> > concern of the work we are doing now. However, there are
> > implementation-dependent strings in libc, like dynamic linker messages,
> > which could potentially be added in the future. Still, since we are
> > setting the file format, it would be important to make sure that
> > whatever we come up now is flexible enough to not block future
> > development. Any thoughts?
> 
> The msgfmt source format specified by POSIX allows multiple plurals and
> an arbitrary C expression to select the correct one. So that is one way
> to go. The alternative is to stay agnostic to numbers and just always
> use all forms in parentheses, e.g. "Loaded %d file(s)".
> 
> While I appreciate that the necessity for an expression parser might
> increase complexity by a lot, what little I do remember of Russian
> suggest that not many simpler alternatives exist. In Russian, just for
> example, there are three numbered forms, namely the nominative singular,
> the nominative plural, and the genitive plural. The nominative singular
> is used whenever the number in question ends in 1 but not 11, the
> nominative plural is used for 2-4 but not 12-14, and the genitive plural
> is used in all other cases.
> 
> That logic is specific to one language (although I suspect a lot might
> be shared with other Slavic languages), so it must be specified in the
> source format if the feature is desired at all.

Rich, do you have any thought for this? On my side, if we think this is too
complicated, I would be happy to just wait and tackle this in a follow-up
project, specially if it's going to delay the current work further.

Best,
Pablo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [musl] Selecting locale source format
  2025-09-17 20:31 ` Rich Felker
@ 2026-03-02 13:54   ` Pablo Correa Gomez
  0 siblings, 0 replies; 15+ messages in thread
From: Pablo Correa Gomez @ 2026-03-02 13:54 UTC (permalink / raw)
  To: Rich Felker, musl

[-- Attachment #1: Type: text/plain, Size: 3744 bytes --]

El Wed, 17-09-2025 a las 16:31 -0400, Rich Felker escribió:
> On Tue, Sep 16, 2025 at 09:14:07PM -0400, Rich Felker wrote:
> > I have a proposed binary format for new locale files that I'm in the
> > process of writing up, but Pablo brought it to my attention that,
> > while binary format (ABI) is what's important to have down and stable
> > at the time we integrate into musl, pinning down the source format is
> > what's important/blocking for collaboration with localization folks.
> > 
> > I have two candidate formats in the works right now for this:
> > 
> > 
> > 
> > Option 1: subset+extension of POSIX localedef format.
> > 
> > The basis for this format is described in
> > https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap07.html
> > 
> > If we go this way, it would be a "subset" because (1) some parts are
> > not relevant, like LC_CTYPE, which does not vary by locale, (2) some
> > parts will necessarily be represented in different ways, like
> > collation where we're using UCA rather than the POSIX form, and (3)
> > the format just has a lot of gratuitous cruft like symbolic character
> > names. It will also necessarily be extended because POSIX localedef
> > has no way to represent translated error strings etc. - keys for them
> > have to be added.
> > 
> > Going this route would have the source data in a fairly compact and
> > "well-known" (to certain audiences) form, but requires that the
> > tooling to produce binary locale files be aware of how these fields
> > translate to the data model for the binary form.
> > 
> > A sample (should be roughly correct C/POSIX locale) is attached for
> > reference.
> 
> Based on my and others' preference so far being this option 1, I've
> been putting together a short program to programmatically generate a
> file in this format from the active host locale. This seems useful
> both as a source of the template and as a means to verify that all of
> the existing information is represented/representable.
> 
> The attached version should be dumping all should-be-localizable data
> from musl except signal descriptions (strsignal). These require some
> consideration since the set of signals that need naming is very
> slightly arch-specific (there is a largely unused "SIGEMT" on mips* in
> place of the also largely unused "SIGSTKFLT" on other archs), and
> there is fundamentally no way to extract the string for the one that's
> not present on the host arch.
> 
> Another slight omission that needs consideration is having keys for
> the "unknown error" cases. For strerror we just treat unknowns the
> same as 0 ("No error information", not "Success"), but for regerror,
> REG_OK is treated distinctly from invalid error codes. gai_strerror
> and hstrerror are like this too, but by choice; we could assign "0" as
> "unknown" easily for them. Signals already use 0 as "unknown".
> 
> Running the program also exposes some errors in musl's built-in
> C/C.UTF-8 locale, such as the LC_TIME era and alt digits stuff
> containing copies of the non-era/normal-digits data rather than ""
> indicating "not available". I don't know why this was done; aside from
> ALT_DIGITS it doesn't even work for the old gettext-type locale
> support because duplicating the non-era strings as keys inherently
> gives duplicate keys.
> 
> Current draft of the generation program is attached.

I run this program with LANG=es_ES.UTF-8 with both musl and debian bookworm
glibc to have a bit of a look at the comparison. One can clearly see the May bug
in the Spanish locale in musl. To compile the programm under glibc I had to
remove REG_OK and EAI_NODATA macros, else it "just worked".

> 
> Rich

[-- Attachment #2: gcc.dump --]
[-- Type: text/plain, Size: 6648 bytes --]

LC_TIME
abday "dom";"lun";"mar";"mié";"jue";"vie";"sáb"
day "domingo";"lunes";"martes";"miércoles";"jueves";"viernes";"sábado"
abmon "ene";"feb";"mar";"abr";"may";"jun";"jul";"ago";"sep";"oct";"nov";"dic"
mon "enero";"febrero";"marzo";"abril";"mayo";"junio";"julio";"agosto";"septiembre";"octubre";"noviembre";"diciembre"
am_pm "";""
d_t_fmt "%a %d %b %Y %T"
d_fmt "%d/%m/%y"
t_fmt "%T"
t_fmt_ampm ""
era ""
era_d_t_fmt ""
era_d_fmt ""
era_t_fmt ""
alt_digits ""
END LC_TIME

LC_NUMERIC
decimal_point ","
thousands_sep "."
grouping 3;3
END LC_NUMERIC

LC_MONETARY
int_curr_symbol "EUR "
currency_symbol "€"
mon_decimal_point ","
mon_thousands_sep "."
mon_grouping 3;3
positive_sign ""
negative_sign "-"
int_frac_digits 2
frac_digits 2
p_cs_precedes 0
p_sep_by_space 1
n_cs_precedes 0
n_sep_by_space 1
p_sign_posn 1
n_sign_posn 1
int_p_cs_precedes 0
int_p_sep_by_space 1
int_n_cs_precedes 0
int_n_sep_by_space 1
int_p_sign_posn 1
int_n_sign_posn 1
END LC_MONETARY

LC_MESSAGES
yesexpr "^[+1sSyY]"
noexpr "^[-0nN]"
E0 "Success"
EPERM "Operation not permitted"
ENOENT "No such file or directory"
ESRCH "No such process"
EINTR "Interrupted system call"
EIO "Input/output error"
ENXIO "No such device or address"
E2BIG "Argument list too long"
ENOEXEC "Exec format error"
EBADF "Bad file descriptor"
ECHILD "No child processes"
EAGAIN "Resource temporarily unavailable"
ENOMEM "Cannot allocate memory"
EACCES "Permission denied"
EFAULT "Bad address"
ENOTBLK "Block device required"
EBUSY "Device or resource busy"
EEXIST "File exists"
EXDEV "Invalid cross-device link"
ENODEV "No such device"
ENOTDIR "Not a directory"
EISDIR "Is a directory"
EINVAL "Invalid argument"
ENFILE "Too many open files in system"
EMFILE "Too many open files"
ENOTTY "Inappropriate ioctl for device"
ETXTBSY "Text file busy"
EFBIG "File too large"
ENOSPC "No space left on device"
ESPIPE "Illegal seek"
EROFS "Read-only file system"
EMLINK "Too many links"
EPIPE "Broken pipe"
EDOM "Numerical argument out of domain"
ERANGE "Numerical result out of range"
EDEADLK "Resource deadlock avoided"
ENAMETOOLONG "File name too long"
ENOLCK "No locks available"
ENOSYS "Function not implemented"
ENOTEMPTY "Directory not empty"
ELOOP "Too many levels of symbolic links"
EWOULDBLOCK "Resource temporarily unavailable"
ENOMSG "No message of desired type"
EIDRM "Identifier removed"
ECHRNG "Channel number out of range"
EL2NSYNC "Level 2 not synchronized"
EL3HLT "Level 3 halted"
EL3RST "Level 3 reset"
ELNRNG "Link number out of range"
EUNATCH "Protocol driver not attached"
ENOCSI "No CSI structure available"
EL2HLT "Level 2 halted"
EBADE "Invalid exchange"
EBADR "Invalid request descriptor"
EXFULL "Exchange full"
ENOANO "No anode"
EBADRQC "Invalid request code"
EBADSLT "Invalid slot"
EDEADLOCK "Resource deadlock avoided"
EBFONT "Bad font file format"
ENOSTR "Device not a stream"
ENODATA "No data available"
ETIME "Timer expired"
ENOSR "Out of streams resources"
ENONET "Machine is not on the network"
ENOPKG "Package not installed"
EREMOTE "Object is remote"
ENOLINK "Link has been severed"
EADV "Advertise error"
ESRMNT "Srmount error"
ECOMM "Communication error on send"
EPROTO "Protocol error"
EMULTIHOP "Multihop attempted"
EDOTDOT "RFS specific error"
EBADMSG "Bad message"
EOVERFLOW "Value too large for defined data type"
ENOTUNIQ "Name not unique on network"
EBADFD "File descriptor in bad state"
EREMCHG "Remote address changed"
ELIBACC "Can not access a needed shared library"
ELIBBAD "Accessing a corrupted shared library"
ELIBSCN ".lib section in a.out corrupted"
ELIBMAX "Attempting to link in too many shared libraries"
ELIBEXEC "Cannot exec a shared library directly"
EILSEQ "Invalid or incomplete multibyte or wide character"
ERESTART "Interrupted system call should be restarted"
ESTRPIPE "Streams pipe error"
EUSERS "Too many users"
ENOTSOCK "Socket operation on non-socket"
EDESTADDRREQ "Destination address required"
EMSGSIZE "Message too long"
EPROTOTYPE "Protocol wrong type for socket"
ENOPROTOOPT "Protocol not available"
EPROTONOSUPPORT "Protocol not supported"
ESOCKTNOSUPPORT "Socket type not supported"
EOPNOTSUPP "Operation not supported"
ENOTSUP "Operation not supported"
EPFNOSUPPORT "Protocol family not supported"
EAFNOSUPPORT "Address family not supported by protocol"
EADDRINUSE "Address already in use"
EADDRNOTAVAIL "Cannot assign requested address"
ENETDOWN "Network is down"
ENETUNREACH "Network is unreachable"
ENETRESET "Network dropped connection on reset"
ECONNABORTED "Software caused connection abort"
ECONNRESET "Connection reset by peer"
ENOBUFS "No buffer space available"
EISCONN "Transport endpoint is already connected"
ENOTCONN "Transport endpoint is not connected"
ESHUTDOWN "Cannot send after transport endpoint shutdown"
ETOOMANYREFS "Too many references: cannot splice"
ETIMEDOUT "Connection timed out"
ECONNREFUSED "Connection refused"
EHOSTDOWN "Host is down"
EHOSTUNREACH "No route to host"
EALREADY "Operation already in progress"
EINPROGRESS "Operation now in progress"
ESTALE "Stale file handle"
EUCLEAN "Structure needs cleaning"
ENOTNAM "Not a XENIX named type file"
ENAVAIL "No XENIX semaphores available"
EISNAM "Is a named type file"
EREMOTEIO "Remote I/O error"
EDQUOT "Disk quota exceeded"
ENOMEDIUM "No medium found"
EMEDIUMTYPE "Wrong medium type"
ECANCELED "Operation canceled"
ENOKEY "Required key not available"
EKEYEXPIRED "Key has expired"
EKEYREVOKED "Key has been revoked"
EKEYREJECTED "Key was rejected by service"
EOWNERDEAD "Owner died"
ENOTRECOVERABLE "State not recoverable"
ERFKILL "Operation not possible due to RF-kill"
EHWPOISON "Memory page has hardware error"
REG_NOMATCH "No match"
REG_BADPAT "Invalid regular expression"
REG_ECOLLATE "Invalid collation character"
REG_ECTYPE "Invalid character class name"
REG_EESCAPE "Trailing backslash"
REG_ESUBREG "Invalid back reference"
REG_EBRACK "Unmatched [, [^, [:, [., or [="
REG_EPAREN "Unmatched ( or \("
REG_EBRACE "Unmatched \{"
REG_BADBR "Invalid content of \{\}"
REG_ERANGE "Invalid range end"
REG_ESPACE "Memory exhausted"
REG_BADRPT "Invalid preceding regular expression"
EAI_BADFLAGS "Bad value for ai_flags"
EAI_NONAME "Name or service not known"
EAI_AGAIN "Temporary failure in name resolution"
EAI_FAIL "Non-recoverable failure in name resolution"
EAI_FAMILY "ai_family not supported"
EAI_SOCKTYPE "ai_socktype not supported"
EAI_SERVICE "Servname not supported for ai_socktype"
EAI_MEMORY "Memory allocation failure"
EAI_SYSTEM "System error"
EAI_OVERFLOW "Unknown error"
H0 "Resolver Error 0 (no error)"
HOST_NOT_FOUND "Unknown host"
TRY_AGAIN "Host name lookup failure"
NO_RECOVERY "Unknown server error"
END LC_MESSAGES


[-- Attachment #3: musl.dump --]
[-- Type: text/plain, Size: 6273 bytes --]

LC_TIME
abday "dom";"lun";"mar";"mie";"jue";"vie";"sab"
day "domingo";"lunes";"martes";"miercoles";"jueves";"viernes";"sabado"
abmon "en";"feb";"mar";"abr";"Mayo";"jun";"jul";"ago";"sep";"oct";"nov";"dec"
mon "Enero";"Febrero";"Marzo";"Abril";"Mayo";"Junio";"Julio";"Agosto";"Septiembre";"Octubre";"Noviembre";"Diciembre"
am_pm "AM";"PM"
d_t_fmt "%a %b %e %T %Y"
d_fmt "%d-%m-%y"
t_fmt "%H:%M:%S"
t_fmt_ampm "%I:%M:%S %p"
era ""
era_d_t_fmt "%a %b %e %T %Y"
era_d_fmt "%d-%m-%y"
era_t_fmt "%H:%M:%S"
alt_digits "0123456789"
END LC_TIME

LC_NUMERIC
decimal_point "."
thousands_sep ""
grouping -1
END LC_NUMERIC

LC_MONETARY
int_curr_symbol ""
currency_symbol ""
mon_decimal_point ""
mon_thousands_sep ""
mon_grouping -1
positive_sign ""
negative_sign ""
int_frac_digits -1
frac_digits -1
p_cs_precedes -1
p_sep_by_space -1
n_cs_precedes -1
n_sep_by_space -1
p_sign_posn -1
n_sign_posn -1
int_p_cs_precedes -1
int_p_sep_by_space -1
int_n_cs_precedes -1
int_n_sep_by_space -1
int_p_sign_posn -1
int_n_sign_posn -1
END LC_MONETARY

LC_MESSAGES
yesexpr "^[yY]"
noexpr "^[nN]"
E0 "No error information"
EPERM "Operation not permitted"
ENOENT "No such file or directory"
ESRCH "No such process"
EINTR "Interrupted system call"
EIO "I/O error"
ENXIO "No such device or address"
E2BIG "Argument list too long"
ENOEXEC "Exec format error"
EBADF "Bad file descriptor"
ECHILD "No child process"
EAGAIN "Resource temporarily unavailable"
ENOMEM "Out of memory"
EACCES "Permission denied"
EFAULT "Bad address"
ENOTBLK "Block device required"
EBUSY "Resource busy"
EEXIST "File exists"
EXDEV "Cross-device link"
ENODEV "No such device"
ENOTDIR "Not a directory"
EISDIR "Is a directory"
EINVAL "Invalid argument"
ENFILE "Too many open files in system"
EMFILE "No file descriptors available"
ENOTTY "Not a tty"
ETXTBSY "Text file busy"
EFBIG "File too large"
ENOSPC "No space left on device"
ESPIPE "Invalid seek"
EROFS "Read-only file system"
EMLINK "Too many links"
EPIPE "Broken pipe"
EDOM "Domain error"
ERANGE "Result not representable"
EDEADLK "Resource deadlock would occur"
ENAMETOOLONG "Filename too long"
ENOLCK "No locks available"
ENOSYS "Function not implemented"
ENOTEMPTY "Directory not empty"
ELOOP "Symbolic link loop"
EWOULDBLOCK "Resource temporarily unavailable"
ENOMSG "No message of desired type"
EIDRM "Identifier removed"
ECHRNG "No error information"
EL2NSYNC "No error information"
EL3HLT "No error information"
EL3RST "No error information"
ELNRNG "No error information"
EUNATCH "No error information"
ENOCSI "No error information"
EL2HLT "No error information"
EBADE "No error information"
EBADR "No error information"
EXFULL "No error information"
ENOANO "No error information"
EBADRQC "No error information"
EBADSLT "No error information"
EDEADLOCK "Resource deadlock would occur"
EBFONT "No error information"
ENOSTR "Device not a stream"
ENODATA "No data available"
ETIME "Device timeout"
ENOSR "Out of streams resources"
ENONET "No error information"
ENOPKG "No error information"
EREMOTE "No error information"
ENOLINK "Link has been severed"
EADV "No error information"
ESRMNT "No error information"
ECOMM "No error information"
EPROTO "Protocol error"
EMULTIHOP "Multihop attempted"
EDOTDOT "No error information"
EBADMSG "Bad message"
EOVERFLOW "Value too large for data type"
ENOTUNIQ "No error information"
EBADFD "File descriptor in bad state"
EREMCHG "No error information"
ELIBACC "No error information"
ELIBBAD "No error information"
ELIBSCN "No error information"
ELIBMAX "No error information"
ELIBEXEC "No error information"
EILSEQ "Illegal byte sequence"
ERESTART "No error information"
ESTRPIPE "No error information"
EUSERS "No error information"
ENOTSOCK "Not a socket"
EDESTADDRREQ "Destination address required"
EMSGSIZE "Message too large"
EPROTOTYPE "Protocol wrong type for socket"
ENOPROTOOPT "Protocol not available"
EPROTONOSUPPORT "Protocol not supported"
ESOCKTNOSUPPORT "Socket type not supported"
EOPNOTSUPP "Not supported"
ENOTSUP "Not supported"
EPFNOSUPPORT "Protocol family not supported"
EAFNOSUPPORT "Address family not supported by protocol"
EADDRINUSE "Address in use"
EADDRNOTAVAIL "Address not available"
ENETDOWN "Network is down"
ENETUNREACH "Network unreachable"
ENETRESET "Connection reset by network"
ECONNABORTED "Connection aborted"
ECONNRESET "Connection reset by peer"
ENOBUFS "No buffer space available"
EISCONN "Socket is connected"
ENOTCONN "Socket not connected"
ESHUTDOWN "Cannot send after socket shutdown"
ETOOMANYREFS "No error information"
ETIMEDOUT "Operation timed out"
ECONNREFUSED "Connection refused"
EHOSTDOWN "Host is down"
EHOSTUNREACH "Host is unreachable"
EALREADY "Operation already in progress"
EINPROGRESS "Operation in progress"
ESTALE "Stale file handle"
EUCLEAN "No error information"
ENOTNAM "No error information"
ENAVAIL "No error information"
EISNAM "No error information"
EREMOTEIO "Remote I/O error"
EDQUOT "Quota exceeded"
ENOMEDIUM "No medium found"
EMEDIUMTYPE "Wrong medium type"
ECANCELED "Operation cancelled"
ENOKEY "Required key not available"
EKEYEXPIRED "Key has expired"
EKEYREVOKED "Key has been revoked"
EKEYREJECTED "Key was rejected by service"
EOWNERDEAD "Previous owner died"
ENOTRECOVERABLE "State not recoverable"
ERFKILL "No error information"
EHWPOISON "No error information"
REG_OK "No error"
REG_NOMATCH "No match"
REG_BADPAT "Invalid regexp"
REG_ECOLLATE "Unknown collating element"
REG_ECTYPE "Unknown character class name"
REG_EESCAPE "Trailing backslash"
REG_ESUBREG "Invalid back reference"
REG_EBRACK "Missing ']'"
REG_EPAREN "Missing ')'"
REG_EBRACE "Missing '}'"
REG_BADBR "Invalid contents of {}"
REG_ERANGE "Invalid character range"
REG_ESPACE "Out of memory"
REG_BADRPT "Repetition not preceded by valid expression"
EAI_BADFLAGS "Invalid flags"
EAI_NONAME "Name does not resolve"
EAI_AGAIN "Try again"
EAI_FAIL "Non-recoverable error"
EAI_NODATA "Name has no usable address"
EAI_FAMILY "Unrecognised address family or invalid length"
EAI_SOCKTYPE "Unrecognised socket type"
EAI_SERVICE "Unrecognised service"
EAI_MEMORY "Out of memory"
EAI_SYSTEM "System error"
EAI_OVERFLOW "Overflow"
H0 "Unknown error"
HOST_NOT_FOUND "Host not found"
TRY_AGAIN "Try again"
NO_RECOVERY "Non-recoverable error"
NO_DATA "Address not available"
END LC_MESSAGES


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2026-03-02 13:55 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-17  1:14 [musl] Selecting locale source format Rich Felker
2025-09-17  1:23 ` A. Wilcox
2025-09-17  1:36   ` Rich Felker
2025-09-19 14:06     ` Pablo Correa Gomez
2026-03-02 13:22     ` Pablo Correa Gomez
2025-09-17 15:43 ` enh
2025-09-17 17:37   ` Rich Felker
2025-09-17 20:31 ` Rich Felker
2026-03-02 13:54   ` Pablo Correa Gomez
2025-09-19 13:59 ` Pablo Correa Gomez
2025-10-01 13:55   ` Pablo Correa Gomez
2025-10-01 17:21     ` Markus Wichmann
2026-03-02 13:35       ` Pablo Correa Gomez
2025-10-01 17:51     ` Demi Marie Obenour
2025-10-02  2:34     ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).