From: Rich Felker <dalias@libc.org>
To: enh <enh@google.com>
Cc: musl@lists.openwall.com
Subject: Re: [musl] Selecting locale source format
Date: Wed, 17 Sep 2025 13:37:45 -0400 [thread overview]
Message-ID: <20250917173745.GV1827@brightrain.aerifal.cx> (raw)
In-Reply-To: <CAJgzZooiidR18yF3jY0098_ugguiwB59dT2NXs4MYg8tfAF1BQ@mail.gmail.com>
On Wed, Sep 17, 2025 at 11:43:46AM -0400, enh wrote:
> On Tue, Sep 16, 2025 at 9:14 PM Rich Felker <dalias@libc.org> wrote:
> >
> > I have a proposed binary format for new locale files that I'm in the
> > process of writing up, but Pablo brought it to my attention that,
> > while binary format (ABI) is what's important to have down and stable
> > at the time we integrate into musl, pinning down the source format is
> > what's important/blocking for collaboration with localization folks.
> >
> > I have two candidate formats in the works right now for this:
> >
> >
> >
> > Option 1: subset+extension of POSIX localedef format.
> >
> > The basis for this format is described in
> > https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap07.html
> >
> > If we go this way, it would be a "subset" because (1) some parts are
> > not relevant, like LC_CTYPE, which does not vary by locale,
>
> note that that's not true
This was a statement about musl and musl's LC_CTYPE, not about what
you could theoretically do.
> for 'i' in turkish/azeri locales, for
> example. (unless you meant that you plan on using the unicode cldr
> data directly here.)
>
> see the "Language-Sensitive Mappings" section of SpecialCasing.txt for
> all the special cases.
There really is not a way to support this except in legacy 8bit
encodings, which are out-of-scope for musl, This is because the
interface doesn't have any way for toupper() or tolower() to map to a
multibyte sequence. AFAICT tolower/toupper and towlower/towupper have
to be consistent with each other, but can't be.
In any case re-litigating this is not in the scope of the project at
hand.
There is all sorts of complexity to transforming case of
natural-language text that cannot adequately be supported by any of
the standard C interfaces but that requires a more expressive
framework. The standard interfaces are really not suitable for
anything more than case-insensitive comparisons (if even that; they
don't suffice even for that in the case of ß vs SS) or other very
basic uses.
Rich
next prev parent reply other threads:[~2025-09-17 17:38 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-17 1:14 Rich Felker
2025-09-17 1:23 ` A. Wilcox
2025-09-17 1:36 ` Rich Felker
2025-09-19 14:06 ` Pablo Correa Gomez
2025-09-17 15:43 ` enh
2025-09-17 17:37 ` Rich Felker [this message]
2025-09-17 20:31 ` Rich Felker
2025-09-19 13:59 ` Pablo Correa Gomez
2025-10-01 13:55 ` Pablo Correa Gomez
2025-10-01 17:21 ` Markus Wichmann
2025-10-01 17:51 ` Demi Marie Obenour
2025-10-02 2:34 ` Rich Felker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250917173745.GV1827@brightrain.aerifal.cx \
--to=dalias@libc.org \
--cc=enh@google.com \
--cc=musl@lists.openwall.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).