mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@libc.org>
To: enh <enh@google.com>
Cc: musl@lists.openwall.com
Subject: Re: [musl] Selecting locale source format
Date: Wed, 17 Sep 2025 13:37:45 -0400	[thread overview]
Message-ID: <20250917173745.GV1827@brightrain.aerifal.cx> (raw)
In-Reply-To: <CAJgzZooiidR18yF3jY0098_ugguiwB59dT2NXs4MYg8tfAF1BQ@mail.gmail.com>

On Wed, Sep 17, 2025 at 11:43:46AM -0400, enh wrote:
> On Tue, Sep 16, 2025 at 9:14 PM Rich Felker <dalias@libc.org> wrote:
> >
> > I have a proposed binary format for new locale files that I'm in the
> > process of writing up, but Pablo brought it to my attention that,
> > while binary format (ABI) is what's important to have down and stable
> > at the time we integrate into musl, pinning down the source format is
> > what's important/blocking for collaboration with localization folks.
> >
> > I have two candidate formats in the works right now for this:
> >
> >
> >
> > Option 1: subset+extension of POSIX localedef format.
> >
> > The basis for this format is described in
> > https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap07.html
> >
> > If we go this way, it would be a "subset" because (1) some parts are
> > not relevant, like LC_CTYPE, which does not vary by locale,
> 
> note that that's not true

This was a statement about musl and musl's LC_CTYPE, not about what
you could theoretically do.

> for 'i' in turkish/azeri locales, for
> example. (unless you meant that you plan on using the unicode cldr
> data directly here.)
> 
> see the "Language-Sensitive Mappings" section of SpecialCasing.txt for
> all the special cases.

There really is not a way to support this except in legacy 8bit
encodings, which are out-of-scope for musl, This is because the
interface doesn't have any way for toupper() or tolower() to map to a
multibyte sequence. AFAICT tolower/toupper and towlower/towupper have
to be consistent with each other, but can't be.

In any case re-litigating this is not in the scope of the project at
hand.

There is all sorts of complexity to transforming case of
natural-language text that cannot adequately be supported by any of
the standard C interfaces but that requires a more expressive
framework. The standard interfaces are really not suitable for
anything more than case-insensitive comparisons (if even that; they
don't suffice even for that in the case of ß vs SS) or other very
basic uses.

Rich

  reply	other threads:[~2025-09-17 17:38 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-17  1:14 Rich Felker
2025-09-17  1:23 ` A. Wilcox
2025-09-17  1:36   ` Rich Felker
2025-09-19 14:06     ` Pablo Correa Gomez
2025-09-17 15:43 ` enh
2025-09-17 17:37   ` Rich Felker [this message]
2025-09-17 20:31 ` Rich Felker
2025-09-19 13:59 ` Pablo Correa Gomez
2025-10-01 13:55   ` Pablo Correa Gomez
2025-10-01 17:21     ` Markus Wichmann
2025-10-01 17:51     ` Demi Marie Obenour
2025-10-02  2:34     ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250917173745.GV1827@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=enh@google.com \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).