mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Markus Wichmann <>
Subject: Re: [musl] A journey of weird file sorting and desktop systems
Date: Fri, 28 Jan 2022 20:47:33 +0100	[thread overview]
Message-ID: <20220128194733.GA1960@voyager> (raw)
In-Reply-To: <>

On Fri, Jan 28, 2022 at 01:01:04PM -0500, Rich Felker wrote:
> ICU is really, *really* bad. I don't want to be encouraging people to
> use it because basic functionality is missing from libc.

But basic functionality *is* missing from libc, and by design. By the
standard. For example, toupper and towupper can only return a single
code point. That doesn't work with German's ß character, which has the
capital form SS. If you were transforming some general German word group
into block capitals for a headline or something, that is the
transformation you would use. Now, some people have invented a capital
version of ß, that is still new enough to make blocks appear in many
programs (test your mail program here: ẞ), but that letter is not widely

Also, many applications expect towupper and towlower to be inverse
functions of each other, but here, not all instance of SS ought to be
transformed to ß when passing them through towlower, even if the
interface did support such a thing.

My point is that the development of interfaces that deal with
internationalization might be better put into a library with an
interface less rigid than libc, where any adjustment moves at the
glacial pace of the Austin Group or WG14, and in any case, breaking
changes are completely out of the question. That is also why we still
have gets() and strchr().

Whether ICU is a suitable library for that purpose I lack the expertise
to say. However, all I have heard about it so far is either that one
should use it to cure all i18n ills, or that it is an abomination unto
the Lord. But even the people in the second camp fail to recommend a
superior alternative. So I'm guessing there isn't one.

As to the actual function in question: Simply having a possibility to
switch strcoll to be the same as strcasecmp instead of strcmp would
probably already be the 80% solution for most European languages.

Yeah, it won't work with umlauts, but we Germans are used to that. "It
is <current year> and we still can't do umlauts" is a common curse
levelled at information technology, and for the most part it is apt. I
routinely counsel against using umlauts in file names or pass phrases,
because you never know what character set it gets saved in or
transmitted later, and it just causes avoidable problems. I really doubt
this issue will ever be solved within my lifetime.


  parent reply	other threads:[~2022-01-28 19:47 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-28 13:41 ellie
2022-01-28 14:10 ` Rich Felker
2022-01-28 14:57   ` ellie
2022-01-28 16:58     ` enh
2022-01-28 18:01       ` Rich Felker
2022-01-28 18:33         ` enh
2022-01-28 19:22           ` Rich Felker
2022-01-28 19:47         ` Markus Wichmann [this message]
2022-01-28 18:01     ` Ariadne Conill
2022-01-28 17:54   ` Ariadne Conill

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220128194733.GA1960@voyager \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).