mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] A journey of weird file sorting and desktop systems
@ 2022-01-28 13:41 ellie
  2022-01-28 14:10 ` Rich Felker
  0 siblings, 1 reply; 10+ messages in thread
From: ellie @ 2022-01-28 13:41 UTC (permalink / raw)
  To: musl

After spending a bit wondering why files like "elder1" and "Elder2" end 
up at completely different spots in the file list on my postmarketOS 
(=Alpine-based) system, I filed a ticket with the Nemo file manager. 
Turns out Nemo just uses locale-dependent sorting, so I spent an hour 
trying to set LC_COLLATE to fix this, until I stumbled across the remark 
on musl's website that LC_COLLATE sorting is simply not supported. So I 
seem to be stuck with this, which I did not expect.

This to me seems kind of disastrous on a desktop system. I just fail to 
see any average default user (who doesn't know ASCII in their head) 
expecting "elder1" and "Elder2" to be miles apart in a sorted listing 
even as a default US person, let alone in some other language that may 
be expected to use a different sorting for whatever reason. (This 
affects umlauts too, I assume? So that'd be most European languages 
having file lists entirely messed up, too.) The sorting shouldn't be 
stuck as something that just makes sense to programmers and balks at any 
special vowels, and it appears at least as of now there is just no way 
to fix this.

Should desktop file managers like Nemo not be using this sorting 
function? Or is musl not intended for desktop use, and postmarketOS 
should switch? Otherwise, it seems like this omission in musl seems like 
kind of a big deal. Or is it really just me who is constantly confused 
as to where any file is at in any file lists...?

Or in other words, would be kind of cool if this could be changed

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [musl] A journey of weird file sorting and desktop systems
  2022-01-28 13:41 [musl] A journey of weird file sorting and desktop systems ellie
@ 2022-01-28 14:10 ` Rich Felker
  2022-01-28 14:57   ` ellie
  2022-01-28 17:54   ` Ariadne Conill
  0 siblings, 2 replies; 10+ messages in thread
From: Rich Felker @ 2022-01-28 14:10 UTC (permalink / raw)
  To: ellie; +Cc: musl

On Fri, Jan 28, 2022 at 02:41:38PM +0100, ellie wrote:
> After spending a bit wondering why files like "elder1" and "Elder2"
> end up at completely different spots in the file list on my
> postmarketOS (=Alpine-based) system, I filed a ticket with the Nemo
> file manager. Turns out Nemo just uses locale-dependent sorting, so
> I spent an hour trying to set LC_COLLATE to fix this, until I
> stumbled across the remark on musl's website that LC_COLLATE sorting
> is simply not supported. So I seem to be stuck with this, which I
> did not expect.
> 
> This to me seems kind of disastrous on a desktop system. I just fail
> to see any average default user (who doesn't know ASCII in their
> head) expecting "elder1" and "Elder2" to be miles apart in a sorted
> listing even as a default US person, let alone in some other
> language that may be expected to use a different sorting for
> whatever reason. (This affects umlauts too, I assume? So that'd be
> most European languages having file lists entirely messed up, too.)
> The sorting shouldn't be stuck as something that just makes sense to
> programmers and balks at any special vowels, and it appears at least
> as of now there is just no way to fix this.
> 
> Should desktop file managers like Nemo not be using this sorting
> function? Or is musl not intended for desktop use, and postmarketOS
> should switch? Otherwise, it seems like this omission in musl seems
> like kind of a big deal. Or is it really just me who is constantly
> confused as to where any file is at in any file lists...?
> 
> Or in other words, would be kind of cool if this could be changed

LC_COLLATE functionality is just not designed or implemented yet, due
to lack of interest/participation from folks who want it to happen. I
very much do want it to happen, but I don't want to design something
(data model for efficient collation tables & code to use them) only to
have it turn out not to meet everyone's/anyone's needs because there
was nobody to bounce questions/testing/what-if's off during the
design.

A big part of this is probably that, historically, *nix users tend to
be happy with (or even prefer, which they can explicitly set via
exporting LC_COLLATE=C) codepoint-order sorting of directory entries,
like Makefile and README appearing at the top. So to get these folks
to care you have to have another setting where collation order
matters.

I'm happy to restart the process for getting this done if ppl are
interested.

Rich

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [musl] A journey of weird file sorting and desktop systems
  2022-01-28 14:10 ` Rich Felker
@ 2022-01-28 14:57   ` ellie
  2022-01-28 16:58     ` enh
  2022-01-28 18:01     ` Ariadne Conill
  2022-01-28 17:54   ` Ariadne Conill
  1 sibling, 2 replies; 10+ messages in thread
From: ellie @ 2022-01-28 14:57 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

I don't think nowadays the majority of users should be expected to be 
traditional *nix users with terminal knowledge anymore. And most modern 
desktop distros don't default to such a sorting as far as I can tell, 
and instead to en_US or alike - but all those which use musl are left 
stranded with "C" sorting. The type of users who are hit most by this 
are not going to be the type who know what a terminal is, what musl is, 
or how to voice their opinion on LC_COLLATE because their file manager 
looks so weird. So if you want them to show up here that probably won't 
happen. Beyond myself, I suppose.

I think for a typical user-friendly desktop the need is kinda clear, so 
I'm not sure what other sort of setting would need to be introduced 
still. If musl is meant to be used on desktop distros, this just seems 
kind of mandatory, or I'm not really getting why it wouldn't be.

My apologies however if I'm misunderstanding, but that was basically 
your question/what you're saying is delaying it, right? Sorry if you 
didn't want further input from me on this, I hope I read your e-mail right

On 1/28/22 3:10 PM, Rich Felker wrote:
> On Fri, Jan 28, 2022 at 02:41:38PM +0100, ellie wrote:
>> After spending a bit wondering why files like "elder1" and "Elder2"
>> end up at completely different spots in the file list on my
>> postmarketOS (=Alpine-based) system, I filed a ticket with the Nemo
>> file manager. Turns out Nemo just uses locale-dependent sorting, so
>> I spent an hour trying to set LC_COLLATE to fix this, until I
>> stumbled across the remark on musl's website that LC_COLLATE sorting
>> is simply not supported. So I seem to be stuck with this, which I
>> did not expect.
>>
>> This to me seems kind of disastrous on a desktop system. I just fail
>> to see any average default user (who doesn't know ASCII in their
>> head) expecting "elder1" and "Elder2" to be miles apart in a sorted
>> listing even as a default US person, let alone in some other
>> language that may be expected to use a different sorting for
>> whatever reason. (This affects umlauts too, I assume? So that'd be
>> most European languages having file lists entirely messed up, too.)
>> The sorting shouldn't be stuck as something that just makes sense to
>> programmers and balks at any special vowels, and it appears at least
>> as of now there is just no way to fix this.
>>
>> Should desktop file managers like Nemo not be using this sorting
>> function? Or is musl not intended for desktop use, and postmarketOS
>> should switch? Otherwise, it seems like this omission in musl seems
>> like kind of a big deal. Or is it really just me who is constantly
>> confused as to where any file is at in any file lists...?
>>
>> Or in other words, would be kind of cool if this could be changed
> 
> LC_COLLATE functionality is just not designed or implemented yet, due
> to lack of interest/participation from folks who want it to happen. I
> very much do want it to happen, but I don't want to design something
> (data model for efficient collation tables & code to use them) only to
> have it turn out not to meet everyone's/anyone's needs because there
> was nobody to bounce questions/testing/what-if's off during the
> design.
> 
> A big part of this is probably that, historically, *nix users tend to
> be happy with (or even prefer, which they can explicitly set via
> exporting LC_COLLATE=C) codepoint-order sorting of directory entries,
> like Makefile and README appearing at the top. So to get these folks
> to care you have to have another setting where collation order
> matters.
> 
> I'm happy to restart the process for getting this done if ppl are
> interested.
> 
> Rich

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [musl] A journey of weird file sorting and desktop systems
  2022-01-28 14:57   ` ellie
@ 2022-01-28 16:58     ` enh
  2022-01-28 18:01       ` Rich Felker
  2022-01-28 18:01     ` Ariadne Conill
  1 sibling, 1 reply; 10+ messages in thread
From: enh @ 2022-01-28 16:58 UTC (permalink / raw)
  To: musl; +Cc: Rich Felker

(Android's libc maintainer here...)

i'd argue this isn't a musl bug. on Android we make a clear distinction between:

1. libc's responsibilities which, to paraphrase rich, are basically
"be unsurprising because your audience is OS/app developers who don't
speak all the languages their users use anyway". that is: "code point
order".

2. icu's responsibilities which cover all the user-facing (as opposed
to developer-facing) stuff. i18n is *hard* and the C/POSIX APIs are,
to be blunt, not fit for *that* purpose. there's a reason why all of
Android/macOS/Windows (and all the browsers) ship copies of icu.

the bug here is that a desktop file manager is assuming "i just want
telephone book order --- how hard can it be?". the answer turns out to
be "hard". especially when you get into fun stuff like users who *do*
speak multiple languages and have strong expectations for how they
sort. or places where there are multiple sort orders in common use.
you don't even need to be in very "exotic" languages to start hitting
these things. German and Spanish will do fine. see
https://unicode-org.github.io/icu/userguide/collation/ for a handful
of specific examples.

(as the maintainer of Android's Java i18n stuff before i ended up
owning bionic, you'd be surprised at the extent to which even Java --
which tried pretty hard by 1990s standards -- doesn't really cover
everything you need, not even for languages like Russian. so i don't
think C/POSIX could have done a great job in the 1990s, and one of
icu's main benefits is that it's been able to evolve to better support
existing languages/support more languages rather than being ossified
by an insufficient standard.)

"if you care about your users, you need icu/CLDR" is the easy side of
the argument. the flip side -- that libc *shouldn't* get involved --
is trickier. what convinced me was the amount of *breakage* you cause
if you try to be "good guy greg"... it turns out no-one wants dotless
i breaking their build just because their locale is a turkish/azeri
locale, for example. (dotted/dotless i is by far the most common
real-world issue i've seen.) but it's that kind of "text manipulation
tool used during builds" that are most likely to use libc
functionality, and although, sure, we can chase *everyone* making sure
they set their locale to "C" when building ... are we helping at that
point, or just making more work for everyone? (without actually
solving the real problem for the folks who just want to use their file
browser.)

On Fri, Jan 28, 2022 at 7:06 AM ellie <el@horse64.org> wrote:
>
> I don't think nowadays the majority of users should be expected to be
> traditional *nix users with terminal knowledge anymore. And most modern
> desktop distros don't default to such a sorting as far as I can tell,
> and instead to en_US or alike - but all those which use musl are left
> stranded with "C" sorting. The type of users who are hit most by this
> are not going to be the type who know what a terminal is, what musl is,
> or how to voice their opinion on LC_COLLATE because their file manager
> looks so weird. So if you want them to show up here that probably won't
> happen. Beyond myself, I suppose.
>
> I think for a typical user-friendly desktop the need is kinda clear, so
> I'm not sure what other sort of setting would need to be introduced
> still. If musl is meant to be used on desktop distros, this just seems
> kind of mandatory, or I'm not really getting why it wouldn't be.
>
> My apologies however if I'm misunderstanding, but that was basically
> your question/what you're saying is delaying it, right? Sorry if you
> didn't want further input from me on this, I hope I read your e-mail right
>
> On 1/28/22 3:10 PM, Rich Felker wrote:
> > On Fri, Jan 28, 2022 at 02:41:38PM +0100, ellie wrote:
> >> After spending a bit wondering why files like "elder1" and "Elder2"
> >> end up at completely different spots in the file list on my
> >> postmarketOS (=Alpine-based) system, I filed a ticket with the Nemo
> >> file manager. Turns out Nemo just uses locale-dependent sorting, so
> >> I spent an hour trying to set LC_COLLATE to fix this, until I
> >> stumbled across the remark on musl's website that LC_COLLATE sorting
> >> is simply not supported. So I seem to be stuck with this, which I
> >> did not expect.
> >>
> >> This to me seems kind of disastrous on a desktop system. I just fail
> >> to see any average default user (who doesn't know ASCII in their
> >> head) expecting "elder1" and "Elder2" to be miles apart in a sorted
> >> listing even as a default US person, let alone in some other
> >> language that may be expected to use a different sorting for
> >> whatever reason. (This affects umlauts too, I assume? So that'd be
> >> most European languages having file lists entirely messed up, too.)
> >> The sorting shouldn't be stuck as something that just makes sense to
> >> programmers and balks at any special vowels, and it appears at least
> >> as of now there is just no way to fix this.
> >>
> >> Should desktop file managers like Nemo not be using this sorting
> >> function? Or is musl not intended for desktop use, and postmarketOS
> >> should switch? Otherwise, it seems like this omission in musl seems
> >> like kind of a big deal. Or is it really just me who is constantly
> >> confused as to where any file is at in any file lists...?
> >>
> >> Or in other words, would be kind of cool if this could be changed
> >
> > LC_COLLATE functionality is just not designed or implemented yet, due
> > to lack of interest/participation from folks who want it to happen. I
> > very much do want it to happen, but I don't want to design something
> > (data model for efficient collation tables & code to use them) only to
> > have it turn out not to meet everyone's/anyone's needs because there
> > was nobody to bounce questions/testing/what-if's off during the
> > design.
> >
> > A big part of this is probably that, historically, *nix users tend to
> > be happy with (or even prefer, which they can explicitly set via
> > exporting LC_COLLATE=C) codepoint-order sorting of directory entries,
> > like Makefile and README appearing at the top. So to get these folks
> > to care you have to have another setting where collation order
> > matters.
> >
> > I'm happy to restart the process for getting this done if ppl are
> > interested.
> >
> > Rich

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [musl] A journey of weird file sorting and desktop systems
  2022-01-28 14:10 ` Rich Felker
  2022-01-28 14:57   ` ellie
@ 2022-01-28 17:54   ` Ariadne Conill
  1 sibling, 0 replies; 10+ messages in thread
From: Ariadne Conill @ 2022-01-28 17:54 UTC (permalink / raw)
  To: musl; +Cc: ellie

Hi,

On Fri, 28 Jan 2022, Rich Felker wrote:

> On Fri, Jan 28, 2022 at 02:41:38PM +0100, ellie wrote:
>> After spending a bit wondering why files like "elder1" and "Elder2"
>> end up at completely different spots in the file list on my
>> postmarketOS (=Alpine-based) system, I filed a ticket with the Nemo
>> file manager. Turns out Nemo just uses locale-dependent sorting, so
>> I spent an hour trying to set LC_COLLATE to fix this, until I
>> stumbled across the remark on musl's website that LC_COLLATE sorting
>> is simply not supported. So I seem to be stuck with this, which I
>> did not expect.
>>
>> This to me seems kind of disastrous on a desktop system. I just fail
>> to see any average default user (who doesn't know ASCII in their
>> head) expecting "elder1" and "Elder2" to be miles apart in a sorted
>> listing even as a default US person, let alone in some other
>> language that may be expected to use a different sorting for
>> whatever reason. (This affects umlauts too, I assume? So that'd be
>> most European languages having file lists entirely messed up, too.)
>> The sorting shouldn't be stuck as something that just makes sense to
>> programmers and balks at any special vowels, and it appears at least
>> as of now there is just no way to fix this.
>>
>> Should desktop file managers like Nemo not be using this sorting
>> function? Or is musl not intended for desktop use, and postmarketOS
>> should switch? Otherwise, it seems like this omission in musl seems
>> like kind of a big deal. Or is it really just me who is constantly
>> confused as to where any file is at in any file lists...?
>>
>> Or in other words, would be kind of cool if this could be changed
>
> LC_COLLATE functionality is just not designed or implemented yet, due
> to lack of interest/participation from folks who want it to happen. I
> very much do want it to happen, but I don't want to design something
> (data model for efficient collation tables & code to use them) only to
> have it turn out not to meet everyone's/anyone's needs because there
> was nobody to bounce questions/testing/what-if's off during the
> design.
>
> A big part of this is probably that, historically, *nix users tend to
> be happy with (or even prefer, which they can explicitly set via
> exporting LC_COLLATE=C) codepoint-order sorting of directory entries,
> like Makefile and README appearing at the top. So to get these folks
> to care you have to have another setting where collation order
> matters.

A case-study might be PostgreSQL, but I believe we solved collation there 
by using the ICU library instead.

Ariadne

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [musl] A journey of weird file sorting and desktop systems
  2022-01-28 16:58     ` enh
@ 2022-01-28 18:01       ` Rich Felker
  2022-01-28 18:33         ` enh
  2022-01-28 19:47         ` Markus Wichmann
  0 siblings, 2 replies; 10+ messages in thread
From: Rich Felker @ 2022-01-28 18:01 UTC (permalink / raw)
  To: enh; +Cc: musl

On Fri, Jan 28, 2022 at 08:58:30AM -0800, enh wrote:
> (Android's libc maintainer here...)
> 
> i'd argue this isn't a musl bug. on Android we make a clear distinction between:
> 
> 1. libc's responsibilities which, to paraphrase rich, are basically
> "be unsurprising because your audience is OS/app developers who don't
> speak all the languages their users use anyway". that is: "code point
> order".

That's not what I said. I speculated that part of the difficulty with
getting people to care is that a large number of users personally
prefer LC_COLLATE=C. Not that we should punt because of that.

> 2. icu's responsibilities which cover all the user-facing (as opposed
> to developer-facing) stuff. i18n is *hard* and the C/POSIX APIs are,
> to be blunt, not fit for *that* purpose. there's a reason why all of
> Android/macOS/Windows (and all the browsers) ship copies of icu.

ICU is really, *really* bad. I don't want to be encouraging people to
use it because basic functionality is missing from libc.

> the bug here is that a desktop file manager is assuming "i just want
> telephone book order --- how hard can it be?". the answer turns out to
> be "hard". especially when you get into fun stuff like users who *do*
> speak multiple languages and have strong expectations for how they
> sort. or places where there are multiple sort orders in common use.

Absolutely. That's why I don't want to treat the problem half-assedly,
but make sure we design or choose a format for the collation tables
that's simultaneously (1) efficient, (2) sufficiently expressive to
give the behaviors users may want, and (3) easy enough to understand
that users can customize it if needed. The POSIX localedef format (an
option group musl intentionally does not support) does not have any of
those properties except maybe #2. The standard Unicode format may
translate directly into something that can meet all 3; I'm not sure.

Rich

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [musl] A journey of weird file sorting and desktop systems
  2022-01-28 14:57   ` ellie
  2022-01-28 16:58     ` enh
@ 2022-01-28 18:01     ` Ariadne Conill
  1 sibling, 0 replies; 10+ messages in thread
From: Ariadne Conill @ 2022-01-28 18:01 UTC (permalink / raw)
  To: musl; +Cc: Rich Felker

Hi,

On Fri, 28 Jan 2022, ellie wrote:

> I don't think nowadays the majority of users should be expected to be 
> traditional *nix users with terminal knowledge anymore. And most modern 
> desktop distros don't default to such a sorting as far as I can tell, and 
> instead to en_US or alike - but all those which use musl are left stranded 
> with "C" sorting. The type of users who are hit most by this are not going to 
> be the type who know what a terminal is, what musl is, or how to voice their 
> opinion on LC_COLLATE because their file manager looks so weird. So if you 
> want them to show up here that probably won't happen. Beyond myself, I 
> suppose.
>
> I think for a typical user-friendly desktop the need is kinda clear, so I'm 
> not sure what other sort of setting would need to be introduced still. If 
> musl is meant to be used on desktop distros, this just seems kind of 
> mandatory, or I'm not really getting why it wouldn't be.
>
> My apologies however if I'm misunderstanding, but that was basically your 
> question/what you're saying is delaying it, right? Sorry if you didn't want 
> further input from me on this, I hope I read your e-mail right

LC_COLLATE is a desired feature in musl, but getting it right is going to 
take some work.  We should want to be careful about it because we want to 
avoid having giant tables, or some plug-in architecture like GLIBC has, 
which was recently at the center of the pwnkit debacle.

Ariadne

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [musl] A journey of weird file sorting and desktop systems
  2022-01-28 18:01       ` Rich Felker
@ 2022-01-28 18:33         ` enh
  2022-01-28 19:22           ` Rich Felker
  2022-01-28 19:47         ` Markus Wichmann
  1 sibling, 1 reply; 10+ messages in thread
From: enh @ 2022-01-28 18:33 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

On Fri, Jan 28, 2022 at 10:01 AM Rich Felker <dalias@libc.org> wrote:
>
> On Fri, Jan 28, 2022 at 08:58:30AM -0800, enh wrote:
> > (Android's libc maintainer here...)
> >
> > i'd argue this isn't a musl bug. on Android we make a clear distinction between:
> >
> > 1. libc's responsibilities which, to paraphrase rich, are basically
> > "be unsurprising because your audience is OS/app developers who don't
> > speak all the languages their users use anyway". that is: "code point
> > order".
>
> That's not what I said. I speculated that part of the difficulty with
> getting people to care is that a large number of users personally
> prefer LC_COLLATE=C. Not that we should punt because of that.
>
> > 2. icu's responsibilities which cover all the user-facing (as opposed
> > to developer-facing) stuff. i18n is *hard* and the C/POSIX APIs are,
> > to be blunt, not fit for *that* purpose. there's a reason why all of
> > Android/macOS/Windows (and all the browsers) ship copies of icu.
>
> ICU is really, *really* bad. I don't want to be encouraging people to
> use it because basic functionality is missing from libc.

human languages are really really messy. a lot of the complexity is inherent.

as for the non-inherent, https://github.com/unicode-org/icu4x seems
like a good start.

> > the bug here is that a desktop file manager is assuming "i just want
> > telephone book order --- how hard can it be?". the answer turns out to
> > be "hard". especially when you get into fun stuff like users who *do*
> > speak multiple languages and have strong expectations for how they
> > sort. or places where there are multiple sort orders in common use.
>
> Absolutely. That's why I don't want to treat the problem half-assedly,

but that's my point --- it's not the *implementation* that's the
issue, it's that the C/POSIX *interfaces* are insufficient. the bar on
how good a job you _can_ do within those constraints is horribly low.

> but make sure we design or choose a format for the collation tables
> that's simultaneously (1) efficient, (2) sufficiently expressive to
> give the behaviors users may want, and (3) easy enough to understand
> that users can customize it if needed. The POSIX localedef format (an
> option group musl intentionally does not support) does not have any of
> those properties except maybe #2. The standard Unicode format may
> translate directly into something that can meet all 3; I'm not sure.
>
> Rich

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [musl] A journey of weird file sorting and desktop systems
  2022-01-28 18:33         ` enh
@ 2022-01-28 19:22           ` Rich Felker
  0 siblings, 0 replies; 10+ messages in thread
From: Rich Felker @ 2022-01-28 19:22 UTC (permalink / raw)
  To: enh; +Cc: musl

On Fri, Jan 28, 2022 at 10:33:53AM -0800, enh wrote:
> On Fri, Jan 28, 2022 at 10:01 AM Rich Felker <dalias@libc.org> wrote:
> >
> > On Fri, Jan 28, 2022 at 08:58:30AM -0800, enh wrote:
> > > (Android's libc maintainer here...)
> > >
> > > i'd argue this isn't a musl bug. on Android we make a clear distinction between:
> > >
> > > 1. libc's responsibilities which, to paraphrase rich, are basically
> > > "be unsurprising because your audience is OS/app developers who don't
> > > speak all the languages their users use anyway". that is: "code point
> > > order".
> >
> > That's not what I said. I speculated that part of the difficulty with
> > getting people to care is that a large number of users personally
> > prefer LC_COLLATE=C. Not that we should punt because of that.
> >
> > > 2. icu's responsibilities which cover all the user-facing (as opposed
> > > to developer-facing) stuff. i18n is *hard* and the C/POSIX APIs are,
> > > to be blunt, not fit for *that* purpose. there's a reason why all of
> > > Android/macOS/Windows (and all the browsers) ship copies of icu.
> >
> > ICU is really, *really* bad. I don't want to be encouraging people to
> > use it because basic functionality is missing from libc.
> 
> human languages are really really messy. a lot of the complexity is inherent.
> 
> as for the non-inherent, https://github.com/unicode-org/icu4x seems
> like a good start.

The problems with ICU are all software engineering problems not
problem-domain complexity problems. Bad resource-hungry choices with
poor safety properties all over.

> > > the bug here is that a desktop file manager is assuming "i just want
> > > telephone book order --- how hard can it be?". the answer turns out to
> > > be "hard". especially when you get into fun stuff like users who *do*
> > > speak multiple languages and have strong expectations for how they
> > > sort. or places where there are multiple sort orders in common use.
> >
> > Absolutely. That's why I don't want to treat the problem half-assedly,
> 
> but that's my point --- it's not the *implementation* that's the
> issue, it's that the C/POSIX *interfaces* are insufficient. the bar on
> how good a job you _can_ do within those constraints is horribly low.

I'm not sure what you mean by "the interfaces are insufficient" here.
They're insufficient to do things they weren't meant to do (e.g. deal
with data with multiple cultural conventions where the data has to be
tagged with which conventions apply to it), but giving listings in a
user's chosen collation order convention is something they're
perfectly capable of doing. Most applications do not want to deal with
(and do not even have the necessary metadata to deal with, since the
raw data is plain text) the sort of mix the standard interfaces can't
handle. They just want to give decent, culturally-non-surprising UX.
Applications that do want to go beyond this can of course use the full
Unicode data (via ICU or ideally a better alternative).

Rich

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [musl] A journey of weird file sorting and desktop systems
  2022-01-28 18:01       ` Rich Felker
  2022-01-28 18:33         ` enh
@ 2022-01-28 19:47         ` Markus Wichmann
  1 sibling, 0 replies; 10+ messages in thread
From: Markus Wichmann @ 2022-01-28 19:47 UTC (permalink / raw)
  To: musl

On Fri, Jan 28, 2022 at 01:01:04PM -0500, Rich Felker wrote:
> ICU is really, *really* bad. I don't want to be encouraging people to
> use it because basic functionality is missing from libc.
>

But basic functionality *is* missing from libc, and by design. By the
standard. For example, toupper and towupper can only return a single
code point. That doesn't work with German's ß character, which has the
capital form SS. If you were transforming some general German word group
into block capitals for a headline or something, that is the
transformation you would use. Now, some people have invented a capital
version of ß, that is still new enough to make blocks appear in many
programs (test your mail program here: ẞ), but that letter is not widely
used.

Also, many applications expect towupper and towlower to be inverse
functions of each other, but here, not all instance of SS ought to be
transformed to ß when passing them through towlower, even if the
interface did support such a thing.

My point is that the development of interfaces that deal with
internationalization might be better put into a library with an
interface less rigid than libc, where any adjustment moves at the
glacial pace of the Austin Group or WG14, and in any case, breaking
changes are completely out of the question. That is also why we still
have gets() and strchr().

Whether ICU is a suitable library for that purpose I lack the expertise
to say. However, all I have heard about it so far is either that one
should use it to cure all i18n ills, or that it is an abomination unto
the Lord. But even the people in the second camp fail to recommend a
superior alternative. So I'm guessing there isn't one.

As to the actual function in question: Simply having a possibility to
switch strcoll to be the same as strcasecmp instead of strcmp would
probably already be the 80% solution for most European languages.

Yeah, it won't work with umlauts, but we Germans are used to that. "It
is <current year> and we still can't do umlauts" is a common curse
levelled at information technology, and for the most part it is apt. I
routinely counsel against using umlauts in file names or pass phrases,
because you never know what character set it gets saved in or
transmitted later, and it just causes avoidable problems. I really doubt
this issue will ever be solved within my lifetime.

JM2C,
Markus

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-01-28 19:47 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-28 13:41 [musl] A journey of weird file sorting and desktop systems ellie
2022-01-28 14:10 ` Rich Felker
2022-01-28 14:57   ` ellie
2022-01-28 16:58     ` enh
2022-01-28 18:01       ` Rich Felker
2022-01-28 18:33         ` enh
2022-01-28 19:22           ` Rich Felker
2022-01-28 19:47         ` Markus Wichmann
2022-01-28 18:01     ` Ariadne Conill
2022-01-28 17:54   ` Ariadne Conill

Code repositories for project(s) associated with this inbox:

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).