* [musl] A journey of weird file sorting and desktop systems @ 2022-01-28 13:41 ellie 2022-01-28 14:10 ` Rich Felker 0 siblings, 1 reply; 10+ messages in thread From: ellie @ 2022-01-28 13:41 UTC (permalink / raw) To: musl After spending a bit wondering why files like "elder1" and "Elder2" end up at completely different spots in the file list on my postmarketOS (=Alpine-based) system, I filed a ticket with the Nemo file manager. Turns out Nemo just uses locale-dependent sorting, so I spent an hour trying to set LC_COLLATE to fix this, until I stumbled across the remark on musl's website that LC_COLLATE sorting is simply not supported. So I seem to be stuck with this, which I did not expect. This to me seems kind of disastrous on a desktop system. I just fail to see any average default user (who doesn't know ASCII in their head) expecting "elder1" and "Elder2" to be miles apart in a sorted listing even as a default US person, let alone in some other language that may be expected to use a different sorting for whatever reason. (This affects umlauts too, I assume? So that'd be most European languages having file lists entirely messed up, too.) The sorting shouldn't be stuck as something that just makes sense to programmers and balks at any special vowels, and it appears at least as of now there is just no way to fix this. Should desktop file managers like Nemo not be using this sorting function? Or is musl not intended for desktop use, and postmarketOS should switch? Otherwise, it seems like this omission in musl seems like kind of a big deal. Or is it really just me who is constantly confused as to where any file is at in any file lists...? Or in other words, would be kind of cool if this could be changed ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [musl] A journey of weird file sorting and desktop systems 2022-01-28 13:41 [musl] A journey of weird file sorting and desktop systems ellie @ 2022-01-28 14:10 ` Rich Felker 2022-01-28 14:57 ` ellie 2022-01-28 17:54 ` Ariadne Conill 0 siblings, 2 replies; 10+ messages in thread From: Rich Felker @ 2022-01-28 14:10 UTC (permalink / raw) To: ellie; +Cc: musl On Fri, Jan 28, 2022 at 02:41:38PM +0100, ellie wrote: > After spending a bit wondering why files like "elder1" and "Elder2" > end up at completely different spots in the file list on my > postmarketOS (=Alpine-based) system, I filed a ticket with the Nemo > file manager. Turns out Nemo just uses locale-dependent sorting, so > I spent an hour trying to set LC_COLLATE to fix this, until I > stumbled across the remark on musl's website that LC_COLLATE sorting > is simply not supported. So I seem to be stuck with this, which I > did not expect. > > This to me seems kind of disastrous on a desktop system. I just fail > to see any average default user (who doesn't know ASCII in their > head) expecting "elder1" and "Elder2" to be miles apart in a sorted > listing even as a default US person, let alone in some other > language that may be expected to use a different sorting for > whatever reason. (This affects umlauts too, I assume? So that'd be > most European languages having file lists entirely messed up, too.) > The sorting shouldn't be stuck as something that just makes sense to > programmers and balks at any special vowels, and it appears at least > as of now there is just no way to fix this. > > Should desktop file managers like Nemo not be using this sorting > function? Or is musl not intended for desktop use, and postmarketOS > should switch? Otherwise, it seems like this omission in musl seems > like kind of a big deal. Or is it really just me who is constantly > confused as to where any file is at in any file lists...? > > Or in other words, would be kind of cool if this could be changed LC_COLLATE functionality is just not designed or implemented yet, due to lack of interest/participation from folks who want it to happen. I very much do want it to happen, but I don't want to design something (data model for efficient collation tables & code to use them) only to have it turn out not to meet everyone's/anyone's needs because there was nobody to bounce questions/testing/what-if's off during the design. A big part of this is probably that, historically, *nix users tend to be happy with (or even prefer, which they can explicitly set via exporting LC_COLLATE=C) codepoint-order sorting of directory entries, like Makefile and README appearing at the top. So to get these folks to care you have to have another setting where collation order matters. I'm happy to restart the process for getting this done if ppl are interested. Rich ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [musl] A journey of weird file sorting and desktop systems 2022-01-28 14:10 ` Rich Felker @ 2022-01-28 14:57 ` ellie 2022-01-28 16:58 ` enh 2022-01-28 18:01 ` Ariadne Conill 2022-01-28 17:54 ` Ariadne Conill 1 sibling, 2 replies; 10+ messages in thread From: ellie @ 2022-01-28 14:57 UTC (permalink / raw) To: Rich Felker; +Cc: musl I don't think nowadays the majority of users should be expected to be traditional *nix users with terminal knowledge anymore. And most modern desktop distros don't default to such a sorting as far as I can tell, and instead to en_US or alike - but all those which use musl are left stranded with "C" sorting. The type of users who are hit most by this are not going to be the type who know what a terminal is, what musl is, or how to voice their opinion on LC_COLLATE because their file manager looks so weird. So if you want them to show up here that probably won't happen. Beyond myself, I suppose. I think for a typical user-friendly desktop the need is kinda clear, so I'm not sure what other sort of setting would need to be introduced still. If musl is meant to be used on desktop distros, this just seems kind of mandatory, or I'm not really getting why it wouldn't be. My apologies however if I'm misunderstanding, but that was basically your question/what you're saying is delaying it, right? Sorry if you didn't want further input from me on this, I hope I read your e-mail right On 1/28/22 3:10 PM, Rich Felker wrote: > On Fri, Jan 28, 2022 at 02:41:38PM +0100, ellie wrote: >> After spending a bit wondering why files like "elder1" and "Elder2" >> end up at completely different spots in the file list on my >> postmarketOS (=Alpine-based) system, I filed a ticket with the Nemo >> file manager. Turns out Nemo just uses locale-dependent sorting, so >> I spent an hour trying to set LC_COLLATE to fix this, until I >> stumbled across the remark on musl's website that LC_COLLATE sorting >> is simply not supported. So I seem to be stuck with this, which I >> did not expect. >> >> This to me seems kind of disastrous on a desktop system. I just fail >> to see any average default user (who doesn't know ASCII in their >> head) expecting "elder1" and "Elder2" to be miles apart in a sorted >> listing even as a default US person, let alone in some other >> language that may be expected to use a different sorting for >> whatever reason. (This affects umlauts too, I assume? So that'd be >> most European languages having file lists entirely messed up, too.) >> The sorting shouldn't be stuck as something that just makes sense to >> programmers and balks at any special vowels, and it appears at least >> as of now there is just no way to fix this. >> >> Should desktop file managers like Nemo not be using this sorting >> function? Or is musl not intended for desktop use, and postmarketOS >> should switch? Otherwise, it seems like this omission in musl seems >> like kind of a big deal. Or is it really just me who is constantly >> confused as to where any file is at in any file lists...? >> >> Or in other words, would be kind of cool if this could be changed > > LC_COLLATE functionality is just not designed or implemented yet, due > to lack of interest/participation from folks who want it to happen. I > very much do want it to happen, but I don't want to design something > (data model for efficient collation tables & code to use them) only to > have it turn out not to meet everyone's/anyone's needs because there > was nobody to bounce questions/testing/what-if's off during the > design. > > A big part of this is probably that, historically, *nix users tend to > be happy with (or even prefer, which they can explicitly set via > exporting LC_COLLATE=C) codepoint-order sorting of directory entries, > like Makefile and README appearing at the top. So to get these folks > to care you have to have another setting where collation order > matters. > > I'm happy to restart the process for getting this done if ppl are > interested. > > Rich ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [musl] A journey of weird file sorting and desktop systems 2022-01-28 14:57 ` ellie @ 2022-01-28 16:58 ` enh 2022-01-28 18:01 ` Rich Felker 2022-01-28 18:01 ` Ariadne Conill 1 sibling, 1 reply; 10+ messages in thread From: enh @ 2022-01-28 16:58 UTC (permalink / raw) To: musl; +Cc: Rich Felker (Android's libc maintainer here...) i'd argue this isn't a musl bug. on Android we make a clear distinction between: 1. libc's responsibilities which, to paraphrase rich, are basically "be unsurprising because your audience is OS/app developers who don't speak all the languages their users use anyway". that is: "code point order". 2. icu's responsibilities which cover all the user-facing (as opposed to developer-facing) stuff. i18n is *hard* and the C/POSIX APIs are, to be blunt, not fit for *that* purpose. there's a reason why all of Android/macOS/Windows (and all the browsers) ship copies of icu. the bug here is that a desktop file manager is assuming "i just want telephone book order --- how hard can it be?". the answer turns out to be "hard". especially when you get into fun stuff like users who *do* speak multiple languages and have strong expectations for how they sort. or places where there are multiple sort orders in common use. you don't even need to be in very "exotic" languages to start hitting these things. German and Spanish will do fine. see https://unicode-org.github.io/icu/userguide/collation/ for a handful of specific examples. (as the maintainer of Android's Java i18n stuff before i ended up owning bionic, you'd be surprised at the extent to which even Java -- which tried pretty hard by 1990s standards -- doesn't really cover everything you need, not even for languages like Russian. so i don't think C/POSIX could have done a great job in the 1990s, and one of icu's main benefits is that it's been able to evolve to better support existing languages/support more languages rather than being ossified by an insufficient standard.) "if you care about your users, you need icu/CLDR" is the easy side of the argument. the flip side -- that libc *shouldn't* get involved -- is trickier. what convinced me was the amount of *breakage* you cause if you try to be "good guy greg"... it turns out no-one wants dotless i breaking their build just because their locale is a turkish/azeri locale, for example. (dotted/dotless i is by far the most common real-world issue i've seen.) but it's that kind of "text manipulation tool used during builds" that are most likely to use libc functionality, and although, sure, we can chase *everyone* making sure they set their locale to "C" when building ... are we helping at that point, or just making more work for everyone? (without actually solving the real problem for the folks who just want to use their file browser.) On Fri, Jan 28, 2022 at 7:06 AM ellie <el@horse64.org> wrote: > > I don't think nowadays the majority of users should be expected to be > traditional *nix users with terminal knowledge anymore. And most modern > desktop distros don't default to such a sorting as far as I can tell, > and instead to en_US or alike - but all those which use musl are left > stranded with "C" sorting. The type of users who are hit most by this > are not going to be the type who know what a terminal is, what musl is, > or how to voice their opinion on LC_COLLATE because their file manager > looks so weird. So if you want them to show up here that probably won't > happen. Beyond myself, I suppose. > > I think for a typical user-friendly desktop the need is kinda clear, so > I'm not sure what other sort of setting would need to be introduced > still. If musl is meant to be used on desktop distros, this just seems > kind of mandatory, or I'm not really getting why it wouldn't be. > > My apologies however if I'm misunderstanding, but that was basically > your question/what you're saying is delaying it, right? Sorry if you > didn't want further input from me on this, I hope I read your e-mail right > > On 1/28/22 3:10 PM, Rich Felker wrote: > > On Fri, Jan 28, 2022 at 02:41:38PM +0100, ellie wrote: > >> After spending a bit wondering why files like "elder1" and "Elder2" > >> end up at completely different spots in the file list on my > >> postmarketOS (=Alpine-based) system, I filed a ticket with the Nemo > >> file manager. Turns out Nemo just uses locale-dependent sorting, so > >> I spent an hour trying to set LC_COLLATE to fix this, until I > >> stumbled across the remark on musl's website that LC_COLLATE sorting > >> is simply not supported. So I seem to be stuck with this, which I > >> did not expect. > >> > >> This to me seems kind of disastrous on a desktop system. I just fail > >> to see any average default user (who doesn't know ASCII in their > >> head) expecting "elder1" and "Elder2" to be miles apart in a sorted > >> listing even as a default US person, let alone in some other > >> language that may be expected to use a different sorting for > >> whatever reason. (This affects umlauts too, I assume? So that'd be > >> most European languages having file lists entirely messed up, too.) > >> The sorting shouldn't be stuck as something that just makes sense to > >> programmers and balks at any special vowels, and it appears at least > >> as of now there is just no way to fix this. > >> > >> Should desktop file managers like Nemo not be using this sorting > >> function? Or is musl not intended for desktop use, and postmarketOS > >> should switch? Otherwise, it seems like this omission in musl seems > >> like kind of a big deal. Or is it really just me who is constantly > >> confused as to where any file is at in any file lists...? > >> > >> Or in other words, would be kind of cool if this could be changed > > > > LC_COLLATE functionality is just not designed or implemented yet, due > > to lack of interest/participation from folks who want it to happen. I > > very much do want it to happen, but I don't want to design something > > (data model for efficient collation tables & code to use them) only to > > have it turn out not to meet everyone's/anyone's needs because there > > was nobody to bounce questions/testing/what-if's off during the > > design. > > > > A big part of this is probably that, historically, *nix users tend to > > be happy with (or even prefer, which they can explicitly set via > > exporting LC_COLLATE=C) codepoint-order sorting of directory entries, > > like Makefile and README appearing at the top. So to get these folks > > to care you have to have another setting where collation order > > matters. > > > > I'm happy to restart the process for getting this done if ppl are > > interested. > > > > Rich ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [musl] A journey of weird file sorting and desktop systems 2022-01-28 16:58 ` enh @ 2022-01-28 18:01 ` Rich Felker 2022-01-28 18:33 ` enh 2022-01-28 19:47 ` Markus Wichmann 0 siblings, 2 replies; 10+ messages in thread From: Rich Felker @ 2022-01-28 18:01 UTC (permalink / raw) To: enh; +Cc: musl On Fri, Jan 28, 2022 at 08:58:30AM -0800, enh wrote: > (Android's libc maintainer here...) > > i'd argue this isn't a musl bug. on Android we make a clear distinction between: > > 1. libc's responsibilities which, to paraphrase rich, are basically > "be unsurprising because your audience is OS/app developers who don't > speak all the languages their users use anyway". that is: "code point > order". That's not what I said. I speculated that part of the difficulty with getting people to care is that a large number of users personally prefer LC_COLLATE=C. Not that we should punt because of that. > 2. icu's responsibilities which cover all the user-facing (as opposed > to developer-facing) stuff. i18n is *hard* and the C/POSIX APIs are, > to be blunt, not fit for *that* purpose. there's a reason why all of > Android/macOS/Windows (and all the browsers) ship copies of icu. ICU is really, *really* bad. I don't want to be encouraging people to use it because basic functionality is missing from libc. > the bug here is that a desktop file manager is assuming "i just want > telephone book order --- how hard can it be?". the answer turns out to > be "hard". especially when you get into fun stuff like users who *do* > speak multiple languages and have strong expectations for how they > sort. or places where there are multiple sort orders in common use. Absolutely. That's why I don't want to treat the problem half-assedly, but make sure we design or choose a format for the collation tables that's simultaneously (1) efficient, (2) sufficiently expressive to give the behaviors users may want, and (3) easy enough to understand that users can customize it if needed. The POSIX localedef format (an option group musl intentionally does not support) does not have any of those properties except maybe #2. The standard Unicode format may translate directly into something that can meet all 3; I'm not sure. Rich ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [musl] A journey of weird file sorting and desktop systems 2022-01-28 18:01 ` Rich Felker @ 2022-01-28 18:33 ` enh 2022-01-28 19:22 ` Rich Felker 2022-01-28 19:47 ` Markus Wichmann 1 sibling, 1 reply; 10+ messages in thread From: enh @ 2022-01-28 18:33 UTC (permalink / raw) To: Rich Felker; +Cc: musl On Fri, Jan 28, 2022 at 10:01 AM Rich Felker <dalias@libc.org> wrote: > > On Fri, Jan 28, 2022 at 08:58:30AM -0800, enh wrote: > > (Android's libc maintainer here...) > > > > i'd argue this isn't a musl bug. on Android we make a clear distinction between: > > > > 1. libc's responsibilities which, to paraphrase rich, are basically > > "be unsurprising because your audience is OS/app developers who don't > > speak all the languages their users use anyway". that is: "code point > > order". > > That's not what I said. I speculated that part of the difficulty with > getting people to care is that a large number of users personally > prefer LC_COLLATE=C. Not that we should punt because of that. > > > 2. icu's responsibilities which cover all the user-facing (as opposed > > to developer-facing) stuff. i18n is *hard* and the C/POSIX APIs are, > > to be blunt, not fit for *that* purpose. there's a reason why all of > > Android/macOS/Windows (and all the browsers) ship copies of icu. > > ICU is really, *really* bad. I don't want to be encouraging people to > use it because basic functionality is missing from libc. human languages are really really messy. a lot of the complexity is inherent. as for the non-inherent, https://github.com/unicode-org/icu4x seems like a good start. > > the bug here is that a desktop file manager is assuming "i just want > > telephone book order --- how hard can it be?". the answer turns out to > > be "hard". especially when you get into fun stuff like users who *do* > > speak multiple languages and have strong expectations for how they > > sort. or places where there are multiple sort orders in common use. > > Absolutely. That's why I don't want to treat the problem half-assedly, but that's my point --- it's not the *implementation* that's the issue, it's that the C/POSIX *interfaces* are insufficient. the bar on how good a job you _can_ do within those constraints is horribly low. > but make sure we design or choose a format for the collation tables > that's simultaneously (1) efficient, (2) sufficiently expressive to > give the behaviors users may want, and (3) easy enough to understand > that users can customize it if needed. The POSIX localedef format (an > option group musl intentionally does not support) does not have any of > those properties except maybe #2. The standard Unicode format may > translate directly into something that can meet all 3; I'm not sure. > > Rich ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [musl] A journey of weird file sorting and desktop systems 2022-01-28 18:33 ` enh @ 2022-01-28 19:22 ` Rich Felker 0 siblings, 0 replies; 10+ messages in thread From: Rich Felker @ 2022-01-28 19:22 UTC (permalink / raw) To: enh; +Cc: musl On Fri, Jan 28, 2022 at 10:33:53AM -0800, enh wrote: > On Fri, Jan 28, 2022 at 10:01 AM Rich Felker <dalias@libc.org> wrote: > > > > On Fri, Jan 28, 2022 at 08:58:30AM -0800, enh wrote: > > > (Android's libc maintainer here...) > > > > > > i'd argue this isn't a musl bug. on Android we make a clear distinction between: > > > > > > 1. libc's responsibilities which, to paraphrase rich, are basically > > > "be unsurprising because your audience is OS/app developers who don't > > > speak all the languages their users use anyway". that is: "code point > > > order". > > > > That's not what I said. I speculated that part of the difficulty with > > getting people to care is that a large number of users personally > > prefer LC_COLLATE=C. Not that we should punt because of that. > > > > > 2. icu's responsibilities which cover all the user-facing (as opposed > > > to developer-facing) stuff. i18n is *hard* and the C/POSIX APIs are, > > > to be blunt, not fit for *that* purpose. there's a reason why all of > > > Android/macOS/Windows (and all the browsers) ship copies of icu. > > > > ICU is really, *really* bad. I don't want to be encouraging people to > > use it because basic functionality is missing from libc. > > human languages are really really messy. a lot of the complexity is inherent. > > as for the non-inherent, https://github.com/unicode-org/icu4x seems > like a good start. The problems with ICU are all software engineering problems not problem-domain complexity problems. Bad resource-hungry choices with poor safety properties all over. > > > the bug here is that a desktop file manager is assuming "i just want > > > telephone book order --- how hard can it be?". the answer turns out to > > > be "hard". especially when you get into fun stuff like users who *do* > > > speak multiple languages and have strong expectations for how they > > > sort. or places where there are multiple sort orders in common use. > > > > Absolutely. That's why I don't want to treat the problem half-assedly, > > but that's my point --- it's not the *implementation* that's the > issue, it's that the C/POSIX *interfaces* are insufficient. the bar on > how good a job you _can_ do within those constraints is horribly low. I'm not sure what you mean by "the interfaces are insufficient" here. They're insufficient to do things they weren't meant to do (e.g. deal with data with multiple cultural conventions where the data has to be tagged with which conventions apply to it), but giving listings in a user's chosen collation order convention is something they're perfectly capable of doing. Most applications do not want to deal with (and do not even have the necessary metadata to deal with, since the raw data is plain text) the sort of mix the standard interfaces can't handle. They just want to give decent, culturally-non-surprising UX. Applications that do want to go beyond this can of course use the full Unicode data (via ICU or ideally a better alternative). Rich ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [musl] A journey of weird file sorting and desktop systems 2022-01-28 18:01 ` Rich Felker 2022-01-28 18:33 ` enh @ 2022-01-28 19:47 ` Markus Wichmann 1 sibling, 0 replies; 10+ messages in thread From: Markus Wichmann @ 2022-01-28 19:47 UTC (permalink / raw) To: musl On Fri, Jan 28, 2022 at 01:01:04PM -0500, Rich Felker wrote: > ICU is really, *really* bad. I don't want to be encouraging people to > use it because basic functionality is missing from libc. > But basic functionality *is* missing from libc, and by design. By the standard. For example, toupper and towupper can only return a single code point. That doesn't work with German's ß character, which has the capital form SS. If you were transforming some general German word group into block capitals for a headline or something, that is the transformation you would use. Now, some people have invented a capital version of ß, that is still new enough to make blocks appear in many programs (test your mail program here: ẞ), but that letter is not widely used. Also, many applications expect towupper and towlower to be inverse functions of each other, but here, not all instance of SS ought to be transformed to ß when passing them through towlower, even if the interface did support such a thing. My point is that the development of interfaces that deal with internationalization might be better put into a library with an interface less rigid than libc, where any adjustment moves at the glacial pace of the Austin Group or WG14, and in any case, breaking changes are completely out of the question. That is also why we still have gets() and strchr(). Whether ICU is a suitable library for that purpose I lack the expertise to say. However, all I have heard about it so far is either that one should use it to cure all i18n ills, or that it is an abomination unto the Lord. But even the people in the second camp fail to recommend a superior alternative. So I'm guessing there isn't one. As to the actual function in question: Simply having a possibility to switch strcoll to be the same as strcasecmp instead of strcmp would probably already be the 80% solution for most European languages. Yeah, it won't work with umlauts, but we Germans are used to that. "It is <current year> and we still can't do umlauts" is a common curse levelled at information technology, and for the most part it is apt. I routinely counsel against using umlauts in file names or pass phrases, because you never know what character set it gets saved in or transmitted later, and it just causes avoidable problems. I really doubt this issue will ever be solved within my lifetime. JM2C, Markus ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [musl] A journey of weird file sorting and desktop systems 2022-01-28 14:57 ` ellie 2022-01-28 16:58 ` enh @ 2022-01-28 18:01 ` Ariadne Conill 1 sibling, 0 replies; 10+ messages in thread From: Ariadne Conill @ 2022-01-28 18:01 UTC (permalink / raw) To: musl; +Cc: Rich Felker Hi, On Fri, 28 Jan 2022, ellie wrote: > I don't think nowadays the majority of users should be expected to be > traditional *nix users with terminal knowledge anymore. And most modern > desktop distros don't default to such a sorting as far as I can tell, and > instead to en_US or alike - but all those which use musl are left stranded > with "C" sorting. The type of users who are hit most by this are not going to > be the type who know what a terminal is, what musl is, or how to voice their > opinion on LC_COLLATE because their file manager looks so weird. So if you > want them to show up here that probably won't happen. Beyond myself, I > suppose. > > I think for a typical user-friendly desktop the need is kinda clear, so I'm > not sure what other sort of setting would need to be introduced still. If > musl is meant to be used on desktop distros, this just seems kind of > mandatory, or I'm not really getting why it wouldn't be. > > My apologies however if I'm misunderstanding, but that was basically your > question/what you're saying is delaying it, right? Sorry if you didn't want > further input from me on this, I hope I read your e-mail right LC_COLLATE is a desired feature in musl, but getting it right is going to take some work. We should want to be careful about it because we want to avoid having giant tables, or some plug-in architecture like GLIBC has, which was recently at the center of the pwnkit debacle. Ariadne ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [musl] A journey of weird file sorting and desktop systems 2022-01-28 14:10 ` Rich Felker 2022-01-28 14:57 ` ellie @ 2022-01-28 17:54 ` Ariadne Conill 1 sibling, 0 replies; 10+ messages in thread From: Ariadne Conill @ 2022-01-28 17:54 UTC (permalink / raw) To: musl; +Cc: ellie Hi, On Fri, 28 Jan 2022, Rich Felker wrote: > On Fri, Jan 28, 2022 at 02:41:38PM +0100, ellie wrote: >> After spending a bit wondering why files like "elder1" and "Elder2" >> end up at completely different spots in the file list on my >> postmarketOS (=Alpine-based) system, I filed a ticket with the Nemo >> file manager. Turns out Nemo just uses locale-dependent sorting, so >> I spent an hour trying to set LC_COLLATE to fix this, until I >> stumbled across the remark on musl's website that LC_COLLATE sorting >> is simply not supported. So I seem to be stuck with this, which I >> did not expect. >> >> This to me seems kind of disastrous on a desktop system. I just fail >> to see any average default user (who doesn't know ASCII in their >> head) expecting "elder1" and "Elder2" to be miles apart in a sorted >> listing even as a default US person, let alone in some other >> language that may be expected to use a different sorting for >> whatever reason. (This affects umlauts too, I assume? So that'd be >> most European languages having file lists entirely messed up, too.) >> The sorting shouldn't be stuck as something that just makes sense to >> programmers and balks at any special vowels, and it appears at least >> as of now there is just no way to fix this. >> >> Should desktop file managers like Nemo not be using this sorting >> function? Or is musl not intended for desktop use, and postmarketOS >> should switch? Otherwise, it seems like this omission in musl seems >> like kind of a big deal. Or is it really just me who is constantly >> confused as to where any file is at in any file lists...? >> >> Or in other words, would be kind of cool if this could be changed > > LC_COLLATE functionality is just not designed or implemented yet, due > to lack of interest/participation from folks who want it to happen. I > very much do want it to happen, but I don't want to design something > (data model for efficient collation tables & code to use them) only to > have it turn out not to meet everyone's/anyone's needs because there > was nobody to bounce questions/testing/what-if's off during the > design. > > A big part of this is probably that, historically, *nix users tend to > be happy with (or even prefer, which they can explicitly set via > exporting LC_COLLATE=C) codepoint-order sorting of directory entries, > like Makefile and README appearing at the top. So to get these folks > to care you have to have another setting where collation order > matters. A case-study might be PostgreSQL, but I believe we solved collation there by using the ICU library instead. Ariadne ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2022-01-28 19:47 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-01-28 13:41 [musl] A journey of weird file sorting and desktop systems ellie 2022-01-28 14:10 ` Rich Felker 2022-01-28 14:57 ` ellie 2022-01-28 16:58 ` enh 2022-01-28 18:01 ` Rich Felker 2022-01-28 18:33 ` enh 2022-01-28 19:22 ` Rich Felker 2022-01-28 19:47 ` Markus Wichmann 2022-01-28 18:01 ` Ariadne Conill 2022-01-28 17:54 ` Ariadne Conill
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).