* [discuss] Sorting mixed alphabets with different locales
@ 2025-02-06 0:52 Christoph Binner via illumos-discuss
0 siblings, 0 replies; only message in thread
From: Christoph Binner via illumos-discuss @ 2025-02-06 0:52 UTC (permalink / raw)
To: illumos-discuss
[-- Attachment #1.1.1: Type: text/plain, Size: 1833 bytes --]
I have a directory containing Latin, Cyrillic and Asian filenames.
After listing the directory with a German locale it took me a good while
to figure out how the files were sorted:
• Latin filenames are sorted alphabetically (obviously)
• purely non-Latin filenames are sorted by their Unicode numbers
(resulting in a pretty plausible order)
• in filenames containing both Latin and non-Latin characters, the
non-Latin ones are completely ignored, e.g. "三丐丑D" is treated like "D"
• most confusingly, non-Latin filenames containing spaces (0x20, not
non-breaking ones) are treated differently, resulting in this order:
$ LANG=de_DE.UTF-8 ls -1
«мно»
абв
абв где
вгд
где
эюя
一丁丂
七七丅
三丐丑
абв где
вгд ежз
丁丂 丆万丈
丒专 且丕
123
456
789
abc
bc de
bcd
бвC
cde
三丐丑D
def
xyz
The C.UTF-8 locale doesn’t treat the spaces differently, resulting in a
less confusing list:
$ LANG=C.UTF-8 ls -1
123
456
789
abc
bc de
bcd
cde
def
xyz
«мно»
абв
абв где
абв где
бвC
вгд
вгд ежз
где
эюя
一丁丂
丁丂 丆万丈
七七丅
三丐丑
三丐丑D
丒专 且丕
With a Russian locale you still get the files with mixed Chinese/Latin
filenames interspersed with the Latin ones but no separate blocks of
files with/without spaces:
$ LANG=ru_RU.UTF-8 ls -1
一丁丂
七七丅
三丐丑
丁丂 丆万丈
丒专 且丕
123
456
789
abc
bc de
bcd
cde
三丐丑D
def
xyz
абв
абв где
абв где
бвC
вгд
вгд ежз
где
«мно»
эюя
While I assume the output above is allowed/required by a standard, I
think it would be more helpful to merge the sorting of filenames
with/without spaces.
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]
[-- Attachment #2: Type: multipart/mixed, Size: 47 bytes --]
This is a multi-part message in MIME format...
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2025-02-06 0:52 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-06 0:52 [discuss] Sorting mixed alphabets with different locales Christoph Binner via illumos-discuss
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).