mailing list of musl libc
 help / color / mirror / code / Atom feed
* wcscoll does not collate properly, even en_US
@ 2017-11-26 21:33 A. Wilcox
  2017-11-26 22:32 ` Rich Felker
  0 siblings, 1 reply; 2+ messages in thread
From: A. Wilcox @ 2017-11-26 21:33 UTC (permalink / raw)
  To: musl


[-- Attachment #1.1: Type: text/plain, Size: 1022 bytes --]

Hi.

My understanding is that musl does not want to support collation in
non-English languages (at least, not yet), but collation is supported in
American English.

glib's test suite is failing on musl now because the locale code is just
functional enough to make glib not skip the tests entirely (1.1.16
failed the 'setlocale is giving us the locale we set back' test), yet
collation doesn't work.  wcscoll is giving the same result as wcscmp.
This is wrong; a simple test case is attached.  Run on a glibc machine,
a FreeBSD machine, and a Solaris machine, it will output:

Amy
bug
cat
Gaz
Tom

On musl it (incorrectly) currently outputs:

Amy
Gaz
Tom
bug
cat

Does this mean my understanding was wrong and musl does not even support
AmE collation?  This is going to affect everything from `ls` to GUI file
managers like Dolphin or Nautilus to email software sorting by sender or
subject.

Regards,
--arw

-- 
A. Wilcox (awilfox)
Project Lead, Adélie Linux
http://adelielinux.org

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.2: wcscoll-test.c --]
[-- Type: text/x-csrc; name="wcscoll-test.c", Size: 900 bytes --]

#include <locale.h>	/* setlocale */
#include <stdio.h>	/* wprintf */
#include <stdlib.h>	/* calloc, free, qsort, EXIT_* */
#include <string.h>	/* mbstowcs */
#include <wchar.h>	/* wcscoll */

static int my_collate(const void *p1, const void *p2)
{
	return wcscoll(*(const wchar_t **)p1, *(const wchar_t **)p2);
}

int main(void)
{
	char *loc;
	const char *stuff[5] = { "bug", "Amy", "Tom", "Gaz", "cat" };
	wchar_t *strs[5];
	setlocale(LC_ALL, "en_US.UTF-8");
	loc = setlocale(LC_ALL, NULL);
	if(loc == NULL || strcmp(loc, "en_US.UTF-8") != 0)
	{
		perror("setlocale");
		return EXIT_FAILURE;
	}

	for(int i = 0; i < 5; i++)
	{
		strs[i] = calloc(sizeof(wchar_t), 4);
		mbstowcs(strs[i], stuff[i], 3);
	}

	qsort(&strs, 5, sizeof(wchar_t *), my_collate);

	for(int i = 0; i < 5; i++)
	{
		wprintf(L"%ls\n", strs[i]);
		free(strs[i]);
	}
	return EXIT_SUCCESS;
}

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: wcscoll does not collate properly, even en_US
  2017-11-26 21:33 wcscoll does not collate properly, even en_US A. Wilcox
@ 2017-11-26 22:32 ` Rich Felker
  0 siblings, 0 replies; 2+ messages in thread
From: Rich Felker @ 2017-11-26 22:32 UTC (permalink / raw)
  To: musl

On Sun, Nov 26, 2017 at 03:33:00PM -0600, A. Wilcox wrote:
> Hi.
> 
> My understanding is that musl does not want to support collation in
> non-English languages (at least, not yet), but collation is supported in
> American English.

I'm not sure where you got that understanding. The information on the
wiki states that proper LC_COLLATE functionality (anything but raw
codepoint order) is intended future functionality but not yet done.
Nowhere in musl is "American English is supported but nothing else is"
an acceptable policy (and as such a hypothetical patch to add
hard-coded LC_COLLATE for American English without a general framework
capable of supporting arbitrary languages would be rejected by me).

> glib's test suite is failing on musl now because the locale code is just
> functional enough to make glib not skip the tests entirely (1.1.16
> failed the 'setlocale is giving us the locale we set back' test), yet
> collation doesn't work.  wcscoll is giving the same result as wcscmp.
> This is wrong; a simple test case is attached.  Run on a glibc machine,
> a FreeBSD machine, and a Solaris machine, it will output:

This is known. kaniini (from Alpine) and others have brought similar
things to my attention and the issue of how current setlocale behavior
affects applications/tests is under discussion in the thread
"setlocale behavior with 'missing' locales". I'd really like further
feedback on it so that the proposed changes don't end up being a worse
problem that we have to revert/throw-away.

While we're on the topic of getting things in a state so that locale
functionality is actually usable, the thread "Bikeshed invitation for
nl_langinfo ambiguities" also needs some attention.

Rich


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2017-11-26 22:32 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-26 21:33 wcscoll does not collate properly, even en_US A. Wilcox
2017-11-26 22:32 ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).