mailing list of musl libc
 help / color / mirror / code / Atom feed
* Tests needed for byte-based C locale
@ 2015-06-14  2:53 Rich Felker
  2015-06-14  4:21 ` Rich Felker
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Rich Felker @ 2015-06-14  2:53 UTC (permalink / raw)
  To: musl

Here are some basic tests I'd like to have for the byte-based C
locale, preferably most of them before committing the code and all
before a release containing it. Any help writing them (for the
libc-test framework) would be much appreciated. Short of that, even
just some quick sanity checks using existing programs (e.g. busybox
utils with regex/fnmatch usage) would be helpful.

Regex & fnmatch:

- Literals with arbitrary high bytes match.
- Brackets match byte values/ranges.

Multibyte functions (test r/non-r, string funcs, and btowc/wctob too):

- Successful round-trip for arbitrary bytes.
- Wchar values outside 0-7f & df80-dfff ranges give EILSEQ.

Iconv:

- Conversions to/from UTF-8 don't break in C locale.

Stdio:

- Encoding rule bound at time stream becomes oriented.
- All wide functions cause orientation/binding of encoding rule.
- Byte printf/scanf use current locale for %ls/%lc/%l[, not file's.
- Wide printf/scanf use current locale for %s/%c/%[, not file's.
- Wchar values outside 0-7f & df80-dfff ranges give EILSEQ.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Tests needed for byte-based C locale
  2015-06-14  2:53 Tests needed for byte-based C locale Rich Felker
@ 2015-06-14  4:21 ` Rich Felker
  2015-06-14 12:13 ` Christian Neukirchen
  2015-06-14 14:21 ` Christian Neukirchen
  2 siblings, 0 replies; 7+ messages in thread
From: Rich Felker @ 2015-06-14  4:21 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 770 bytes --]

On Sat, Jun 13, 2015 at 10:53:42PM -0400, Rich Felker wrote:
> Here are some basic tests I'd like to have for the byte-based C
> locale, preferably most of them before committing the code and all
> before a release containing it. Any help writing them (for the
> libc-test framework) would be much appreciated. Short of that, even
> just some quick sanity checks using existing programs (e.g. busybox
> utils with regex/fnmatch usage) would be helpful.
> 
> Multibyte functions (test r/non-r, string funcs, and btowc/wctob too):
> 
> - Successful round-trip for arbitrary bytes.
> - Wchar values outside 0-7f & df80-dfff ranges give EILSEQ.

These (mb/wc funcs) are largely covered by the attached file, in
libc-test form. Ideas for additional checks are welcome.

Rich

[-- Attachment #2: clocale_mbfuncs.c --]
[-- Type: text/plain, Size: 1925 bytes --]

#include <stdio.h>
#include <string.h>
#include <wchar.h>
#include <stdlib.h>
#include <locale.h>
#include <langinfo.h>
#include <limits.h>
#include "test.h"

int main(void)
{
	int i, j;
	mbstate_t st, st2;
	wchar_t wc, map[257], wtmp[257];
	char s[MB_LEN_MAX*256];
	size_t rv;
	int c;
	int ni_errors=0;

	setlocale(LC_CTYPE, "C");

	if (MB_CUR_MAX != 1) t_error("MB_CUR_MAX = %d, expected 1\n", (int)MB_CUR_MAX);

	for (i=0; i<256; i++) {
		st = (mbstate_t){0};
		if (mbrtowc(&wc, &(char){i}, 1, &st) != !!i)
			t_error("mbrtowc failed to convert byte %.2x to wchar_t\n", i);
		if ((map[i]=btowc(i)) == WEOF) {
			t_error("btowc failed to convert byte %.2x to wchar_t\n", i);
			continue;
		}
		for (j=0; j<i; j++) {
			if (map[j]==map[i])
				t_error("bytes %.2x and %.2x map to same wchar_t %.4x\n", j, i, (unsigned)map[i]);
		}
	}

	for (i=0; i<256; i++) {
		if (map[i]==WEOF) continue;
		if (wctob(map[i]) != i)
			t_error("wctob failed to convert wchar_t %.4x back to byte %.2x\n", (unsigned)map[i], i);
	}

	/* covering whole 32-bit range would be too slow... maybe add random high tests? */
	for (i=0; i<0x110000; i++) {
		if (wcschr(map+1, i)) continue;
		if ((c=wctob(i)) != WEOF && ni_errors++ < 50)
			t_error("wctob accepted non-image wchar_t %.4x as byte %.2x\n", i, c);
		st = (mbstate_t){0};
		if (wcrtomb(s, i, &st) != -1  && ni_errors++ < 50)
			t_error("wcrtomb accepted non-image wchar_t %.4x\n", i);
	}
	if (ni_errors > 50)
		t_error("additional %d non-image errors (not printed)\n", ni_errors);

	map[256] = 0;
	st = (mbstate_t){0};
	if ((rv=wcsrtombs(s, &(const wchar_t *){map+1}, sizeof s, &st)) != 255)
		t_error("wcsrtombs returned %zd, expected 255\n", rv);
	if ((rv=mbsrtowcs(wtmp, &(const char *){s}, 256, &st)) != 255)
		t_error("mbsrtowcs returned %zd, expected 255\n", rv);
	if (memcmp(map+1, wtmp, 256*sizeof(*map)))
		t_error("wcsrtombs/mbsrtowcs round trip failed\n");

	return t_status;
}

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Tests needed for byte-based C locale
  2015-06-14  2:53 Tests needed for byte-based C locale Rich Felker
  2015-06-14  4:21 ` Rich Felker
@ 2015-06-14 12:13 ` Christian Neukirchen
  2015-06-14 13:22   ` Rich Felker
  2015-06-14 14:21 ` Christian Neukirchen
  2 siblings, 1 reply; 7+ messages in thread
From: Christian Neukirchen @ 2015-06-14 12:13 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

Rich Felker <dalias@libc.org> writes:

> Here are some basic tests I'd like to have for the byte-based C
> locale, preferably most of them before committing the code and all
> before a release containing it. Any help writing them (for the
> libc-test framework) would be much appreciated. Short of that, even
> just some quick sanity checks using existing programs (e.g. busybox
> utils with regex/fnmatch usage) would be helpful.
>
> Regex & fnmatch:
>
> - Literals with arbitrary high bytes match.
> - Brackets match byte values/ranges.

With GNU grep 2.21 + these patches:

$ printf 'foo\x80bar' | LANG=C grep f                  
foo�bar
$ printf 'foo\x80bar' | LC_ALL=C grep 'o.b'
foo�bar
$ printf 'foo\x80bar' | LC_ALL=C grep $'[\177-\201]' 
foo�bar

(I guess this only tests binary detection, tho.)

-- 
Christian Neukirchen  <chneukirchen@gmail.com>  http://chneukirchen.org


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Tests needed for byte-based C locale
  2015-06-14 12:13 ` Christian Neukirchen
@ 2015-06-14 13:22   ` Rich Felker
  0 siblings, 0 replies; 7+ messages in thread
From: Rich Felker @ 2015-06-14 13:22 UTC (permalink / raw)
  To: musl

On Sun, Jun 14, 2015 at 02:13:49PM +0200, Christian Neukirchen wrote:
> Rich Felker <dalias@libc.org> writes:
> 
> > Here are some basic tests I'd like to have for the byte-based C
> > locale, preferably most of them before committing the code and all
> > before a release containing it. Any help writing them (for the
> > libc-test framework) would be much appreciated. Short of that, even
> > just some quick sanity checks using existing programs (e.g. busybox
> > utils with regex/fnmatch usage) would be helpful.
> >
> > Regex & fnmatch:
> >
> > - Literals with arbitrary high bytes match.
> > - Brackets match byte values/ranges.
> 
> With GNU grep 2.21 + these patches:
> 
> $ printf 'foo\x80bar' | LANG=C grep f                  
> foo�bar
> $ printf 'foo\x80bar' | LC_ALL=C grep 'o.b'
> foo�bar
> $ printf 'foo\x80bar' | LC_ALL=C grep $'[\177-\201]' 
> foo�bar
> 
> (I guess this only tests binary detection, tho.)

Obviously this isn't comprehensive but it's nice to see a confirmation
that one of the desired usage cases is working as intended. Thanks!

Rich


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Tests needed for byte-based C locale
  2015-06-14  2:53 Tests needed for byte-based C locale Rich Felker
  2015-06-14  4:21 ` Rich Felker
  2015-06-14 12:13 ` Christian Neukirchen
@ 2015-06-14 14:21 ` Christian Neukirchen
  2015-06-14 14:37   ` Szabolcs Nagy
  2 siblings, 1 reply; 7+ messages in thread
From: Christian Neukirchen @ 2015-06-14 14:21 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

Rich Felker <dalias@libc.org> writes:

> Here are some basic tests I'd like to have for the byte-based C
> locale, preferably most of them before committing the code and all
> before a release containing it. Any help writing them (for the
> libc-test framework) would be much appreciated. Short of that, even
> just some quick sanity checks using existing programs (e.g. busybox
> utils with regex/fnmatch usage) would be helpful.
>
> Regex & fnmatch:
>
> - Literals with arbitrary high bytes match.
> - Brackets match byte values/ranges.

printf 'foo\x80bar\n' | busybox grep f.*b
<nothing>
printf 'foo\x80bar\n' | LANG=C LC_ALL=C busybox grep f.*b
foo�bar
printf 'foo\x80bar\n' | LANG=C LC_ALL=C busybox grep f.b
<nothing again, why?>

Breakpoint 1, regexec (preg=0x7ffff7ffe668, 
    string=0x7ffff7ffec00 "foo\200bar", nmatch=1, pmatch=0x7ffff7ffe6a8, 
    eflags=0) at src/regex/regexec.c:983
then returns 1.

-- 
Christian Neukirchen  <chneukirchen@gmail.com>  http://chneukirchen.org


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Tests needed for byte-based C locale
  2015-06-14 14:21 ` Christian Neukirchen
@ 2015-06-14 14:37   ` Szabolcs Nagy
  2015-06-14 15:09     ` Christian Neukirchen
  0 siblings, 1 reply; 7+ messages in thread
From: Szabolcs Nagy @ 2015-06-14 14:37 UTC (permalink / raw)
  To: musl; +Cc: Rich Felker

* Christian Neukirchen <chneukirchen@gmail.com> [2015-06-14 16:21:56 +0200]:
> printf 'foo\x80bar\n' | busybox grep f.*b
> <nothing>
> printf 'foo\x80bar\n' | LANG=C LC_ALL=C busybox grep f.*b
> foo???bar
> printf 'foo\x80bar\n' | LANG=C LC_ALL=C busybox grep f.b
> <nothing again, why?>
> 

foo\x80bar does not match f.b
it should match foo.b


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Tests needed for byte-based C locale
  2015-06-14 14:37   ` Szabolcs Nagy
@ 2015-06-14 15:09     ` Christian Neukirchen
  0 siblings, 0 replies; 7+ messages in thread
From: Christian Neukirchen @ 2015-06-14 15:09 UTC (permalink / raw)
  To: musl; +Cc: Rich Felker

Szabolcs Nagy <nsz@port70.net> writes:

> * Christian Neukirchen <chneukirchen@gmail.com> [2015-06-14 16:21:56 +0200]:
>> printf 'foo\x80bar\n' | busybox grep f.*b
>> <nothing>
>> printf 'foo\x80bar\n' | LANG=C LC_ALL=C busybox grep f.*b
>> foo???bar
>> printf 'foo\x80bar\n' | LANG=C LC_ALL=C busybox grep f.b
>> <nothing again, why?>
>> 
>
> foo\x80bar does not match f.b
> it should match foo.b

I'm stupid.  It works fine then. :)

-- 
Christian Neukirchen  <chneukirchen@gmail.com>  http://chneukirchen.org


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-06-14 15:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-14  2:53 Tests needed for byte-based C locale Rich Felker
2015-06-14  4:21 ` Rich Felker
2015-06-14 12:13 ` Christian Neukirchen
2015-06-14 13:22   ` Rich Felker
2015-06-14 14:21 ` Christian Neukirchen
2015-06-14 14:37   ` Szabolcs Nagy
2015-06-14 15:09     ` Christian Neukirchen

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).