* Tests needed for byte-based C locale @ 2015-06-14 2:53 Rich Felker 2015-06-14 4:21 ` Rich Felker ` (2 more replies) 0 siblings, 3 replies; 7+ messages in thread From: Rich Felker @ 2015-06-14 2:53 UTC (permalink / raw) To: musl Here are some basic tests I'd like to have for the byte-based C locale, preferably most of them before committing the code and all before a release containing it. Any help writing them (for the libc-test framework) would be much appreciated. Short of that, even just some quick sanity checks using existing programs (e.g. busybox utils with regex/fnmatch usage) would be helpful. Regex & fnmatch: - Literals with arbitrary high bytes match. - Brackets match byte values/ranges. Multibyte functions (test r/non-r, string funcs, and btowc/wctob too): - Successful round-trip for arbitrary bytes. - Wchar values outside 0-7f & df80-dfff ranges give EILSEQ. Iconv: - Conversions to/from UTF-8 don't break in C locale. Stdio: - Encoding rule bound at time stream becomes oriented. - All wide functions cause orientation/binding of encoding rule. - Byte printf/scanf use current locale for %ls/%lc/%l[, not file's. - Wide printf/scanf use current locale for %s/%c/%[, not file's. - Wchar values outside 0-7f & df80-dfff ranges give EILSEQ. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Tests needed for byte-based C locale 2015-06-14 2:53 Tests needed for byte-based C locale Rich Felker @ 2015-06-14 4:21 ` Rich Felker 2015-06-14 12:13 ` Christian Neukirchen 2015-06-14 14:21 ` Christian Neukirchen 2 siblings, 0 replies; 7+ messages in thread From: Rich Felker @ 2015-06-14 4:21 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 770 bytes --] On Sat, Jun 13, 2015 at 10:53:42PM -0400, Rich Felker wrote: > Here are some basic tests I'd like to have for the byte-based C > locale, preferably most of them before committing the code and all > before a release containing it. Any help writing them (for the > libc-test framework) would be much appreciated. Short of that, even > just some quick sanity checks using existing programs (e.g. busybox > utils with regex/fnmatch usage) would be helpful. > > Multibyte functions (test r/non-r, string funcs, and btowc/wctob too): > > - Successful round-trip for arbitrary bytes. > - Wchar values outside 0-7f & df80-dfff ranges give EILSEQ. These (mb/wc funcs) are largely covered by the attached file, in libc-test form. Ideas for additional checks are welcome. Rich [-- Attachment #2: clocale_mbfuncs.c --] [-- Type: text/plain, Size: 1925 bytes --] #include <stdio.h> #include <string.h> #include <wchar.h> #include <stdlib.h> #include <locale.h> #include <langinfo.h> #include <limits.h> #include "test.h" int main(void) { int i, j; mbstate_t st, st2; wchar_t wc, map[257], wtmp[257]; char s[MB_LEN_MAX*256]; size_t rv; int c; int ni_errors=0; setlocale(LC_CTYPE, "C"); if (MB_CUR_MAX != 1) t_error("MB_CUR_MAX = %d, expected 1\n", (int)MB_CUR_MAX); for (i=0; i<256; i++) { st = (mbstate_t){0}; if (mbrtowc(&wc, &(char){i}, 1, &st) != !!i) t_error("mbrtowc failed to convert byte %.2x to wchar_t\n", i); if ((map[i]=btowc(i)) == WEOF) { t_error("btowc failed to convert byte %.2x to wchar_t\n", i); continue; } for (j=0; j<i; j++) { if (map[j]==map[i]) t_error("bytes %.2x and %.2x map to same wchar_t %.4x\n", j, i, (unsigned)map[i]); } } for (i=0; i<256; i++) { if (map[i]==WEOF) continue; if (wctob(map[i]) != i) t_error("wctob failed to convert wchar_t %.4x back to byte %.2x\n", (unsigned)map[i], i); } /* covering whole 32-bit range would be too slow... maybe add random high tests? */ for (i=0; i<0x110000; i++) { if (wcschr(map+1, i)) continue; if ((c=wctob(i)) != WEOF && ni_errors++ < 50) t_error("wctob accepted non-image wchar_t %.4x as byte %.2x\n", i, c); st = (mbstate_t){0}; if (wcrtomb(s, i, &st) != -1 && ni_errors++ < 50) t_error("wcrtomb accepted non-image wchar_t %.4x\n", i); } if (ni_errors > 50) t_error("additional %d non-image errors (not printed)\n", ni_errors); map[256] = 0; st = (mbstate_t){0}; if ((rv=wcsrtombs(s, &(const wchar_t *){map+1}, sizeof s, &st)) != 255) t_error("wcsrtombs returned %zd, expected 255\n", rv); if ((rv=mbsrtowcs(wtmp, &(const char *){s}, 256, &st)) != 255) t_error("mbsrtowcs returned %zd, expected 255\n", rv); if (memcmp(map+1, wtmp, 256*sizeof(*map))) t_error("wcsrtombs/mbsrtowcs round trip failed\n"); return t_status; } ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Tests needed for byte-based C locale 2015-06-14 2:53 Tests needed for byte-based C locale Rich Felker 2015-06-14 4:21 ` Rich Felker @ 2015-06-14 12:13 ` Christian Neukirchen 2015-06-14 13:22 ` Rich Felker 2015-06-14 14:21 ` Christian Neukirchen 2 siblings, 1 reply; 7+ messages in thread From: Christian Neukirchen @ 2015-06-14 12:13 UTC (permalink / raw) To: Rich Felker; +Cc: musl Rich Felker <dalias@libc.org> writes: > Here are some basic tests I'd like to have for the byte-based C > locale, preferably most of them before committing the code and all > before a release containing it. Any help writing them (for the > libc-test framework) would be much appreciated. Short of that, even > just some quick sanity checks using existing programs (e.g. busybox > utils with regex/fnmatch usage) would be helpful. > > Regex & fnmatch: > > - Literals with arbitrary high bytes match. > - Brackets match byte values/ranges. With GNU grep 2.21 + these patches: $ printf 'foo\x80bar' | LANG=C grep f foo�bar $ printf 'foo\x80bar' | LC_ALL=C grep 'o.b' foo�bar $ printf 'foo\x80bar' | LC_ALL=C grep $'[\177-\201]' foo�bar (I guess this only tests binary detection, tho.) -- Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Tests needed for byte-based C locale 2015-06-14 12:13 ` Christian Neukirchen @ 2015-06-14 13:22 ` Rich Felker 0 siblings, 0 replies; 7+ messages in thread From: Rich Felker @ 2015-06-14 13:22 UTC (permalink / raw) To: musl On Sun, Jun 14, 2015 at 02:13:49PM +0200, Christian Neukirchen wrote: > Rich Felker <dalias@libc.org> writes: > > > Here are some basic tests I'd like to have for the byte-based C > > locale, preferably most of them before committing the code and all > > before a release containing it. Any help writing them (for the > > libc-test framework) would be much appreciated. Short of that, even > > just some quick sanity checks using existing programs (e.g. busybox > > utils with regex/fnmatch usage) would be helpful. > > > > Regex & fnmatch: > > > > - Literals with arbitrary high bytes match. > > - Brackets match byte values/ranges. > > With GNU grep 2.21 + these patches: > > $ printf 'foo\x80bar' | LANG=C grep f > foo�bar > $ printf 'foo\x80bar' | LC_ALL=C grep 'o.b' > foo�bar > $ printf 'foo\x80bar' | LC_ALL=C grep $'[\177-\201]' > foo�bar > > (I guess this only tests binary detection, tho.) Obviously this isn't comprehensive but it's nice to see a confirmation that one of the desired usage cases is working as intended. Thanks! Rich ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Tests needed for byte-based C locale 2015-06-14 2:53 Tests needed for byte-based C locale Rich Felker 2015-06-14 4:21 ` Rich Felker 2015-06-14 12:13 ` Christian Neukirchen @ 2015-06-14 14:21 ` Christian Neukirchen 2015-06-14 14:37 ` Szabolcs Nagy 2 siblings, 1 reply; 7+ messages in thread From: Christian Neukirchen @ 2015-06-14 14:21 UTC (permalink / raw) To: Rich Felker; +Cc: musl Rich Felker <dalias@libc.org> writes: > Here are some basic tests I'd like to have for the byte-based C > locale, preferably most of them before committing the code and all > before a release containing it. Any help writing them (for the > libc-test framework) would be much appreciated. Short of that, even > just some quick sanity checks using existing programs (e.g. busybox > utils with regex/fnmatch usage) would be helpful. > > Regex & fnmatch: > > - Literals with arbitrary high bytes match. > - Brackets match byte values/ranges. printf 'foo\x80bar\n' | busybox grep f.*b <nothing> printf 'foo\x80bar\n' | LANG=C LC_ALL=C busybox grep f.*b foo�bar printf 'foo\x80bar\n' | LANG=C LC_ALL=C busybox grep f.b <nothing again, why?> Breakpoint 1, regexec (preg=0x7ffff7ffe668, string=0x7ffff7ffec00 "foo\200bar", nmatch=1, pmatch=0x7ffff7ffe6a8, eflags=0) at src/regex/regexec.c:983 then returns 1. -- Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Tests needed for byte-based C locale 2015-06-14 14:21 ` Christian Neukirchen @ 2015-06-14 14:37 ` Szabolcs Nagy 2015-06-14 15:09 ` Christian Neukirchen 0 siblings, 1 reply; 7+ messages in thread From: Szabolcs Nagy @ 2015-06-14 14:37 UTC (permalink / raw) To: musl; +Cc: Rich Felker * Christian Neukirchen <chneukirchen@gmail.com> [2015-06-14 16:21:56 +0200]: > printf 'foo\x80bar\n' | busybox grep f.*b > <nothing> > printf 'foo\x80bar\n' | LANG=C LC_ALL=C busybox grep f.*b > foo???bar > printf 'foo\x80bar\n' | LANG=C LC_ALL=C busybox grep f.b > <nothing again, why?> > foo\x80bar does not match f.b it should match foo.b ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Tests needed for byte-based C locale 2015-06-14 14:37 ` Szabolcs Nagy @ 2015-06-14 15:09 ` Christian Neukirchen 0 siblings, 0 replies; 7+ messages in thread From: Christian Neukirchen @ 2015-06-14 15:09 UTC (permalink / raw) To: musl; +Cc: Rich Felker Szabolcs Nagy <nsz@port70.net> writes: > * Christian Neukirchen <chneukirchen@gmail.com> [2015-06-14 16:21:56 +0200]: >> printf 'foo\x80bar\n' | busybox grep f.*b >> <nothing> >> printf 'foo\x80bar\n' | LANG=C LC_ALL=C busybox grep f.*b >> foo???bar >> printf 'foo\x80bar\n' | LANG=C LC_ALL=C busybox grep f.b >> <nothing again, why?> >> > > foo\x80bar does not match f.b > it should match foo.b I'm stupid. It works fine then. :) -- Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-06-14 15:09 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-06-14 2:53 Tests needed for byte-based C locale Rich Felker 2015-06-14 4:21 ` Rich Felker 2015-06-14 12:13 ` Christian Neukirchen 2015-06-14 13:22 ` Rich Felker 2015-06-14 14:21 ` Christian Neukirchen 2015-06-14 14:37 ` Szabolcs Nagy 2015-06-14 15:09 ` Christian Neukirchen
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).