* Tests needed for byte-based C locale
@ 2015-06-14 2:53 Rich Felker
2015-06-14 4:21 ` Rich Felker
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Rich Felker @ 2015-06-14 2:53 UTC (permalink / raw)
To: musl
Here are some basic tests I'd like to have for the byte-based C
locale, preferably most of them before committing the code and all
before a release containing it. Any help writing them (for the
libc-test framework) would be much appreciated. Short of that, even
just some quick sanity checks using existing programs (e.g. busybox
utils with regex/fnmatch usage) would be helpful.
Regex & fnmatch:
- Literals with arbitrary high bytes match.
- Brackets match byte values/ranges.
Multibyte functions (test r/non-r, string funcs, and btowc/wctob too):
- Successful round-trip for arbitrary bytes.
- Wchar values outside 0-7f & df80-dfff ranges give EILSEQ.
Iconv:
- Conversions to/from UTF-8 don't break in C locale.
Stdio:
- Encoding rule bound at time stream becomes oriented.
- All wide functions cause orientation/binding of encoding rule.
- Byte printf/scanf use current locale for %ls/%lc/%l[, not file's.
- Wide printf/scanf use current locale for %s/%c/%[, not file's.
- Wchar values outside 0-7f & df80-dfff ranges give EILSEQ.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Tests needed for byte-based C locale
2015-06-14 2:53 Tests needed for byte-based C locale Rich Felker
@ 2015-06-14 4:21 ` Rich Felker
2015-06-14 12:13 ` Christian Neukirchen
2015-06-14 14:21 ` Christian Neukirchen
2 siblings, 0 replies; 7+ messages in thread
From: Rich Felker @ 2015-06-14 4:21 UTC (permalink / raw)
To: musl
[-- Attachment #1: Type: text/plain, Size: 770 bytes --]
On Sat, Jun 13, 2015 at 10:53:42PM -0400, Rich Felker wrote:
> Here are some basic tests I'd like to have for the byte-based C
> locale, preferably most of them before committing the code and all
> before a release containing it. Any help writing them (for the
> libc-test framework) would be much appreciated. Short of that, even
> just some quick sanity checks using existing programs (e.g. busybox
> utils with regex/fnmatch usage) would be helpful.
>
> Multibyte functions (test r/non-r, string funcs, and btowc/wctob too):
>
> - Successful round-trip for arbitrary bytes.
> - Wchar values outside 0-7f & df80-dfff ranges give EILSEQ.
These (mb/wc funcs) are largely covered by the attached file, in
libc-test form. Ideas for additional checks are welcome.
Rich
[-- Attachment #2: clocale_mbfuncs.c --]
[-- Type: text/plain, Size: 1925 bytes --]
#include <stdio.h>
#include <string.h>
#include <wchar.h>
#include <stdlib.h>
#include <locale.h>
#include <langinfo.h>
#include <limits.h>
#include "test.h"
int main(void)
{
int i, j;
mbstate_t st, st2;
wchar_t wc, map[257], wtmp[257];
char s[MB_LEN_MAX*256];
size_t rv;
int c;
int ni_errors=0;
setlocale(LC_CTYPE, "C");
if (MB_CUR_MAX != 1) t_error("MB_CUR_MAX = %d, expected 1\n", (int)MB_CUR_MAX);
for (i=0; i<256; i++) {
st = (mbstate_t){0};
if (mbrtowc(&wc, &(char){i}, 1, &st) != !!i)
t_error("mbrtowc failed to convert byte %.2x to wchar_t\n", i);
if ((map[i]=btowc(i)) == WEOF) {
t_error("btowc failed to convert byte %.2x to wchar_t\n", i);
continue;
}
for (j=0; j<i; j++) {
if (map[j]==map[i])
t_error("bytes %.2x and %.2x map to same wchar_t %.4x\n", j, i, (unsigned)map[i]);
}
}
for (i=0; i<256; i++) {
if (map[i]==WEOF) continue;
if (wctob(map[i]) != i)
t_error("wctob failed to convert wchar_t %.4x back to byte %.2x\n", (unsigned)map[i], i);
}
/* covering whole 32-bit range would be too slow... maybe add random high tests? */
for (i=0; i<0x110000; i++) {
if (wcschr(map+1, i)) continue;
if ((c=wctob(i)) != WEOF && ni_errors++ < 50)
t_error("wctob accepted non-image wchar_t %.4x as byte %.2x\n", i, c);
st = (mbstate_t){0};
if (wcrtomb(s, i, &st) != -1 && ni_errors++ < 50)
t_error("wcrtomb accepted non-image wchar_t %.4x\n", i);
}
if (ni_errors > 50)
t_error("additional %d non-image errors (not printed)\n", ni_errors);
map[256] = 0;
st = (mbstate_t){0};
if ((rv=wcsrtombs(s, &(const wchar_t *){map+1}, sizeof s, &st)) != 255)
t_error("wcsrtombs returned %zd, expected 255\n", rv);
if ((rv=mbsrtowcs(wtmp, &(const char *){s}, 256, &st)) != 255)
t_error("mbsrtowcs returned %zd, expected 255\n", rv);
if (memcmp(map+1, wtmp, 256*sizeof(*map)))
t_error("wcsrtombs/mbsrtowcs round trip failed\n");
return t_status;
}
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Tests needed for byte-based C locale
2015-06-14 2:53 Tests needed for byte-based C locale Rich Felker
2015-06-14 4:21 ` Rich Felker
@ 2015-06-14 12:13 ` Christian Neukirchen
2015-06-14 13:22 ` Rich Felker
2015-06-14 14:21 ` Christian Neukirchen
2 siblings, 1 reply; 7+ messages in thread
From: Christian Neukirchen @ 2015-06-14 12:13 UTC (permalink / raw)
To: Rich Felker; +Cc: musl
Rich Felker <dalias@libc.org> writes:
> Here are some basic tests I'd like to have for the byte-based C
> locale, preferably most of them before committing the code and all
> before a release containing it. Any help writing them (for the
> libc-test framework) would be much appreciated. Short of that, even
> just some quick sanity checks using existing programs (e.g. busybox
> utils with regex/fnmatch usage) would be helpful.
>
> Regex & fnmatch:
>
> - Literals with arbitrary high bytes match.
> - Brackets match byte values/ranges.
With GNU grep 2.21 + these patches:
$ printf 'foo\x80bar' | LANG=C grep f
foo�bar
$ printf 'foo\x80bar' | LC_ALL=C grep 'o.b'
foo�bar
$ printf 'foo\x80bar' | LC_ALL=C grep $'[\177-\201]'
foo�bar
(I guess this only tests binary detection, tho.)
--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Tests needed for byte-based C locale
2015-06-14 12:13 ` Christian Neukirchen
@ 2015-06-14 13:22 ` Rich Felker
0 siblings, 0 replies; 7+ messages in thread
From: Rich Felker @ 2015-06-14 13:22 UTC (permalink / raw)
To: musl
On Sun, Jun 14, 2015 at 02:13:49PM +0200, Christian Neukirchen wrote:
> Rich Felker <dalias@libc.org> writes:
>
> > Here are some basic tests I'd like to have for the byte-based C
> > locale, preferably most of them before committing the code and all
> > before a release containing it. Any help writing them (for the
> > libc-test framework) would be much appreciated. Short of that, even
> > just some quick sanity checks using existing programs (e.g. busybox
> > utils with regex/fnmatch usage) would be helpful.
> >
> > Regex & fnmatch:
> >
> > - Literals with arbitrary high bytes match.
> > - Brackets match byte values/ranges.
>
> With GNU grep 2.21 + these patches:
>
> $ printf 'foo\x80bar' | LANG=C grep f
> foo�bar
> $ printf 'foo\x80bar' | LC_ALL=C grep 'o.b'
> foo�bar
> $ printf 'foo\x80bar' | LC_ALL=C grep $'[\177-\201]'
> foo�bar
>
> (I guess this only tests binary detection, tho.)
Obviously this isn't comprehensive but it's nice to see a confirmation
that one of the desired usage cases is working as intended. Thanks!
Rich
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Tests needed for byte-based C locale
2015-06-14 2:53 Tests needed for byte-based C locale Rich Felker
2015-06-14 4:21 ` Rich Felker
2015-06-14 12:13 ` Christian Neukirchen
@ 2015-06-14 14:21 ` Christian Neukirchen
2015-06-14 14:37 ` Szabolcs Nagy
2 siblings, 1 reply; 7+ messages in thread
From: Christian Neukirchen @ 2015-06-14 14:21 UTC (permalink / raw)
To: Rich Felker; +Cc: musl
Rich Felker <dalias@libc.org> writes:
> Here are some basic tests I'd like to have for the byte-based C
> locale, preferably most of them before committing the code and all
> before a release containing it. Any help writing them (for the
> libc-test framework) would be much appreciated. Short of that, even
> just some quick sanity checks using existing programs (e.g. busybox
> utils with regex/fnmatch usage) would be helpful.
>
> Regex & fnmatch:
>
> - Literals with arbitrary high bytes match.
> - Brackets match byte values/ranges.
printf 'foo\x80bar\n' | busybox grep f.*b
<nothing>
printf 'foo\x80bar\n' | LANG=C LC_ALL=C busybox grep f.*b
foo�bar
printf 'foo\x80bar\n' | LANG=C LC_ALL=C busybox grep f.b
<nothing again, why?>
Breakpoint 1, regexec (preg=0x7ffff7ffe668,
string=0x7ffff7ffec00 "foo\200bar", nmatch=1, pmatch=0x7ffff7ffe6a8,
eflags=0) at src/regex/regexec.c:983
then returns 1.
--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Tests needed for byte-based C locale
2015-06-14 14:21 ` Christian Neukirchen
@ 2015-06-14 14:37 ` Szabolcs Nagy
2015-06-14 15:09 ` Christian Neukirchen
0 siblings, 1 reply; 7+ messages in thread
From: Szabolcs Nagy @ 2015-06-14 14:37 UTC (permalink / raw)
To: musl; +Cc: Rich Felker
* Christian Neukirchen <chneukirchen@gmail.com> [2015-06-14 16:21:56 +0200]:
> printf 'foo\x80bar\n' | busybox grep f.*b
> <nothing>
> printf 'foo\x80bar\n' | LANG=C LC_ALL=C busybox grep f.*b
> foo???bar
> printf 'foo\x80bar\n' | LANG=C LC_ALL=C busybox grep f.b
> <nothing again, why?>
>
foo\x80bar does not match f.b
it should match foo.b
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Tests needed for byte-based C locale
2015-06-14 14:37 ` Szabolcs Nagy
@ 2015-06-14 15:09 ` Christian Neukirchen
0 siblings, 0 replies; 7+ messages in thread
From: Christian Neukirchen @ 2015-06-14 15:09 UTC (permalink / raw)
To: musl; +Cc: Rich Felker
Szabolcs Nagy <nsz@port70.net> writes:
> * Christian Neukirchen <chneukirchen@gmail.com> [2015-06-14 16:21:56 +0200]:
>> printf 'foo\x80bar\n' | busybox grep f.*b
>> <nothing>
>> printf 'foo\x80bar\n' | LANG=C LC_ALL=C busybox grep f.*b
>> foo???bar
>> printf 'foo\x80bar\n' | LANG=C LC_ALL=C busybox grep f.b
>> <nothing again, why?>
>>
>
> foo\x80bar does not match f.b
> it should match foo.b
I'm stupid. It works fine then. :)
--
Christian Neukirchen <chneukirchen@gmail.com> http://chneukirchen.org
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-06-14 15:09 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-14 2:53 Tests needed for byte-based C locale Rich Felker
2015-06-14 4:21 ` Rich Felker
2015-06-14 12:13 ` Christian Neukirchen
2015-06-14 13:22 ` Rich Felker
2015-06-14 14:21 ` Christian Neukirchen
2015-06-14 14:37 ` Szabolcs Nagy
2015-06-14 15:09 ` Christian Neukirchen
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).