From: erik quanstrom <quanstro@quanstro.net>
To: 9fans@9fans.net
Subject: Re: [9fans] a strange bug in grep
Date: Sat, 29 Mar 2014 19:54:15 -0400 [thread overview]
Message-ID: <938abc1c40e15468aa034d34b07a2d49@brasstown.quanstro.net> (raw)
In-Reply-To: <76A0A13C-0D91-4620-A282-A581C206A9FA@ar.aichi-u.ac.jp>
> Hello,
>
> I found a strange bug in grep.
> some Japanese runes does not match ‘[^0-9]’.
>
> for example ‘ま' (307e) and ‘み’(307f).
>
i can't replicate here with 9atom's fixes to grep.
with the same t3 file as you've got,
; wc -l /tmp/t3
21 /tmp/t3
; grep -v '^[0-9]' /tmp/t3 | wc -l
21
i have some other differences in grep, including -I (same
as -i, except fold runes), but i think the differences in
comp.c are what cause the bug. in particular, you really
need that 0xffff entry in the tabs.
/n/sources/plan9/sys/src/cmd/grep/comp.c:135,145 - comp.c:135,147
{
0x007f,
0x07ff,
+ 0xffff,
};
Rune tab2[] =
{
0x003f,
0x0fff,
+ 0xffff,
};
Re2
the additional pairs and the correction to the combining case
here were not accepted to sources, but they allow for large character
classes generated used by folding. many of the characters are contiguous
so getting the contiguous case right is important.
/n/sources/plan9/sys/src/cmd/grep/comp.c:215,221 - comp.c:217,223
Re2
re2class(char *s)
{
- Rune pairs[200+2], *p, *q, ov;
+ Rune pairs[400+2], *p, *q, ov;
int nc;
Re2 x;
/n/sources/plan9/sys/src/cmd/grep/comp.c:234,240 - comp.c:236,242
break;
p[1] = *p;
p += 2;
- if(p >= pairs + nelem(pairs) - 2)
+ if(p == pairs + nelem(pairs) - 2)
error("class too big");
s += chartorune(p, s);
if(*p != '-')
/n/sources/plan9/sys/src/cmd/grep/comp.c:254,260 - comp.c:256,262
for(p=pairs+2; *p; p+=2) {
if(p[0] > p[1])
continue;
- if(p[0] > q[1] || p[1] < q[0]) {
+ if(p[0] > q[1]+1 || p[1] < q[0]) {
q[2] = p[0];
q[3] = p[1];
q += 2;
i believe this case is also critical. split the bmp off.
/n/sources/plan9/sys/src/cmd/grep/comp.c:275,281 - comp.c:277,283
x = re2or(x, rclass(ov, p[0]-1));
ov = p[1]+1;
}
- x = re2or(x, rclass(ov, Runemask));
+ x = re2or(x, rclass(ov, 0xffff));
} else {
x = rclass(p[0], p[1]);
for(p+=2; *p; p+=2)
- erik
next prev parent reply other threads:[~2014-03-29 23:54 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-29 23:31 arisawa
2014-03-29 23:54 ` erik quanstrom [this message]
2014-03-30 0:51 ` arisawa
2014-03-30 1:44 ` cinap_lenrek
2014-03-30 6:10 ` erik quanstrom
2014-03-30 15:40 ` cinap_lenrek
2014-03-30 16:26 ` erik quanstrom
2014-03-30 18:05 ` cinap_lenrek
2014-03-30 18:10 ` erik quanstrom
2014-03-30 6:24 ` erik quanstrom
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=938abc1c40e15468aa034d34b07a2d49@brasstown.quanstro.net \
--to=quanstro@quanstro.net \
--cc=9fans@9fans.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).