From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 7.2 \(1874\)) From: arisawa In-Reply-To: <938abc1c40e15468aa034d34b07a2d49@brasstown.quanstro.net> Date: Sun, 30 Mar 2014 09:51:07 +0900 Content-Transfer-Encoding: quoted-printable Message-Id: <1225405A-877D-45E9-B8CB-03ADED337703@ar.aichi-u.ac.jp> References: <76A0A13C-0D91-4620-A282-A581C206A9FA@ar.aichi-u.ac.jp> <938abc1c40e15468aa034d34b07a2d49@brasstown.quanstro.net> To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Subject: Re: [9fans] a strange bug in grep Topicbox-Message-UUID: d2b55e6a-ead8-11e9-9d60-3106f5b1d025 thanks eric. that fixed problems of my sample data! Kenji Arisawa 2014/03/30 8:54=E3=80=81erik quanstrom = =E3=81=AE=E3=83=A1=E3=83=BC=E3=83=AB=EF=BC=9A >> Hello, >>=20 >> I found a strange bug in grep. >> some Japanese runes does not match =E2=80=98[^0-9]=E2=80=99. >>=20 >> for example =E2=80=98=E3=81=BE' (307e) and =E2=80=98=E3=81=BF=E2=80=99(= 307f). >>=20 >=20 > i can't replicate here with 9atom's fixes to grep. > with the same t3 file as you've got, >=20 > ; wc -l /tmp/t3 > 21 /tmp/t3 > ; grep -v '^[0-9]' /tmp/t3 | wc -l > 21 >=20 > i have some other differences in grep, including -I (same > as -i, except fold runes), but i think the differences in > comp.c are what cause the bug. in particular, you really > need that 0xffff entry in the tabs. >=20 > /n/sources/plan9/sys/src/cmd/grep/comp.c:135,145 - comp.c:135,147 > { > 0x007f, > 0x07ff, > + 0xffff, > }; > Rune tab2[] =3D > { > 0x003f, > 0x0fff, > + 0xffff, > }; >=20 > Re2 >=20 > the additional pairs and the correction to the combining case > here were not accepted to sources, but they allow for large character > classes generated used by folding. many of the characters are = contiguous > so getting the contiguous case right is important. >=20 > /n/sources/plan9/sys/src/cmd/grep/comp.c:215,221 - comp.c:217,223 > Re2 > re2class(char *s) > { > - Rune pairs[200+2], *p, *q, ov; > + Rune pairs[400+2], *p, *q, ov; > int nc; > Re2 x; >=20 > /n/sources/plan9/sys/src/cmd/grep/comp.c:234,240 - comp.c:236,242 > break; > p[1] =3D *p; > p +=3D 2; > - if(p >=3D pairs + nelem(pairs) - 2) > + if(p =3D=3D pairs + nelem(pairs) - 2) > error("class too big"); > s +=3D chartorune(p, s); > if(*p !=3D '-') > /n/sources/plan9/sys/src/cmd/grep/comp.c:254,260 - comp.c:256,262 > for(p=3Dpairs+2; *p; p+=3D2) { > if(p[0] > p[1]) > continue; > - if(p[0] > q[1] || p[1] < q[0]) { > + if(p[0] > q[1]+1 || p[1] < q[0]) { > q[2] =3D p[0]; > q[3] =3D p[1]; > q +=3D 2; >=20 > i believe this case is also critical. split the bmp off. >=20 > /n/sources/plan9/sys/src/cmd/grep/comp.c:275,281 - comp.c:277,283 > x =3D re2or(x, rclass(ov, p[0]-1)); > ov =3D p[1]+1; > } > - x =3D re2or(x, rclass(ov, Runemask)); > + x =3D re2or(x, rclass(ov, 0xffff)); > } else { > x =3D rclass(p[0], p[1]); > for(p+=3D2; *p; p+=3D2) >=20 > - erik >=20