From: arisawa <arisawa@ar.aichi-u.ac.jp>
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Subject: Re: [9fans] a strange bug in grep
Date: Sun, 30 Mar 2014 09:51:07 +0900 [thread overview]
Message-ID: <1225405A-877D-45E9-B8CB-03ADED337703@ar.aichi-u.ac.jp> (raw)
In-Reply-To: <938abc1c40e15468aa034d34b07a2d49@brasstown.quanstro.net>
thanks eric.
that fixed problems of my sample data!
Kenji Arisawa
2014/03/30 8:54、erik quanstrom <quanstro@quanstro.net> のメール:
>> Hello,
>>
>> I found a strange bug in grep.
>> some Japanese runes does not match ‘[^0-9]’.
>>
>> for example ‘ま' (307e) and ‘み’(307f).
>>
>
> i can't replicate here with 9atom's fixes to grep.
> with the same t3 file as you've got,
>
> ; wc -l /tmp/t3
> 21 /tmp/t3
> ; grep -v '^[0-9]' /tmp/t3 | wc -l
> 21
>
> i have some other differences in grep, including -I (same
> as -i, except fold runes), but i think the differences in
> comp.c are what cause the bug. in particular, you really
> need that 0xffff entry in the tabs.
>
> /n/sources/plan9/sys/src/cmd/grep/comp.c:135,145 - comp.c:135,147
> {
> 0x007f,
> 0x07ff,
> + 0xffff,
> };
> Rune tab2[] =
> {
> 0x003f,
> 0x0fff,
> + 0xffff,
> };
>
> Re2
>
> the additional pairs and the correction to the combining case
> here were not accepted to sources, but they allow for large character
> classes generated used by folding. many of the characters are contiguous
> so getting the contiguous case right is important.
>
> /n/sources/plan9/sys/src/cmd/grep/comp.c:215,221 - comp.c:217,223
> Re2
> re2class(char *s)
> {
> - Rune pairs[200+2], *p, *q, ov;
> + Rune pairs[400+2], *p, *q, ov;
> int nc;
> Re2 x;
>
> /n/sources/plan9/sys/src/cmd/grep/comp.c:234,240 - comp.c:236,242
> break;
> p[1] = *p;
> p += 2;
> - if(p >= pairs + nelem(pairs) - 2)
> + if(p == pairs + nelem(pairs) - 2)
> error("class too big");
> s += chartorune(p, s);
> if(*p != '-')
> /n/sources/plan9/sys/src/cmd/grep/comp.c:254,260 - comp.c:256,262
> for(p=pairs+2; *p; p+=2) {
> if(p[0] > p[1])
> continue;
> - if(p[0] > q[1] || p[1] < q[0]) {
> + if(p[0] > q[1]+1 || p[1] < q[0]) {
> q[2] = p[0];
> q[3] = p[1];
> q += 2;
>
> i believe this case is also critical. split the bmp off.
>
> /n/sources/plan9/sys/src/cmd/grep/comp.c:275,281 - comp.c:277,283
> x = re2or(x, rclass(ov, p[0]-1));
> ov = p[1]+1;
> }
> - x = re2or(x, rclass(ov, Runemask));
> + x = re2or(x, rclass(ov, 0xffff));
> } else {
> x = rclass(p[0], p[1]);
> for(p+=2; *p; p+=2)
>
> - erik
>
next prev parent reply other threads:[~2014-03-30 0:51 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-29 23:31 arisawa
2014-03-29 23:54 ` erik quanstrom
2014-03-30 0:51 ` arisawa [this message]
2014-03-30 1:44 ` cinap_lenrek
2014-03-30 6:10 ` erik quanstrom
2014-03-30 15:40 ` cinap_lenrek
2014-03-30 16:26 ` erik quanstrom
2014-03-30 18:05 ` cinap_lenrek
2014-03-30 18:10 ` erik quanstrom
2014-03-30 6:24 ` erik quanstrom
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1225405A-877D-45E9-B8CB-03ADED337703@ar.aichi-u.ac.jp \
--to=arisawa@ar.aichi-u.ac.jp \
--cc=9fans@9fans.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).