9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: arisawa <arisawa@ar.aichi-u.ac.jp>
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Subject: Re: [9fans] a strange bug in grep
Date: Sun, 30 Mar 2014 09:51:07 +0900	[thread overview]
Message-ID: <1225405A-877D-45E9-B8CB-03ADED337703@ar.aichi-u.ac.jp> (raw)
In-Reply-To: <938abc1c40e15468aa034d34b07a2d49@brasstown.quanstro.net>

thanks eric.
that fixed problems of my sample data!

Kenji Arisawa

2014/03/30 8:54、erik quanstrom <quanstro@quanstro.net> のメール:

>> Hello,
>> 
>> I found a strange bug in grep.
>> some Japanese runes does not match ‘[^0-9]’.
>> 
>> for example ‘ま' (307e) and ‘み’(307f).
>> 
> 
> i can't replicate here with 9atom's fixes to grep.
> with the same t3 file as you've got,
> 
> 	; wc -l /tmp/t3
> 	     21 /tmp/t3
> 	; grep -v '^[0-9]' /tmp/t3 | wc -l
> 	     21
> 
> i have some other differences in grep, including -I (same
> as -i, except fold runes), but i think the differences in
> comp.c are what cause the bug.  in particular, you really
> need that 0xffff entry in the tabs.
> 
> /n/sources/plan9/sys/src/cmd/grep/comp.c:135,145 - comp.c:135,147
>  {
>  	0x007f,
>  	0x07ff,
> + 	0xffff,
>  };
>  Rune	tab2[] =
>  {
>  	0x003f,
>  	0x0fff,
> + 	0xffff,
>  };
> 
>  Re2
> 
> the additional pairs and the correction to the combining case
> here were not accepted to sources, but they allow for large character
> classes generated used by folding.  many of the characters are contiguous
> so getting the contiguous case right is important.
> 
> /n/sources/plan9/sys/src/cmd/grep/comp.c:215,221 - comp.c:217,223
>  Re2
>  re2class(char *s)
>  {
> - 	Rune pairs[200+2], *p, *q, ov;
> + 	Rune pairs[400+2], *p, *q, ov;
>  	int nc;
>  	Re2 x;
> 
> /n/sources/plan9/sys/src/cmd/grep/comp.c:234,240 - comp.c:236,242
>  			break;
>  		p[1] = *p;
>  		p += 2;
> - 		if(p >= pairs + nelem(pairs) - 2)
> + 		if(p == pairs + nelem(pairs) - 2)
>  			error("class too big");
>  		s += chartorune(p, s);
>  		if(*p != '-')
> /n/sources/plan9/sys/src/cmd/grep/comp.c:254,260 - comp.c:256,262
>  	for(p=pairs+2; *p; p+=2) {
>  		if(p[0] > p[1])
>  			continue;
> - 		if(p[0] > q[1] || p[1] < q[0]) {
> + 		if(p[0] > q[1]+1 || p[1] < q[0]) {
>  			q[2] = p[0];
>  			q[3] = p[1];
>  			q += 2;
> 
> i believe this case is also critical.  split the bmp off.
> 
> /n/sources/plan9/sys/src/cmd/grep/comp.c:275,281 - comp.c:277,283
>  			x = re2or(x, rclass(ov, p[0]-1));
>  			ov = p[1]+1;
>  		}
> - 		x = re2or(x, rclass(ov, Runemask));
> + 		x = re2or(x, rclass(ov, 0xffff));
>  	} else {
>  		x = rclass(p[0], p[1]);
>  		for(p+=2; *p; p+=2)
> 
> - erik
> 




  reply	other threads:[~2014-03-30  0:51 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-29 23:31 arisawa
2014-03-29 23:54 ` erik quanstrom
2014-03-30  0:51   ` arisawa [this message]
2014-03-30  1:44   ` cinap_lenrek
2014-03-30  6:10     ` erik quanstrom
2014-03-30 15:40       ` cinap_lenrek
2014-03-30 16:26         ` erik quanstrom
2014-03-30 18:05           ` cinap_lenrek
2014-03-30 18:10             ` erik quanstrom
2014-03-30  6:24     ` erik quanstrom

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1225405A-877D-45E9-B8CB-03ADED337703@ar.aichi-u.ac.jp \
    --to=arisawa@ar.aichi-u.ac.jp \
    --cc=9fans@9fans.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).