9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] grep bug?
@ 2013-07-11 13:11 arisawa
  2013-07-11 14:05 ` erik quanstrom
  0 siblings, 1 reply; 4+ messages in thread
From: arisawa @ 2013-07-11 13:11 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Hello,

It seems f option of grep is buggy.
or any limitations in using the RE?

term% wc MD5dir
   4584    9168  388756 MD5dir
term% wc x
   4582    4582  151206 x
term% grep -f x MD5dir | wc
   4580    9160  388463
term%
term% grep e54272690d513f8b2403568a7574b1ba MD5dir
e54272690d513f8b2403568a7574b1ba /usr/arisawa/src/taskfs/Q/task.387.a/
term% grep e54272690d513f8b2403568a7574b1ba x
e54272690d513f8b2403568a7574b1ba
term% grep -v -f x MD5dir
7b6d7ae369226b6d0195ac3fe4487ce7 /usr/arisawa/src/elnfs/WWW/
d44d788ad1237311d8282bbabca65977 /usr/arisawa/src/hg/python-2.5.1-ape/Modules/_ctypes/libffi/src/darwin/
e54272690d513f8b2403568a7574b1ba /usr/arisawa/src/taskfs/Q/task.387.a/
84a0f83f5020f16d0b277e8b19407791 /usr/arisawa/src/trans
term% 

Kenji Arisawa





^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [9fans] grep bug?
  2013-07-11 13:11 [9fans] grep bug? arisawa
@ 2013-07-11 14:05 ` erik quanstrom
  2013-07-11 21:39   ` arisawa
  0 siblings, 1 reply; 4+ messages in thread
From: erik quanstrom @ 2013-07-11 14:05 UTC (permalink / raw)
  To: 9fans

On Thu Jul 11 09:13:10 EDT 2013, arisawa@ar.aichi-u.ac.jp wrote:
> Hello,
>
> It seems f option of grep is buggy.
> or any limitations in using the RE?
>
> term% wc MD5dir
>    4584    9168  388756 MD5dir
> term% wc x
>    4582    4582  151206 x
> term% grep -f x MD5dir | wc
>    4580    9160  388463
> term%
> term% grep e54272690d513f8b2403568a7574b1ba MD5dir
> e54272690d513f8b2403568a7574b1ba /usr/arisawa/src/taskfs/Q/task.387.a/
> term% grep e54272690d513f8b2403568a7574b1ba x
> e54272690d513f8b2403568a7574b1ba
> term% grep -v -f x MD5dir
> 7b6d7ae369226b6d0195ac3fe4487ce7 /usr/arisawa/src/elnfs/WWW/
> d44d788ad1237311d8282bbabca65977 /usr/arisawa/src/hg/python-2.5.1-ape/Modules/_ctypes/libffi/src/darwin/
> e54272690d513f8b2403568a7574b1ba /usr/arisawa/src/taskfs/Q/task.387.a/
> 84a0f83f5020f16d0b277e8b19407791 /usr/arisawa/src/trans
> term%

a trick i often use for many fixed strings is sort + uniq.
(internally, grep/comp.c:/^increment does O(n^2)
qsorts on the patterns) perhaps it could be used to
double-check.

to find the md5 hashes that only appear in one file or the other
(only the first field is considered by uniq),

	cat x MD5dir | sort | uniq -c | sed '/^ *2 /d'

to count the fields that appear in both

	cat x MD5dir | sort | uniq -c | grep '^ *2 ' | wc -l
or
			...	| awk '$1==2{n++}END{print n}'

can you find a smaller test case that has the same issue.  this
should be fixed

- erik



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [9fans] grep bug?
  2013-07-11 14:05 ` erik quanstrom
@ 2013-07-11 21:39   ` arisawa
  2013-07-21  3:29     ` [9fans] grep bug again arisawa
  0 siblings, 1 reply; 4+ messages in thread
From: arisawa @ 2013-07-11 21:39 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Thanks erik,

It is not easy to get small sample.
I tried to make smaller sample discarding the first line of pattern file and target file.
the problem depends not only pattern file but also target file!
it is curious why grep does not give conclusion to each line that is read from target file.

my result is shown below.
t0 and  t1 are target file
z0 and z1 are pattern file

term% echo $t
e54272690d513f8b2403568a7574b1ba
term% grep -n $t z0 z1 t0 t1
z0:4081: e54272690d513f8b2403568a7574b1ba
z1:4082: e54272690d513f8b2403568a7574b1ba
t0:4298: e54272690d513f8b2403568a7574b1ba /usr/arisawa/src/taskfs/Q/task.387.a/
t1:4299: e54272690d513f8b2403568a7574b1ba /usr/arisawa/src/taskfs/Q/task.387.a/
term% wc z0 z1 t0 t1
   4575    4575  150975 z0
   4576    4576  151008 z1
   4583    9166  388702 t0
   4584    9168  388756 t1
  18318   27485 1079441 total
term% diff -c z0 z1
z0:1,3 - z1:1,4
+ 00775d6a004acb79a2cd3ec30f743a9e
  008de90ca02f6c4f10e2f172e0511105
  00cfd51ea3ea1152f98c1f90130be7d0
  00cff9d5f2b4d08753f96564dac50a58
term% diff -c t0 t1
t0:1,3 - t1:1,4
+ f104706a82b7c20e0f2c3cf83033958c /usr/arisawa/bin/rc/
  521db7c46291f0785d8d77f8e614350a /usr/arisawa/bin/rc/photo/
  e76b64877aea9c8b4dcc24069436d52d /usr/arisawa/lib/
  1d6ca6d83f5f51ebfdd94fea02a9fb8b /usr/arisawa/lib/cookies/
term% grep -f z0 t0 | grep $t
e54272690d513f8b2403568a7574b1ba /usr/arisawa/src/taskfs/Q/task.387.a/
term% grep -f z1 t0 | grep $t
e54272690d513f8b2403568a7574b1ba /usr/arisawa/src/taskfs/Q/task.387.a/
term% grep -f z0 t1 | grep $t
e54272690d513f8b2403568a7574b1ba /usr/arisawa/src/taskfs/Q/task.387.a/
term% grep -f z1 t1 | grep $t
term% 

Kenji Arisawa




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [9fans] grep bug again
  2013-07-11 21:39   ` arisawa
@ 2013-07-21  3:29     ` arisawa
  0 siblings, 0 replies; 4+ messages in thread
From: arisawa @ 2013-07-21  3:29 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Hello,

grep -vf z z
is a good test.

term% p z
850f815f90e7498364c668b9e0774d96
851cef1c518242f977be3ce40714b7f8
852142a8b4f175d4fa9003ab30743106
8522f3bc66efa1edaa1e6c495e2e7b89
852684fef763a4e36d57b99a85ab366b
85377206b2292dd884a5f502f846c7d1
...
term% wc z
   2209    2209   72897 z
term% grep -vf z z
feeaa668ee79fbbde6f7539dac41a2ed
term% grep feeaa668ee79fbbde6f7539dac41a2ed z
feeaa668ee79fbbde6f7539dac41a2ed
term%

The sample file is:
http://plan9.aichi-u.ac.jp/z.gz

Kenji Arisawa




^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-07-21  3:29 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-11 13:11 [9fans] grep bug? arisawa
2013-07-11 14:05 ` erik quanstrom
2013-07-11 21:39   ` arisawa
2013-07-21  3:29     ` [9fans] grep bug again arisawa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).