From mboxrd@z Thu Jan 1 00:00:00 1970 From: erik quanstrom Date: Thu, 11 Jul 2013 10:05:18 -0400 To: 9fans@9fans.net Message-ID: <24ccd68343090eaea81aabf6e539c2b1@coraid.com> In-Reply-To: <18706FFE-E7CD-4A93-B69E-604AC2370B1E@ar.aichi-u.ac.jp> References: <18706FFE-E7CD-4A93-B69E-604AC2370B1E@ar.aichi-u.ac.jp> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Subject: Re: [9fans] grep bug? Topicbox-Message-UUID: 6bba428e-ead8-11e9-9d60-3106f5b1d025 On Thu Jul 11 09:13:10 EDT 2013, arisawa@ar.aichi-u.ac.jp wrote: > Hello, > > It seems f option of grep is buggy. > or any limitations in using the RE? > > term% wc MD5dir > 4584 9168 388756 MD5dir > term% wc x > 4582 4582 151206 x > term% grep -f x MD5dir | wc > 4580 9160 388463 > term% > term% grep e54272690d513f8b2403568a7574b1ba MD5dir > e54272690d513f8b2403568a7574b1ba /usr/arisawa/src/taskfs/Q/task.387.a/ > term% grep e54272690d513f8b2403568a7574b1ba x > e54272690d513f8b2403568a7574b1ba > term% grep -v -f x MD5dir > 7b6d7ae369226b6d0195ac3fe4487ce7 /usr/arisawa/src/elnfs/WWW/ > d44d788ad1237311d8282bbabca65977 /usr/arisawa/src/hg/python-2.5.1-ape/Modules/_ctypes/libffi/src/darwin/ > e54272690d513f8b2403568a7574b1ba /usr/arisawa/src/taskfs/Q/task.387.a/ > 84a0f83f5020f16d0b277e8b19407791 /usr/arisawa/src/trans > term% a trick i often use for many fixed strings is sort + uniq. (internally, grep/comp.c:/^increment does O(n^2) qsorts on the patterns) perhaps it could be used to double-check. to find the md5 hashes that only appear in one file or the other (only the first field is considered by uniq), cat x MD5dir | sort | uniq -c | sed '/^ *2 /d' to count the fields that appear in both cat x MD5dir | sort | uniq -c | grep '^ *2 ' | wc -l or ... | awk '$1==2{n++}END{print n}' can you find a smaller test case that has the same issue. this should be fixed - erik