From mboxrd@z Thu Jan 1 00:00:00 1970 From: erik quanstrom Date: Sun, 14 Nov 2010 17:16:55 -0500 To: 9fans@9fans.net Message-ID: <156f86de45ec6bfd35c0bc7375f4815e@plug.quanstro.net> In-Reply-To: References: <703b2539-027e-4f9f-a739-00b59f6d3d82@v28g2000vbb.googlegroups.com> <20101113192425.GC22589@nibiru.local> <284949CC-81F5-4791-91C1-13357BC23E7D@9srv.net> <464879d2da7b991d8d56371eab78f08e@plug.quanstro.net> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Subject: Re: [9fans] Plan9 development Topicbox-Message-UUID: 7e3274c4-ead6-11e9-9d60-3106f5b1d025 > > unfortunately, the last i checked, gnu grep mallocs > > for each byte of input when using a utf-8 locale. > > that bug was fixed in gnu grep years ago, > probably before you found and reported it. > unfortunately, linux distributions were for > many years not updating their copies of > gnu grep to the latest version, so very few > '/bin/grep's had the bug fix. if i recall correctly, i found that in 2004 or 2005 and fixed it directly from the gnu.org source. perhaps you remember something i don't. in any event, it's still not really fixed. utf-8 performance still sucks: ; grep --version >[2=1] | sed 1q GNU grep 2.5.4 ; time grep missingstring1 mail.tar 0.00u 0.00s 0.01r grep missingstring1 mail.tar # status=1 LANG=en_US.UTF-8 time grep missingstring1 mail.tar 0.44u 0.00s 0.53r grep missingstring1 mail.tar # status=1 - erik