From mboxrd@z Thu Jan  1 00:00:00 1970
From: erik quanstrom <quanstro@quanstro.net>
Date: Sun, 14 Nov 2010 17:16:55 -0500
To: 9fans@9fans.net
Message-ID: <156f86de45ec6bfd35c0bc7375f4815e@plug.quanstro.net>
In-Reply-To: <AANLkTinWRGEcvk9N5A4ejGarsuw9b0E5LVjUP6ySO+1T@mail.gmail.com>
References: <703b2539-027e-4f9f-a739-00b59f6d3d82@v28g2000vbb.googlegroups.com>
	<AANLkTinAxGdn9_rPkfQd3e8kZUDaXGb0dwm9m6WgKyXz@mail.gmail.com>
	<AANLkTi=JoL-0Fv2ZFZKu2YKBb9RUasMJcMLcBB-WLuBJ@mail.gmail.com>
	<AANLkTikyyRnQCQho62LpxYoGCwbTc66c45Q-zCWYrGS_@mail.gmail.com>
	<AANLkTi=qZKFznwTLLCLqbAzgX9kzkoGeg_esY3x59bY6@mail.gmail.com>
	<20101113192425.GC22589@nibiru.local>
	<AA291601-9BEF-444D-B3D1-D7CCA75AF316@vaughan.pe>
	<284949CC-81F5-4791-91C1-13357BC23E7D@9srv.net>
	<AANLkTimhephZwyJZ6CdmnLhMRDDVEExPPpqgLKmCWL8L@mail.gmail.com>
	<464879d2da7b991d8d56371eab78f08e@plug.quanstro.net>
	<AANLkTinWRGEcvk9N5A4ejGarsuw9b0E5LVjUP6ySO+1T@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
Subject: Re: [9fans] Plan9 development
Topicbox-Message-UUID: 7e3274c4-ead6-11e9-9d60-3106f5b1d025

> > unfortunately, the last i checked, gnu grep mallocs
> > for each byte of input when using a utf-8 locale.
>
> that bug was fixed in gnu grep years ago,
> probably before you found and reported it.
> unfortunately, linux distributions were for
> many years not updating their copies of
> gnu grep to the latest version, so very few
> '/bin/grep's had the bug fix.

if i recall correctly, i found that in 2004 or 2005
and fixed it directly from the gnu.org source.
perhaps you remember something i don't.

in any event, it's still not really fixed.  utf-8
performance still sucks:

; grep --version >[2=1] | sed 1q
GNU grep 2.5.4
; time grep missingstring1 mail.tar
0.00u 0.00s 0.01r 	 grep missingstring1 mail.tar  # status=1
LANG=en_US.UTF-8 time grep missingstring1 mail.tar
0.44u 0.00s 0.53r 	 grep missingstring1 mail.tar  # status=1

- erik