From mboxrd@z Thu Jan 1 00:00:00 1970 From: erik quanstrom Date: Sun, 29 Nov 2009 23:29:30 -0500 To: 9fans@9fans.net Message-ID: In-Reply-To: <> References: <> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Subject: Re: [9fans] =?utf-8?q?gr=C3=ABp_=28rhymes_with_creep=29_and_cptmp?= Topicbox-Message-UUID: a50ebdc4-ead5-11e9-9d60-3106f5b1d025 On Sun Nov 29 14:03:23 EST 2009, jason.catena@gmail.com wrote: > I wrote a wrapper around grep to search for words regardless of > accents. I didn't want to worry about whether I used accents on > characters (I sometimes use them inconsistently, and others decidedly > do), but I still wanted to limit the results to exact matches if I > supplied an accent. Here's an example run. hey, this is great stuff! i really like the approach. i played with this a little bit, but quickly ran into problems. the patterns get really big in a hurry. "reasonable" re size limits of say 300 characters just don't work if you're doing expansion. expanding "cooperate" results in a 460-byte string! so i went back to an old idea. i hope you won't accuse me of topperism, but you finally motivated me to work on something i threatened to do at iwp9 2e: add folding to grep. it was right up my alley since i just recently redid the rune tables that i've been using. they're built directly from UnicodeData.txt. it wasn't too hard to build a table that folds modified letters to a base with the unicode data. from there, i reused the same same technique used for case folding. since the table i'm using don't fold case, "grep -Ii" makes sense. performance is pretty good. worse case is about 2x the user time. there's no overhead when the I flag isn't given. the source is in /n/sources/contrib/quanstro/src/grepfold. please let me know of any bugs. i'm sure there are a few wierd cases. let me know if there are. - erik