From mboxrd@z Thu Jan  1 00:00:00 1970
From: erik quanstrom <quanstro@quanstro.net>
Date: Sun, 29 Nov 2009 23:29:30 -0500
To: 9fans@9fans.net
Message-ID: <cb038fc012830f0a8f6cad5b76beb980@ladd.quanstro.net>
In-Reply-To: <<d50d7d460911291101k7420eb0fna61f87646606e991@mail.gmail.com>>
References: <<d50d7d460911291101k7420eb0fna61f87646606e991@mail.gmail.com>>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
Subject: Re: [9fans] =?utf-8?q?gr=C3=ABp_=28rhymes_with_creep=29_and_cptmp?=
Topicbox-Message-UUID: a50ebdc4-ead5-11e9-9d60-3106f5b1d025

On Sun Nov 29 14:03:23 EST 2009, jason.catena@gmail.com wrote:

> I wrote a wrapper around grep to search for words regardless of
> accents.  I didn't want to worry about whether I used accents on
> characters (I sometimes use them inconsistently, and others decidedly
> do), but I still wanted to limit the results to exact matches if I
> supplied an accent.  Here's an example run.

hey, this is great stuff!  i really like the approach.  i played with
this a little bit, but quickly ran into problems.  the patterns get
really big in a hurry.  "reasonable" re size limits of say 300 characters
just don't work if you're doing expansion.  expanding "cooperate"
results in a 460-byte string!

so i went back to an old idea.  i hope you won't accuse me of topperism,
but you finally motivated me to work on something i threatened
to do at iwp9 2e: add folding to grep.

it was right up my alley since i just recently redid the rune tables
that i've been using.  they're built directly from UnicodeData.txt.
it wasn't too hard to build a table that folds modified letters to
a base with the unicode data.  from there, i reused the same same
technique used for case folding.  since the table i'm using don't
fold case, "grep -Ii" makes sense.

performance is pretty good. worse case is about 2x the user time.
there's no overhead when the I flag isn't given.

the source is in /n/sources/contrib/quanstro/src/grepfold.
please let me know of any bugs.  i'm sure there are a few wierd
cases.  let me know if there are.

- erik