From: erik quanstrom <quanstro@quanstro.net>
To: 9fans@9fans.net
Subject: Re: [9fans] grëp (rhymes with creep) and cptmp
Date: Sun, 29 Nov 2009 23:29:30 -0500 [thread overview]
Message-ID: <cb038fc012830f0a8f6cad5b76beb980@ladd.quanstro.net> (raw)
In-Reply-To: <<d50d7d460911291101k7420eb0fna61f87646606e991@mail.gmail.com>>
On Sun Nov 29 14:03:23 EST 2009, jason.catena@gmail.com wrote:
> I wrote a wrapper around grep to search for words regardless of
> accents. I didn't want to worry about whether I used accents on
> characters (I sometimes use them inconsistently, and others decidedly
> do), but I still wanted to limit the results to exact matches if I
> supplied an accent. Here's an example run.
hey, this is great stuff! i really like the approach. i played with
this a little bit, but quickly ran into problems. the patterns get
really big in a hurry. "reasonable" re size limits of say 300 characters
just don't work if you're doing expansion. expanding "cooperate"
results in a 460-byte string!
so i went back to an old idea. i hope you won't accuse me of topperism,
but you finally motivated me to work on something i threatened
to do at iwp9 2e: add folding to grep.
it was right up my alley since i just recently redid the rune tables
that i've been using. they're built directly from UnicodeData.txt.
it wasn't too hard to build a table that folds modified letters to
a base with the unicode data. from there, i reused the same same
technique used for case folding. since the table i'm using don't
fold case, "grep -Ii" makes sense.
performance is pretty good. worse case is about 2x the user time.
there's no overhead when the I flag isn't given.
the source is in /n/sources/contrib/quanstro/src/grepfold.
please let me know of any bugs. i'm sure there are a few wierd
cases. let me know if there are.
- erik
next parent reply other threads:[~2009-11-30 4:29 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <<d50d7d460911291101k7420eb0fna61f87646606e991@mail.gmail.com>
2009-11-30 4:29 ` erik quanstrom [this message]
2009-11-30 7:52 ` Jason Catena
2009-11-30 9:00 ` Eris Discordia
[not found] <<df49a7370911300648l5e243b12ncdf6de116d81afa9@mail.gmail.com>
2009-11-30 15:28 ` erik quanstrom
2009-11-30 16:38 ` roger peppe
2009-11-30 17:34 ` erik quanstrom
[not found] <<df49a7370911300326m3e3a6be1yc77e49a2b23a6da2@mail.gmail.com>
2009-11-30 14:06 ` erik quanstrom
[not found] <<d50d7d460911292352j7cbcbc7erefa21b3b7f29f20a@mail.gmail.com>
2009-11-30 13:50 ` erik quanstrom
2009-11-30 14:48 ` roger peppe
2009-11-30 14:54 ` David Leimbach
2009-11-30 15:10 ` Jason Catena
2009-11-30 15:32 ` erik quanstrom
2009-11-30 15:54 ` Jorden Mauro
2009-11-30 16:00 ` erik quanstrom
2009-11-30 18:38 ` hiro
2009-11-30 19:43 ` Jorden Mauro
2009-11-29 19:01 Jason Catena
2009-11-30 4:51 ` Bruce Ellis
2009-11-30 11:26 ` roger peppe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cb038fc012830f0a8f6cad5b76beb980@ladd.quanstro.net \
--to=quanstro@quanstro.net \
--cc=9fans@9fans.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).