From mboxrd@z Thu Jan  1 00:00:00 1970
Date: Mon, 30 Nov 2009 09:00:25 +0000
From: Eris Discordia <eris.discordia@gmail.com>
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Message-ID: <954FF94C2C131285456E5657@[192.168.1.2]>
In-Reply-To: <d50d7d460911292352j7cbcbc7erefa21b3b7f29f20a@mail.gmail.com>
References: <cb038fc012830f0a8f6cad5b76beb980@ladd.quanstro.net>
	<d50d7d460911292352j7cbcbc7erefa21b3b7f29f20a@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
Subject: Re: [9fans] =?utf-8?q?gr=C3=ABp_=28rhymes_with_creep=29_and_cptmp?=
Topicbox-Message-UUID: a529a8d2-ead5-11e9-9d60-3106f5b1d025

> $ time gr=C3=ABp Obergruppenfuhrersaal *

Touch=C3=A9 :-)


--On Monday, November 30, 2009 01:52 -0600 Jason Catena=20
<jason.catena@gmail.com> wrote:

>> hey, this is great stuff! =C2=A0i really like the approach.
>
> Thank you.  It evolved from wanting to cut-and-paste character
> classes, to automatically applying them to test them.  I suppose the
> character classes file could be useful in other applications that
> selectively don't want to care about accents.
>
> I added a dash-and-hyphen class, keyed to the hyphen-minus as the
> first character (since it's overused), so I had to change the sed
> command.
>
> sed '/^\[.+-/d;...
>
> I also now "rm $classes" at the end, of course, though I guess it now
> doesn't exit with the exit status of grep.  I should probably save
> $status after the grep command, and exit with it.  Or, save the
> expanded regex in a new shell variable, rm $classes, then grep with
> the new shell variable so the grep is the last command.
>
>> the patterns get really big in a hurry.
>
> Agreed.  Part of grep's job is to be a regex engine, so I thought in
> general it would be okay to push it here.
>
>> i played with this a little bit, but quickly ran into problems.
>
>> "reasonable" re size limits of say 300 characters
>> just don't work if you're doing expansion. =C2=A0expanding "cooperate"
>> results in a 460-byte string!
>
> Where does this 300-character limit come from?  If you code them by
> hand I agree that a 300 character regex could be hard to fully
> understand.  The regexes this script generates are very simple in
> structure and (ahem) regular, so I'd be inclined to allow them past a
> size restriction based on style.  As far as time and space required to
> wade through the character sets, I haven't yet run into performance
> problems or actual failures in my tests.
>
> $ which grep
> /usr/local/plan9/bin/grep
>
> $ wc *|tail -1
>   17655  118910  774237 total
>
> $ time gr=C3=ABp Obergruppenfuhrersaal *
> wewelsburg:155: (1938=E2=80=931943): The "Obergruppenf=C3=BChrersaal" (SS =
Generals'
> Hall) and wewelsburg:161: floor of the "Obergruppenf=C3=BChrersaal" lie =
on
> this axis.  Both redesigned
> wewelsburg:180: The "Obergruppenf=C3=BChrersaal" (SS Generals' Hall).  On =
the
> ground wewelsburg:181: floor the "Obergruppenf=C3=BChrersaal" (literally
> translated: wewelsburg:236: castle, in the so-called
> Obergruppenf=C3=BChrersaal
> ("Obergruppenf=C3=BChrer
> 0.00u 0.03s 0.03r 	 gr=C3=ABp Obergruppenfuhrersaal 0=E2=80=9331acme =
0=E2=80=9331i850
> 1920s ...
>
> 0.03 was the biggest result I got in practice.  The first run had 0.02
> user time.  This seems negligible to me, so I'm not yet pushing its
> performance boundaries with this string (lots of vowels and other
> characters with bigger classes) on this data set (a collection of
> notes largely cut-and-pasted from the web).
>
>> - erik
>
> Jason Catena
>