9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: Eris Discordia <eris.discordia@gmail.com>
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Subject: Re: [9fans] grëp (rhymes with creep) and cptmp
Date: Mon, 30 Nov 2009 09:00:25 +0000	[thread overview]
Message-ID: <954FF94C2C131285456E5657@[192.168.1.2]> (raw)
In-Reply-To: <d50d7d460911292352j7cbcbc7erefa21b3b7f29f20a@mail.gmail.com>

> $ time grëp Obergruppenfuhrersaal *

Touché :-)


--On Monday, November 30, 2009 01:52 -0600 Jason Catena 
<jason.catena@gmail.com> wrote:

>> hey, this is great stuff!  i really like the approach.
>
> Thank you.  It evolved from wanting to cut-and-paste character
> classes, to automatically applying them to test them.  I suppose the
> character classes file could be useful in other applications that
> selectively don't want to care about accents.
>
> I added a dash-and-hyphen class, keyed to the hyphen-minus as the
> first character (since it's overused), so I had to change the sed
> command.
>
> sed '/^\[.+-/d;...
>
> I also now "rm $classes" at the end, of course, though I guess it now
> doesn't exit with the exit status of grep.  I should probably save
> $status after the grep command, and exit with it.  Or, save the
> expanded regex in a new shell variable, rm $classes, then grep with
> the new shell variable so the grep is the last command.
>
>> the patterns get really big in a hurry.
>
> Agreed.  Part of grep's job is to be a regex engine, so I thought in
> general it would be okay to push it here.
>
>> i played with this a little bit, but quickly ran into problems.
>
>> "reasonable" re size limits of say 300 characters
>> just don't work if you're doing expansion.  expanding "cooperate"
>> results in a 460-byte string!
>
> Where does this 300-character limit come from?  If you code them by
> hand I agree that a 300 character regex could be hard to fully
> understand.  The regexes this script generates are very simple in
> structure and (ahem) regular, so I'd be inclined to allow them past a
> size restriction based on style.  As far as time and space required to
> wade through the character sets, I haven't yet run into performance
> problems or actual failures in my tests.
>
> $ which grep
> /usr/local/plan9/bin/grep
>
> $ wc *|tail -1
>   17655  118910  774237 total
>
> $ time grëp Obergruppenfuhrersaal *
> wewelsburg:155: (1938–1943): The "Obergruppenführersaal" (SS Generals'
> Hall) and wewelsburg:161: floor of the "Obergruppenführersaal" lie on
> this axis.  Both redesigned
> wewelsburg:180: The "Obergruppenführersaal" (SS Generals' Hall).  On the
> ground wewelsburg:181: floor the "Obergruppenführersaal" (literally
> translated: wewelsburg:236: castle, in the so-called
> Obergruppenführersaal
> ("Obergruppenführer
> 0.00u 0.03s 0.03r 	 grëp Obergruppenfuhrersaal 0–31acme 0–31i850
> 1920s ...
>
> 0.03 was the biggest result I got in practice.  The first run had 0.02
> user time.  This seems negligible to me, so I'm not yet pushing its
> performance boundaries with this string (lots of vowels and other
> characters with bigger classes) on this data set (a collection of
> notes largely cut-and-pasted from the web).
>
>> - erik
>
> Jason Catena
>







  reply	other threads:[~2009-11-30  9:00 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <<d50d7d460911291101k7420eb0fna61f87646606e991@mail.gmail.com>
2009-11-30  4:29 ` erik quanstrom
2009-11-30  7:52   ` Jason Catena
2009-11-30  9:00     ` Eris Discordia [this message]
     [not found] <<df49a7370911300648l5e243b12ncdf6de116d81afa9@mail.gmail.com>
2009-11-30 15:28 ` erik quanstrom
2009-11-30 16:38   ` roger peppe
2009-11-30 17:34     ` erik quanstrom
     [not found] <<df49a7370911300326m3e3a6be1yc77e49a2b23a6da2@mail.gmail.com>
2009-11-30 14:06 ` erik quanstrom
     [not found] <<d50d7d460911292352j7cbcbc7erefa21b3b7f29f20a@mail.gmail.com>
2009-11-30 13:50 ` erik quanstrom
2009-11-30 14:48   ` roger peppe
2009-11-30 14:54     ` David Leimbach
2009-11-30 15:10   ` Jason Catena
2009-11-30 15:32     ` erik quanstrom
2009-11-30 15:54       ` Jorden Mauro
2009-11-30 16:00         ` erik quanstrom
2009-11-30 18:38           ` hiro
2009-11-30 19:43           ` Jorden Mauro
2009-11-29 19:01 Jason Catena
2009-11-30  4:51 ` Bruce Ellis
2009-11-30 11:26 ` roger peppe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='954FF94C2C131285456E5657@[192.168.1.2]' \
    --to=eris.discordia@gmail.com \
    --cc=9fans@9fans.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).