9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: Bruce Ellis <bruce.ellis@gmail.com>
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Subject: Re: [9fans] grëp (rhymes with creep) and cptmp
Date: Mon, 30 Nov 2009 15:51:51 +1100	[thread overview]
Message-ID: <775b8d190911292051g57001bf7p3deb7439858b9e4b@mail.gmail.com> (raw)
In-Reply-To: <d50d7d460911291101k7420eb0fna61f87646606e991@mail.gmail.com>

i like the approach. back in basser computational linguistics days
frank was indexing a greek verb dictionary. to sort the keys - he used
tr | sort | tr.

i'm glad you didn't screw with grep. it's brilliant but the
implementation is not easily understood. i was in the room at the
time, so i have a headstart.

brucee

On 11/30/09, Jason Catena <jason.catena@gmail.com> wrote:
> I wrote a wrapper around grep to search for words regardless of
> accents.  I didn't want to worry about whether I used accents on
> characters (I sometimes use them inconsistently, and others decidedly
> do), but I still wanted to limit the results to exact matches if I
> supplied an accent.  Here's an example run.
>
>
> $ grep facade word
> treatment <a museum's east facade>.  A false, superficial, or artificial
>
> $ grëp facade word
> 89: to bow to man. façade. circa 1681.  French façade, from Italian
> 92: treatment <a museum's east facade>.  A false, superficial, or artificial
>
> $ grëp façade *
> style:21: crucial difference to pronunciation: cliché, soupçon, façade, café,
> wabisabi:51: or the crumbling stone façade of an old building.   Transience,
> word:89: to bow to man. façade. circa 1681.  French façade, from Italian
>
>
> Note that line word:92 (output by the second command) is not output by
> the third command, since I supplied an accent on that particular
> character (ç) in my input pattern.  I chose the umlaut or diæresis to
> remind me that grëp provides the -n option by default, so I'll get a
> line number and : in the output.  (I should probably just pass through
> all of grep's command-line options.)
>
>
> <grëp>=
> #!/usr/local/plan9/bin/rc
>
> regex=$1
> shift
>
> classes=`{cptmp classes}
> sed '/-/d;s,^\[(.),s/\1/\[\1,;s,$,/g,' charclass > $classes
>
> grep -n `{echo $regex | sed -f $classes} $*
>
>
> I translate each ordinary latin character in the input pattern (eg
> [0-9A-Za-z]) into a character class (the attached charclass file,
> which doesn't cut-and-paste well), and then call grep with the updated
> pattern.  The first sed command in grëp turns the character classes in
> charclass into s commands for sed.  The charclass file contains the
> square brackets because I also use it to cut-and-paste from when I
> need a character class for a sed script.
>
> The script cptmp creates a temporary copy of an existing file, or a
> temporary new file.
>
>
> <cptmp>=
> #!/usr/local/plan9/bin/rc
> flag e +
>
> if(~ $#TMPDIR 0)
>        TMPDIR=/tmp
> base=`{basename $1}
> tmp=$TMPDIR/$base.$USER.$pid
>
> if (test -f $1) {
>        cp -pr $1 $tmp
> }
> if not {
>        touch $tmp
> }
> chmod +wx $tmp
> echo $tmp
>
>
> Jason Catena
>
>



  reply	other threads:[~2009-11-30  4:51 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-29 19:01 Jason Catena
2009-11-30  4:51 ` Bruce Ellis [this message]
2009-11-30 11:26 ` roger peppe
     [not found] <<d50d7d460911291101k7420eb0fna61f87646606e991@mail.gmail.com>
2009-11-30  4:29 ` erik quanstrom
2009-11-30  7:52   ` Jason Catena
2009-11-30  9:00     ` Eris Discordia
     [not found] <<d50d7d460911292352j7cbcbc7erefa21b3b7f29f20a@mail.gmail.com>
2009-11-30 13:50 ` erik quanstrom
2009-11-30 14:48   ` roger peppe
2009-11-30 14:54     ` David Leimbach
2009-11-30 15:10   ` Jason Catena
2009-11-30 15:32     ` erik quanstrom
2009-11-30 15:54       ` Jorden Mauro
2009-11-30 16:00         ` erik quanstrom
2009-11-30 18:38           ` hiro
2009-11-30 19:43           ` Jorden Mauro
     [not found] <<df49a7370911300326m3e3a6be1yc77e49a2b23a6da2@mail.gmail.com>
2009-11-30 14:06 ` erik quanstrom
     [not found] <<df49a7370911300648l5e243b12ncdf6de116d81afa9@mail.gmail.com>
2009-11-30 15:28 ` erik quanstrom
2009-11-30 16:38   ` roger peppe
2009-11-30 17:34     ` erik quanstrom

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=775b8d190911292051g57001bf7p3deb7439858b9e4b@mail.gmail.com \
    --to=bruce.ellis@gmail.com \
    --cc=9fans@9fans.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).