9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
[parent not found: <<df49a7370911300648l5e243b12ncdf6de116d81afa9@mail.gmail.com>]
[parent not found: <<d50d7d460911292352j7cbcbc7erefa21b3b7f29f20a@mail.gmail.com>]
[parent not found: <<d50d7d460911291101k7420eb0fna61f87646606e991@mail.gmail.com>]
* [9fans] grëp (rhymes with creep) and cptmp
@ 2009-11-29 19:01 Jason Catena
  2009-11-30  4:51 ` Bruce Ellis
  2009-11-30 11:26 ` roger peppe
  0 siblings, 2 replies; 19+ messages in thread
From: Jason Catena @ 2009-11-29 19:01 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 2222 bytes --]

I wrote a wrapper around grep to search for words regardless of
accents.  I didn't want to worry about whether I used accents on
characters (I sometimes use them inconsistently, and others decidedly
do), but I still wanted to limit the results to exact matches if I
supplied an accent.  Here's an example run.


$ grep facade word
treatment <a museum's east facade>.  A false, superficial, or artificial

$ grëp facade word
89: to bow to man. façade. circa 1681.  French façade, from Italian
92: treatment <a museum's east facade>.  A false, superficial, or artificial

$ grëp façade *
style:21: crucial difference to pronunciation: cliché, soupçon, façade, café,
wabisabi:51: or the crumbling stone façade of an old building.   Transience,
word:89: to bow to man. façade. circa 1681.  French façade, from Italian


Note that line word:92 (output by the second command) is not output by
the third command, since I supplied an accent on that particular
character (ç) in my input pattern.  I chose the umlaut or diæresis to
remind me that grëp provides the -n option by default, so I'll get a
line number and : in the output.  (I should probably just pass through
all of grep's command-line options.)


<grëp>=
#!/usr/local/plan9/bin/rc

regex=$1
shift

classes=`{cptmp classes}
sed '/-/d;s,^\[(.),s/\1/\[\1,;s,$,/g,' charclass > $classes

grep -n `{echo $regex | sed -f $classes} $*


I translate each ordinary latin character in the input pattern (eg
[0-9A-Za-z]) into a character class (the attached charclass file,
which doesn't cut-and-paste well), and then call grep with the updated
pattern.  The first sed command in grëp turns the character classes in
charclass into s commands for sed.  The charclass file contains the
square brackets because I also use it to cut-and-paste from when I
need a character class for a sed script.

The script cptmp creates a temporary copy of an existing file, or a
temporary new file.


<cptmp>=
#!/usr/local/plan9/bin/rc
flag e +

if(~ $#TMPDIR 0)
	TMPDIR=/tmp
base=`{basename $1}
tmp=$TMPDIR/$base.$USER.$pid

if (test -f $1) {
	cp -pr $1 $tmp
}
if not {
	touch $tmp
}
chmod +wx $tmp
echo $tmp


Jason Catena

[-- Attachment #2: charclass --]
[-- Type: application/octet-stream, Size: 1126 bytes --]

[ 	]
[0-9]
[0⁰₀]
[1¹₁]
[2²₂]
[3³₃]
[4⁴₄]
[5⁵₅]
[6⁶₆]
[7⁷₇]
[8⁸₈]
[9⁹₉]
[A-Z]
[AÁÀĂÂǍÅǺÄǞÃȦǠĄĀȀȂª]
[BƁʙɞʚ]
[CĆĈČĊÇƇ]
[DĎĐÐƉƊ]
[EÉÈĔÊĚËĖȨĘĒȄȆɝƎƐɛɜ]
[FƑℲ]
[GǴĞĜǦĠĢǤƓɢʛ]
[HĤȞHĦʜǶ]
[IÍÌĬÎǏÏĨİĮĪȈȊIƗɪ]
[JĴJ]
[KǨĶƘKĸ]
[LĹĽĻŁŁĿʟ]
[M]
[NŃǸŇÑŅƝNɴŊ]
[OÓÒŎÔǑÖȪŐÕȬȮȰØǾǪǬŌȌȎƠƟ]
[PƤP]
[Q]
[RŔŘŖȐȒƦʀʁ]
[SŚŜŠŞȘ]
[TŤTŢȚŦƬƮ]
[UÚÙŬÛǓŮÜǗǛǙǕŰŨŲŪȔȖƯ]
[VƲ]
[WŴW]
[X]
[YÝŶYŸȲʏƳ]
[ZŹŽŻƵȤʐǮ]
[a-z]
[aáàăâǎåǻäǟãȧǡąāȁȃɐɑɒ]
[bƀɓƂƃ]
[cćĉčċçƈɕ]
[dďđðɖɗƋƌȡ]
[eéèĕêěëėȩęēȅȇɚǝƏəɘ]
[fƒʩ]
[gǵğĝǧġģǥɠɡ]
[hĥȟħƕɦɧ]
[iíìĭîǐïĩiįīȉȋıɨƖɩ]
[jĵǰʝɟʄ]
[kǩķƙʞ]
[lĺľļłłŀƚɫɬɭȴ]
[mɱ]
[nńǹňñņɲȠƞɳȵnŋ]
[oóòŏôǒöȫőõȭȯȱøǿǫǭōȍȏơɵ]
[pƥp]
[qʠ]
[rŕřŗȑȓɼɽɾɹɺɻɿ]
[sśŝšşșʂ]
[tťţțƫƭʈȶ]
[uúùŭûǔůüǘǜǚǖűũųūȕȗưʉ]
[vʋ]
[wŵ]
[x]
[yýŷÿȳƴ]
[zźžżƶȥʑǯƺ]
[ÆǼǢ]
[æǽǣ]
[Œɶ]
[œ]
[ɮ]

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2009-11-30 19:43 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <<df49a7370911300326m3e3a6be1yc77e49a2b23a6da2@mail.gmail.com>
2009-11-30 14:06 ` [9fans] grëp (rhymes with creep) and cptmp erik quanstrom
     [not found] <<df49a7370911300648l5e243b12ncdf6de116d81afa9@mail.gmail.com>
2009-11-30 15:28 ` erik quanstrom
2009-11-30 16:38   ` roger peppe
2009-11-30 17:34     ` erik quanstrom
     [not found] <<d50d7d460911292352j7cbcbc7erefa21b3b7f29f20a@mail.gmail.com>
2009-11-30 13:50 ` erik quanstrom
2009-11-30 14:48   ` roger peppe
2009-11-30 14:54     ` David Leimbach
2009-11-30 15:10   ` Jason Catena
2009-11-30 15:32     ` erik quanstrom
2009-11-30 15:54       ` Jorden Mauro
2009-11-30 16:00         ` erik quanstrom
2009-11-30 18:38           ` hiro
2009-11-30 19:43           ` Jorden Mauro
     [not found] <<d50d7d460911291101k7420eb0fna61f87646606e991@mail.gmail.com>
2009-11-30  4:29 ` erik quanstrom
2009-11-30  7:52   ` Jason Catena
2009-11-30  9:00     ` Eris Discordia
2009-11-29 19:01 Jason Catena
2009-11-30  4:51 ` Bruce Ellis
2009-11-30 11:26 ` roger peppe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).