From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-Version: 1.0 Date: Sun, 29 Nov 2009 13:01:33 -0600 Message-ID: From: Jason Catena To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Content-Type: multipart/mixed; boundary=0015175cf9c05039d40479872748 Subject: [9fans] =?iso-8859-1?q?gr=EBp_=28rhymes_with_creep=29_and_cptmp?= Topicbox-Message-UUID: a4d77a1c-ead5-11e9-9d60-3106f5b1d025 --0015175cf9c05039d40479872748 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I wrote a wrapper around grep to search for words regardless of accents. I didn't want to worry about whether I used accents on characters (I sometimes use them inconsistently, and others decidedly do), but I still wanted to limit the results to exact matches if I supplied an accent. Here's an example run. $ grep facade word treatment . A false, superficial, or artificial $ gr=EBp facade word 89: to bow to man. fa=E7ade. circa 1681. French fa=E7ade, from Italian 92: treatment . A false, superficial, or artificia= l $ gr=EBp fa=E7ade * style:21: crucial difference to pronunciation: clich=E9, soup=E7on, fa=E7ad= e, caf=E9, wabisabi:51: or the crumbling stone fa=E7ade of an old building. Transien= ce, word:89: to bow to man. fa=E7ade. circa 1681. French fa=E7ade, from Italia= n Note that line word:92 (output by the second command) is not output by the third command, since I supplied an accent on that particular character (=E7) in my input pattern. I chose the umlaut or di=E6resis to remind me that gr=EBp provides the -n option by default, so I'll get a line number and : in the output. (I should probably just pass through all of grep's command-line options.) =3D #!/usr/local/plan9/bin/rc regex=3D$1 shift classes=3D`{cptmp classes} sed '/-/d;s,^\[(.),s/\1/\[\1,;s,$,/g,' charclass > $classes grep -n `{echo $regex | sed -f $classes} $* I translate each ordinary latin character in the input pattern (eg [0-9A-Za-z]) into a character class (the attached charclass file, which doesn't cut-and-paste well), and then call grep with the updated pattern. The first sed command in gr=EBp turns the character classes in charclass into s commands for sed. The charclass file contains the square brackets because I also use it to cut-and-paste from when I need a character class for a sed script. The script cptmp creates a temporary copy of an existing file, or a temporary new file. =3D #!/usr/local/plan9/bin/rc flag e + if(~ $#TMPDIR 0) TMPDIR=3D/tmp base=3D`{basename $1} tmp=3D$TMPDIR/$base.$USER.$pid if (test -f $1) { cp -pr $1 $tmp } if not { touch $tmp } chmod +wx $tmp echo $tmp Jason Catena --0015175cf9c05039d40479872748 Content-Type: application/octet-stream; name=charclass Content-Disposition: attachment; filename=charclass Content-Transfer-Encoding: base64 X-Attachment-Id: f_g2m5dksh0 WyAJXQpbMC05XQpbMOKBsOKCgF0KWzHCueKCgV0KWzLCsuKCgl0KWzPCs+KCg10KWzTigbTigoRd Cls14oG14oKFXQpbNuKBtuKChl0KWzfigbfigoddCls44oG44oKIXQpbOeKBueKCiV0KW0EtWl0K W0HDgcOAxILDgseNw4XHusOEx57Dg8imx6DEhMSAyIDIgsKqXQpbQsaBypnJnsqaXQpbQ8SGxIjE jMSKw4fGh10KW0TEjsSQw5DGicaKXQpbRcOJw4jElMOKxJrDi8SWyKjEmMSSyITIhsmdxo7GkMmb yZxdCltGxpHihLJdCltHx7TEnsScx6bEoMSix6TGk8miyptdCltIxKTInkjEpsqcx7ZdCltJw43D jMSsw47Hj8OPxKjEsMSuxKrIiMiKScaXyapdCltKxLRKXQpbS8eoxLbGmEvEuF0KW0zEucS9xLvF gcWBxL/Kn10KW01dCltOxYPHuMWHw5HFhcadTsm0xYpdCltPw5PDksWOw5THkcOWyKrFkMOVyKzI rsiww5jHvseqx6zFjMiMyI7GoMafXQpbUMakUF0KW1FdCltSxZTFmMWWyJDIksamyoDKgV0KW1PF msWcxaDFnsiYXQpbVMWkVMWiyJrFpsasxq5dCltVw5rDmcWsw5vHk8Wuw5zHl8ebx5nHlcWwxajF ssWqyJTIlsavXQpbVsayXQpbV8W0V10KW1hdCltZw53FtlnFuMiyyo/Gs10KW1rFucW9xbvGtcik ypDHrl0KW2Etel0KW2HDocOgxIPDoseOw6XHu8Okx5/Do8inx6HEhcSByIHIg8mQyZHJkl0KW2LG gMmTxoLGg10KW2PEh8SJxI3Ei8OnxojJlV0KW2TEj8SRw7DJlsmXxovGjMihXQpbZcOpw6jElcOq xJvDq8SXyKnEmcSTyIXIh8max53Gj8mZyZhdCltmxpLKqV0KW2fHtcSfxJ3Hp8ShxKPHpcmgyaFd CltoxKXIn8SnxpXJpsmnXQpbacOtw6zErcOux5DDr8SpacSvxKvIiciLxLHJqMaWyaldCltqxLXH sMqdyZ/KhF0KW2vHqcS3xpnKnl0KW2zEusS+xLzFgsWCxYDGmsmryazJrci0XQpbbcmxXQpbbsWE x7nFiMOxxYbJssigxp7Js8i1bsWLXQpbb8Ozw7LFj8O0x5LDtsirxZHDtcityK/IscO4x7/Hq8et xY3IjciPxqHJtV0KW3DGpXBdCltxyqBdCltyxZXFmcWXyJHIk8m8yb3Jvsm5ybrJu8m/XQpbc8Wb xZ3FocWfyJnKgl0KW3TFpcWjyJvGq8atyojItl0KW3XDusO5xa3Du8eUxa/DvMeYx5zHmseWxbHF qcWzxavIlciXxrDKiV0KW3bKi10KW3fFtV0KW3hdClt5w73Ft8O/yLPGtF0KW3rFusW+xbzGtsil ypHHr8a6XQpbw4bHvMeiXQpbw6bHvcejXQpbxZLJtl0KW8WTXQpbya5dCg== --0015175cf9c05039d40479872748--