From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: To: 9fans@9fans.net From: erik quanstrom Date: Fri, 24 Jul 2009 13:16:35 -0400 In-Reply-To: <4A69DDF6.9020108@proweb.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Subject: [9fans] (no subject) Topicbox-Message-UUID: 2c8251c2-ead5-11e9-9d60-3106f5b1d025 > #!/bin/rc > > hget http://dictionary.com/browse/$1 | htmlfmt | awk ' /dictionary > results/, /Cite This Source/ {print } ' > EOF > > chmod 755 $home/bin/rc/odict > > odict simple unfortunately the default charset is 8859-1, and in the http headers dictionary.com sets the charset to utf-8. this fact is lost on hget, so this works a little better for me: hget http://dictionary.com/browse/illustrate|htmlfmt -c UTF-8 |sed -n ' /dictionary results/,/Cite This Source/p' also, one would really want to urlencode the parameter. this would allow a lookup of "en masse" fortunately this little bit of contorted awk #!/bin/awk -f # urlencode # when in doubt, use brute force function chr(ch, i) { for(i = 0; i < 127; i++) if(utf(i) == ch) return i; return 32; } { o="" for(i=1; i<=length($0); i++){ c = substr($0, i, 1) if(match(c, /[a-zA-Z0-9]/) == 0) c = sprintf("%%%.2x", chr(c)) o = o c; } print o } gives us #!/bin/rc word=`{urlencode $"*} hget http://dictionary.com/browse/illustrate | htmlfmt -c UTF-8 | sed -n ' /dictionary results/,/Cite This Source/p'