From mboxrd@z Thu Jan  1 00:00:00 1970
Message-ID: <6e08851011f045080e1945623fb62232@plan9.ucalgary.ca>
To: 9fans@cse.psu.edu
From: mirtchov@cpsc.ucalgary.ca
MIME-Version: 1.0
Content-Type: multipart/mixed;
	boundary="upas-tuzeuckkkjtmiipkbjcrqmmxxi"
Subject: [9fans] web-based bulgarian-english and english-bulgarian dictionary
Date: Thu,  6 Nov 2003 16:29:21 -0700
Topicbox-Message-UUID: 80a9e2d2-eacc-11e9-9e20-41e7f4b1d025

This is a multi-part message in MIME format.
--upas-tuzeuckkkjtmiipkbjcrqmmxxi
Content-Disposition: inline
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit

Inspired by the little thesaurus script today's "pull" brought home,
this little script queries a relatively complete en/bg and bg/en
dictionary on the http://sa.dir.bg web site.

I've sent them an email to tell them about the wonders of utf-8, but I
doubt they'll switch from the venerable Windows-1251 cyrillic
encoding, so there's a key in the script for transliterating all
cyrillic queries, or if you wish you can type in cyrillic and it'll do
the transliteration for you.

also available at:

	http://pages.cpsc.ucalgary.ca/~mirtchov/p9/sadict/

I don't think this should make it in the default Plan 9 distribution,
unless Plan 9 becomes the OS of choice in Bulgaria in the next 30
minutes.

andrey

PS: I used it just now to check the meaning of 'venerable', it works :)

--upas-tuzeuckkkjtmiipkbjcrqmmxxi
Content-Disposition: attachment; filename=sadict
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

#!/bin/rc

# english-bulgarian translation:
#	sadict test

# bulgarian-english translation:
#	sadict TEST

# the web site switches between bulgarian and english translations
# based on whether it receives uppercase or lowercase letters
# to translate from bulgarian into english use the following transliterat=
ion
# key:

# key for translating from bulgarian to english (all capital letters
# are in english), as they would be written if you were typing on=20
# a phonetic english-american keyboard.
#=20
#	=D0=B0	=3D	A
#	=D0=B1	=3D	B
#	=D0=B2	=3D	W
#	=D0=B3	=3D	G
#	=D0=B4	=3D	D
#	=D0=B5	=3D	E
#	=D0=B6	=3D	V
#	=D0=B7	=3D	Z
#	=D0=B8	=3D	I
#	=D0=B9	=3D	J
#	=D0=BA	=3D	K
#	=D0=BB	=3D	L
#	=D0=BC	=3D	M
#	=D0=BD	=3D	N
#	=D0=BE	=3D	O
#	=D0=BF	=3D	P
#	=D1=80	=3D	R
#	=D1=81	=3D	S
#	=D1=82	=3D	T
#	=D1=83	=3D	U
#	=D1=84	=3D	F
#	=D1=85	=3D	H
#	=D1=86	=3D	C
#	=D1=87	=3D	` (or %60)
#	=D1=88	=3D	[
#	=D1=89	=3D	]
#	=D1=8A	=3D	Y
#	=D1=8C	=3D	X
#	=D1=8E	=3D	|
#	=D1=8F	=3D	Q

# sa.dir.bg outputs everything in window-1251 encoding, so we=20
# use tr -s to substitute for proper unicode cyrillic

# sa.dir.bg gives 20 (sometimes less) similar words in a separate
# frame. I've left it there because I often find it useful and it just
# scroll off the top of the screen when it isn't.

# don't try capital letters, won't work!

query=3D`{echo $1 |=20
	tr -s '=D0=B0=D0=B1=D0=B2=D0=B3=D0=B4=D0=B5=D0=B6=D0=B7=D0=B8=D0=B9=D0=BA=
=D0=BB=D0=BC=D0=BD=D0=BE=D0=BF=D1=80=D1=81=D1=82=D1=83=D1=84=D1=85=D1=86=D1=
=88=D1=89=D1=8A=D1=8C=D1=8E=D1=8F' 'ABWGDEVZIJKLMNOPRSTUFHC\[\]YX\|Q' |
	sed 's/=D1=87/%60/g'
}



hget 'http://sa.dir.bg/cgi-bin/sabig.cgi?word=3D'^$query |
	htmlfmt -l 1000 |
	tr -s '=C3=A0-=C3=BF=C3=80-=C3=9F' '=D0=B0-=D1=8F=D0=90-=D0=AF'


--upas-tuzeuckkkjtmiipkbjcrqmmxxi--