9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] Brian Kernighan?
@ 1999-11-11 10:56 Elliott
  0 siblings, 0 replies; 5+ messages in thread
From: Elliott @ 1999-11-11 10:56 UTC (permalink / raw)


forsyth wrote:
> the Cornell PL/1 compiler used a similar approach to
> do spelling correction on keywords as part of a broader
> attempt to repair all obvious errors in the given program; the results
> were amusing if not enlightening, as they so often are with AI.

IBM's jikes Java compiler also tries spelling correction, but its
ideas of proximity have nothing to do with any human's. it's
particularly unfortunate that it doesn't even know about the
Java naming conventions. (not that i necessarily think it should,
it's just that if it's going to try to guess what you meant to type,
it would be better off making educated guesses.)

there's a big difference between correcting simple typos and
more complicated "wrong identifier" errors.

anyway, back to the point: the original questioner might be
interested in "Finding Approximate Matches in Large Lexicons"
by Justin Zobel (jz@cs.rmit.oz.au) and Philip Dart (philip@cs.mu.oz.au),
which was the best paper i found when trying to come up with
decent guesses in a "dict"-like program.

btw, has anyone had better luck than i at getting information
about the CD-ROM version of the OED, with a view to having
a Plan 9/Unix OED "dict"? ever since i read the acme paper
with its "futtock" example, i've been jealous.

--
"As the Chinese say, 1001 words is worth more than a picture."
	-- John McCarthy





^ permalink raw reply	[flat|nested] 5+ messages in thread

* [9fans] Brian Kernighan?
@ 1999-11-11 11:16 forsyth
  0 siblings, 0 replies; 5+ messages in thread
From: forsyth @ 1999-11-11 11:16 UTC (permalink / raw)


>>looking to match full mail names against potential spelling
>>errors.  The dictionary is consequently not very large, and the

although the spdist technique might be good enough in practice,
there are other techniques for matching names; i'd try
searching the online bibliographic databases for references.
some of them aren't much more effort to implement.




^ permalink raw reply	[flat|nested] 5+ messages in thread

* [9fans] Brian Kernighan?
@ 1999-11-11 11:10 Lucio
  0 siblings, 0 replies; 5+ messages in thread
From: Lucio @ 1999-11-11 11:10 UTC (permalink / raw)


On Thu, Nov 11, 1999 at 11:56:05AM +0100, Elliott Hughes wrote:
>
> anyway, back to the point: the original questioner might be
> interested in "Finding Approximate Matches in Large Lexicons"
> by Justin Zobel (jz@cs.rmit.oz.au) and Philip Dart (philip@cs.mu.oz.au),
> which was the best paper i found when trying to come up with
> decent guesses in a "dict"-like program.
>
Thank you for the additional information.  As it happens, I'm
looking to match full mail names against potential spelling
errors.  The dictionary is consequently not very large, and the
problem should be tractable.  It's just that regular expressions
don't quite cut it :-(

++L




^ permalink raw reply	[flat|nested] 5+ messages in thread

* [9fans] Brian Kernighan?
@ 1999-11-11 10:16 forsyth
  0 siblings, 0 replies; 5+ messages in thread
From: forsyth @ 1999-11-11 10:16 UTC (permalink / raw)


see  pp 208-13 of The UNIX Programming Environment by B W Kernighan and Rob Pike, and the end of the chapter for credits and references.

the Cornell PL/1 compiler used a similar approach to
do spelling correction on keywords as part of a broader
attempt to repair all obvious errors in the given program; the results
were amusing if not enlightening, as they so often are with AI.




^ permalink raw reply	[flat|nested] 5+ messages in thread

* [9fans] Brian Kernighan?
@ 1999-11-11  9:43 Lucio
  0 siblings, 0 replies; 5+ messages in thread
From: Lucio @ 1999-11-11  9:43 UTC (permalink / raw)


The Bell-Labs web site seems unable to serve BWK's details, and
I seem to recall that he was responsible for an algorithm to
identify single-character spelling errors and character
transposition.

Can somebody point me in the right direction on this score?
Sorry to abuse this mailing list, but it's the closet I have
to the subject I'm interested in :-(

The URL I can't seem to get access to, by the way, is

	<http://cm.bell-labs.com/cm/cs/who/bwk/>

(I hope I got that right :-)  I seem to get a "connection refused".

++L




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~1999-11-11 11:16 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-11-11 10:56 [9fans] Brian Kernighan? Elliott
  -- strict thread matches above, loose matches on Subject: below --
1999-11-11 11:16 forsyth
1999-11-11 11:10 Lucio
1999-11-11 10:16 forsyth
1999-11-11  9:43 Lucio

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).