The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
* [TUHS] Spell - was tmac: Move macro diagnostics away from `quotes'
@ 2017-11-23  1:05 Doug McIlroy
  2017-11-23  3:11 ` Lyndon Nerenberg
  2017-11-25  3:25 ` Bakul Shah
  0 siblings, 2 replies; 14+ messages in thread
From: Doug McIlroy @ 2017-11-23  1:05 UTC (permalink / raw)


Repeat, slightly modified, of a previous post that got
shunted to the attachment heap.

>   I am curious if anyone on the list remembers much
> about the development of the first spell checkers in Unix?

Yes, intimately. They had no relationship to the PDP 10.

The first one was a fantastic tour de force by Bob Morris,
called "typo". Aside from the file "eign" of the very most common
English words, it had no vocabulary. Instead it evaluated the
likelihood that any particular word came from a source with the
same letter-trigram frequencies as the document as a whole. The
words were then printed in increasing order of likelihood. Typos
tended to come early in the list.

Typo, introduced in v3, was very popular until Steve Johnson wrote
"spell", a remarkably short shell script that (efficiently) looks
up a document's words in the wordlist of Webster's Collegiate
Dictionary, which we had on line. The only "real" coding he did
was to write a simple affix-stripping program to make it possible
to look up plurals, past tenses, etc. If memory serves, Steve's
program is described in Kernighan and Pike. It appeared in v5.

Steve's program was good, but the dictionary isn't an ideal source
for real text, which abounds in proper names and terms of art.
It also has a lot of rare words that don't pull their weight in
a spell checker, and some attractive nuisances, especially obscure
short words from Scots, botany, etc, which are more likely to
arise in everyday text as typos than by intent. Given the basic
success of Steve's program, I undertook to make a more useful
spelling list, along with more vigorous affix stripping (and a
stop list to avert associated traps, e.g. "presenation" =
pre+senate+ion"). That has been described in Bentley's "Programming
Pearls" and in http://www.cs.dartmouth.edu/~doug/spell.pdf.

Morris's program and mine labored under space constraints, so
have some pretty ingenious coding tricks. In fact Morris has
a patent on the way he counted frequencies of the 26^3 trigrams
in 26^3 bytes, even though the counts could exceed 255. I did
some heroic (and probabilistic) encoding to squeeze a 30,000
word dictionary into a 64K data space, without severely 
affecting lookup time.

Doug


^ permalink raw reply	[flat|nested] 14+ messages in thread
* [TUHS] Spell - was tmac: Move macro diagnostics away from `quotes'
@ 2017-11-24 15:13 Noel Chiappa
  2017-11-24 17:24 ` Will Senn
  0 siblings, 1 reply; 14+ messages in thread
From: Noel Chiappa @ 2017-11-24 15:13 UTC (permalink / raw)


    > From: "Nelson H. F. Beebe"

    > The PDF URLs for bstj.bell-labs.com no longer work, and the ones for
    > www.alcatel-lucent.com ... now redirect to an HTML page.

With any luck, someone scraped them before they went.

I've gotten in the habit of scraping all the Web content I look at, since it
has (as above) a distressing tendency to vapourize.

	Noel


^ permalink raw reply	[flat|nested] 14+ messages in thread
* [TUHS] Spell - was tmac: Move macro diagnostics away from `quotes'
@ 2017-11-24 15:05 Nelson H. F. Beebe
  2017-11-24 17:01 ` Ralph Corderoy
  0 siblings, 1 reply; 14+ messages in thread
From: Nelson H. F. Beebe @ 2017-11-24 15:05 UTC (permalink / raw)


BibTeX entries for the complete contents of the Bell System Technical
Journal family are in the TeX User Group archives at

	http://www.math.utah.edu/pub/tex/bib/bstj1970.bib

[change 1970 to other decades, and .bib to .html for live hyperlinks].

The PDF URLs for bstj.bell-labs.com no longer work, and the ones for
www.alcatel-lucent.com, such as 

	http://www.alcatel-lucent.com/bstj/vol57-1978/articles/bstj57-6-2155.pdf

now redirect to an HTML page.

Otherwise, articles are available from the Wiley site at

	http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1538-7305/issues/

but are behind a paywall.  There are also copies in the IEEE eXplore
database at

	http://ieeexplore.ieee.org/xpl/RecentIssue.jsp?reload=true&punumber=6731002

I tried to find the URLs at https://web.archive.org/, but it does
appear to have them.

-------------------------------------------------------------------------------
- Nelson H. F. Beebe                    Tel: +1 801 581 5254                  -
- University of Utah                    FAX: +1 801 581 4148                  -
- Department of Mathematics, 110 LCB    Internet e-mail: beebe at math.utah.edu  -
- 155 S 1400 E RM 233                       beebe at acm.org  beebe at computer.org -
- Salt Lake City, UT 84112-0090, USA    URL: http://www.math.utah.edu/~beebe/ -
-------------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 14+ messages in thread
* [TUHS] Spell - was tmac: Move macro diagnostics away from `quotes\'
@ 2017-11-22  2:34 Doug McIlroy
  0 siblings, 0 replies; 14+ messages in thread
From: Doug McIlroy @ 2017-11-22  2:34 UTC (permalink / raw)


A non-text attachment was scrubbed...
Name: not available
Type: application/octet-stream
Size: 2099 bytes
Desc: not available
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20171121/0f7829c0/attachment.obj>


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-11-25  3:25 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-23  1:05 [TUHS] Spell - was tmac: Move macro diagnostics away from `quotes' Doug McIlroy
2017-11-23  3:11 ` Lyndon Nerenberg
2017-11-24  9:44   ` Tim Bradshaw
2017-11-25  3:25 ` Bakul Shah
  -- strict thread matches above, loose matches on Subject: below --
2017-11-24 15:13 Noel Chiappa
2017-11-24 17:24 ` Will Senn
2017-11-24 15:05 Nelson H. F. Beebe
2017-11-24 17:01 ` Ralph Corderoy
2017-11-24 18:06   ` Nelson H. F. Beebe
2017-11-24 18:17     ` Henry Bent
2017-11-24 20:18       ` Ron Natalie
2017-11-24 22:46   ` Dave Horsfall
2017-11-24 22:57   ` Arthur Krewat
2017-11-22  2:34 [TUHS] Spell - was tmac: Move macro diagnostics away from `quotes\' Doug McIlroy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).