The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
From: Steve Mynott <steve.mynott@gmail.com>
To: Douglas McIlroy <douglas.mcilroy@dartmouth.edu>
Cc: The Eunuchs Hysterical Society <tuhs@tuhs.org>
Subject: [TUHS] Re: A second Unix Patent
Date: Fri, 3 Mar 2023 13:29:13 +0000	[thread overview]
Message-ID: <20230303132913.ab5mqzuwlut64saw@risey.fivesnap.com> (raw)
In-Reply-To: <CAKH6PiVCXC9Kn3PQD28KfBkBD0at_yAKTMvHOAiB-LmXP=vRPg@mail.gmail.com>

While poking around I discovered a modern go rewrite by Rob Pike

https://github.com/robpike/typo


On Wed, Mar 01, 2023 at 11:41:17PM -0500, Douglas McIlroy typed:

> Typo, in v3 through v6, may be the most creative Unix program to have come
> out of Bell Labs. It served as a spell checker before spell(1), though it
> knew nothing about spelling beyond a list of the most common words in the
> language. This brainchild of Bob Morris would, in his words, work just as
> well in Urdu as in English.
> 
> The beautiful trick: gather trigram frequencies in the document, then print
> out a list of the individual words in increasing order of the likelihood
> that they came from the statistical source that those frequencies
> characterize. Typos (as distinct from phonetic misspellings) generally
> floated toward the beginning of the list and so were easy to spot.
> 
> But that's not all that Bob invented. 26^3 16-bit trigram counts didn't fit
> in the PDP-11 memory, so he counted them in 8-bit bytes. To do so he
> invented the trick of "counting large integers in small registers". Roughly
> speaking, when you see a word whose current count is in the range 2^(n-1)
> to 2^n-1, you increment the count with probability 1/2^n, thus getting an
> approximation to lg n, which serves in estimating the entropy of each word.
> 
> This counting method merited a patent and is now recognized as the first of
> what is now an active subfield of theoretical computer
> science--memory-bounded streaming algorithms.


-- 
Steve Mynott <steve.mynott@gmail.com>
rsa3072/629FBB91565E591955B5876A79CEFAA4450EBD50

  parent reply	other threads:[~2023-03-03 13:29 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-01 22:59 [TUHS] " Warner Losh
2023-03-02  4:41 ` [TUHS] " Douglas McIlroy
2023-03-02  5:16   ` David Arnold
2023-03-03 13:29   ` Steve Mynott [this message]
2023-03-03 19:19     ` James Frew
2023-03-04  1:57 Noel Chiappa
2023-03-04  9:22 ` Ralph Corderoy
2023-03-04 15:08 ` Clem Cole

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230303132913.ab5mqzuwlut64saw@risey.fivesnap.com \
    --to=steve.mynott@gmail.com \
    --cc=douglas.mcilroy@dartmouth.edu \
    --cc=tuhs@tuhs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).