The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
From: Douglas McIlroy <douglas.mcilroy@dartmouth.edu>
To: Warner Losh <imp@bsdimp.com>
Cc: The Eunuchs Hysterical Society <tuhs@tuhs.org>
Subject: [TUHS] Re: A second Unix Patent
Date: Wed, 1 Mar 2023 23:41:17 -0500	[thread overview]
Message-ID: <CAKH6PiVCXC9Kn3PQD28KfBkBD0at_yAKTMvHOAiB-LmXP=vRPg@mail.gmail.com> (raw)
In-Reply-To: <CANCZdfpbJHtiS0paM9yjNdywwKpnbK8k5w2Q2roAc8QdwsZ+9g@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 1730 bytes --]

Typo, in v3 through v6, may be the most creative Unix program to have come
out of Bell Labs. It served as a spell checker before spell(1), though it
knew nothing about spelling beyond a list of the most common words in the
language. This brainchild of Bob Morris would, in his words, work just as
well in Urdu as in English.

The beautiful trick: gather trigram frequencies in the document, then print
out a list of the individual words in increasing order of the likelihood
that they came from the statistical source that those frequencies
characterize. Typos (as distinct from phonetic misspellings) generally
floated toward the beginning of the list and so were easy to spot.

But that's not all that Bob invented. 26^3 16-bit trigram counts didn't fit
in the PDP-11 memory, so he counted them in 8-bit bytes. To do so he
invented the trick of "counting large integers in small registers". Roughly
speaking, when you see a word whose current count is in the range 2^(n-1)
to 2^n-1, you increment the count with probability 1/2^n, thus getting an
approximation to lg n, which serves in estimating the entropy of each word.

This counting method merited a patent and is now recognized as the first of
what is now an active subfield of theoretical computer
science--memory-bounded streaming algorithms.

Doug

On Wed, Mar 1, 2023 at 6:00 PM Warner Losh <imp@bsdimp.com> wrote:

> In looking at the first AUUGN today, I noticed the following at the end of
> a letter John Lions sent home when he spent a sabbatical at Bell Labs
>
> [image: image.png]
> I've seen the first patent, but not the second one... That's got to be a
> joke or inside joke, right? Anybody know anything else about it?
>

[-- Attachment #1.2: Type: text/html, Size: 2127 bytes --]

[-- Attachment #2: image.png --]
[-- Type: image/png, Size: 46129 bytes --]

  reply	other threads:[~2023-03-02  4:41 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-01 22:59 [TUHS] " Warner Losh
2023-03-02  4:41 ` Douglas McIlroy [this message]
2023-03-02  5:16   ` [TUHS] " David Arnold
2023-03-03 13:29   ` Steve Mynott
2023-03-03 19:19     ` James Frew
2023-03-04  1:57 Noel Chiappa
2023-03-04  9:22 ` Ralph Corderoy
2023-03-04 15:08 ` Clem Cole

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKH6PiVCXC9Kn3PQD28KfBkBD0at_yAKTMvHOAiB-LmXP=vRPg@mail.gmail.com' \
    --to=douglas.mcilroy@dartmouth.edu \
    --cc=imp@bsdimp.com \
    --cc=tuhs@tuhs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).