The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
* [TUHS] A second Unix Patent
@ 2023-03-01 22:59 Warner Losh
  2023-03-02  4:41 ` [TUHS] " Douglas McIlroy
  0 siblings, 1 reply; 8+ messages in thread
From: Warner Losh @ 2023-03-01 22:59 UTC (permalink / raw)
  To: The Eunuchs Hysterical Society


[-- Attachment #1.1: Type: text/plain, Size: 303 bytes --]

In looking at the first AUUGN today, I noticed the following at the end of
a letter John Lions sent home when he spent a sabbatical at Bell Labs

[image: image.png]
I've seen the first patent, but not the second one... That's got to be a
joke or inside joke, right? Anybody know anything else about it?

[-- Attachment #1.2: Type: text/html, Size: 422 bytes --]

[-- Attachment #2: image.png --]
[-- Type: image/png, Size: 46129 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [TUHS] Re: A second Unix Patent
  2023-03-01 22:59 [TUHS] A second Unix Patent Warner Losh
@ 2023-03-02  4:41 ` Douglas McIlroy
  2023-03-02  5:16   ` David Arnold
  2023-03-03 13:29   ` Steve Mynott
  0 siblings, 2 replies; 8+ messages in thread
From: Douglas McIlroy @ 2023-03-02  4:41 UTC (permalink / raw)
  To: Warner Losh; +Cc: The Eunuchs Hysterical Society


[-- Attachment #1.1: Type: text/plain, Size: 1730 bytes --]

Typo, in v3 through v6, may be the most creative Unix program to have come
out of Bell Labs. It served as a spell checker before spell(1), though it
knew nothing about spelling beyond a list of the most common words in the
language. This brainchild of Bob Morris would, in his words, work just as
well in Urdu as in English.

The beautiful trick: gather trigram frequencies in the document, then print
out a list of the individual words in increasing order of the likelihood
that they came from the statistical source that those frequencies
characterize. Typos (as distinct from phonetic misspellings) generally
floated toward the beginning of the list and so were easy to spot.

But that's not all that Bob invented. 26^3 16-bit trigram counts didn't fit
in the PDP-11 memory, so he counted them in 8-bit bytes. To do so he
invented the trick of "counting large integers in small registers". Roughly
speaking, when you see a word whose current count is in the range 2^(n-1)
to 2^n-1, you increment the count with probability 1/2^n, thus getting an
approximation to lg n, which serves in estimating the entropy of each word.

This counting method merited a patent and is now recognized as the first of
what is now an active subfield of theoretical computer
science--memory-bounded streaming algorithms.

Doug

On Wed, Mar 1, 2023 at 6:00 PM Warner Losh <imp@bsdimp.com> wrote:

> In looking at the first AUUGN today, I noticed the following at the end of
> a letter John Lions sent home when he spent a sabbatical at Bell Labs
>
> [image: image.png]
> I've seen the first patent, but not the second one... That's got to be a
> joke or inside joke, right? Anybody know anything else about it?
>

[-- Attachment #1.2: Type: text/html, Size: 2127 bytes --]

[-- Attachment #2: image.png --]
[-- Type: image/png, Size: 46129 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [TUHS] Re: A second Unix Patent
  2023-03-02  4:41 ` [TUHS] " Douglas McIlroy
@ 2023-03-02  5:16   ` David Arnold
  2023-03-03 13:29   ` Steve Mynott
  1 sibling, 0 replies; 8+ messages in thread
From: David Arnold @ 2023-03-02  5:16 UTC (permalink / raw)
  To: Douglas McIlroy; +Cc: The Eunuchs Hysterical Society

[-- Attachment #1: Type: text/html, Size: 2714 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [TUHS] Re: A second Unix Patent
  2023-03-02  4:41 ` [TUHS] " Douglas McIlroy
  2023-03-02  5:16   ` David Arnold
@ 2023-03-03 13:29   ` Steve Mynott
  2023-03-03 19:19     ` James Frew
  1 sibling, 1 reply; 8+ messages in thread
From: Steve Mynott @ 2023-03-03 13:29 UTC (permalink / raw)
  To: Douglas McIlroy; +Cc: The Eunuchs Hysterical Society

While poking around I discovered a modern go rewrite by Rob Pike

https://github.com/robpike/typo


On Wed, Mar 01, 2023 at 11:41:17PM -0500, Douglas McIlroy typed:

> Typo, in v3 through v6, may be the most creative Unix program to have come
> out of Bell Labs. It served as a spell checker before spell(1), though it
> knew nothing about spelling beyond a list of the most common words in the
> language. This brainchild of Bob Morris would, in his words, work just as
> well in Urdu as in English.
> 
> The beautiful trick: gather trigram frequencies in the document, then print
> out a list of the individual words in increasing order of the likelihood
> that they came from the statistical source that those frequencies
> characterize. Typos (as distinct from phonetic misspellings) generally
> floated toward the beginning of the list and so were easy to spot.
> 
> But that's not all that Bob invented. 26^3 16-bit trigram counts didn't fit
> in the PDP-11 memory, so he counted them in 8-bit bytes. To do so he
> invented the trick of "counting large integers in small registers". Roughly
> speaking, when you see a word whose current count is in the range 2^(n-1)
> to 2^n-1, you increment the count with probability 1/2^n, thus getting an
> approximation to lg n, which serves in estimating the entropy of each word.
> 
> This counting method merited a patent and is now recognized as the first of
> what is now an active subfield of theoretical computer
> science--memory-bounded streaming algorithms.


-- 
Steve Mynott <steve.mynott@gmail.com>
rsa3072/629FBB91565E591955B5876A79CEFAA4450EBD50

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [TUHS] Re: A second Unix Patent
  2023-03-03 13:29   ` Steve Mynott
@ 2023-03-03 19:19     ` James Frew
  0 siblings, 0 replies; 8+ messages in thread
From: James Frew @ 2023-03-03 19:19 UTC (permalink / raw)
  To: tuhs

This qualifies as a true Service to the Profession™.

If you doubt that, take a look at the original C version, completely 
free of pesky comments. (Would've made a great CS PhD exam question: 
"What does this code do?")

/Frew

P.S.: All you cats waiting on the books I promised: they're all boxed up 
on the spare bed behind me, shipping Real Soon Now™.

On 2023-03-03 05:29, Steve Mynott wrote:
> While poking around I discovered a modern go rewrite by Rob Pike
>
> https://github.com/robpike/typo
>
> On Wed, Mar 01, 2023 at 11:41:17PM -0500, Douglas McIlroy typed:
>> Typo, in v3 through v6, may be the most creative Unix program to have come
>> out of Bell Labs.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [TUHS] Re: A second Unix Patent
  2023-03-04  1:57 Noel Chiappa
  2023-03-04  9:22 ` Ralph Corderoy
@ 2023-03-04 15:08 ` Clem Cole
  1 sibling, 0 replies; 8+ messages in thread
From: Clem Cole @ 2023-03-04 15:08 UTC (permalink / raw)
  To: Noel Chiappa; +Cc: tuhs

[-- Attachment #1: Type: text/plain, Size: 1428 bytes --]

On Fri, Mar 3, 2023 at 8:58 PM Noel Chiappa <jnc@mercury.lcs.mit.edu> wrote:

> The other possible explanation is that it was perfectly possible to run
> UNIXes
> of that era (V4 on) on machines without enough main memory to hold the
> kernel
> and a 'full-sized' process simultaneously. (Our original machine, an
> -11/40,
> started out without a lot of memory; I don't recall exactly how much,
> though.
> It had, I'm pretty sure, 3 banks of core; I was thinking it was 3 MM11-L
> core
> units, which would be 3x16KB, or only 48KB, but my memory must be wrong;
> that's not really enough.)
>
It's interesting - we did the same thing at CMU - almost all of the
11/40's, 40e's and /34's we had in the 1970s running UNIX.   We would order
them with very minimalist - i.e. 48K and run UNIX swapping to RK05's
(slowly) until we had the money to buy more memory and rotating storage (
which was almost always aftermarket - not from DEC) and then those systems
would be maxed to 256K.   Mellon actually got an original RK07 cheap, so
its disks were all Digital (tape was aftermarket), but most of the Unix
boxes had dual RK05's and then some storage that we could get cheap.
  A couple
of years later, after I graduated and moved on,  I think many groups put
Enable's in a couple of them so they could break the 256K barrier - as I
sent that code back to Ted and Mike after I wrote it at Tektronix.
ᐧ

[-- Attachment #2: Type: text/html, Size: 2771 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [TUHS] Re: A second Unix Patent
  2023-03-04  1:57 Noel Chiappa
@ 2023-03-04  9:22 ` Ralph Corderoy
  2023-03-04 15:08 ` Clem Cole
  1 sibling, 0 replies; 8+ messages in thread
From: Ralph Corderoy @ 2023-03-04  9:22 UTC (permalink / raw)
  To: tuhs

Hi,

Noel wrote:
> I fed '26 3 ^p' into 'dc' to see just how big it was - and got
> "17576", a 16-bit word array of which would fit into a PDP-11 64KB
> address space.

I'm a fan of dc(1), but also of units(1) and since both seem underused
here's an alternative method.

    $ units -1v '26^3 16 bit' 64KiB
            26^3 16 bit = 0.53637695 * 64KiB

-- 
Cheers, Ralph.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [TUHS] Re: A second Unix Patent
@ 2023-03-04  1:57 Noel Chiappa
  2023-03-04  9:22 ` Ralph Corderoy
  2023-03-04 15:08 ` Clem Cole
  0 siblings, 2 replies; 8+ messages in thread
From: Noel Chiappa @ 2023-03-04  1:57 UTC (permalink / raw)
  To: tuhs

    > From: Douglas McIlroy

    > Typo, in v3 through v6 ...
    > 26^3 16-bit trigram counts didn't fit in the PDP-11 memory

Being mildly curious, I fed '26 3 ^p' into 'dc' to see just how big it was -
and got "17576", a 16-bit word array of which would fit into a PDP-11 64KB
address space.

I think the answer is in the first line - V3 didn't use the PDP-11 memory
management, so the kernel _and_ the application had to fit into 56KB. So
there may well have not been 36KB available to hold a 26^3 array of 16-bit
words.

The other possible explanation is that it was perfectly possible to run UNIXes
of that era (V4 on) on machines without enough main memory to hold the kernel
and a 'full-sized' process simultaneously. (Our original machine, an -11/40,
started out without a lot of memory; I don't recall exactly how much, though.
It had, I'm pretty sure, 3 banks of core; I was thinking it was 3 MM11-L core
units, which would be 3x16KB, or only 48KB, but my memory must be wrong;
that's not really enough.)

	Noel

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-03-04 15:09 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-01 22:59 [TUHS] A second Unix Patent Warner Losh
2023-03-02  4:41 ` [TUHS] " Douglas McIlroy
2023-03-02  5:16   ` David Arnold
2023-03-03 13:29   ` Steve Mynott
2023-03-03 19:19     ` James Frew
2023-03-04  1:57 Noel Chiappa
2023-03-04  9:22 ` Ralph Corderoy
2023-03-04 15:08 ` Clem Cole

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).