From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FROM,MAILING_LIST_MULTI autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 2966 invoked from network); 3 Mar 2023 13:29:28 -0000 Received: from minnie.tuhs.org (50.116.15.146) by inbox.vuxu.org with ESMTPUTF8; 3 Mar 2023 13:29:28 -0000 Received: from minnie.tuhs.org (localhost [IPv6:::1]) by minnie.tuhs.org (Postfix) with ESMTP id 39DFB432FE; Fri, 3 Mar 2023 23:29:23 +1000 (AEST) Received: from mail-wr1-x435.google.com (mail-wr1-x435.google.com [IPv6:2a00:1450:4864:20::435]) by minnie.tuhs.org (Postfix) with ESMTPS id 5CDDE432DE for ; Fri, 3 Mar 2023 23:29:18 +1000 (AEST) Received: by mail-wr1-x435.google.com with SMTP id t15so2259172wrz.7 for ; Fri, 03 Mar 2023 05:29:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=pa8BCJHIDUBYMmExLMOr0EEJZ+tSr+9zI4k3/oomiT0=; b=M2pKV8tZW5kxrbXaBsJeFePZEKLrazQ8Qjf6wDU8fuIKnXVnLws2mHPsvfXm3ohMZd h9ItYi72XOP4FAGhVjWUzo8P4bcjlMlymSwBZNJo4aSvKZaCVPtjquIooJ7fuuQShJf/ xBBavtSNlq9hOXz7c5GcHD3exzLHNaACJl5cEwbu6G3rZg2DuW8twJTbvPp/jnfUoY3p +JYnbpGtAtltMjle4cs8VFsTnlJL6pwZtaGWf+FlbV2ot7SbvsDdTOI8lrfHFX55giI+ uuuFNG46n/jw04bq8ybu0fF19L6EoGtRp5MzL2Olh4A2Fz2eC1MEiTUWHIlRWu4RNi6H f2Rw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=pa8BCJHIDUBYMmExLMOr0EEJZ+tSr+9zI4k3/oomiT0=; b=7PZAJMsRuKAwbZ982838mIh9c7X4W+PCpE1eEqDfl19Om1N4G4qVJLyos2tq1QH8pE MOX5OTE55OlJQTrJVILeKTaix4s/kK4w1vnHRqzX3wfQA0AQBC8DIZFFkwLeR29W9PTF xhlHYhcbfalmm9pl4W9W44Q+DHfXKXta0yh04c9wGcf3WfD07sFSwIwy1cNyNKTgie6Y 1BA1/xZI8yMJKWms0ZOPzEQ0BB/JmqeqVlIVyyTAv0+mDhjBJGeAfN6r/X+3tvw4jsXd 18clb7pyS2EhEY7oTNIHzw2H+CCTiTec5+zabTZRvs+Ue5ip1Yk/O2E7ypWaTjWbZda6 y1Eg== X-Gm-Message-State: AO0yUKWLGIvePmzl0KPS6dzqGcR+Viv477Yzx/nae4YiDEOWduKZ6dmG OgcjqONG9rMjsXSwF8NCTB4= X-Google-Smtp-Source: AK7set8GCGTFnw9BgDy45LsvJPKCC0YCanICx/pLlDVrBBkTZyUGa+eMkBl36B8pyU9F9HdEQoJNLA== X-Received: by 2002:adf:ea08:0:b0:2c7:809:38f2 with SMTP id q8-20020adfea08000000b002c7080938f2mr1567354wrm.12.1677850156518; Fri, 03 Mar 2023 05:29:16 -0800 (PST) Received: from risey.fivesnap.com (15.43.187.81.in-addr.arpa. [81.187.43.15]) by smtp.gmail.com with ESMTPSA id p7-20020a5d4e07000000b002c5694aef92sm2271693wrt.21.2023.03.03.05.29.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 Mar 2023 05:29:15 -0800 (PST) Date: Fri, 3 Mar 2023 13:29:13 +0000 From: Steve Mynott To: Douglas McIlroy Message-ID: <20230303132913.ab5mqzuwlut64saw@risey.fivesnap.com> X-OS: Linux risey 5.10.0-21-amd64 References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Message-ID-Hash: RYYVNUL7ILCYFHKPBW6MUX32X5YUYKAY X-Message-ID-Hash: RYYVNUL7ILCYFHKPBW6MUX32X5YUYKAY X-MailFrom: steve.mynott@gmail.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header CC: The Eunuchs Hysterical Society X-Mailman-Version: 3.3.6b1 Precedence: list Subject: [TUHS] Re: A second Unix Patent List-Id: The Unix Heritage Society mailing list Archived-At: List-Archive: List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: While poking around I discovered a modern go rewrite by Rob Pike https://github.com/robpike/typo On Wed, Mar 01, 2023 at 11:41:17PM -0500, Douglas McIlroy typed: > Typo, in v3 through v6, may be the most creative Unix program to have come > out of Bell Labs. It served as a spell checker before spell(1), though it > knew nothing about spelling beyond a list of the most common words in the > language. This brainchild of Bob Morris would, in his words, work just as > well in Urdu as in English. > > The beautiful trick: gather trigram frequencies in the document, then print > out a list of the individual words in increasing order of the likelihood > that they came from the statistical source that those frequencies > characterize. Typos (as distinct from phonetic misspellings) generally > floated toward the beginning of the list and so were easy to spot. > > But that's not all that Bob invented. 26^3 16-bit trigram counts didn't fit > in the PDP-11 memory, so he counted them in 8-bit bytes. To do so he > invented the trick of "counting large integers in small registers". Roughly > speaking, when you see a word whose current count is in the range 2^(n-1) > to 2^n-1, you increment the count with probability 1/2^n, thus getting an > approximation to lg n, which serves in estimating the entropy of each word. > > This counting method merited a patent and is now recognized as the first of > what is now an active subfield of theoretical computer > science--memory-bounded streaming algorithms. -- Steve Mynott rsa3072/629FBB91565E591955B5876A79CEFAA4450EBD50