The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
* [TUHS] Spell - was tmac: Move macro diagnostics away from `quotes\'
@ 2017-11-22  2:34 Doug McIlroy
  0 siblings, 0 replies; 14+ messages in thread
From: Doug McIlroy @ 2017-11-22  2:34 UTC (permalink / raw)


A non-text attachment was scrubbed...
Name: not available
Type: application/octet-stream
Size: 2099 bytes
Desc: not available
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20171121/0f7829c0/attachment.obj>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TUHS] Spell - was tmac: Move macro diagnostics away from `quotes'
  2017-11-23  1:05 [TUHS] Spell - was tmac: Move macro diagnostics away from `quotes' Doug McIlroy
  2017-11-23  3:11 ` Lyndon Nerenberg
@ 2017-11-25  3:25 ` Bakul Shah
  1 sibling, 0 replies; 14+ messages in thread
From: Bakul Shah @ 2017-11-25  3:25 UTC (permalink / raw)


On Nov 22, 2017, at 5:05 PM, Doug McIlroy <doug at cs.dartmouth.edu> wrote:
> 
> Steve's program was good, but the dictionary isn't an ideal source
> for real text, which abounds in proper names and terms of art.
> It also has a lot of rare words that don't pull their weight in
> a spell checker, and some attractive nuisances, especially obscure
> short words from Scots, botany, etc, which are more likely to
> arise in everyday text as typos than by intent. Given the basic
> success of Steve's program, I undertook to make a more useful
> spelling list, along with more vigorous affix stripping (and a
> stop list to avert associated traps, e.g. "presenation" =
> pre+senate+ion"). That has been described in Bentley's "Programming
> Pearls" and in http://www.cs.dartmouth.edu/~doug/spell.pdf.

This is quite interesting to me. A while ago I looked into building a spell
checker for Gujarati (a Sanskrit based language) and found it to be a
complicated affair -- words can have multiple suffixes since the Guj.
equivalents of from/to/in/ etc prepositions are tacked on at the end of
a word. But the same endings can also appear in normal words. And
there are other complications.... Even though the language is phonetic,
mistakes of using the wrong form of long/short vowel signs are common.
After reading your paper I am tempted to revive the effort.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TUHS] Spell - was tmac: Move macro diagnostics away from `quotes'
  2017-11-24 17:01 ` Ralph Corderoy
  2017-11-24 18:06   ` Nelson H. F. Beebe
  2017-11-24 22:46   ` Dave Horsfall
@ 2017-11-24 22:57   ` Arthur Krewat
  2 siblings, 0 replies; 14+ messages in thread
From: Arthur Krewat @ 2017-11-24 22:57 UTC (permalink / raw)




On 11/24/2017 12:01 PM, Ralph Corderoy wrote:
> I too find interesting pages have disappeared in later years so visit
> archive.org and have them take a copy for me, and everyone else.
>
Don't rely on outside entities to archive content. Mirror it, and put it 
up yourself.

If anyone needs free hosting, let me know.

art k.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TUHS] Spell - was tmac: Move macro diagnostics away from `quotes'
  2017-11-24 17:01 ` Ralph Corderoy
  2017-11-24 18:06   ` Nelson H. F. Beebe
@ 2017-11-24 22:46   ` Dave Horsfall
  2017-11-24 22:57   ` Arthur Krewat
  2 siblings, 0 replies; 14+ messages in thread
From: Dave Horsfall @ 2017-11-24 22:46 UTC (permalink / raw)


On Fri, 24 Nov 2017, Ralph Corderoy wrote:

> I too find interesting pages have disappeared in later years so visit 
> archive.org and have them take a copy for me, and everyone else.

And there'll be hell to pay if Dennis' page is ever removed.

-- 
Dave Horsfall DTM (VK2KFU)  "Those who don't understand security will suffer."


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TUHS] Spell - was tmac: Move macro diagnostics away from `quotes'
  2017-11-24 18:17     ` Henry Bent
@ 2017-11-24 20:18       ` Ron Natalie
  0 siblings, 0 replies; 14+ messages in thread
From: Ron Natalie @ 2017-11-24 20:18 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1672 bytes --]

I remember in 1990 we got our first 1Gig drive, I paid $1000 for it.   ($1/MB).

One of the sales guys I worked with had a unit of storage called the “Costco Terabyte.”    How much one terabyte of storage costs at Costco.

When we started tracking it, it was around $5000.    It was down about $40 last I checked.

 

From: TUHS [mailto:tuhs-bounces@minnie.tuhs.org] On Behalf Of Henry Bent
Sent: Friday, November 24, 2017 1:17 PM
To: Nelson H. F. Beebe
Cc: TUHS main list
Subject: Re: [TUHS] Spell - was tmac: Move macro diagnostics away from `quotes'

 

On 24 November 2017 at 13:06, Nelson H. F. Beebe <beebe at math.utah.edu> wrote:

P.S. In 1990, we filled a dumpster with 9-track tapes that we had to
abandon because of our move to new hardware that lacked such a drive,
and because our new disk system had insufficent disk space to preserve
their contents.

I have since regretted that decision many times, because a lot of
stuff was lost forever.

The maximum capacity of 6250-bpi 9-track tapes was about 100MB to
170MB.  A thousand such tapes would have needed just 100GB to 170GB,
an amount of space that I can now buy in Utah for about US$4 (based on
a local store offering of $94 for a 4TB USB-3 attached disk about the
size of a paperback thriller).

 

Sure, but how much would 170GB of storage have cost in 1990?  And what would have been the cost to mirror it, or to back it up on to a more modern tape format?  Was that data really worth tens of thousands of dollars?

 

-Henry

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20171124/9525d377/attachment.html>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TUHS] Spell - was tmac: Move macro diagnostics away from `quotes'
  2017-11-24 18:06   ` Nelson H. F. Beebe
@ 2017-11-24 18:17     ` Henry Bent
  2017-11-24 20:18       ` Ron Natalie
  0 siblings, 1 reply; 14+ messages in thread
From: Henry Bent @ 2017-11-24 18:17 UTC (permalink / raw)


On 24 November 2017 at 13:06, Nelson H. F. Beebe <beebe at math.utah.edu>
wrote:

> P.S. In 1990, we filled a dumpster with 9-track tapes that we had to
> abandon because of our move to new hardware that lacked such a drive,
> and because our new disk system had insufficent disk space to preserve
> their contents.
>
> I have since regretted that decision many times, because a lot of
> stuff was lost forever.
>
> The maximum capacity of 6250-bpi 9-track tapes was about 100MB to
> 170MB.  A thousand such tapes would have needed just 100GB to 170GB,
> an amount of space that I can now buy in Utah for about US$4 (based on
> a local store offering of $94 for a 4TB USB-3 attached disk about the
> size of a paperback thriller).
>

Sure, but how much would 170GB of storage have cost in 1990?  And what
would have been the cost to mirror it, or to back it up on to a more modern
tape format?  Was that data really worth tens of thousands of dollars?

-Henry
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20171124/9cf4db24/attachment-0001.html>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TUHS] Spell - was tmac: Move macro diagnostics away from `quotes'
  2017-11-24 17:01 ` Ralph Corderoy
@ 2017-11-24 18:06   ` Nelson H. F. Beebe
  2017-11-24 18:17     ` Henry Bent
  2017-11-24 22:46   ` Dave Horsfall
  2017-11-24 22:57   ` Arthur Krewat
  2 siblings, 1 reply; 14+ messages in thread
From: Nelson H. F. Beebe @ 2017-11-24 18:06 UTC (permalink / raw)


Ralph Corderoy <ralph at inputplus.co.uk> writes today:

>> Is https://archive.org/details/bstj-archives what you're after?

Thanks for uncovering that!  I'll add links to it shortly in all of
the Bell Labs journal family bibliography files.

I prepared the family's BibTeX bibliographies in December 2010,
according to my revision history logs, and e-mail archives of
exchanges with a Bell Labs researcher.  At that time, I had access to
the full collection of PDFs, but out of concern for local disk space,
and the (now mistaken) belief that they would continue to be available
at Bell Labs/Lucent, I did not mirror them to Utah.

I made the same mistake with the two IBM journals whose archives
disappeared behind the IEEE pay wall.  Sigh...

----------------------------------------

P.S. In 1990, we filled a dumpster with 9-track tapes that we had to
abandon because of our move to new hardware that lacked such a drive,
and because our new disk system had insufficent disk space to preserve
their contents.

I have since regretted that decision many times, because a lot of
stuff was lost forever.

The maximum capacity of 6250-bpi 9-track tapes was about 100MB to
170MB.  A thousand such tapes would have needed just 100GB to 170GB,
an amount of space that I can now buy in Utah for about US$4 (based on
a local store offering of $94 for a 4TB USB-3 attached disk about the
size of a paperback thriller).

-------------------------------------------------------------------------------
- Nelson H. F. Beebe                    Tel: +1 801 581 5254                  -
- University of Utah                    FAX: +1 801 581 4148                  -
- Department of Mathematics, 110 LCB    Internet e-mail: beebe at math.utah.edu  -
- 155 S 1400 E RM 233                       beebe at acm.org  beebe at computer.org -
- Salt Lake City, UT 84112-0090, USA    URL: http://www.math.utah.edu/~beebe/ -
-------------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TUHS] Spell - was tmac: Move macro diagnostics away from `quotes'
  2017-11-24 15:13 Noel Chiappa
@ 2017-11-24 17:24 ` Will Senn
  0 siblings, 0 replies; 14+ messages in thread
From: Will Senn @ 2017-11-24 17:24 UTC (permalink / raw)


On 11/24/17 9:13 AM, Noel Chiappa wrote:
>      > From: "Nelson H. F. Beebe"
>
>      > The PDF URLs for bstj.bell-labs.com no longer work, and the ones for
>      > www.alcatel-lucent.com ... now redirect to an HTML page.
>
> With any luck, someone scraped them before they went.
>
> I've gotten in the habit of scraping all the Web content I look at, since it
> has (as above) a distressing tendency to vapourize.
>
> 	Noel

A lot of the unix related articles are collected in these two volumes on 
bitsavers:

http://bitsavers.org/pdf/att/unix/UNIX_System_Readings_and_Applications_Volume_1_1987.pdf
http://bitsavers.org/pdf/att/unix/UNIX_System_Readings_and_Applications_Volume_2_1987.pdf

Will

-- 
GPG Fingerprint: 68F4 B3BD 1730 555A 4462  7D45 3EAA 5B6D A982 BAAF



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TUHS] Spell - was tmac: Move macro diagnostics away from `quotes'
  2017-11-24 15:05 Nelson H. F. Beebe
@ 2017-11-24 17:01 ` Ralph Corderoy
  2017-11-24 18:06   ` Nelson H. F. Beebe
                     ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Ralph Corderoy @ 2017-11-24 17:01 UTC (permalink / raw)


Hi Nelson,

> I tried to find the URLs at https://web.archive.org/, but it does
> appear to have them.

Is https://archive.org/details/bstj-archives what you're after?

I too find interesting pages have disappeared in later years so visit
archive.org and have them take a copy for me, and everyone else.

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TUHS] Spell - was tmac: Move macro diagnostics away from `quotes'
@ 2017-11-24 15:13 Noel Chiappa
  2017-11-24 17:24 ` Will Senn
  0 siblings, 1 reply; 14+ messages in thread
From: Noel Chiappa @ 2017-11-24 15:13 UTC (permalink / raw)


    > From: "Nelson H. F. Beebe"

    > The PDF URLs for bstj.bell-labs.com no longer work, and the ones for
    > www.alcatel-lucent.com ... now redirect to an HTML page.

With any luck, someone scraped them before they went.

I've gotten in the habit of scraping all the Web content I look at, since it
has (as above) a distressing tendency to vapourize.

	Noel


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TUHS] Spell - was tmac: Move macro diagnostics away from `quotes'
@ 2017-11-24 15:05 Nelson H. F. Beebe
  2017-11-24 17:01 ` Ralph Corderoy
  0 siblings, 1 reply; 14+ messages in thread
From: Nelson H. F. Beebe @ 2017-11-24 15:05 UTC (permalink / raw)


BibTeX entries for the complete contents of the Bell System Technical
Journal family are in the TeX User Group archives at

	http://www.math.utah.edu/pub/tex/bib/bstj1970.bib

[change 1970 to other decades, and .bib to .html for live hyperlinks].

The PDF URLs for bstj.bell-labs.com no longer work, and the ones for
www.alcatel-lucent.com, such as 

	http://www.alcatel-lucent.com/bstj/vol57-1978/articles/bstj57-6-2155.pdf

now redirect to an HTML page.

Otherwise, articles are available from the Wiley site at

	http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1538-7305/issues/

but are behind a paywall.  There are also copies in the IEEE eXplore
database at

	http://ieeexplore.ieee.org/xpl/RecentIssue.jsp?reload=true&punumber=6731002

I tried to find the URLs at https://web.archive.org/, but it does
appear to have them.

-------------------------------------------------------------------------------
- Nelson H. F. Beebe                    Tel: +1 801 581 5254                  -
- University of Utah                    FAX: +1 801 581 4148                  -
- Department of Mathematics, 110 LCB    Internet e-mail: beebe at math.utah.edu  -
- 155 S 1400 E RM 233                       beebe at acm.org  beebe at computer.org -
- Salt Lake City, UT 84112-0090, USA    URL: http://www.math.utah.edu/~beebe/ -
-------------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TUHS] Spell - was tmac: Move macro diagnostics away from `quotes'
  2017-11-23  3:11 ` Lyndon Nerenberg
@ 2017-11-24  9:44   ` Tim Bradshaw
  0 siblings, 0 replies; 14+ messages in thread
From: Tim Bradshaw @ 2017-11-24  9:44 UTC (permalink / raw)



> On 23 Nov 2017, at 03:11, Lyndon Nerenberg <lyndon at orthanc.ca> wrote:
> 
> This was written up in the same BSTJ number that talked about many of the troff pre-processors and other DWB tools, IIRC.  Was that the "big" UNIX edition? Either way, the paper is well worth a read if you can find it (and I'm sorry I can't recall the title right now).

I think it's 'Language development tools' by Johnson & Lesk in vol 57 number 6 part 2, p2155 (which I'm sure I should cite in some more proper way).

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20171124/637c0eba/attachment.html>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TUHS] Spell - was tmac: Move macro diagnostics away from `quotes'
  2017-11-23  1:05 [TUHS] Spell - was tmac: Move macro diagnostics away from `quotes' Doug McIlroy
@ 2017-11-23  3:11 ` Lyndon Nerenberg
  2017-11-24  9:44   ` Tim Bradshaw
  2017-11-25  3:25 ` Bakul Shah
  1 sibling, 1 reply; 14+ messages in thread
From: Lyndon Nerenberg @ 2017-11-23  3:11 UTC (permalink / raw)



> On Nov 22, 2017, at 5:05 PM, Doug McIlroy <doug at cs.dartmouth.edu> wrote:
> 
> The first one was a fantastic tour de force by Bob Morris,
> called "typo". Aside from the file "eign" of the very most common
> English words, it had no vocabulary. Instead it evaluated the
> likelihood that any particular word came from a source with the
> same letter-trigram frequencies as the document as a whole. The
> words were then printed in increasing order of likelihood. Typos
> tended to come early in the list.

This was written up in the same BSTJ number that talked about many of the troff pre-processors and other DWB tools, IIRC.  Was that the "big" UNIX edition?  Either way, the paper is well worth a read if you can find it (and I'm sorry I can't recall the title right now).


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TUHS] Spell - was tmac: Move macro diagnostics away from `quotes'
@ 2017-11-23  1:05 Doug McIlroy
  2017-11-23  3:11 ` Lyndon Nerenberg
  2017-11-25  3:25 ` Bakul Shah
  0 siblings, 2 replies; 14+ messages in thread
From: Doug McIlroy @ 2017-11-23  1:05 UTC (permalink / raw)


Repeat, slightly modified, of a previous post that got
shunted to the attachment heap.

>   I am curious if anyone on the list remembers much
> about the development of the first spell checkers in Unix?

Yes, intimately. They had no relationship to the PDP 10.

The first one was a fantastic tour de force by Bob Morris,
called "typo". Aside from the file "eign" of the very most common
English words, it had no vocabulary. Instead it evaluated the
likelihood that any particular word came from a source with the
same letter-trigram frequencies as the document as a whole. The
words were then printed in increasing order of likelihood. Typos
tended to come early in the list.

Typo, introduced in v3, was very popular until Steve Johnson wrote
"spell", a remarkably short shell script that (efficiently) looks
up a document's words in the wordlist of Webster's Collegiate
Dictionary, which we had on line. The only "real" coding he did
was to write a simple affix-stripping program to make it possible
to look up plurals, past tenses, etc. If memory serves, Steve's
program is described in Kernighan and Pike. It appeared in v5.

Steve's program was good, but the dictionary isn't an ideal source
for real text, which abounds in proper names and terms of art.
It also has a lot of rare words that don't pull their weight in
a spell checker, and some attractive nuisances, especially obscure
short words from Scots, botany, etc, which are more likely to
arise in everyday text as typos than by intent. Given the basic
success of Steve's program, I undertook to make a more useful
spelling list, along with more vigorous affix stripping (and a
stop list to avert associated traps, e.g. "presenation" =
pre+senate+ion"). That has been described in Bentley's "Programming
Pearls" and in http://www.cs.dartmouth.edu/~doug/spell.pdf.

Morris's program and mine labored under space constraints, so
have some pretty ingenious coding tricks. In fact Morris has
a patent on the way he counted frequencies of the 26^3 trigrams
in 26^3 bytes, even though the counts could exceed 255. I did
some heroic (and probabilistic) encoding to squeeze a 30,000
word dictionary into a 64K data space, without severely 
affecting lookup time.

Doug


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-11-25  3:25 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-22  2:34 [TUHS] Spell - was tmac: Move macro diagnostics away from `quotes\' Doug McIlroy
2017-11-23  1:05 [TUHS] Spell - was tmac: Move macro diagnostics away from `quotes' Doug McIlroy
2017-11-23  3:11 ` Lyndon Nerenberg
2017-11-24  9:44   ` Tim Bradshaw
2017-11-25  3:25 ` Bakul Shah
2017-11-24 15:05 Nelson H. F. Beebe
2017-11-24 17:01 ` Ralph Corderoy
2017-11-24 18:06   ` Nelson H. F. Beebe
2017-11-24 18:17     ` Henry Bent
2017-11-24 20:18       ` Ron Natalie
2017-11-24 22:46   ` Dave Horsfall
2017-11-24 22:57   ` Arthur Krewat
2017-11-24 15:13 Noel Chiappa
2017-11-24 17:24 ` Will Senn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).