From mboxrd@z Thu Jan  1 00:00:00 1970
From: "erik quanstrom" <quanstro@quanstro.net>
To: "Fans of the OS Plan 9 from Bell Labs" <9fans@cse.psu.edu>,
	"Russ Cox" <rsc@swtch.com>
References: <57471c9f2e6b9a4c77886fffb87d244d@terzarima.net>
	<Pine.LNX.4.60.0511090910050.15638@athena>
	<20051110012431.6F23F10F89@dexter-peak.quanstro.net>
	<ee9e417a0511091830i7bdff76dp872e1128f34e5e91@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
In-Reply-To: <ee9e417a0511091830i7bdff76dp872e1128f34e5e91@mail.gmail.com>
Subject: Re: [9fans] Scaleable mail repositories.
Message-Id: <20051110115514.DFCD522DE@dexter-peak.quanstro.net>
Date: Thu, 10 Nov 2005 05:55:14 -0600
Cc: 
Topicbox-Message-UUID: aa92810e-ead0-11e9-9d60-3106f5b1d025

yes, and they've developed some interesting high-performance algorithms, which i've scanned,
but need to take a good look at.

the computational bio guys love it because they have long strings of base pairs that they want
to index. and suffix arrays are the ticket for that. the reason they love suffix arrays is that
there is no natural "word".

text searching would be the opposite. words are the natural unit (in speech there are no letters)
and words are often repeated.

- erik

Russ Cox <rsc@swtch.com> writes

| 
| > suffix arrays create an index that is bigger than the
| > original data. regardless of the theoretical O(1) mumble,
| > the size of the index is a major drawback.
| 
| That's true, but it depends a lot on the app.
| The computational biology guys seem to love them
| for indexing large amounts of DNA.
| 
| Russ