From mboxrd@z Thu Jan 1 00:00:00 1970 From: "erik quanstrom" To: "Fans of the OS Plan 9 from Bell Labs" <9fans@cse.psu.edu>, "Russ Cox" References: <57471c9f2e6b9a4c77886fffb87d244d@terzarima.net> <20051110012431.6F23F10F89@dexter-peak.quanstro.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 In-Reply-To: Subject: Re: [9fans] Scaleable mail repositories. Message-Id: <20051110115514.DFCD522DE@dexter-peak.quanstro.net> Date: Thu, 10 Nov 2005 05:55:14 -0600 Cc: Topicbox-Message-UUID: aa92810e-ead0-11e9-9d60-3106f5b1d025 yes, and they've developed some interesting high-performance algorithms, which i've scanned, but need to take a good look at. the computational bio guys love it because they have long strings of base pairs that they want to index. and suffix arrays are the ticket for that. the reason they love suffix arrays is that there is no natural "word". text searching would be the opposite. words are the natural unit (in speech there are no letters) and words are often repeated. - erik Russ Cox writes | | > suffix arrays create an index that is bigger than the | > original data. regardless of the theoretical O(1) mumble, | > the size of the index is a major drawback. | | That's true, but it depends a lot on the app. | The computational biology guys seem to love them | for indexing large amounts of DNA. | | Russ