From mboxrd@z Thu Jan 1 00:00:00 1970 From: "erik quanstrom" To: 9fans@cse.psu.edu, Sam References: <57471c9f2e6b9a4c77886fffb87d244d@terzarima.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 In-Reply-To: Subject: Re: [9fans] Scaleable mail repositories. Message-Id: <20051110012431.6F23F10F89@dexter-peak.quanstro.net> Date: Wed, 9 Nov 2005 19:24:31 -0600 Cc: Topicbox-Message-UUID: a9df6c2c-ead0-11e9-9d60-3106f5b1d025 suffix arrays create an index that is bigger than the original data. regardless of the theoretical O(1) mumble, the size of the index is a major drawback. erik Sam writes | | In the not-so-distant past I was part of a three man | effort to write a web site indexer / search engine | generator. My job was to take the indexed files / urls | (they sucked them down with java) and create a suffix | tree database that could be searched upon via cgi. I | don't have any specific numbers, but it was quite fast. | | This was when google was just becoming known and once | we realized we could point google at a website the | project was abandoned. | | The whole point of using suffix trees is linear time | search wrt the size of the search string (note: not | the size of the searched text). Seems like it's | a good candidate for this task. | | Sam