9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: rog@vitanuova.com
To: 9fans@cse.psu.edu
Subject: Re: [9fans] Scaleable mail repositories.
Date: Tue,  8 Nov 2005 19:56:27 +0000	[thread overview]
Message-ID: <2df4e3af3782344adbb24aae570efef9@vitanuova.com> (raw)
In-Reply-To: <8ccc8ba40511011429t47bf84a0y293ee9e578d311f8@mail.gmail.com>

> Why search just mail? If you store your mail as files and put in place
> a search engine, the views and searches you want to make will work
> for it all.

that would be nice, but i think it's a bit ambitious for what i'm
looking at currently.  the search engine would have to be quite
intelligent:

1) it would have to be triggered on the arrival of new mail (otherwise
newly arrived messages would not be held in the index)
2) it would have to know which parts of the file system contained
mail messages and MIME parse them (assuming the mail files
were stored in raw format, which seems necessary for digital
signature verification, not to mention efficiency of delivery
and storage).

having just had a brief glance at the description of Google Desktop,
it appears that it probably does all these things.  in fact, given the
special parsing necessary to index different kinds of data, it's
probably irrelevant what format the mailbox is in - it's dealable
with.

i have to say that some kind of "google desktop for plan 9" would be
lovely, but going for mail first is perhaps a more immediately
realisable target.

the first step, anyway, in both cases, is writing the code to do the
inverted index.

i thought i'd write an external search algorithm - i'm most of the way
through an extendable hash implementation (which seems simple and
quick for insertion, but things get more complex when dealing with
large values, and on deletion; i'm not sure of the best way to deal
with block allocation; and more seriously, maybe it's essential to
have an algorithm that can do range (e.g.  prefix) lookups).  any
elegant (read *small*!), nicely implemented, open source libraries out
there that might fit the bill?  a good description of an appropriate
algorithm would do just as well...



  reply	other threads:[~2005-11-08 19:56 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-10-29 15:34 [9fans] rfork(RFPROC) and ffork() erik quanstrom
2005-10-29 19:11 ` William Josephson
2005-10-29 19:18 ` Russ Cox
2005-10-29 23:00   ` erik quanstrom
2005-10-29 23:24     ` Francisco J Ballesteros
2005-10-29 23:38     ` Russ Cox
2005-10-30  0:19       ` erik quanstrom
2005-10-30  1:07         ` Russ Cox
2005-10-30  1:15           ` Ronald G Minnich
2005-10-30  1:22             ` geoff
2005-10-30  1:58               ` jmk
2005-10-30  1:54           ` Dave Eckhardt
2005-10-30  2:24           ` erik quanstrom
2005-10-30  2:51             ` geoff
2005-10-30  1:10         ` geoff
2005-10-30  1:18           ` Paul Lalonde
2005-10-30  6:52             ` Skip Tavakkolian
2005-10-30 10:14             ` Francisco J Ballesteros
2005-10-30 15:17             ` Russ Cox
2005-10-30 23:00               ` Dave Eckhardt
2005-10-30 23:14                 ` George Michaelson
2005-10-31  2:15                 ` erik quanstrom
2005-10-31  2:33                   ` geoff
2005-10-31  3:23                   ` Skip Tavakkolian
2005-10-31  4:20                 ` Lyndon Nerenberg
2005-10-31 21:31                   ` Dave Eckhardt
2005-10-31  4:06             ` [9fans] Scaleable mail repositories Lyndon Nerenberg
2005-10-31 10:55               ` C H Forsyth
2005-10-31 12:32                 ` erik quanstrom
2005-11-01 19:56                   ` rog
2005-11-01 22:29                     ` Francisco J Ballesteros
2005-11-08 19:56                       ` rog [this message]
2005-11-08 23:22                         ` Joel Salomon
2005-11-09  0:51                         ` Caerwyn Jones
2005-11-09  0:55                           ` Russ Cox
2005-11-09  3:32                         ` erik quanstrom
2005-10-31 15:30                 ` jmk
2005-10-30  1:10       ` [9fans] rfork(RFPROC) and ffork() William Josephson
2005-10-31 14:48 ` Russ Cox
2005-10-31 11:32 [9fans] Scaleable mail repositories Fco. J. Ballesteros
2005-10-31 16:01 ` Ronald G Minnich
2005-10-31 15:06   ` jmk
2005-10-31 15:14 Fco. J. Ballesteros
2005-10-31 16:22 ` Ronald G Minnich
2005-10-31 18:37   ` William Josephson
2005-10-31 15:19 Fco. J. Ballesteros
2005-10-31 15:33 Fco. J. Ballesteros
2005-10-31 18:38 ` William Josephson
2005-11-09  9:45 Fco. J. Ballesteros
2005-11-09 10:24 ` Charles Forsyth
2005-11-09 14:19   ` Sam
2005-11-10  1:24     ` erik quanstrom
2005-11-10  2:30       ` Russ Cox
2005-11-10  6:33         ` Scott Schwartz
2005-11-10 11:55         ` erik quanstrom

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2df4e3af3782344adbb24aae570efef9@vitanuova.com \
    --to=rog@vitanuova.com \
    --cc=9fans@cse.psu.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).