From: Joseph Barillari <jbarilla@princeton.edu>
Cc: ding@gnus.org
Subject: Re: Performance with large mailboxes and nnsql.el
Date: Fri, 03 May 2002 20:37:40 -0400 [thread overview]
Message-ID: <m3bsbw7nor.fsf@washer.barillari.org> (raw)
In-Reply-To: <vzad6wlxlk4.fsf@false.linpro.no> (Kristoffer Gleditsch's message of "Sat, 27 Apr 2002 16:28:43 +0200")
[-- Attachment #1: Type: text/plain, Size: 3621 bytes --]
>>>>> "KG" == Kristoffer Gleditsch <toffer@ping.uio.no> writes:
> [ Joseph Barillari ]
>> Out of curiosity, is anyone working on nnsql.el? Or are there
>> any fragments of a half-finished nnsql.el lying around with
>> which I could begin work?
> I did some work on such a backend some time ago, using Eric
> Marsden's PostgreSQL interface library
> (http://www.chez.com/emarsden/downloads/pg.el). It almost gave
> me a working summary buffer, but I never got around to doing the
> database insertion stuff, so it has been standing still for some
> time. If anyone wants to take a look at it, it can be checked
> out of anonymous CVS following the instructions at
> http://www.ping.uio.no/anoncvs.shtml (just use 'nnsql' instead
> of 'frotz'.)
I noticed in your code:
/* We store each line as a separate database entry. I don't know if
that is a good idea or not. Probably not. */
I'd agree. One cannot perform a regular expression search on the
message body quite so easily that way (Postgres has such a
feature).
If splitting the bodies into individual lines is a bad idea,
presumably each body should be kept intact in a TEXT field.
What do you think of splitting headers? I assume splitting the
most-often-searched-upon headers (To:, Cc:, Subject:, etc.) into
separate fields would be a good idea -- it would aid searching, make
it easier to write other clients (web-based clients, for instance),
and it would let the underlying database build indexes.
Most headers (except From:) can and often do consist of a
comma-delimited list of entries. Splitting these up further along the
comma-delimitations, and inserting them into a separate table is
probably a good idea, once again, as it can improve indexing
performance.
What to do with the message body is another matter.
If one stores the headers separately, should one also remove them from
the body text? Leaving them in the body would be a Bad Idea in the
relational-database frame of thinking -- data should never be
duplicated. But stripping them out means that the database will not be
storing exact copies of the messages. Reconstructing the headers in
the exact order in which they appeared may not be possible unless some
information is left in the database.
This would be a non-issue if the message headers were static. But
headers often need to be changed -- the drafts mailbox is a perfect
example. If the headers are left in the body, and Gnus needs to change
them, then the entire message has to be rewritten -- not just the
header in question.
The only way around that re-writing would be to remove the
stored-separately headers and replace them with some sort of tags that
Gnus will replace with the actual headers when the message is fed into
a buffer. This would speed writes, as the body would not need to be
touched, but make reads ugly and replete with string-replacements, to
say nothing of the additional difficulty in writing read-only clients
(like web-based readers.)
What do you think of this pseudo-model, using the
leave-headers-in-body option?
_Messages_ _Headers_ /* but only interesting headers */
Body TEXT Header TEXT
Serial INTEGER--| Contents TEXT
From TEXT |----Message INTEGER
MsgID TEXT
Rec'd TIMESTAMP
Date TIMESTAMP
Mailbox TEXT /*all of the nnsql: mailboxes could be
accommodated in a single table*/
--Joe
[-- Attachment #2: Type: application/pgp-signature, Size: 268 bytes --]
next prev parent reply other threads:[~2002-05-04 0:37 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-04-27 1:19 Joseph Barillari
2002-04-27 14:28 ` Kristoffer Gleditsch
2002-05-04 0:37 ` Joseph Barillari [this message]
2002-05-13 23:34 ` Kristoffer Gleditsch
2002-05-27 19:58 ` Joseph Barillari
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m3bsbw7nor.fsf@washer.barillari.org \
--to=jbarilla@princeton.edu \
--cc=ding@gnus.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).