Gnus development mailing list
 help / color / mirror / Atom feed
From: Joseph Barillari <jbarilla@princeton.edu>
Cc: ding@gnus.org
Subject: Re: Performance with large mailboxes and nnsql.el
Date: Fri, 03 May 2002 20:37:40 -0400	[thread overview]
Message-ID: <m3bsbw7nor.fsf@washer.barillari.org> (raw)
In-Reply-To: <vzad6wlxlk4.fsf@false.linpro.no> (Kristoffer Gleditsch's message of "Sat, 27 Apr 2002 16:28:43 +0200")

[-- Attachment #1: Type: text/plain, Size: 3621 bytes --]

>>>>> "KG" == Kristoffer Gleditsch <toffer@ping.uio.no> writes:

    > [ Joseph Barillari ]
    >> Out of curiosity, is anyone working on nnsql.el? Or are there
    >> any fragments of a half-finished nnsql.el lying around with
    >> which I could begin work?

    > I did some work on such a backend some time ago, using Eric
    > Marsden's PostgreSQL interface library
    > (http://www.chez.com/emarsden/downloads/pg.el).  It almost gave
    > me a working summary buffer, but I never got around to doing the
    > database insertion stuff, so it has been standing still for some
    > time.  If anyone wants to take a look at it, it can be checked
    > out of anonymous CVS following the instructions at
    > http://www.ping.uio.no/anoncvs.shtml (just use 'nnsql' instead
    > of 'frotz'.)

I noticed in your code:

/* We store each line as a separate database entry.  I don't know if
   that is a good idea or not.  Probably not.  */

I'd agree. One cannot perform a regular expression search on the
message body quite so easily that way (Postgres has such a
feature). 

If splitting the bodies into individual lines is a bad idea,
presumably each body should be kept intact in a TEXT field.

What do you think of splitting headers? I assume splitting the
most-often-searched-upon headers (To:, Cc:, Subject:, etc.) into
separate fields would be a good idea -- it would aid searching, make
it easier to write other clients (web-based clients, for instance),
and it would let the underlying database build indexes. 

Most headers (except From:) can and often do consist of a
comma-delimited list of entries. Splitting these up further along the
comma-delimitations, and inserting them into a separate table is
probably a good idea, once again, as it can improve indexing
performance.

What to do with the message body is another matter.

If one stores the headers separately, should one also remove them from
the body text? Leaving them in the body would be a Bad Idea in the
relational-database frame of thinking -- data should never be
duplicated. But stripping them out means that the database will not be
storing exact copies of the messages. Reconstructing the headers in
the exact order in which they appeared may not be possible unless some
information is left in the database.

This would be a non-issue if the message headers were static. But
headers often need to be changed -- the drafts mailbox is a perfect
example. If the headers are left in the body, and Gnus needs to change
them, then the entire message has to be rewritten -- not just the
header in question.

The only way around that re-writing would be to remove the
stored-separately headers and replace them with some sort of tags that
Gnus will replace with the actual headers when the message is fed into
a buffer. This would speed writes, as the body would not need to be
touched, but make reads ugly and replete with string-replacements, to
say nothing of the additional difficulty in writing read-only clients
(like web-based readers.)

What do you think of this pseudo-model, using the
leave-headers-in-body option?

_Messages_           _Headers_  /* but only interesting headers */
 Body TEXT            Header TEXT                                
 Serial INTEGER--|    Contents TEXT                              
 From TEXT       |----Message INTEGER                            
 MsgID TEXT
 Rec'd TIMESTAMP   
 Date TIMESTAMP
 Mailbox TEXT /*all of the nnsql: mailboxes could be 
                   accommodated in a single table*/

--Joe







[-- Attachment #2: Type: application/pgp-signature, Size: 268 bytes --]

  reply	other threads:[~2002-05-04  0:37 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-04-27  1:19 Joseph Barillari
2002-04-27 14:28 ` Kristoffer Gleditsch
2002-05-04  0:37   ` Joseph Barillari [this message]
2002-05-13 23:34     ` Kristoffer Gleditsch
2002-05-27 19:58       ` Joseph Barillari

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m3bsbw7nor.fsf@washer.barillari.org \
    --to=jbarilla@princeton.edu \
    --cc=ding@gnus.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).