From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/44651 Path: main.gmane.org!not-for-mail From: Joseph Barillari Newsgroups: gmane.emacs.gnus.general Subject: Re: Performance with large mailboxes and nnsql.el Date: Fri, 03 May 2002 20:37:40 -0400 Sender: owner-ding@hpc.uh.edu Message-ID: References: NNTP-Posting-Host: localhost.gmane.org Mime-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" X-Trace: main.gmane.org 1020472751 8932 127.0.0.1 (4 May 2002 00:39:11 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Sat, 4 May 2002 00:39:11 +0000 (UTC) Cc: ding@gnus.org Return-path: Original-Received: from malifon.math.uh.edu ([129.7.128.13]) by main.gmane.org with esmtp (Exim 3.33 #1 (Debian)) id 173nZv-0002Jq-00 for ; Sat, 04 May 2002 02:39:11 +0200 Original-Received: from sina.hpc.uh.edu ([129.7.128.10] ident=lists) by malifon.math.uh.edu with esmtp (Exim 3.20 #1) id 173nYv-0006xw-00; Fri, 03 May 2002 19:38:09 -0500 Original-Received: by sina.hpc.uh.edu (TLB v0.09a (1.20 tibbs 1996/10/09 22:03:07)); Fri, 03 May 2002 19:38:24 -0500 (CDT) Original-Received: from sclp3.sclp.com (qmailr@sclp3.sclp.com [209.196.61.66]) by sina.hpc.uh.edu (8.9.3/8.9.3) with SMTP id TAA23187 for ; Fri, 3 May 2002 19:38:10 -0500 (CDT) Original-Received: (qmail 13227 invoked by alias); 4 May 2002 00:37:50 -0000 Original-Received: (qmail 13222 invoked from network); 4 May 2002 00:37:49 -0000 Original-Received: from jbarilla.student.princeton.edu (@140.180.137.120) by gnus.org with SMTP; 4 May 2002 00:37:49 -0000 Original-Received: (from jbarilla@localhost) by jbarilla.student.princeton.edu (8.11.6/8.11.6) id g440bjo14760; Fri, 3 May 2002 20:37:45 -0400 Original-To: Kristoffer Gleditsch X-Public-Key: finger -l jbarilla@phoenix.princeton.edu X-Public-Key-Fingerprint: 99C7 4F49 AF41 AD0F A4FC 529C 215E 1BD2 F6A1 FA37 X-URL: http://www.princeton.edu/~jbarilla In-Reply-To: (Kristoffer Gleditsch's message of "Sat, 27 Apr 2002 16:28:43 +0200") Original-Lines: 87 User-Agent: Gnus/5.090007 (Oort Gnus v0.07) Emacs/21.2.50 (i686-pc-linux-gnu) Precedence: list X-Majordomo: 1.94.jlt7 Xref: main.gmane.org gmane.emacs.gnus.general:44651 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:44651 --=-=-= Content-Transfer-Encoding: quoted-printable >>>>> "KG" =3D=3D Kristoffer Gleditsch writes: > [ Joseph Barillari ] >> Out of curiosity, is anyone working on nnsql.el? Or are there >> any fragments of a half-finished nnsql.el lying around with >> which I could begin work? > I did some work on such a backend some time ago, using Eric > Marsden's PostgreSQL interface library > (http://www.chez.com/emarsden/downloads/pg.el). It almost gave > me a working summary buffer, but I never got around to doing the > database insertion stuff, so it has been standing still for some > time. If anyone wants to take a look at it, it can be checked > out of anonymous CVS following the instructions at > http://www.ping.uio.no/anoncvs.shtml (just use 'nnsql' instead > of 'frotz'.) I noticed in your code: /* We store each line as a separate database entry. I don't know if that is a good idea or not. Probably not. */ I'd agree. One cannot perform a regular expression search on the message body quite so easily that way (Postgres has such a feature).=20 If splitting the bodies into individual lines is a bad idea, presumably each body should be kept intact in a TEXT field. What do you think of splitting headers? I assume splitting the most-often-searched-upon headers (To:, Cc:, Subject:, etc.) into separate fields would be a good idea -- it would aid searching, make it easier to write other clients (web-based clients, for instance), and it would let the underlying database build indexes.=20 Most headers (except From:) can and often do consist of a comma-delimited list of entries. Splitting these up further along the comma-delimitations, and inserting them into a separate table is probably a good idea, once again, as it can improve indexing performance. What to do with the message body is another matter. If one stores the headers separately, should one also remove them from the body text? Leaving them in the body would be a Bad Idea in the relational-database frame of thinking -- data should never be duplicated. But stripping them out means that the database will not be storing exact copies of the messages. Reconstructing the headers in the exact order in which they appeared may not be possible unless some information is left in the database. This would be a non-issue if the message headers were static. But headers often need to be changed -- the drafts mailbox is a perfect example. If the headers are left in the body, and Gnus needs to change them, then the entire message has to be rewritten -- not just the header in question. The only way around that re-writing would be to remove the stored-separately headers and replace them with some sort of tags that Gnus will replace with the actual headers when the message is fed into a buffer. This would speed writes, as the body would not need to be touched, but make reads ugly and replete with string-replacements, to say nothing of the additional difficulty in writing read-only clients (like web-based readers.) What do you think of this pseudo-model, using the leave-headers-in-body option? _Messages_ _Headers_ /* but only interesting headers */ Body TEXT Header TEXT=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 Serial INTEGER--| Contents TEXT=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 From TEXT |----Message INTEGER=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 MsgID TEXT Rec'd TIMESTAMP=20=20=20 Date TIMESTAMP Mailbox TEXT /*all of the nnsql: mailboxes could be=20 accommodated in a single table*/ =2D-Joe --=-=-= Content-Type: application/pgp-signature -----BEGIN PGP MESSAGE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: Processed by Mailcrypt 3.5.6 and Gnu Privacy Guard iD8DBQE80y1XIV4b0vah+jcRAgZZAJwLdZQwTJ+dxyQ6bmTQfJZVEQjeegCeITzE xFFwy+f6JbibR5sgJmq2vOY= =vsaz -----END PGP MESSAGE----- --=-=-=--