Gnus development mailing list
 help / color / mirror / Atom feed
From: James Leifer <James.Leifer@inria.fr>
Subject: improving nnmaildir performance for constant messages (long)?
Date: Sat, 06 Dec 2003 19:31:17 +0100	[thread overview]
Message-ID: <r77smjxer2i.fsf@muscadet.inria.fr> (raw)

I've been doing some experiments with nnmaildir on a test group with
10,000 messages.  Current setup: latest cvs gnus, emacs 21.2.5, 2.5
Ghz CPU, 512MB, Debian woody, ext3.

Everything is working smoothly, though a bit slowly in comparison to
an equivalent 10,000 message nnmh group.  Since adding this maildir it
takes about 17 seconds longer to start gnus, and substantially longer
than nnmh to rescan (M-g) the last 200 messages (5 seconds vs 1
second) and the last 5000 (13 seconds vs 8 seconds) in a group.  (All
my figures are wristwatch calculated, so not so accurate, and repeated
a few times to try to avoid cache vs non-cache distortions.)

In order to try to understand what was going on I ran emacs under
strace.  It seems that nnmaildir is stat-ing a whole lot of files in
order to be really safe.  E.g. when gnus starts up,

* for every uniq (maildir file name) there is a stat of
  foo/.nnmaildir/marks/read/uniq.

* every foo/cur/uniq:2, is stated

* every foo/.nnmaildir/nov/uniq is stated twice and then opened

When gnus enters a nnmaildir group

* for every uniq that needs to be displayed, foo/cur/uniq:2, is stated
  and foo/.nnmaildir/nov/uniq is stated

It seems that to stat many files takes much longer than to simply get
the list of file names (since the latter just reads the directory as a
file while the former has to look up the file's inode I believe).

So why all this stating?  Well I assume that Paul Jarc designed
nnmaildir to be robust in the face of messages or nov data getting
modified.  I'm wondering, therefore, how much faster we could make
nnmaildir if we were willing to adopt the following principle:

* Once a message is added (foo/cur/uniq:2,flags) its contents are
  _constant_, i.e. never modified.

(If one wants to ``modify'' it then one has to rename it to a new
unique name.)

As a result, foo/.nnmaildir/nov/uniq will always be correct as long as
it too is unmodified.

Given that I never knowingly modify any message, I would be happy to
accept this principle in return for faster performance: I believe that
most of the stat-ing could be then eliminated or replaced by simply
listing directories.

Thoughts?

-James



             reply	other threads:[~2003-12-06 18:31 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-12-06 18:31 James Leifer [this message]
2003-12-07  1:56 ` Adam Sjøgren
2003-12-09 19:40   ` James Leifer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=r77smjxer2i.fsf@muscadet.inria.fr \
    --to=james.leifer@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).