From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/55114 Path: main.gmane.org!not-for-mail From: James Leifer Newsgroups: gmane.emacs.gnus.general Subject: improving nnmaildir performance for constant messages (long)? Date: Sat, 06 Dec 2003 19:31:17 +0100 Sender: ding-owner@lists.math.uh.edu Message-ID: Reply-To: James Leifer NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1070735517 18471 80.91.224.253 (6 Dec 2003 18:31:57 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Sat, 6 Dec 2003 18:31:57 +0000 (UTC) Original-X-From: ding-owner+M3654@lists.math.uh.edu Sat Dec 06 19:31:51 2003 Return-path: Original-Received: from malifon.math.uh.edu ([129.7.128.13]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1AShDb-00071n-00 for ; Sat, 06 Dec 2003 19:31:51 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.math.uh.edu) by malifon.math.uh.edu with smtp (Exim 3.20 #1) id 1AShDD-0004lT-00; Sat, 06 Dec 2003 12:31:27 -0600 Original-Received: from justine.libertine.org ([66.139.78.221] ident=postfix) by malifon.math.uh.edu with esmtp (Exim 3.20 #1) id 1AShD5-0004lN-00 for ding@lists.math.uh.edu; Sat, 06 Dec 2003 12:31:19 -0600 Original-Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by justine.libertine.org (Postfix) with ESMTP id CF6AF3A007A for ; Sat, 6 Dec 2003 12:31:18 -0600 (CST) Original-Received: from muscadet.inria.fr (muscadet.inria.fr [128.93.8.12]) by nez-perce.inria.fr (8.11.1/8.11.1) with ESMTP id hB6IVH109779 for ; Sat, 6 Dec 2003 19:31:17 +0100 (MET) Original-Received: by muscadet.inria.fr (Postfix, from userid 11404) id CA7E57A5F; Sat, 6 Dec 2003 19:31:17 +0100 (CET) Original-To: ding@gnus.org User-Agent: Gnus/5.1003 (Gnus v5.10.3) Emacs/21.2 (gnu/linux) Precedence: bulk Xref: main.gmane.org gmane.emacs.gnus.general:55114 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:55114 I've been doing some experiments with nnmaildir on a test group with 10,000 messages. Current setup: latest cvs gnus, emacs 21.2.5, 2.5 Ghz CPU, 512MB, Debian woody, ext3. Everything is working smoothly, though a bit slowly in comparison to an equivalent 10,000 message nnmh group. Since adding this maildir it takes about 17 seconds longer to start gnus, and substantially longer than nnmh to rescan (M-g) the last 200 messages (5 seconds vs 1 second) and the last 5000 (13 seconds vs 8 seconds) in a group. (All my figures are wristwatch calculated, so not so accurate, and repeated a few times to try to avoid cache vs non-cache distortions.) In order to try to understand what was going on I ran emacs under strace. It seems that nnmaildir is stat-ing a whole lot of files in order to be really safe. E.g. when gnus starts up, * for every uniq (maildir file name) there is a stat of foo/.nnmaildir/marks/read/uniq. * every foo/cur/uniq:2, is stated * every foo/.nnmaildir/nov/uniq is stated twice and then opened When gnus enters a nnmaildir group * for every uniq that needs to be displayed, foo/cur/uniq:2, is stated and foo/.nnmaildir/nov/uniq is stated It seems that to stat many files takes much longer than to simply get the list of file names (since the latter just reads the directory as a file while the former has to look up the file's inode I believe). So why all this stating? Well I assume that Paul Jarc designed nnmaildir to be robust in the face of messages or nov data getting modified. I'm wondering, therefore, how much faster we could make nnmaildir if we were willing to adopt the following principle: * Once a message is added (foo/cur/uniq:2,flags) its contents are _constant_, i.e. never modified. (If one wants to ``modify'' it then one has to rename it to a new unique name.) As a result, foo/.nnmaildir/nov/uniq will always be correct as long as it too is unmodified. Given that I never knowingly modify any message, I would be happy to accept this principle in return for faster performance: I believe that most of the stat-ing could be then eliminated or replaced by simply listing directories. Thoughts? -James