From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/69908 Path: news.gmane.org!not-for-mail From: Ted Zlatanov Newsgroups: gmane.emacs.devel,gmane.emacs.gnus.general Subject: Re: [PATCH 2/2] nnmaildir: Use a 'num' file, instead of a directory Date: Thu, 12 Aug 2010 19:15:24 -0500 Organization: =?utf-8?B?0KLQtdC+0LTQvtGAINCX0LvQsNGC0LDQvdC+0LI=?= @ Cienfuegos Message-ID: <87mxsrfjvn.fsf@lifelogs.com> References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: dough.gmane.org 1281658550 3152 80.91.229.12 (13 Aug 2010 00:15:50 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Fri, 13 Aug 2010 00:15:50 +0000 (UTC) Cc: ding@gnus.org To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Aug 13 02:15:48 2010 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Ojhw4-000427-Bl for ged-emacs-devel@m.gmane.org; Fri, 13 Aug 2010 02:15:48 +0200 Original-Received: from localhost ([127.0.0.1]:37409 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Ojhw3-0001aM-Ho for ged-emacs-devel@m.gmane.org; Thu, 12 Aug 2010 20:15:47 -0400 Original-Received: from [140.186.70.92] (port=33307 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Ojhvw-0001XS-Nj for emacs-devel@gnu.org; Thu, 12 Aug 2010 20:15:42 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1Ojhvv-0000CF-FO for emacs-devel@gnu.org; Thu, 12 Aug 2010 20:15:40 -0400 Original-Received: from lo.gmane.org ([80.91.229.12]:42263) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Ojhvv-0000C2-3s for emacs-devel@gnu.org; Thu, 12 Aug 2010 20:15:39 -0400 Original-Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1Ojhvr-0003yv-2U for emacs-devel@gnu.org; Fri, 13 Aug 2010 02:15:35 +0200 Original-Received: from 38.98.147.130 ([38.98.147.130]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 13 Aug 2010 02:15:35 +0200 Original-Received: from tzz by 38.98.147.130 with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 13 Aug 2010 02:15:35 +0200 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 63 Original-X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: 38.98.147.130 X-Face: bd.DQ~'29fIs`T_%O%C\g%6jW)yi[zuz6; d4V0`@y-~$#3P_Ng{@m+e4o<4P'#(_GJQ%TT= D}[Ep*b!\e,fBZ'j_+#"Ps?s2!4H2-Y"sx" User-Agent: Gnus/5.110011 (No Gnus v0.11) Emacs/24.0.50 (gnu/linux) Cancel-Lock: sha1:G5O2BYJdIXpE8DY9edk78Q8htC8= X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:128590 gmane.emacs.gnus.general:69908 Archived-At: On Sat, 26 Jun 2010 13:33:42 -0400 prj@po.cwru.edu (Paul Jarc) wrote: >> I have recently experienced slow nnmaildir performance PJ> I suspect the performance problem is with the nov/ directory, rather PJ> than num/. With the current nov/ structure, it's quick and easy to PJ> map from filenames to article numbers, but I think we usually need the PJ> inverse operation. As it is, to map from an article number to a PJ> filename, we need to read the contents of all the nov/ files. That's PJ> done just once and the results are cached, so it's not too horrific, PJ> but we still need to check timestamps to see if the files have PJ> changed, and it takes a lot of memory for large groups. PJ> It's been a while since I looked at it, though--there may be some PJ> operations where we do need to go from the filename to the number. So PJ> then it might be useful to add hard links so each nov/ file could be PJ> accessed by either its article number or filename. The filename would PJ> also have to be added to the contents of those files somehow. Thanks for explaining, Paul. I wanted to respond to you and John carefully so it took me a while. Sorry about that. I looked at the nnmaildir code. Keeping in mind the majority of Gnus users don't need concurrent access to their Maildirs, I have a proposal. Regarding John's patch, I think it's good to avoid creating many extra files. Inodes can be expensive and many filesystems are not good about indexing many files. But it should be a user option called 'make-concurrent for instance (on the nnmaildir backend), not a complete switchover as it is now. But using a `num' file seems superfluous since, if we know concurrent access is not an issue, we can keep a single database. We also don't have to worry about users going back and forth between concurrent and non-concurrent access. If they do, we can complain loudly and maybe provide a slow bidirectional switchover function. Regarding the NOV database in .nnmaildir/nov/MESSAGE-ID, the goal is to map it to the number N that's currently inside that file. Links would also burden the filesystem and are IMO not a good improvement since scanning the directory repeatedly is expensive. I think the current strategy should be kept as is and turned on only if the user asks for concurrency (as above). The non-concurrent alternative should be to keep a single NOV and num database in memory for the active group and flush it to disk as needed. The database can be as simple as one line at the beginning for the version and then just the NOV vectors in order, one per line. Appending is trivial (read last line to get max number, append line) and rewriting the NOV is only necessary when deleting an article. I think this would speed up nnmaildir operations significantly. I'd like to know your opinion since you wrote so much of nnmaildir.el and have experience supporting it. I am certain that for the majority of Gnus users today concurrent access is not an issue based on what I've heard in the Gnus mailing lists over the last 8 years. But do you think the current concurrent system can be improved significantly rather than doing what I propose? Should we look at different storage, maybe SQLite or Berkeley DB or sparse files, for those databases? Is there anything in the Emacs core that can help us (thus the CC to emacs-devel) or can anything be added to Emacs to that end? Thanks for your help Ted