From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Fri, 13 Apr 2007 11:22:51 -0700 From: Lyndon Nerenberg To: Fans of the OS Plan 9 from Bell Labs <9fans@cse.psu.edu> Subject: Re: [9fans] Re: [sources] 20070410: % cat In-Reply-To: Message-ID: <20070413110748.B32425@orthanc.ca> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Topicbox-Message-UUID: 470d158e-ead2-11e9-9d60-3106f5b1d025 > for 250k messages, either solution is poor. deleting the > first of 250k messages requires rewriting the whole mbox. > dealing with 250k directory entries can be painful, too, > as many fs keep directores as arrays. > > if you have that many messages, you might want an index. ;-) I've said it before, and I'll keep saying it until I'm proved wrong: nothing runs faster or scales better than the Cyrus IMAP mail store. MH filesystem layout with per-folder index + header cache. I have deployed mail servers with literally millions of user accounts using this layout, and it just works. These days I don't find large directories to be a problem. A few months ago I did some performance tests on large directory operations (create, unlink, readdir) on Linux and FreeBSD. For both OSes, directory enumeration times were negligible for the tests I ran (in the vicinity of 200K entries). And with everyone caching directory entries these days, the only real difference between directory I/O and file I/O is the locking overhead for directory modifications. (This all assumes local disk. Throw in NFS and everything implodes.) But scalability aside, the real evilness in mbox is the quoting of '^From '. --lyndon