From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/74505 Path: news.gmane.org!not-for-mail From: Dan Christensen Newsgroups: gmane.emacs.gnus.general Subject: Re: Improving Gnus speed Date: Sun, 28 Nov 2010 19:29:33 -0500 Message-ID: <87vd3h55w2.fsf@uwo.ca> References: <87zktemkwl.fsf@uwo.ca> <87vd42mdci.fsf@uwo.ca> <87y68vhj3q.fsf@uwo.ca> <87sjz3h9zo.fsf@uwo.ca> <871v6el86z.fsf@uwo.ca> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: dough.gmane.org 1290990655 17811 80.91.229.12 (29 Nov 2010 00:30:55 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Mon, 29 Nov 2010 00:30:55 +0000 (UTC) To: ding@gnus.org Original-X-From: ding-owner+M22864@lists.math.uh.edu Mon Nov 29 01:30:45 2010 Return-path: Envelope-to: ding-account@gmane.org Original-Received: from util0.math.uh.edu ([129.7.128.18]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1PMrdk-0002IJ-PD for ding-account@gmane.org; Mon, 29 Nov 2010 01:30:45 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.math.uh.edu) by util0.math.uh.edu with smtp (Exim 4.63) (envelope-from ) id 1PMrct-0004s2-OZ; Sun, 28 Nov 2010 18:29:51 -0600 Original-Received: from mx1.math.uh.edu ([129.7.128.32]) by util0.math.uh.edu with esmtps (TLSv1:AES256-SHA:256) (Exim 4.63) (envelope-from ) id 1PMrcr-0004rr-Se for ding@lists.math.uh.edu; Sun, 28 Nov 2010 18:29:49 -0600 Original-Received: from quimby.gnus.org ([80.91.231.51]) by mx1.math.uh.edu with esmtp (Exim 4.72) (envelope-from ) id 1PMrcq-0006pH-Fl for ding@lists.math.uh.edu; Sun, 28 Nov 2010 18:29:49 -0600 Original-Received: from lo.gmane.org ([80.91.229.12]) by quimby.gnus.org with esmtp (Exim 3.36 #1 (Debian)) id 1PMrco-0002Ip-00 for ; Mon, 29 Nov 2010 01:29:46 +0100 Original-Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1PMrcm-000246-V8 for ding@gnus.org; Mon, 29 Nov 2010 01:29:44 +0100 Original-Received: from bas3-london14-1096779322.dsl.bell.ca ([65.95.134.58]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 29 Nov 2010 01:29:44 +0100 Original-Received: from jdc by bas3-london14-1096779322.dsl.bell.ca with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 29 Nov 2010 01:29:44 +0100 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 75 Original-X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: bas3-london14-1096779322.dsl.bell.ca User-Agent: Gnus/5.110011 (No Gnus v0.11) Emacs/23.1.50 (gnu/linux) Cancel-Lock: sha1:HzgeeqfbA++sLFpDj6pmGggMtBc= X-Spam-Score: -1.9 (-) List-ID: Precedence: bulk Xref: news.gmane.org gmane.emacs.gnus.general:74505 Archived-At: Dan Christensen writes: > I think caching mainly makes sense for the functions which require > traversing the thread tree: gnus-thread-sort-by-total-score and > gnus-thread-sort-by-most-recent-number and -date. Are there > any others? If these tucked away their result when called, > I imagine there would be a big savings. Depending on the shape > of the tree, this would reduce sort key computation time from > somewhere between O(n log n) and O(n^2) to O(n). I've done some tests on the threads from a group I have with about 6500 articles. It takes about 1.6s to do the sorting step using gnus-thread-sort-by-most-recent-date, and it turns out that the majority of the time is taken by converting the dates from the string stored in the header to an emacs time, i.e. the function gnus-date-get-time (and what it calls). Even though that function caches its result, I thought that it might be worth caching the most-recent-date results too, since they currently involve traversing all sub-threads of all threads. But it turns out that that would barely make any difference, at least for groups of this size. Here's how I determined this. First, I saved the thread tree in a file, so I could do isolated tests of just the sorting. Here's the total time required: (benchmark-run 1 (gnus-sort-threads threads)) (1.638083 8 0.38992400000000016) [The first number is the time in seconds.] Now I get a fresh copy of threads, and do the latest-date computation on all of the articles, using a maptree function I wrote: (benchmark-run 1 (maptree (lambda (header) (gnus-date-get-time (mail-header-date header))) threads)) (1.31202 4 0.2062689999999998) That's 1.3 of the total 1.6 seconds right there. If I do that again, without resetting threads, the time goes to near zero, which shows how effective the caching of the date-conversion results is: (0.056382999999999996 0 0.0) And now if we do the sort after having pre-computed the date-conversion, it only takes about 0.3s! (benchmark-run 1 (gnus-sort-threads threads)) (0.299614 1 0.0529029999999997) The upshot is that about 1.3s is date conversion for each article, the rest has to do with the latest-date traversal and the actual sorting. In fact, further tests show that the latest-date sub-thread traversal barely adds anything. So there doesn't seem to be much point in adding caching for these recursively defined sort functions. To improve the speed here, what needs to be done is to improve the speed of the date conversion. Moreover, for me, other stages of preparing the summary buffer take significantly longer, so it would be worth investigating these. I should also note that I first did the timings yesterday, and got a total time of 2.5s (reproducibly). Then, 24 hours later, in the same emacs, the total time was almost double that (reproducibly). So I started a fresh emacs, and got the times above (1.6s total). Maybe this has to do with emacs memory management? In any case, the effect means that the time varies by a factor of 3 or more, so tracking this down would be another good way to speed up Gnus. Dan