From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/66848 Path: news.gmane.org!not-for-mail From: Reiner Steib Newsgroups: gmane.emacs.gnus.general Subject: Re: Faster NOV braiding for large newsgroups with many cached articles Date: Sat, 19 Apr 2008 16:22:10 +0200 Message-ID: References: <200803302125.02240.gareth.mccaughan@pobox.com> Reply-To: Reiner Steib NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Trace: ger.gmane.org 1208627523 18585 80.91.229.12 (19 Apr 2008 17:52:03 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 19 Apr 2008 17:52:03 +0000 (UTC) Cc: ding@gnus.org To: Gareth McCaughan Original-X-From: ding-owner+M15330@lists.math.uh.edu Sat Apr 19 19:52:36 2008 connect(): Connection refused Return-path: Envelope-to: ding-account@gmane.org Original-Received: from util0.math.uh.edu ([129.7.128.18]) by lo.gmane.org with esmtp (Exim 4.50) id 1JnDz5-0002SC-1v for ding-account@gmane.org; Sat, 19 Apr 2008 16:24:07 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.math.uh.edu) by util0.math.uh.edu with smtp (Exim 4.63) (envelope-from ) id 1JnDxo-00060L-9k; Sat, 19 Apr 2008 09:22:48 -0500 Original-Received: from mx2.math.uh.edu ([129.7.128.33]) by util0.math.uh.edu with esmtps (TLSv1:AES256-SHA:256) (Exim 4.63) (envelope-from ) id 1JnDxm-000605-0p for ding@lists.math.uh.edu; Sat, 19 Apr 2008 09:22:46 -0500 Original-Received: from quimby.gnus.org ([80.91.231.51]) by mx2.math.uh.edu with esmtp (Exim 4.67) (envelope-from ) id 1JnDxf-0006RG-37 for ding@lists.math.uh.edu; Sat, 19 Apr 2008 09:22:45 -0500 Original-Received: from mail.uni-ulm.de ([134.60.1.11]) by quimby.gnus.org with esmtp (Exim 3.35 #1 (Debian)) id 1JnDxq-0007ww-00 for ; Sat, 19 Apr 2008 16:22:50 +0200 Original-Received: from bridgekeeper.physik.uni-ulm.de (bridgekeeper.physik.uni-ulm.de [134.60.41.37]) by mail.uni-ulm.de (8.14.2/8.14.2) with ESMTP id m3JEMZT4001504; Sat, 19 Apr 2008 16:22:35 +0200 (MEST) Original-Received: from localhost (bridgekeeper.physik.uni-ulm.de [134.60.41.37]) by bridgekeeper.physik.uni-ulm.de (Postfix) with ESMTP id 3726712FF0; Sat, 19 Apr 2008 16:22:35 +0200 (CEST) X-Face: 3Phac&+dw=IZHjhua]bp}LH<*p{qzj8u+ Precedence: bulk Xref: news.gmane.org gmane.emacs.gnus.general:66848 Archived-At: --=-=-= On Sun, Mar 30 2008, Gareth McCaughan wrote: > I read one newsgroup for which my (local, leafnode) server has approximately > 170k articles and my Gnus cache contains approximately 20k articles. > It turns out that in this mildly pathological situation Gnus behaves > mildly pathologically. > > Specifically, gnus-cache-braid-nov takes several minutes to run, > and much of this appears to be because all the insertions in the > nntp-server-buffer are kinda slow. > > By building up the new buffer contents in a list of strings, > assembling them into a single string, and then dumping that into > the buffer where it belongs in a single operation, I can (on my > machine, on one occasion -- I haven't tested this scientifically) > speed up gnus-cache-braid-nov by a factor of about 20; 30 seconds > instead of 10 minutes. > > (Note: measured under conditions of moderate load; > don't take the numbers too seriously.) With which (X)Emacs and Gnus versions? Did you try other versions as well? > In principle this is more wasteful of memory than the old > g-c-b-n, because there may be three copies of the new data > sitting around (the possibly-short strings, the single > concatenated string, and the new buffer contents). On the > other hand, growing a large buffer in small steps probably > incurs some wastage due to fragmentation, and for me at least > the tradeoff is a (very) clear win. > > In non-pathological situations, the original g-c-b-n is faster than > my version, but it doesn't matter because both are fast enough for > the user not to care. > > Here is my version of g-c-b-n. I've given no thought at all > to multibyte issues; it may be that I should be counting bytes > rather than characters, or something. Perhaps the final > concatenation could be done with (apply 'concatenate (nreverse new-records)) > but I worry about hitting implementation limits on the number > of arguments to CONCATENATE. It is easier for us if you don't post the modified function. Instead, produce a diff (unified diff preferred: "-u") against the version you use (preferably HEAD revision of the CVS trunk, else tell us the version). [1] | --- gnus-cache.el 01 Mar 2008 22:54:54 +0100 6.26.2.13 | +++ gnus-cache.el 19 Apr 2008 16:01:26 +0200 | @@ -501,10 +501,14 @@ | (setq gnus-cache-active-altered t))) | articles))) | | + | (defun gnus-cache-braid-nov (group cached &optional file) | + (message "Merging cached articles with ones on server...") Better use `gnus-message' here. | + ;; reverse chunks and concatenate | + (let ((n 0) (records new-records)) | + (while records | + (incf n (length (car records))) | + (setq records (cdr records))) | + (let ((new-content (make-string n ?.))) | + (setq n 0) | + (setq records (nreverse new-records)) | + (setf new-records nil) ; help the GC a little Please explain why you use `setf' and why GC need help. | + (while records | + (store-substring new-content n (car records)) | + (incf n (length (car records))) | + (setq records (cdr records))) | + (set-buffer nntp-server-buffer) | + (erase-buffer) | + (insert new-content))) )) `---- > It's possible that gnus-cache-braid-heads could benefit from > some similar sort of treatment; I haven't looked. > > I also tried a version of this that accumulated the new buffer contents > in a new buffer (so that insertions were always at the end). That was > (in my pathological case) 2-3 times faster than the old version of g-c-b-n > and therefore on the order of 10 times slower than the one above. On Fri, Apr 18 2008, Gareth McCaughan wrote on bugs@gnus.org: [...] > I posted a version of gnus-cache-braid-nov that works that way > to ding@gnus.org. (No replies; fair enough.) Sorry, I didn't have time to look at your code. It would be better to remind us by following-up to the original message instead of starting a new thread on a different list. > It might be better to look at the size of the group and of the cache > and choose heuristically between the two implementations, so as not > to pay the memory cost for large groups with few cached articles > (where I think the speed should be OK with the old implementation, > though I haven't measured it). Sounds useful. > If there's any interest in improving this and my code is useful, > I am happy to sign whatever papers are necessary. I'll send you the form off-list. Bye, Reiner. -- [1] Inserting your defun into gnus-cache.el on the v5-10 branch (Gnus 5.10.10) and produding a diff: --=-=-= Content-Type: text/x-patch Content-Disposition: inline; filename=gnus-cache-McCaughan.patch --- gnus-cache.el 01 Mar 2008 22:54:54 +0100 6.26.2.13 +++ gnus-cache.el 19 Apr 2008 16:01:26 +0200 @@ -501,10 +501,14 @@ (setq gnus-cache-active-altered t))) articles))) + (defun gnus-cache-braid-nov (group cached &optional file) + (message "Merging cached articles with ones on server...") (let ((cache-buf (gnus-get-buffer-create " *gnus-cache*")) - beg end) + (new-records nil) + beg end server-cursor) (gnus-cache-save-buffers) + ;; create new buffer for reading cache overview (save-excursion (set-buffer cache-buf) (erase-buffer) @@ -513,27 +517,58 @@ (insert-file-contents (or file (gnus-cache-file-name group ".overview")))) (goto-char (point-min)) - (insert "\n") + (insert "\n") ; so we can search for, e.g., \n123\t (goto-char (point-min))) (set-buffer nntp-server-buffer) (goto-char (point-min)) + (setq server-cursor (point)) (while cached + (set-buffer nntp-server-buffer) + ;; skip server records preceding first cached article (while (and (not (eobp)) (< (read (current-buffer)) (car cached))) (forward-line 1)) (beginning-of-line) + ;; grab those records for the new buffer + (let ((new-server-cursor (point))) + (when (> new-server-cursor server-cursor) + (push (buffer-substring server-cursor new-server-cursor) new-records) + (setq server-cursor new-server-cursor))) + ;; grab first cached article, if present (set-buffer cache-buf) (if (search-forward (concat "\n" (int-to-string (car cached)) "\t") nil t) (setq beg (gnus-point-at-bol) end (progn (end-of-line) (point))) (setq beg nil)) - (set-buffer nntp-server-buffer) + ;; grab that article's data for new buffer (when beg - (insert-buffer-substring cache-buf beg end) - (insert "\n")) + (push (buffer-substring beg end) new-records) + (push "\n" new-records)) (setq cached (cdr cached))) - (kill-buffer cache-buf))) + ;; we're finished with the cache overview now + (kill-buffer cache-buf) + ;; grab any remaining stuff from old server buffer for new one + (set-buffer nntp-server-buffer) + (let ((new-server-cursor (point-max))) + (when (> new-server-cursor server-cursor) + (push (buffer-substring server-cursor new-server-cursor) new-records))) + ;; reverse chunks and concatenate + (let ((n 0) (records new-records)) + (while records + (incf n (length (car records))) + (setq records (cdr records))) + (let ((new-content (make-string n ?.))) + (setq n 0) + (setq records (nreverse new-records)) + (setf new-records nil) ; help the GC a little + (while records + (store-substring new-content n (car records)) + (incf n (length (car records))) + (setq records (cdr records))) + (set-buffer nntp-server-buffer) + (erase-buffer) + (insert new-content))) )) (defun gnus-cache-braid-heads (group cached) (let ((cache-buf (gnus-get-buffer-create " *gnus-cache*"))) --=-=-=--