From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/66608 Path: news.gmane.org!not-for-mail From: Gareth McCaughan Newsgroups: gmane.emacs.gnus.general Subject: Faster NOV braiding for large newsgroups with many cached articles Date: Sun, 30 Mar 2008 21:25:01 +0100 Organization: International Pedant Conspiracy Message-ID: <200803302125.02240.gareth.mccaughan@pobox.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1206908815 14707 80.91.229.12 (30 Mar 2008 20:26:55 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 30 Mar 2008 20:26:55 +0000 (UTC) To: ding@gnus.org Original-X-From: ding-owner+M15093@lists.math.uh.edu Sun Mar 30 22:27:26 2008 Return-path: Envelope-to: ding-account@gmane.org Original-Received: from util0.math.uh.edu ([129.7.128.18]) by lo.gmane.org with esmtp (Exim 4.50) id 1Jg47g-0002ht-LL for ding-account@gmane.org; Sun, 30 Mar 2008 22:27:24 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.math.uh.edu) by util0.math.uh.edu with smtp (Exim 4.63) (envelope-from ) id 1Jg466-0005ZY-F5; Sun, 30 Mar 2008 15:25:46 -0500 Original-Received: from mx1.math.uh.edu ([129.7.128.32]) by util0.math.uh.edu with esmtps (TLSv1:AES256-SHA:256) (Exim 4.63) (envelope-from ) id 1Jg465-0005ZO-2B for ding@lists.math.uh.edu; Sun, 30 Mar 2008 15:25:45 -0500 Original-Received: from quimby.gnus.org ([80.91.231.51]) by mx1.math.uh.edu with esmtp (Exim 4.67) (envelope-from ) id 1Jg45y-0003V4-AO for ding@lists.math.uh.edu; Sun, 30 Mar 2008 15:25:44 -0500 Original-Received: from smtp802.mail.ird.yahoo.com ([217.146.188.62]) by quimby.gnus.org with smtp (Exim 3.35 #1 (Debian)) id 1Jg467-0003xb-00 for ; Sun, 30 Mar 2008 22:25:47 +0200 Original-Received: (qmail 77853 invoked from network); 30 Mar 2008 20:25:04 -0000 Original-Received: from unknown (HELO g.local) (gareth.mccaughan@btinternet.com@86.129.240.179 with plain) by smtp802.mail.ird.yahoo.com with SMTP; 30 Mar 2008 20:25:04 -0000 X-YMail-OSG: O9mkzdEVM1l.d4a.0gZIy0NFW6oHZ1JP8ESBDk2xnqsBvQKmywUsdCLtaS6bHPis0DziUw7A0vzoSJAfpdPqeueFwL.4ZHOAnaoaQ7APR1L5BwFb X-Yahoo-Newman-Property: ymail-3 Original-Received: from localhost.local ([127.0.0.1] helo=g) by g.local with esmtp (Exim 4.68 (FreeBSD)) (envelope-from ) id 1Jg45P-000FGk-3U for ding@gnus.org; Sun, 30 Mar 2008 21:25:03 +0100 User-Agent: KMail/1.9.7 Content-Disposition: inline X-Spam-Score: -1.7 (-) List-ID: Precedence: bulk Xref: news.gmane.org gmane.emacs.gnus.general:66608 Archived-At: (My apologies if this arrives twice. It looks like ding@gnus.org silently drops messages from non-subscribers on the floor. Fair enough in these spammy days, but might I suggest adding a note on the Resources page of gnus.org saying so?) I read one newsgroup for which my (local, leafnode) server has approximately 170k articles and my Gnus cache contains approximately 20k articles. It turns out that in this mildly pathological situation Gnus behaves mildly pathologically. Specifically, gnus-cache-braid-nov takes several minutes to run, and much of this appears to be because all the insertions in the nntp-server-buffer are kinda slow. By building up the new buffer contents in a list of strings, assembling them into a single string, and then dumping that into the buffer where it belongs in a single operation, I can (on my machine, on one occasion -- I haven't tested this scientifically) speed up gnus-cache-braid-nov by a factor of about 20; 30 seconds instead of 10 minutes. (Note: measured under conditions of moderate load; don't take the numbers too seriously.) In principle this is more wasteful of memory than the old g-c-b-n, because there may be three copies of the new data sitting around (the possibly-short strings, the single concatenated string, and the new buffer contents). On the other hand, growing a large buffer in small steps probably incurs some wastage due to fragmentation, and for me at least the tradeoff is a (very) clear win. In non-pathological situations, the original g-c-b-n is faster than my version, but it doesn't matter because both are fast enough for the user not to care. Here is my version of g-c-b-n. I've given no thought at all to multibyte issues; it may be that I should be counting bytes rather than characters, or something. Perhaps the final concatenation could be done with (apply 'concatenate (nreverse new-records)) but I worry about hitting implementation limits on the number of arguments to CONCATENATE. (defun gnus-cache-braid-nov (group cached &optional file) (message "Merging cached articles with ones on server...") (let ((cache-buf (gnus-get-buffer-create " *gnus-cache*")) (new-records nil) beg end server-cursor) (gnus-cache-save-buffers) ;; create new buffer for reading cache overview (save-excursion (set-buffer cache-buf) (erase-buffer) (let ((coding-system-for-read gnus-cache-overview-coding-system)) (insert-file-contents (or file (gnus-cache-file-name group ".overview")))) (goto-char (point-min)) (insert "\n") ; so we can search for, e.g., \n123\t (goto-char (point-min))) (set-buffer nntp-server-buffer) (goto-char (point-min)) (setq server-cursor (point)) (while cached (set-buffer nntp-server-buffer) ;; skip server records preceding first cached article (while (and (not (eobp)) (< (read (current-buffer)) (car cached))) (forward-line 1)) (beginning-of-line) ;; grab those records for the new buffer (let ((new-server-cursor (point))) (when (> new-server-cursor server-cursor) (push (buffer-substring server-cursor new-server-cursor) new-records) (setq server-cursor new-server-cursor))) ;; grab first cached article, if present (set-buffer cache-buf) (if (search-forward (concat "\n" (int-to-string (car cached)) "\t") nil t) (setq beg (gnus-point-at-bol) end (progn (end-of-line) (point))) (setq beg nil)) ;; grab that article's data for new buffer (when beg (push (buffer-substring beg end) new-records) (push "\n" new-records)) (setq cached (cdr cached))) ;; we're finished with the cache overview now (kill-buffer cache-buf) ;; grab any remaining stuff from old server buffer for new one (set-buffer nntp-server-buffer) (let ((new-server-cursor (point-max))) (when (> new-server-cursor server-cursor) (push (buffer-substring server-cursor new-server-cursor) new-records))) ;; reverse chunks and concatenate (let ((n 0) (records new-records)) (while records (incf n (length (car records))) (setq records (cdr records))) (let ((new-content (make-string n ?.))) (setq n 0) (setq records (nreverse new-records)) (setf new-records nil) ; help the GC a little (while records (store-substring new-content n (car records)) (incf n (length (car records))) (setq records (cdr records))) (set-buffer nntp-server-buffer) (erase-buffer) (insert new-content))) )) It's possible that gnus-cache-braid-heads could benefit from some similar sort of treatment; I haven't looked. I also tried a version of this that accumulated the new buffer contents in a new buffer (so that insertions were always at the end). That was (in my pathological case) 2-3 times faster than the old version of g-c-b-n and therefore on the order of 10 times slower than the one above. -- g