Gnus development mailing list
 help / color / mirror / Atom feed
From: Reiner Steib <reinersteib+gmane@imap.cc>
To: Gareth McCaughan <gareth.mccaughan@pobox.com>
Cc: ding@gnus.org
Subject: Re: Faster NOV braiding for large newsgroups with many cached articles
Date: Sat, 19 Apr 2008 16:22:10 +0200	[thread overview]
Message-ID: <v97ietrkkt.fsf@marauder.physik.uni-ulm.de> (raw)
In-Reply-To: <200803302125.02240.gareth.mccaughan@pobox.com> (Gareth McCaughan's message of "Sun, 30 Mar 2008 21:25:01 +0100")

[-- Attachment #1: Type: text/plain, Size: 4598 bytes --]

On Sun, Mar 30 2008, Gareth McCaughan wrote:

> I read one newsgroup for which my (local, leafnode) server has approximately
> 170k articles and my Gnus cache contains approximately 20k articles.
> It turns out that in this mildly pathological situation Gnus behaves
> mildly pathologically.
>
> Specifically, gnus-cache-braid-nov takes several minutes to run,
> and much of this appears to be because all the insertions in the
> nntp-server-buffer are kinda slow.
>
> By building up the new buffer contents in a list of strings,
> assembling them into a single string, and then dumping that into
> the buffer where it belongs in a single operation, I can (on my
> machine, on one occasion -- I haven't tested this scientifically)
> speed up gnus-cache-braid-nov by a factor of about 20; 30 seconds
> instead of 10 minutes.
>
>     (Note: measured under conditions of moderate load;
>     don't take the numbers too seriously.)

With which (X)Emacs and Gnus versions?  Did you try other versions as
well?

> In principle this is more wasteful of memory than the old
> g-c-b-n, because there may be three copies of the new data
> sitting around (the possibly-short strings, the single
> concatenated string, and the new buffer contents). On the
> other hand, growing a large buffer in small steps probably
> incurs some wastage due to fragmentation, and for me at least
> the tradeoff is a (very) clear win.
>
> In non-pathological situations, the original g-c-b-n is faster than
> my version, but it doesn't matter because both are fast enough for
> the user not to care.
>
> Here is my version of g-c-b-n. I've given no thought at all
> to multibyte issues; it may be that I should be counting bytes
> rather than characters, or something. Perhaps the final
> concatenation could be done with (apply 'concatenate (nreverse new-records))
> but I worry about hitting implementation limits on the number
> of arguments to CONCATENATE.

It is easier for us if you don't post the modified function.  Instead,
produce a diff (unified diff preferred: "-u") against the version you
use (preferably HEAD revision of the CVS trunk, else tell us the
version). [1]

| --- gnus-cache.el	01 Mar 2008 22:54:54 +0100	6.26.2.13
| +++ gnus-cache.el	19 Apr 2008 16:01:26 +0200	
| @@ -501,10 +501,14 @@
|  	  (setq gnus-cache-active-altered t)))
|        articles)))
|  
| +
|  (defun gnus-cache-braid-nov (group cached &optional file)
| +  (message "Merging cached articles with ones on server...")

Better use `gnus-message' here.

| +    ;; reverse chunks and concatenate
| +    (let ((n 0) (records new-records))
| +      (while records
| +        (incf n (length (car records)))
| +        (setq records (cdr records)))
| +      (let ((new-content (make-string n ?.)))
| +        (setq n 0)
| +        (setq records (nreverse new-records))
| +        (setf new-records nil) ; help the GC a little

Please explain why you use `setf' and why GC need help.

| +        (while records
| +          (store-substring new-content n (car records))
| +          (incf n (length (car records)))
| +          (setq records (cdr records)))
| +        (set-buffer nntp-server-buffer)
| +        (erase-buffer)
| +        (insert new-content))) ))
`----

> It's possible that gnus-cache-braid-heads could benefit from
> some similar sort of treatment; I haven't looked.
>
> I also tried a version of this that accumulated the new buffer contents
> in a new buffer (so that insertions were always at the end). That was
> (in my pathological case) 2-3 times faster than the old version of g-c-b-n
> and therefore on the order of 10 times slower than the one above.

On Fri, Apr 18 2008, Gareth McCaughan wrote on bugs@gnus.org:
[...]
> I posted a version of gnus-cache-braid-nov that works that way
> to ding@gnus.org. (No replies; fair enough.) 

Sorry, I didn't have time to look at your code.  It would be better to
remind us by following-up to the original message instead of starting
a new thread on a different list.

> It might be better to look at the size of the group and of the cache
> and choose heuristically between the two implementations, so as not
> to pay the memory cost for large groups with few cached articles
> (where I think the speed should be OK with the old implementation,
> though I haven't measured it).

Sounds useful.

> If there's any interest in improving this and my code is useful,
> I am happy to sign whatever papers are necessary. 

I'll send you the form off-list.

Bye, Reiner.
-- 
[1] Inserting your defun into gnus-cache.el on the v5-10 branch (Gnus
    5.10.10) and produding a diff:

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: gnus-cache-McCaughan.patch --]
[-- Type: text/x-patch, Size: 3119 bytes --]

--- gnus-cache.el	01 Mar 2008 22:54:54 +0100	6.26.2.13
+++ gnus-cache.el	19 Apr 2008 16:01:26 +0200	
@@ -501,10 +501,14 @@
 	  (setq gnus-cache-active-altered t)))
       articles)))
 
+
 (defun gnus-cache-braid-nov (group cached &optional file)
+  (message "Merging cached articles with ones on server...")
   (let ((cache-buf (gnus-get-buffer-create " *gnus-cache*"))
-	beg end)
+        (new-records nil)
+	beg end server-cursor)
     (gnus-cache-save-buffers)
+    ;; create new buffer for reading cache overview
     (save-excursion
       (set-buffer cache-buf)
       (erase-buffer)
@@ -513,27 +517,58 @@
 	(insert-file-contents
 	 (or file (gnus-cache-file-name group ".overview"))))
       (goto-char (point-min))
-      (insert "\n")
+      (insert "\n") ; so we can search for, e.g., \n123\t
       (goto-char (point-min)))
     (set-buffer nntp-server-buffer)
     (goto-char (point-min))
+    (setq server-cursor (point))
     (while cached
+      (set-buffer nntp-server-buffer)
+      ;; skip server records preceding first cached article
       (while (and (not (eobp))
 		  (< (read (current-buffer)) (car cached)))
 	(forward-line 1))
       (beginning-of-line)
+      ;; grab those records for the new buffer
+      (let ((new-server-cursor (point)))
+        (when (> new-server-cursor server-cursor)
+          (push (buffer-substring server-cursor new-server-cursor) new-records)
+          (setq server-cursor new-server-cursor)))
+      ;; grab first cached article, if present
       (set-buffer cache-buf)
       (if (search-forward (concat "\n" (int-to-string (car cached)) "\t")
 			  nil t)
 	  (setq beg (gnus-point-at-bol)
 		end (progn (end-of-line) (point)))
 	(setq beg nil))
-      (set-buffer nntp-server-buffer)
+      ;; grab that article's data for new buffer
       (when beg
-	(insert-buffer-substring cache-buf beg end)
-	(insert "\n"))
+        (push (buffer-substring beg end) new-records)
+        (push "\n" new-records))
       (setq cached (cdr cached)))
-    (kill-buffer cache-buf)))
+    ;; we're finished with the cache overview now
+    (kill-buffer cache-buf)
+    ;; grab any remaining stuff from old server buffer for new one
+    (set-buffer nntp-server-buffer)
+    (let ((new-server-cursor (point-max)))
+      (when (> new-server-cursor server-cursor)
+        (push (buffer-substring server-cursor new-server-cursor) new-records)))
+    ;; reverse chunks and concatenate
+    (let ((n 0) (records new-records))
+      (while records
+        (incf n (length (car records)))
+        (setq records (cdr records)))
+      (let ((new-content (make-string n ?.)))
+        (setq n 0)
+        (setq records (nreverse new-records))
+        (setf new-records nil) ; help the GC a little
+        (while records
+          (store-substring new-content n (car records))
+          (incf n (length (car records)))
+          (setq records (cdr records)))
+        (set-buffer nntp-server-buffer)
+        (erase-buffer)
+        (insert new-content))) ))
 
 (defun gnus-cache-braid-heads (group cached)
   (let ((cache-buf (gnus-get-buffer-create " *gnus-cache*")))

  parent reply	other threads:[~2008-04-19 14:22 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-30 20:25 Gareth McCaughan
2008-03-31 13:31 ` Jason L Tibbitts III
2008-03-31 17:15   ` Gareth McCaughan
2008-04-12  9:03   ` Gaute Strokkenes
2008-04-12 21:25     ` Gareth McCaughan
2008-04-19 14:22 ` Reiner Steib [this message]
2008-04-19 20:33   ` Gareth McCaughan
  -- strict thread matches above, loose matches on Subject: below --
2008-03-30  2:21 Gareth McCaughan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=v97ietrkkt.fsf@marauder.physik.uni-ulm.de \
    --to=reinersteib+gmane@imap.cc \
    --cc=Reiner.Steib@gmx.de \
    --cc=ding@gnus.org \
    --cc=gareth.mccaughan@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).