From: Reiner Steib <reinersteib+gmane@imap.cc>
To: Gareth McCaughan <gareth.mccaughan@pobox.com>
Cc: ding@gnus.org
Subject: Re: Faster NOV braiding for large newsgroups with many cached articles
Date: Sat, 19 Apr 2008 16:22:10 +0200 [thread overview]
Message-ID: <v97ietrkkt.fsf@marauder.physik.uni-ulm.de> (raw)
In-Reply-To: <200803302125.02240.gareth.mccaughan@pobox.com> (Gareth McCaughan's message of "Sun, 30 Mar 2008 21:25:01 +0100")
[-- Attachment #1: Type: text/plain, Size: 4598 bytes --]
On Sun, Mar 30 2008, Gareth McCaughan wrote:
> I read one newsgroup for which my (local, leafnode) server has approximately
> 170k articles and my Gnus cache contains approximately 20k articles.
> It turns out that in this mildly pathological situation Gnus behaves
> mildly pathologically.
>
> Specifically, gnus-cache-braid-nov takes several minutes to run,
> and much of this appears to be because all the insertions in the
> nntp-server-buffer are kinda slow.
>
> By building up the new buffer contents in a list of strings,
> assembling them into a single string, and then dumping that into
> the buffer where it belongs in a single operation, I can (on my
> machine, on one occasion -- I haven't tested this scientifically)
> speed up gnus-cache-braid-nov by a factor of about 20; 30 seconds
> instead of 10 minutes.
>
> (Note: measured under conditions of moderate load;
> don't take the numbers too seriously.)
With which (X)Emacs and Gnus versions? Did you try other versions as
well?
> In principle this is more wasteful of memory than the old
> g-c-b-n, because there may be three copies of the new data
> sitting around (the possibly-short strings, the single
> concatenated string, and the new buffer contents). On the
> other hand, growing a large buffer in small steps probably
> incurs some wastage due to fragmentation, and for me at least
> the tradeoff is a (very) clear win.
>
> In non-pathological situations, the original g-c-b-n is faster than
> my version, but it doesn't matter because both are fast enough for
> the user not to care.
>
> Here is my version of g-c-b-n. I've given no thought at all
> to multibyte issues; it may be that I should be counting bytes
> rather than characters, or something. Perhaps the final
> concatenation could be done with (apply 'concatenate (nreverse new-records))
> but I worry about hitting implementation limits on the number
> of arguments to CONCATENATE.
It is easier for us if you don't post the modified function. Instead,
produce a diff (unified diff preferred: "-u") against the version you
use (preferably HEAD revision of the CVS trunk, else tell us the
version). [1]
| --- gnus-cache.el 01 Mar 2008 22:54:54 +0100 6.26.2.13
| +++ gnus-cache.el 19 Apr 2008 16:01:26 +0200
| @@ -501,10 +501,14 @@
| (setq gnus-cache-active-altered t)))
| articles)))
|
| +
| (defun gnus-cache-braid-nov (group cached &optional file)
| + (message "Merging cached articles with ones on server...")
Better use `gnus-message' here.
| + ;; reverse chunks and concatenate
| + (let ((n 0) (records new-records))
| + (while records
| + (incf n (length (car records)))
| + (setq records (cdr records)))
| + (let ((new-content (make-string n ?.)))
| + (setq n 0)
| + (setq records (nreverse new-records))
| + (setf new-records nil) ; help the GC a little
Please explain why you use `setf' and why GC need help.
| + (while records
| + (store-substring new-content n (car records))
| + (incf n (length (car records)))
| + (setq records (cdr records)))
| + (set-buffer nntp-server-buffer)
| + (erase-buffer)
| + (insert new-content))) ))
`----
> It's possible that gnus-cache-braid-heads could benefit from
> some similar sort of treatment; I haven't looked.
>
> I also tried a version of this that accumulated the new buffer contents
> in a new buffer (so that insertions were always at the end). That was
> (in my pathological case) 2-3 times faster than the old version of g-c-b-n
> and therefore on the order of 10 times slower than the one above.
On Fri, Apr 18 2008, Gareth McCaughan wrote on bugs@gnus.org:
[...]
> I posted a version of gnus-cache-braid-nov that works that way
> to ding@gnus.org. (No replies; fair enough.)
Sorry, I didn't have time to look at your code. It would be better to
remind us by following-up to the original message instead of starting
a new thread on a different list.
> It might be better to look at the size of the group and of the cache
> and choose heuristically between the two implementations, so as not
> to pay the memory cost for large groups with few cached articles
> (where I think the speed should be OK with the old implementation,
> though I haven't measured it).
Sounds useful.
> If there's any interest in improving this and my code is useful,
> I am happy to sign whatever papers are necessary.
I'll send you the form off-list.
Bye, Reiner.
--
[1] Inserting your defun into gnus-cache.el on the v5-10 branch (Gnus
5.10.10) and produding a diff:
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: gnus-cache-McCaughan.patch --]
[-- Type: text/x-patch, Size: 3119 bytes --]
--- gnus-cache.el 01 Mar 2008 22:54:54 +0100 6.26.2.13
+++ gnus-cache.el 19 Apr 2008 16:01:26 +0200
@@ -501,10 +501,14 @@
(setq gnus-cache-active-altered t)))
articles)))
+
(defun gnus-cache-braid-nov (group cached &optional file)
+ (message "Merging cached articles with ones on server...")
(let ((cache-buf (gnus-get-buffer-create " *gnus-cache*"))
- beg end)
+ (new-records nil)
+ beg end server-cursor)
(gnus-cache-save-buffers)
+ ;; create new buffer for reading cache overview
(save-excursion
(set-buffer cache-buf)
(erase-buffer)
@@ -513,27 +517,58 @@
(insert-file-contents
(or file (gnus-cache-file-name group ".overview"))))
(goto-char (point-min))
- (insert "\n")
+ (insert "\n") ; so we can search for, e.g., \n123\t
(goto-char (point-min)))
(set-buffer nntp-server-buffer)
(goto-char (point-min))
+ (setq server-cursor (point))
(while cached
+ (set-buffer nntp-server-buffer)
+ ;; skip server records preceding first cached article
(while (and (not (eobp))
(< (read (current-buffer)) (car cached)))
(forward-line 1))
(beginning-of-line)
+ ;; grab those records for the new buffer
+ (let ((new-server-cursor (point)))
+ (when (> new-server-cursor server-cursor)
+ (push (buffer-substring server-cursor new-server-cursor) new-records)
+ (setq server-cursor new-server-cursor)))
+ ;; grab first cached article, if present
(set-buffer cache-buf)
(if (search-forward (concat "\n" (int-to-string (car cached)) "\t")
nil t)
(setq beg (gnus-point-at-bol)
end (progn (end-of-line) (point)))
(setq beg nil))
- (set-buffer nntp-server-buffer)
+ ;; grab that article's data for new buffer
(when beg
- (insert-buffer-substring cache-buf beg end)
- (insert "\n"))
+ (push (buffer-substring beg end) new-records)
+ (push "\n" new-records))
(setq cached (cdr cached)))
- (kill-buffer cache-buf)))
+ ;; we're finished with the cache overview now
+ (kill-buffer cache-buf)
+ ;; grab any remaining stuff from old server buffer for new one
+ (set-buffer nntp-server-buffer)
+ (let ((new-server-cursor (point-max)))
+ (when (> new-server-cursor server-cursor)
+ (push (buffer-substring server-cursor new-server-cursor) new-records)))
+ ;; reverse chunks and concatenate
+ (let ((n 0) (records new-records))
+ (while records
+ (incf n (length (car records)))
+ (setq records (cdr records)))
+ (let ((new-content (make-string n ?.)))
+ (setq n 0)
+ (setq records (nreverse new-records))
+ (setf new-records nil) ; help the GC a little
+ (while records
+ (store-substring new-content n (car records))
+ (incf n (length (car records)))
+ (setq records (cdr records)))
+ (set-buffer nntp-server-buffer)
+ (erase-buffer)
+ (insert new-content))) ))
(defun gnus-cache-braid-heads (group cached)
(let ((cache-buf (gnus-get-buffer-create " *gnus-cache*")))
next prev parent reply other threads:[~2008-04-19 14:22 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-03-30 20:25 Gareth McCaughan
2008-03-31 13:31 ` Jason L Tibbitts III
2008-03-31 17:15 ` Gareth McCaughan
2008-04-12 9:03 ` Gaute Strokkenes
2008-04-12 21:25 ` Gareth McCaughan
2008-04-19 14:22 ` Reiner Steib [this message]
2008-04-19 20:33 ` Gareth McCaughan
-- strict thread matches above, loose matches on Subject: below --
2008-03-30 2:21 Gareth McCaughan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=v97ietrkkt.fsf@marauder.physik.uni-ulm.de \
--to=reinersteib+gmane@imap.cc \
--cc=Reiner.Steib@gmx.de \
--cc=ding@gnus.org \
--cc=gareth.mccaughan@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).