Gnus development mailing list
 help / color / mirror / Atom feed
* Faster NOV braiding for large newsgroups with many cached articles
@ 2008-03-30  2:21 Gareth McCaughan
  0 siblings, 0 replies; 8+ messages in thread
From: Gareth McCaughan @ 2008-03-30  2:21 UTC (permalink / raw)
  To: ding

(Note: I am not subscribed. If that's considered bad etiquette here,
I hope someone will let me know. I'll be watching the gmane archive
for a little while, so it's not a disaster if replies are sent only
to the list.)

I read one newsgroup for which my (local, leafnode) server has approximately
170k articles and my Gnus cache contains approximately 20k articles.
It turns out that in this mildly pathological situation Gnus behaves
mildly pathologically.

Specifically, gnus-cache-braid-nov takes several minutes to run,
and much of this appears to be because all the insertions in the
nntp-server-buffer are kinda slow.

By building up the new buffer contents in a list of strings,
assembling them into a single string, and then dumping that into
the buffer where it belongs in a single operation, I can (on my
machine, on one occasion -- I haven't tested this scientifically)
speed up gnus-cache-braid-nov by a factor of about 20; 30 seconds
instead of 10 minutes.

    (Note: measured under conditions of moderate load;
    don't take the numbers too seriously.)

In principle this is more wasteful of memory than the old
g-c-b-n, because there may be three copies of the new data
sitting around (the possibly-short strings, the single
concatenated string, and the new buffer contents). On the
other hand, growing a large buffer in small steps probably
incurs some wastage due to fragmentation, and for me at least
the tradeoff is a (very) clear win.

In non-pathological situations, the original g-c-b-n is faster than
my version, but it doesn't matter because both are fast enough for
the user not to care.

Here is my version of g-c-b-n. I've given no thought at all
to multibyte issues; it may be that I should be counting bytes
rather than characters, or something. Perhaps the final
concatenation could be done with (apply 'concatenate (nreverse new-records))
but I worry about hitting implementation limits on the number
of arguments to CONCATENATE.

(defun gnus-cache-braid-nov (group cached &optional file)
  (message "Merging cached articles with ones on server...")
  (let ((cache-buf (gnus-get-buffer-create " *gnus-cache*"))
        (new-records nil)
	beg end server-cursor)
    (gnus-cache-save-buffers)
    ;; create new buffer for reading cache overview
    (save-excursion
      (set-buffer cache-buf)
      (erase-buffer)
      (let ((coding-system-for-read
	     gnus-cache-overview-coding-system))
	(insert-file-contents
	 (or file (gnus-cache-file-name group ".overview"))))
      (goto-char (point-min))
      (insert "\n") ; so we can search for, e.g., \n123\t
      (goto-char (point-min)))
    (set-buffer nntp-server-buffer)
    (goto-char (point-min))
    (setq server-cursor (point))
    (while cached
      (set-buffer nntp-server-buffer)
      ;; skip server records preceding first cached article
      (while (and (not (eobp))
		  (< (read (current-buffer)) (car cached)))
	(forward-line 1))
      (beginning-of-line)
      ;; grab those records for the new buffer
      (let ((new-server-cursor (point)))
        (when (> new-server-cursor server-cursor)
          (push (buffer-substring server-cursor new-server-cursor) new-records)
          (setq server-cursor new-server-cursor)))
      ;; grab first cached article, if present
      (set-buffer cache-buf)
      (if (search-forward (concat "\n" (int-to-string (car cached)) "\t")
			  nil t)
	  (setq beg (gnus-point-at-bol)
		end (progn (end-of-line) (point)))
	(setq beg nil))
      ;; grab that article's data for new buffer
      (when beg
        (push (buffer-substring beg end) new-records)
        (push "\n" new-records))
      (setq cached (cdr cached)))
    ;; we're finished with the cache overview now
    (kill-buffer cache-buf)
    ;; grab any remaining stuff from old server buffer for new one
    (set-buffer nntp-server-buffer)
    (let ((new-server-cursor (point-max)))
      (when (> new-server-cursor server-cursor)
        (push (buffer-substring server-cursor new-server-cursor) new-records)))
    ;; reverse chunks and concatenate
    (let ((n 0) (records new-records))
      (while records
        (incf n (length (car records)))
        (setq records (cdr records)))
      (let ((new-content (make-string n ?.)))
        (setq n 0)
        (setq records (nreverse new-records))
        (setf new-records nil) ; help the GC a little
        (while records
          (store-substring new-content n (car records))
          (incf n (length (car records)))
          (setq records (cdr records)))
        (set-buffer nntp-server-buffer)
        (erase-buffer)
        (insert new-content))) ))

It's possible that gnus-cache-braid-heads could benefit from
some similar sort of treatment; I haven't looked.

I also tried a version of this that accumulated the new buffer contents
in a new buffer (so that insertions were always at the end). That was
(in my pathological case) 2-3 times faster than the old version of g-c-b-n
and therefore on the order of 10 times slower than the one above.

-- 
g



^ permalink raw reply	[flat|nested] 8+ messages in thread
* Faster NOV braiding for large newsgroups with many cached articles
@ 2008-03-30 20:25 Gareth McCaughan
  2008-03-31 13:31 ` Jason L Tibbitts III
  2008-04-19 14:22 ` Reiner Steib
  0 siblings, 2 replies; 8+ messages in thread
From: Gareth McCaughan @ 2008-03-30 20:25 UTC (permalink / raw)
  To: ding

(My apologies if this arrives twice. It looks like ding@gnus.org
silently drops messages from non-subscribers on the floor. Fair
enough in these spammy days, but might I suggest adding a note
on the Resources page of gnus.org saying so?)

I read one newsgroup for which my (local, leafnode) server has approximately
170k articles and my Gnus cache contains approximately 20k articles.
It turns out that in this mildly pathological situation Gnus behaves
mildly pathologically.

Specifically, gnus-cache-braid-nov takes several minutes to run,
and much of this appears to be because all the insertions in the
nntp-server-buffer are kinda slow.

By building up the new buffer contents in a list of strings,
assembling them into a single string, and then dumping that into
the buffer where it belongs in a single operation, I can (on my
machine, on one occasion -- I haven't tested this scientifically)
speed up gnus-cache-braid-nov by a factor of about 20; 30 seconds
instead of 10 minutes.

    (Note: measured under conditions of moderate load;
    don't take the numbers too seriously.)

In principle this is more wasteful of memory than the old
g-c-b-n, because there may be three copies of the new data
sitting around (the possibly-short strings, the single
concatenated string, and the new buffer contents). On the
other hand, growing a large buffer in small steps probably
incurs some wastage due to fragmentation, and for me at least
the tradeoff is a (very) clear win.

In non-pathological situations, the original g-c-b-n is faster than
my version, but it doesn't matter because both are fast enough for
the user not to care.

Here is my version of g-c-b-n. I've given no thought at all
to multibyte issues; it may be that I should be counting bytes
rather than characters, or something. Perhaps the final
concatenation could be done with (apply 'concatenate (nreverse new-records))
but I worry about hitting implementation limits on the number
of arguments to CONCATENATE.

(defun gnus-cache-braid-nov (group cached &optional file)
  (message "Merging cached articles with ones on server...")
  (let ((cache-buf (gnus-get-buffer-create " *gnus-cache*"))
        (new-records nil)
	beg end server-cursor)
    (gnus-cache-save-buffers)
    ;; create new buffer for reading cache overview
    (save-excursion
      (set-buffer cache-buf)
      (erase-buffer)
      (let ((coding-system-for-read
	     gnus-cache-overview-coding-system))
	(insert-file-contents
	 (or file (gnus-cache-file-name group ".overview"))))
      (goto-char (point-min))
      (insert "\n") ; so we can search for, e.g., \n123\t
      (goto-char (point-min)))
    (set-buffer nntp-server-buffer)
    (goto-char (point-min))
    (setq server-cursor (point))
    (while cached
      (set-buffer nntp-server-buffer)
      ;; skip server records preceding first cached article
      (while (and (not (eobp))
		  (< (read (current-buffer)) (car cached)))
	(forward-line 1))
      (beginning-of-line)
      ;; grab those records for the new buffer
      (let ((new-server-cursor (point)))
        (when (> new-server-cursor server-cursor)
          (push (buffer-substring server-cursor new-server-cursor) new-records)
          (setq server-cursor new-server-cursor)))
      ;; grab first cached article, if present
      (set-buffer cache-buf)
      (if (search-forward (concat "\n" (int-to-string (car cached)) "\t")
			  nil t)
	  (setq beg (gnus-point-at-bol)
		end (progn (end-of-line) (point)))
	(setq beg nil))
      ;; grab that article's data for new buffer
      (when beg
        (push (buffer-substring beg end) new-records)
        (push "\n" new-records))
      (setq cached (cdr cached)))
    ;; we're finished with the cache overview now
    (kill-buffer cache-buf)
    ;; grab any remaining stuff from old server buffer for new one
    (set-buffer nntp-server-buffer)
    (let ((new-server-cursor (point-max)))
      (when (> new-server-cursor server-cursor)
        (push (buffer-substring server-cursor new-server-cursor) new-records)))
    ;; reverse chunks and concatenate
    (let ((n 0) (records new-records))
      (while records
        (incf n (length (car records)))
        (setq records (cdr records)))
      (let ((new-content (make-string n ?.)))
        (setq n 0)
        (setq records (nreverse new-records))
        (setf new-records nil) ; help the GC a little
        (while records
          (store-substring new-content n (car records))
          (incf n (length (car records)))
          (setq records (cdr records)))
        (set-buffer nntp-server-buffer)
        (erase-buffer)
        (insert new-content))) ))

It's possible that gnus-cache-braid-heads could benefit from
some similar sort of treatment; I haven't looked.

I also tried a version of this that accumulated the new buffer contents
in a new buffer (so that insertions were always at the end). That was
(in my pathological case) 2-3 times faster than the old version of g-c-b-n
and therefore on the order of 10 times slower than the one above.

-- 
g



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-04-19 20:33 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-30  2:21 Faster NOV braiding for large newsgroups with many cached articles Gareth McCaughan
2008-03-30 20:25 Gareth McCaughan
2008-03-31 13:31 ` Jason L Tibbitts III
2008-03-31 17:15   ` Gareth McCaughan
2008-04-12  9:03   ` Gaute Strokkenes
2008-04-12 21:25     ` Gareth McCaughan
2008-04-19 14:22 ` Reiner Steib
2008-04-19 20:33   ` Gareth McCaughan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).