Gnus development mailing list
 help / color / mirror / Atom feed
* Faster NOV braiding for large newsgroups with many cached articles
@ 2008-03-30 20:25 Gareth McCaughan
  2008-03-31 13:31 ` Jason L Tibbitts III
  2008-04-19 14:22 ` Reiner Steib
  0 siblings, 2 replies; 8+ messages in thread
From: Gareth McCaughan @ 2008-03-30 20:25 UTC (permalink / raw)
  To: ding

(My apologies if this arrives twice. It looks like ding@gnus.org
silently drops messages from non-subscribers on the floor. Fair
enough in these spammy days, but might I suggest adding a note
on the Resources page of gnus.org saying so?)

I read one newsgroup for which my (local, leafnode) server has approximately
170k articles and my Gnus cache contains approximately 20k articles.
It turns out that in this mildly pathological situation Gnus behaves
mildly pathologically.

Specifically, gnus-cache-braid-nov takes several minutes to run,
and much of this appears to be because all the insertions in the
nntp-server-buffer are kinda slow.

By building up the new buffer contents in a list of strings,
assembling them into a single string, and then dumping that into
the buffer where it belongs in a single operation, I can (on my
machine, on one occasion -- I haven't tested this scientifically)
speed up gnus-cache-braid-nov by a factor of about 20; 30 seconds
instead of 10 minutes.

    (Note: measured under conditions of moderate load;
    don't take the numbers too seriously.)

In principle this is more wasteful of memory than the old
g-c-b-n, because there may be three copies of the new data
sitting around (the possibly-short strings, the single
concatenated string, and the new buffer contents). On the
other hand, growing a large buffer in small steps probably
incurs some wastage due to fragmentation, and for me at least
the tradeoff is a (very) clear win.

In non-pathological situations, the original g-c-b-n is faster than
my version, but it doesn't matter because both are fast enough for
the user not to care.

Here is my version of g-c-b-n. I've given no thought at all
to multibyte issues; it may be that I should be counting bytes
rather than characters, or something. Perhaps the final
concatenation could be done with (apply 'concatenate (nreverse new-records))
but I worry about hitting implementation limits on the number
of arguments to CONCATENATE.

(defun gnus-cache-braid-nov (group cached &optional file)
  (message "Merging cached articles with ones on server...")
  (let ((cache-buf (gnus-get-buffer-create " *gnus-cache*"))
        (new-records nil)
	beg end server-cursor)
    (gnus-cache-save-buffers)
    ;; create new buffer for reading cache overview
    (save-excursion
      (set-buffer cache-buf)
      (erase-buffer)
      (let ((coding-system-for-read
	     gnus-cache-overview-coding-system))
	(insert-file-contents
	 (or file (gnus-cache-file-name group ".overview"))))
      (goto-char (point-min))
      (insert "\n") ; so we can search for, e.g., \n123\t
      (goto-char (point-min)))
    (set-buffer nntp-server-buffer)
    (goto-char (point-min))
    (setq server-cursor (point))
    (while cached
      (set-buffer nntp-server-buffer)
      ;; skip server records preceding first cached article
      (while (and (not (eobp))
		  (< (read (current-buffer)) (car cached)))
	(forward-line 1))
      (beginning-of-line)
      ;; grab those records for the new buffer
      (let ((new-server-cursor (point)))
        (when (> new-server-cursor server-cursor)
          (push (buffer-substring server-cursor new-server-cursor) new-records)
          (setq server-cursor new-server-cursor)))
      ;; grab first cached article, if present
      (set-buffer cache-buf)
      (if (search-forward (concat "\n" (int-to-string (car cached)) "\t")
			  nil t)
	  (setq beg (gnus-point-at-bol)
		end (progn (end-of-line) (point)))
	(setq beg nil))
      ;; grab that article's data for new buffer
      (when beg
        (push (buffer-substring beg end) new-records)
        (push "\n" new-records))
      (setq cached (cdr cached)))
    ;; we're finished with the cache overview now
    (kill-buffer cache-buf)
    ;; grab any remaining stuff from old server buffer for new one
    (set-buffer nntp-server-buffer)
    (let ((new-server-cursor (point-max)))
      (when (> new-server-cursor server-cursor)
        (push (buffer-substring server-cursor new-server-cursor) new-records)))
    ;; reverse chunks and concatenate
    (let ((n 0) (records new-records))
      (while records
        (incf n (length (car records)))
        (setq records (cdr records)))
      (let ((new-content (make-string n ?.)))
        (setq n 0)
        (setq records (nreverse new-records))
        (setf new-records nil) ; help the GC a little
        (while records
          (store-substring new-content n (car records))
          (incf n (length (car records)))
          (setq records (cdr records)))
        (set-buffer nntp-server-buffer)
        (erase-buffer)
        (insert new-content))) ))

It's possible that gnus-cache-braid-heads could benefit from
some similar sort of treatment; I haven't looked.

I also tried a version of this that accumulated the new buffer contents
in a new buffer (so that insertions were always at the end). That was
(in my pathological case) 2-3 times faster than the old version of g-c-b-n
and therefore on the order of 10 times slower than the one above.

-- 
g



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Faster NOV braiding for large newsgroups with many cached articles
  2008-03-30 20:25 Faster NOV braiding for large newsgroups with many cached articles Gareth McCaughan
@ 2008-03-31 13:31 ` Jason L Tibbitts III
  2008-03-31 17:15   ` Gareth McCaughan
  2008-04-12  9:03   ` Gaute Strokkenes
  2008-04-19 14:22 ` Reiner Steib
  1 sibling, 2 replies; 8+ messages in thread
From: Jason L Tibbitts III @ 2008-03-31 13:31 UTC (permalink / raw)
  To: Gareth McCaughan; +Cc: ding

>>>>> "GM" == Gareth McCaughan <gareth.mccaughan@pobox.com> writes:

GM> (My apologies if this arrives twice. It looks like ding@gnus.org
GM> silently drops messages from non-subscribers on the floor. Fair
GM> enough in these spammy days, but might I suggest adding a note on
GM> the Resources page of gnus.org saying so?)

For the record, that is untrue.  It is possible I missed your article
in the flood of spam, or it's possible that I simply wasn't able to
get to it in a time frame which satisfies you.  In either case, your
comment is not reflective of reality and a simple query to the list
owner would have satisfied your curiosity.

 - J<



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Faster NOV braiding for large newsgroups with many cached articles
  2008-03-31 13:31 ` Jason L Tibbitts III
@ 2008-03-31 17:15   ` Gareth McCaughan
  2008-04-12  9:03   ` Gaute Strokkenes
  1 sibling, 0 replies; 8+ messages in thread
From: Gareth McCaughan @ 2008-03-31 17:15 UTC (permalink / raw)
  To: Jason L Tibbitts III; +Cc: ding

On Monday 31 March 2008, Jason L Tibbitts III wrote:

> >>>>> "GM" == Gareth McCaughan <gareth.mccaughan@pobox.com> writes:
> 
> GM> (My apologies if this arrives twice. It looks like ding@gnus.org
> GM> silently drops messages from non-subscribers on the floor. Fair
> GM> enough in these spammy days, but might I suggest adding a note on
> GM> the Resources page of gnus.org saying so?)
> 
> For the record, that is untrue.

Oops. My apologies to anyone I misled.

>                                  It is possible I missed your article 
> in the flood of spam, or it's possible that I simply wasn't able to
> get to it in a time frame which satisfies you.  In either case, your
> comment is not reflective of reality and a simple query to the list
> owner would have satisfied your curiosity.

It sounds like I gave the impression I was annoyed, for which I
apologize; the above wasn't a complaint, and I certainly wasn't
expressing dissatisfaction at the speed with which my message
was handled.

Anyway, I've subscribed now.

-- 
g



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Faster NOV braiding for large newsgroups with many cached articles
  2008-03-31 13:31 ` Jason L Tibbitts III
  2008-03-31 17:15   ` Gareth McCaughan
@ 2008-04-12  9:03   ` Gaute Strokkenes
  2008-04-12 21:25     ` Gareth McCaughan
  1 sibling, 1 reply; 8+ messages in thread
From: Gaute Strokkenes @ 2008-04-12  9:03 UTC (permalink / raw)
  To: ding

On 31 mars 2008, tibbs@math.uh.edu wrote:

>>>>>> "GM" == Gareth McCaughan <gareth.mccaughan@pobox.com> writes:
>
> GM> (My apologies if this arrives twice. It looks like ding@gnus.org
> GM> silently drops messages from non-subscribers on the floor. Fair
> GM> enough in these spammy days, but might I suggest adding a note on
> GM> the Resources page of gnus.org saying so?)
>
> For the record, that is untrue.  It is possible I missed your article
> in the flood of spam, or it's possible that I simply wasn't able to
> get to it in a time frame which satisfies you.  In either case, your
> comment is not reflective of reality and a simple query to the list
> owner would have satisfied your curiosity.

I believe a message of mine also got dropped.  The subject was "Dodgy
ranges in .marks files, and: Return of the disappearing unread marks."
I sent it on 20 March, and the from address I used was
"gs234-ding@srcf.ucam.org".

When I realised that the message wasn't getting through, I went to
<http://gnus.org/resources.html> (the only helpful google hit for the
search term "ding mailing list") to look for a list owner, but there's
no one listed, so I pretty much gave up and resolved to just post
without the "-ding" suffix in the future.

I think it would be good if a note about moderation was added to that
page, along with a means to contact the list owner.

Thanks,

-- 
Gaute Strokkenes



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Faster NOV braiding for large newsgroups with many cached articles
  2008-04-12  9:03   ` Gaute Strokkenes
@ 2008-04-12 21:25     ` Gareth McCaughan
  0 siblings, 0 replies; 8+ messages in thread
From: Gareth McCaughan @ 2008-04-12 21:25 UTC (permalink / raw)
  To: ding

Gaute Strokkenes wrote:

[me:]
> > GM> (My apologies if this arrives twice. It looks like ding@gnus.org
> > GM> silently drops messages from non-subscribers on the floor. Fair
> > GM> enough in these spammy days, but might I suggest adding a note on
> > GM> the Resources page of gnus.org saying so?)

[Jason Tibbitts, the moderator:]
> > For the record, that is untrue.  It is possible I missed your article
> > in the flood of spam, or it's possible that I simply wasn't able to
> > get to it in a time frame which satisfies you.  In either case, your
> > comment is not reflective of reality and a simple query to the list
> > owner would have satisfied your curiosity.

[Gaute:]
> I believe a message of mine also got dropped.  The subject was "Dodgy
> ranges in .marks files, and: Return of the disappearing unread marks."

For what it's worth, mine didn't in fact get dropped; it appeared
on the list eventually. (Later than the version I sent after subscribing,
though.)

> I think it would be good if a note about moderation was added to that
> page, along with a means to contact the list owner.

I agree that this would be nice. Where it should come in the
priorities of whoever maintains that page is another question,
of course :-).

-- 
g



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Faster NOV braiding for large newsgroups with many cached articles
  2008-03-30 20:25 Faster NOV braiding for large newsgroups with many cached articles Gareth McCaughan
  2008-03-31 13:31 ` Jason L Tibbitts III
@ 2008-04-19 14:22 ` Reiner Steib
  2008-04-19 20:33   ` Gareth McCaughan
  1 sibling, 1 reply; 8+ messages in thread
From: Reiner Steib @ 2008-04-19 14:22 UTC (permalink / raw)
  To: Gareth McCaughan; +Cc: ding

[-- Attachment #1: Type: text/plain, Size: 4598 bytes --]

On Sun, Mar 30 2008, Gareth McCaughan wrote:

> I read one newsgroup for which my (local, leafnode) server has approximately
> 170k articles and my Gnus cache contains approximately 20k articles.
> It turns out that in this mildly pathological situation Gnus behaves
> mildly pathologically.
>
> Specifically, gnus-cache-braid-nov takes several minutes to run,
> and much of this appears to be because all the insertions in the
> nntp-server-buffer are kinda slow.
>
> By building up the new buffer contents in a list of strings,
> assembling them into a single string, and then dumping that into
> the buffer where it belongs in a single operation, I can (on my
> machine, on one occasion -- I haven't tested this scientifically)
> speed up gnus-cache-braid-nov by a factor of about 20; 30 seconds
> instead of 10 minutes.
>
>     (Note: measured under conditions of moderate load;
>     don't take the numbers too seriously.)

With which (X)Emacs and Gnus versions?  Did you try other versions as
well?

> In principle this is more wasteful of memory than the old
> g-c-b-n, because there may be three copies of the new data
> sitting around (the possibly-short strings, the single
> concatenated string, and the new buffer contents). On the
> other hand, growing a large buffer in small steps probably
> incurs some wastage due to fragmentation, and for me at least
> the tradeoff is a (very) clear win.
>
> In non-pathological situations, the original g-c-b-n is faster than
> my version, but it doesn't matter because both are fast enough for
> the user not to care.
>
> Here is my version of g-c-b-n. I've given no thought at all
> to multibyte issues; it may be that I should be counting bytes
> rather than characters, or something. Perhaps the final
> concatenation could be done with (apply 'concatenate (nreverse new-records))
> but I worry about hitting implementation limits on the number
> of arguments to CONCATENATE.

It is easier for us if you don't post the modified function.  Instead,
produce a diff (unified diff preferred: "-u") against the version you
use (preferably HEAD revision of the CVS trunk, else tell us the
version). [1]

| --- gnus-cache.el	01 Mar 2008 22:54:54 +0100	6.26.2.13
| +++ gnus-cache.el	19 Apr 2008 16:01:26 +0200	
| @@ -501,10 +501,14 @@
|  	  (setq gnus-cache-active-altered t)))
|        articles)))
|  
| +
|  (defun gnus-cache-braid-nov (group cached &optional file)
| +  (message "Merging cached articles with ones on server...")

Better use `gnus-message' here.

| +    ;; reverse chunks and concatenate
| +    (let ((n 0) (records new-records))
| +      (while records
| +        (incf n (length (car records)))
| +        (setq records (cdr records)))
| +      (let ((new-content (make-string n ?.)))
| +        (setq n 0)
| +        (setq records (nreverse new-records))
| +        (setf new-records nil) ; help the GC a little

Please explain why you use `setf' and why GC need help.

| +        (while records
| +          (store-substring new-content n (car records))
| +          (incf n (length (car records)))
| +          (setq records (cdr records)))
| +        (set-buffer nntp-server-buffer)
| +        (erase-buffer)
| +        (insert new-content))) ))
`----

> It's possible that gnus-cache-braid-heads could benefit from
> some similar sort of treatment; I haven't looked.
>
> I also tried a version of this that accumulated the new buffer contents
> in a new buffer (so that insertions were always at the end). That was
> (in my pathological case) 2-3 times faster than the old version of g-c-b-n
> and therefore on the order of 10 times slower than the one above.

On Fri, Apr 18 2008, Gareth McCaughan wrote on bugs@gnus.org:
[...]
> I posted a version of gnus-cache-braid-nov that works that way
> to ding@gnus.org. (No replies; fair enough.) 

Sorry, I didn't have time to look at your code.  It would be better to
remind us by following-up to the original message instead of starting
a new thread on a different list.

> It might be better to look at the size of the group and of the cache
> and choose heuristically between the two implementations, so as not
> to pay the memory cost for large groups with few cached articles
> (where I think the speed should be OK with the old implementation,
> though I haven't measured it).

Sounds useful.

> If there's any interest in improving this and my code is useful,
> I am happy to sign whatever papers are necessary. 

I'll send you the form off-list.

Bye, Reiner.
-- 
[1] Inserting your defun into gnus-cache.el on the v5-10 branch (Gnus
    5.10.10) and produding a diff:

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: gnus-cache-McCaughan.patch --]
[-- Type: text/x-patch, Size: 3119 bytes --]

--- gnus-cache.el	01 Mar 2008 22:54:54 +0100	6.26.2.13
+++ gnus-cache.el	19 Apr 2008 16:01:26 +0200	
@@ -501,10 +501,14 @@
 	  (setq gnus-cache-active-altered t)))
       articles)))
 
+
 (defun gnus-cache-braid-nov (group cached &optional file)
+  (message "Merging cached articles with ones on server...")
   (let ((cache-buf (gnus-get-buffer-create " *gnus-cache*"))
-	beg end)
+        (new-records nil)
+	beg end server-cursor)
     (gnus-cache-save-buffers)
+    ;; create new buffer for reading cache overview
     (save-excursion
       (set-buffer cache-buf)
       (erase-buffer)
@@ -513,27 +517,58 @@
 	(insert-file-contents
 	 (or file (gnus-cache-file-name group ".overview"))))
       (goto-char (point-min))
-      (insert "\n")
+      (insert "\n") ; so we can search for, e.g., \n123\t
       (goto-char (point-min)))
     (set-buffer nntp-server-buffer)
     (goto-char (point-min))
+    (setq server-cursor (point))
     (while cached
+      (set-buffer nntp-server-buffer)
+      ;; skip server records preceding first cached article
       (while (and (not (eobp))
 		  (< (read (current-buffer)) (car cached)))
 	(forward-line 1))
       (beginning-of-line)
+      ;; grab those records for the new buffer
+      (let ((new-server-cursor (point)))
+        (when (> new-server-cursor server-cursor)
+          (push (buffer-substring server-cursor new-server-cursor) new-records)
+          (setq server-cursor new-server-cursor)))
+      ;; grab first cached article, if present
       (set-buffer cache-buf)
       (if (search-forward (concat "\n" (int-to-string (car cached)) "\t")
 			  nil t)
 	  (setq beg (gnus-point-at-bol)
 		end (progn (end-of-line) (point)))
 	(setq beg nil))
-      (set-buffer nntp-server-buffer)
+      ;; grab that article's data for new buffer
       (when beg
-	(insert-buffer-substring cache-buf beg end)
-	(insert "\n"))
+        (push (buffer-substring beg end) new-records)
+        (push "\n" new-records))
       (setq cached (cdr cached)))
-    (kill-buffer cache-buf)))
+    ;; we're finished with the cache overview now
+    (kill-buffer cache-buf)
+    ;; grab any remaining stuff from old server buffer for new one
+    (set-buffer nntp-server-buffer)
+    (let ((new-server-cursor (point-max)))
+      (when (> new-server-cursor server-cursor)
+        (push (buffer-substring server-cursor new-server-cursor) new-records)))
+    ;; reverse chunks and concatenate
+    (let ((n 0) (records new-records))
+      (while records
+        (incf n (length (car records)))
+        (setq records (cdr records)))
+      (let ((new-content (make-string n ?.)))
+        (setq n 0)
+        (setq records (nreverse new-records))
+        (setf new-records nil) ; help the GC a little
+        (while records
+          (store-substring new-content n (car records))
+          (incf n (length (car records)))
+          (setq records (cdr records)))
+        (set-buffer nntp-server-buffer)
+        (erase-buffer)
+        (insert new-content))) ))
 
 (defun gnus-cache-braid-heads (group cached)
   (let ((cache-buf (gnus-get-buffer-create " *gnus-cache*")))

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Faster NOV braiding for large newsgroups with many cached articles
  2008-04-19 14:22 ` Reiner Steib
@ 2008-04-19 20:33   ` Gareth McCaughan
  0 siblings, 0 replies; 8+ messages in thread
From: Gareth McCaughan @ 2008-04-19 20:33 UTC (permalink / raw)
  To: Reiner Steib; +Cc: ding

On Saturday 19 April 2008, Reiner Steib wrote:

[me:]
> > Specifically, gnus-cache-braid-nov takes several minutes to run,
> > and much of this appears to be because all the insertions in the
> > nntp-server-buffer are kinda slow.
> >
> > By building up the new buffer contents in a list of strings,
> > assembling them into a single string, and then dumping that into
> > the buffer where it belongs in a single operation, I can (on my
> > machine, on one occasion -- I haven't tested this scientifically)
> > speed up gnus-cache-braid-nov by a factor of about 20; 30 seconds
> > instead of 10 minutes.
...
> With which (X)Emacs and Gnus versions?  Did you try other versions as
> well?

GNU Emacs (haven't tried XEmacs); several different versions
of Gnus, namely the ones distributed with many recent releases
of GNU Emacs. I'm afraid I don't have records of exactly which
ones, but what I have at the moment is:

    (emacs-version)
    "GNU Emacs 22.1.1 (i386-pc-freebsd, GTK+ Version 2.10.14)
     of 2007-07-25 on g.local"
    (gnus-version)
    "Gnus v5.11"

> It is easier for us if you don't post the modified function.  Instead,
> produce a diff (unified diff preferred: "-u") against the version you
> use (preferably HEAD revision of the CVS trunk, else tell us the
> version). [1]

Looks like you've made the diff yourself in this case; thanks.
(I posted the new code rather than the diff because I was
basically replacing an entire function and it seemed friendlier
to casual ding@-readers to have the code as readily readable
as possible.)

> Better use `gnus-message' here.

Oops, yes. Didn't even notice its existence.

> | +    ;; reverse chunks and concatenate
> | +    (let ((n 0) (records new-records))
> | +      (while records
> | +        (incf n (length (car records)))
> | +        (setq records (cdr records)))
> | +      (let ((new-content (make-string n ?.)))
> | +        (setq n 0)
> | +        (setq records (nreverse new-records))
> | +        (setf new-records nil) ; help the GC a little
> 
> Please explain why you use `setf' and why GC need help.

I use "setf" because I write much more Common Lisp than
elisp and occasionally forget myself. setq would be just
as good, of course.

I don't know whether the GC does in fact *need* help,
but it can't do any harm to give it more opportunity
to collect the garbage early if it wants.

> On Fri, Apr 18 2008, Gareth McCaughan wrote on bugs@gnus.org:
> [...]
> > I posted a version of gnus-cache-braid-nov that works that way
> > to ding@gnus.org. (No replies; fair enough.) 
> 
> Sorry, I didn't have time to look at your code.  It would be better to
> remind us by following-up to the original message instead of starting
> a new thread on a different list.

It occurred to me that perhaps I'd posted it to the wrong
place originally and bugs@ might have been preferable. I'm
sorry if that was less than maximally convenient.

> > It might be better to look at the size of the group and of the cache
> > and choose heuristically between the two implementations, so as not
> > to pay the memory cost for large groups with few cached articles
> > (where I think the speed should be OK with the old implementation,
> > though I haven't measured it).
> 
> Sounds useful.

OK; I'll have a play and put together something that
avoids gratuitously large time and memory costs in as
wide a range of situations as I can.

> > If there's any interest in improving this and my code is useful,
> > I am happy to sign whatever papers are necessary. 
> 
> I'll send you the form off-list.

Thanks.

-- 
g



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Faster NOV braiding for large newsgroups with many cached articles
@ 2008-03-30  2:21 Gareth McCaughan
  0 siblings, 0 replies; 8+ messages in thread
From: Gareth McCaughan @ 2008-03-30  2:21 UTC (permalink / raw)
  To: ding

(Note: I am not subscribed. If that's considered bad etiquette here,
I hope someone will let me know. I'll be watching the gmane archive
for a little while, so it's not a disaster if replies are sent only
to the list.)

I read one newsgroup for which my (local, leafnode) server has approximately
170k articles and my Gnus cache contains approximately 20k articles.
It turns out that in this mildly pathological situation Gnus behaves
mildly pathologically.

Specifically, gnus-cache-braid-nov takes several minutes to run,
and much of this appears to be because all the insertions in the
nntp-server-buffer are kinda slow.

By building up the new buffer contents in a list of strings,
assembling them into a single string, and then dumping that into
the buffer where it belongs in a single operation, I can (on my
machine, on one occasion -- I haven't tested this scientifically)
speed up gnus-cache-braid-nov by a factor of about 20; 30 seconds
instead of 10 minutes.

    (Note: measured under conditions of moderate load;
    don't take the numbers too seriously.)

In principle this is more wasteful of memory than the old
g-c-b-n, because there may be three copies of the new data
sitting around (the possibly-short strings, the single
concatenated string, and the new buffer contents). On the
other hand, growing a large buffer in small steps probably
incurs some wastage due to fragmentation, and for me at least
the tradeoff is a (very) clear win.

In non-pathological situations, the original g-c-b-n is faster than
my version, but it doesn't matter because both are fast enough for
the user not to care.

Here is my version of g-c-b-n. I've given no thought at all
to multibyte issues; it may be that I should be counting bytes
rather than characters, or something. Perhaps the final
concatenation could be done with (apply 'concatenate (nreverse new-records))
but I worry about hitting implementation limits on the number
of arguments to CONCATENATE.

(defun gnus-cache-braid-nov (group cached &optional file)
  (message "Merging cached articles with ones on server...")
  (let ((cache-buf (gnus-get-buffer-create " *gnus-cache*"))
        (new-records nil)
	beg end server-cursor)
    (gnus-cache-save-buffers)
    ;; create new buffer for reading cache overview
    (save-excursion
      (set-buffer cache-buf)
      (erase-buffer)
      (let ((coding-system-for-read
	     gnus-cache-overview-coding-system))
	(insert-file-contents
	 (or file (gnus-cache-file-name group ".overview"))))
      (goto-char (point-min))
      (insert "\n") ; so we can search for, e.g., \n123\t
      (goto-char (point-min)))
    (set-buffer nntp-server-buffer)
    (goto-char (point-min))
    (setq server-cursor (point))
    (while cached
      (set-buffer nntp-server-buffer)
      ;; skip server records preceding first cached article
      (while (and (not (eobp))
		  (< (read (current-buffer)) (car cached)))
	(forward-line 1))
      (beginning-of-line)
      ;; grab those records for the new buffer
      (let ((new-server-cursor (point)))
        (when (> new-server-cursor server-cursor)
          (push (buffer-substring server-cursor new-server-cursor) new-records)
          (setq server-cursor new-server-cursor)))
      ;; grab first cached article, if present
      (set-buffer cache-buf)
      (if (search-forward (concat "\n" (int-to-string (car cached)) "\t")
			  nil t)
	  (setq beg (gnus-point-at-bol)
		end (progn (end-of-line) (point)))
	(setq beg nil))
      ;; grab that article's data for new buffer
      (when beg
        (push (buffer-substring beg end) new-records)
        (push "\n" new-records))
      (setq cached (cdr cached)))
    ;; we're finished with the cache overview now
    (kill-buffer cache-buf)
    ;; grab any remaining stuff from old server buffer for new one
    (set-buffer nntp-server-buffer)
    (let ((new-server-cursor (point-max)))
      (when (> new-server-cursor server-cursor)
        (push (buffer-substring server-cursor new-server-cursor) new-records)))
    ;; reverse chunks and concatenate
    (let ((n 0) (records new-records))
      (while records
        (incf n (length (car records)))
        (setq records (cdr records)))
      (let ((new-content (make-string n ?.)))
        (setq n 0)
        (setq records (nreverse new-records))
        (setf new-records nil) ; help the GC a little
        (while records
          (store-substring new-content n (car records))
          (incf n (length (car records)))
          (setq records (cdr records)))
        (set-buffer nntp-server-buffer)
        (erase-buffer)
        (insert new-content))) ))

It's possible that gnus-cache-braid-heads could benefit from
some similar sort of treatment; I haven't looked.

I also tried a version of this that accumulated the new buffer contents
in a new buffer (so that insertions were always at the end). That was
(in my pathological case) 2-3 times faster than the old version of g-c-b-n
and therefore on the order of 10 times slower than the one above.

-- 
g



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-04-19 20:33 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-30 20:25 Faster NOV braiding for large newsgroups with many cached articles Gareth McCaughan
2008-03-31 13:31 ` Jason L Tibbitts III
2008-03-31 17:15   ` Gareth McCaughan
2008-04-12  9:03   ` Gaute Strokkenes
2008-04-12 21:25     ` Gareth McCaughan
2008-04-19 14:22 ` Reiner Steib
2008-04-19 20:33   ` Gareth McCaughan
  -- strict thread matches above, loose matches on Subject: below --
2008-03-30  2:21 Gareth McCaughan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).