Gnus development mailing list
 help / color / mirror / Atom feed
* performance of fuzzy subject handling and killing threads
@ 1996-05-24 10:24 Ken Raeburn
  1996-05-24 15:41 ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 6+ messages in thread
From: Ken Raeburn @ 1996-05-24 10:24 UTC (permalink / raw)



I noticed that killing a thread in a large newsgroup was taking a long
time.  (I'm trying to catch up on some 8000 messages from the Linux
kernel developers' list, so I'm hitting 'k' a lot.)  A little poking
around showed me that gnus was spending much of its time switching
buffers and trimming subjects to see if they might match the one I'm
killing.  For all the remaining articles.  And then, next time I hit
'k', it does it all over again.  Oh, and it spent a lot of time in
garbage collection.  Each of those comparisons produces a new string
to compare with, then discards it.

Since the simplified strings come out the same each time they're
generated, I figured, why not cache them?  It'll slow down the first
pass through the article list a little, but following passes should be
much faster.  And in newsgroups with lots of unread articles -- enough
for the additional cost of building the cache to be noticeable -- I'm
guessing it's probably not the most common case that 'k' gets used
exactly once.

And while I'm at it, why not intern them in a group-local obarray?
That way emacs itself would automatically keep them unique, so
comparisons would be even cheaper, and the extra storage required
would be reduced somewhat in most cases.

Well, I threw together a quick hack to try this.  In this 8000-message
nnml group, hitting 'k' 12 times and killing fifty or so articles
originally took me over 10 minutes.  Now it's down to about 1 minute,
most of that spent in doing the first kill.  So the first one is now a
little slower, but later ones are much faster.

When I worked on this, I was looking to optimize this specific case,
using 'k' in a big group.  I haven't really looked at the various ways
subject comparisons can be done for searching, killing, threading,
etc., so I don't know how much this can be generalized.  I don't have
time to take it further right now, and I don't have the original 0.83
code at the moment to make diffs from, but I'll give the basics in
case someone else wants to give it a shot.  (And if anyone wants my
full sources they can have them.)

I created a new variable gnus-simple-subject-obarray, and added it to
gnus-summary-local-variables.  I changed mail-header to add a
simple-subject field, and changed each place a header was created to
adjust the size or provide a nil initializer.

These functions build up the cache:

(defsubst find-simple-subj-string (str)
  (if (not gnus-simple-subject-obarray)
      (setq gnus-simple-subject-obarray (make-vector 100 0)))
  (intern (gnus-simplify-subject-fully str)
	  gnus-simple-subject-obarray))
(defsubst find-simple-subj (header)
  (let ((h (mail-header-simple-subject header)))
    (if (not h)
	(progn
	  (setq h (find-simple-subj-string (mail-header-subject header)))
	  (mail-header-set-simple-subject header h))
      )
    h))

(Possibly I should've made the obarray larger?)

Calling find-simple-subj will return the interned simplified subject
for the header, actually computing it only once per article.

Then, gnus-summary-find-subject needs to be updated.  It can call
find-simple-subj-string with the string it gets passed, or the calling
interface can be changed to pass it the already-interned symbol.  That
makes more work for the caller, but in the case of repeated calls, as
from gnus-summary-mark-same-subject, can mean a little more savings;
how much in this case is dependent on the number of find-subject calls
made, not the number of articles checked, so it's probably not quite
as important.

I did it by adding a new function with the new interface:

(defun gnus-summary-find-subject-sym (simp-subject &optional unread backward article)
  (let* ((article (or article (gnus-summary-article-number)))
	 (articles (gnus-data-list backward))
	 (arts (gnus-data-find-list article articles))
	 result)
    (when (or (not gnus-summary-check-current)
	      (not unread)
	      (not (gnus-data-unread-p (car arts))))
      (setq arts (cdr arts)))
    (while arts
      (and (or (not unread)
	       (gnus-data-unread-p (car arts)))
	   (vectorp (gnus-data-header (car arts)))
	   (eq simp-subject (find-simple-subj (gnus-data-header (car arts))))
	   (setq result (car arts)
		 arts nil))
      (setq arts (cdr arts)))
    (and result
	 (goto-char (gnus-data-pos result))
	 (gnus-data-number result))))

(defun gnus-summary-mark-same-subject (subject &optional unmark)
  "Mark articles with same SUBJECT as read, and return marked number.
If optional argument UNMARK is positive, remove any kinds of marks.
If optional argument UNMARK is negative, mark articles as unread instead."
  (setq subject (find-simple-subj-string subject))
  (let ((count 1))
    (save-excursion
      (cond
       ((null unmark)			; Mark as read.
	(while (and
		(progn
		  (gnus-summary-mark-article-as-read gnus-killed-mark)
		  (gnus-summary-show-thread) t)
		(gnus-summary-find-subject-sym subject))
	  (setq count (1+ count))))
       ((> unmark 0)			; Tick.
	(while (and
		(progn
		  (gnus-summary-mark-article-as-unread gnus-ticked-mark)
		  (gnus-summary-show-thread) t)
		(gnus-summary-find-subject-sym subject))
	  (setq count (1+ count))))
       (t				; Mark as unread.
	(while (and
		(progn
		  (gnus-summary-mark-article-as-unread gnus-unread-mark)
		  (gnus-summary-show-thread) t)
		(gnus-summary-find-subject-sym subject))
	  (setq count (1+ count)))))
      (gnus-set-mode-line 'summary)
      ;; Return the number of marked articles.
      count)))

There may be other routines besides find-subject that could be
adapted; I haven't looked.


These changes do increase memory use in some ways.  An extra element
is added to the mail-header vector always, even if it isn't used.
Maintaining symbols in an obarray does involve some extra storage
beyond the strings themselves, though that's per unique simplified
subject.  This storage is retained until you leave the newsgroup,
unlike the short-lived strings generated currently.

Hope this is of interest to someone.  If not, maybe I'll get back to
it myself sometime....

Ken


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: performance of fuzzy subject handling and killing threads
  1996-05-24 10:24 performance of fuzzy subject handling and killing threads Ken Raeburn
@ 1996-05-24 15:41 ` Lars Magne Ingebrigtsen
  1996-05-24 18:17   ` Jack Vinson
  1996-05-24 20:58   ` Ken Raeburn
  0 siblings, 2 replies; 6+ messages in thread
From: Lars Magne Ingebrigtsen @ 1996-05-24 15:41 UTC (permalink / raw)


Ken Raeburn <raeburn@cygnus.com> writes:

> Since the simplified strings come out the same each time they're
> generated, I figured, why not cache them?  It'll slow down the first
> pass through the article list a little, but following passes should be
> much faster.  And in newsgroups with lots of unread articles -- enough
> for the additional cost of building the cache to be noticeable -- I'm
> guessing it's probably not the most common case that 'k' gets used
> exactly once.

Yes, I never use `k' myself.  (Well, I just use `M-C-l' to kill
articles; what with the threading and gathering, it kills exactly the
same articles that `k' does in a fraction of the time.)

> And while I'm at it, why not intern them in a group-local obarray?
> That way emacs itself would automatically keep them unique, so
> comparisons would be even cheaper, and the extra storage required
> would be reduced somewhat in most cases.

Nice technique.  I'll be adding your stuff to Red Gnus.

-- 
  "Yes.  The journey through the human heart 
     would have to wait until some other time."


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: performance of fuzzy subject handling and killing threads
  1996-05-24 15:41 ` Lars Magne Ingebrigtsen
@ 1996-05-24 18:17   ` Jack Vinson
  1996-05-24 18:44     ` Lars Magne Ingebrigtsen
  1996-05-24 20:58   ` Ken Raeburn
  1 sibling, 1 reply; 6+ messages in thread
From: Jack Vinson @ 1996-05-24 18:17 UTC (permalink / raw)


>>>>> "LMI" == Lars Magne Ingebrigtsen <larsi@ifi.uio.no> writes:

LMI> Yes, I never use `k' myself.  (Well, I just use `M-C-l' to kill
LMI> articles; what with the threading and gathering, it kills exactly the
LMI> same articles that `k' does in a fraction of the time.)

This is Meta-C-l?  Meta-c is bound to to capitalize-word in my *Summary*
buffer.  "M C" is, of course, gnus-summary-catchup.

-- 
   (      "Mmmmmm - coffee"               (   
   ))     Jack Vinson                     ))  
 C|~~|    jvinson@cheux.ecs.umass.edu   C|~~| 
  `--'                                   `--' 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: performance of fuzzy subject handling and killing threads
  1996-05-24 18:17   ` Jack Vinson
@ 1996-05-24 18:44     ` Lars Magne Ingebrigtsen
  0 siblings, 0 replies; 6+ messages in thread
From: Lars Magne Ingebrigtsen @ 1996-05-24 18:44 UTC (permalink / raw)


Jack Vinson <jvinson@cheux.ecs.umass.edu> writes:

> LMI> Yes, I never use `k' myself.  (Well, I just use `M-C-l' to kill
> LMI> articles; what with the threading and gathering, it kills exactly the
> LMI> same articles that `k' does in a fraction of the time.)
> 
> This is Meta-C-l?  Meta-c is bound to to capitalize-word in my *Summary*
> buffer.  "M C" is, of course, gnus-summary-catchup.

No, this is `M-C-l'.  Meta-control-ell.

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@ifi.uio.no * Lars Ingebrigtsen


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: performance of fuzzy subject handling and killing threads
  1996-05-24 15:41 ` Lars Magne Ingebrigtsen
  1996-05-24 18:17   ` Jack Vinson
@ 1996-05-24 20:58   ` Ken Raeburn
  1996-05-24 21:21     ` Lars Magne Ingebrigtsen
  1 sibling, 1 reply; 6+ messages in thread
From: Ken Raeburn @ 1996-05-24 20:58 UTC (permalink / raw)
  Cc: ding


Lars Magne Ingebrigtsen <larsi@ifi.uio.no> writes:

> Yes, I never use `k' myself.  (Well, I just use `M-C-l' to kill
> articles; what with the threading and gathering, it kills exactly the
> same articles that `k' does in a fraction of the time.)

Sounds good, but it's dependent on how gathering and score handling
are configured.  I'm not using scores at all, nor particularly
aggressive gathering.  For me, M-C-l put a "-" on the summary lines
for the thread, and moved me to the next article of that thread.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: performance of fuzzy subject handling and killing threads
  1996-05-24 20:58   ` Ken Raeburn
@ 1996-05-24 21:21     ` Lars Magne Ingebrigtsen
  0 siblings, 0 replies; 6+ messages in thread
From: Lars Magne Ingebrigtsen @ 1996-05-24 21:21 UTC (permalink / raw)


Ken Raeburn <raeburn@cygnus.com> writes:

> Sounds good, but it's dependent on how gathering and score handling
> are configured.  I'm not using scores at all, nor particularly
> aggressive gathering.  For me, M-C-l put a "-" on the summary lines
> for the thread, and moved me to the next article of that thread.

Hm.  Whoops.  `gnus-summary-mark-below' defaults to nil.  I've now
changed it to 0, which will make `M-C-l' mark the low-scored articles
as read. 

-- 
(domestic pets only, the antidote for overdose, milk.)
  larsi@ifi.uio.no * Lars Ingebrigtsen


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~1996-05-24 21:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1996-05-24 10:24 performance of fuzzy subject handling and killing threads Ken Raeburn
1996-05-24 15:41 ` Lars Magne Ingebrigtsen
1996-05-24 18:17   ` Jack Vinson
1996-05-24 18:44     ` Lars Magne Ingebrigtsen
1996-05-24 20:58   ` Ken Raeburn
1996-05-24 21:21     ` Lars Magne Ingebrigtsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).