Gnus development mailing list
 help / color / mirror / Atom feed
From: Ken Raeburn <raeburn@cygnus.com>
Subject: performance of fuzzy subject handling and killing threads
Date: Fri, 24 May 1996 06:24:51 -0400	[thread overview]
Message-ID: <9605241024.AA24364@cujo.cygnus.com> (raw)


I noticed that killing a thread in a large newsgroup was taking a long
time.  (I'm trying to catch up on some 8000 messages from the Linux
kernel developers' list, so I'm hitting 'k' a lot.)  A little poking
around showed me that gnus was spending much of its time switching
buffers and trimming subjects to see if they might match the one I'm
killing.  For all the remaining articles.  And then, next time I hit
'k', it does it all over again.  Oh, and it spent a lot of time in
garbage collection.  Each of those comparisons produces a new string
to compare with, then discards it.

Since the simplified strings come out the same each time they're
generated, I figured, why not cache them?  It'll slow down the first
pass through the article list a little, but following passes should be
much faster.  And in newsgroups with lots of unread articles -- enough
for the additional cost of building the cache to be noticeable -- I'm
guessing it's probably not the most common case that 'k' gets used
exactly once.

And while I'm at it, why not intern them in a group-local obarray?
That way emacs itself would automatically keep them unique, so
comparisons would be even cheaper, and the extra storage required
would be reduced somewhat in most cases.

Well, I threw together a quick hack to try this.  In this 8000-message
nnml group, hitting 'k' 12 times and killing fifty or so articles
originally took me over 10 minutes.  Now it's down to about 1 minute,
most of that spent in doing the first kill.  So the first one is now a
little slower, but later ones are much faster.

When I worked on this, I was looking to optimize this specific case,
using 'k' in a big group.  I haven't really looked at the various ways
subject comparisons can be done for searching, killing, threading,
etc., so I don't know how much this can be generalized.  I don't have
time to take it further right now, and I don't have the original 0.83
code at the moment to make diffs from, but I'll give the basics in
case someone else wants to give it a shot.  (And if anyone wants my
full sources they can have them.)

I created a new variable gnus-simple-subject-obarray, and added it to
gnus-summary-local-variables.  I changed mail-header to add a
simple-subject field, and changed each place a header was created to
adjust the size or provide a nil initializer.

These functions build up the cache:

(defsubst find-simple-subj-string (str)
  (if (not gnus-simple-subject-obarray)
      (setq gnus-simple-subject-obarray (make-vector 100 0)))
  (intern (gnus-simplify-subject-fully str)
	  gnus-simple-subject-obarray))
(defsubst find-simple-subj (header)
  (let ((h (mail-header-simple-subject header)))
    (if (not h)
	(progn
	  (setq h (find-simple-subj-string (mail-header-subject header)))
	  (mail-header-set-simple-subject header h))
      )
    h))

(Possibly I should've made the obarray larger?)

Calling find-simple-subj will return the interned simplified subject
for the header, actually computing it only once per article.

Then, gnus-summary-find-subject needs to be updated.  It can call
find-simple-subj-string with the string it gets passed, or the calling
interface can be changed to pass it the already-interned symbol.  That
makes more work for the caller, but in the case of repeated calls, as
from gnus-summary-mark-same-subject, can mean a little more savings;
how much in this case is dependent on the number of find-subject calls
made, not the number of articles checked, so it's probably not quite
as important.

I did it by adding a new function with the new interface:

(defun gnus-summary-find-subject-sym (simp-subject &optional unread backward article)
  (let* ((article (or article (gnus-summary-article-number)))
	 (articles (gnus-data-list backward))
	 (arts (gnus-data-find-list article articles))
	 result)
    (when (or (not gnus-summary-check-current)
	      (not unread)
	      (not (gnus-data-unread-p (car arts))))
      (setq arts (cdr arts)))
    (while arts
      (and (or (not unread)
	       (gnus-data-unread-p (car arts)))
	   (vectorp (gnus-data-header (car arts)))
	   (eq simp-subject (find-simple-subj (gnus-data-header (car arts))))
	   (setq result (car arts)
		 arts nil))
      (setq arts (cdr arts)))
    (and result
	 (goto-char (gnus-data-pos result))
	 (gnus-data-number result))))

(defun gnus-summary-mark-same-subject (subject &optional unmark)
  "Mark articles with same SUBJECT as read, and return marked number.
If optional argument UNMARK is positive, remove any kinds of marks.
If optional argument UNMARK is negative, mark articles as unread instead."
  (setq subject (find-simple-subj-string subject))
  (let ((count 1))
    (save-excursion
      (cond
       ((null unmark)			; Mark as read.
	(while (and
		(progn
		  (gnus-summary-mark-article-as-read gnus-killed-mark)
		  (gnus-summary-show-thread) t)
		(gnus-summary-find-subject-sym subject))
	  (setq count (1+ count))))
       ((> unmark 0)			; Tick.
	(while (and
		(progn
		  (gnus-summary-mark-article-as-unread gnus-ticked-mark)
		  (gnus-summary-show-thread) t)
		(gnus-summary-find-subject-sym subject))
	  (setq count (1+ count))))
       (t				; Mark as unread.
	(while (and
		(progn
		  (gnus-summary-mark-article-as-unread gnus-unread-mark)
		  (gnus-summary-show-thread) t)
		(gnus-summary-find-subject-sym subject))
	  (setq count (1+ count)))))
      (gnus-set-mode-line 'summary)
      ;; Return the number of marked articles.
      count)))

There may be other routines besides find-subject that could be
adapted; I haven't looked.


These changes do increase memory use in some ways.  An extra element
is added to the mail-header vector always, even if it isn't used.
Maintaining symbols in an obarray does involve some extra storage
beyond the strings themselves, though that's per unique simplified
subject.  This storage is retained until you leave the newsgroup,
unlike the short-lived strings generated currently.

Hope this is of interest to someone.  If not, maybe I'll get back to
it myself sometime....

Ken


             reply	other threads:[~1996-05-24 10:24 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
1996-05-24 10:24 Ken Raeburn [this message]
1996-05-24 15:41 ` Lars Magne Ingebrigtsen
1996-05-24 18:17   ` Jack Vinson
1996-05-24 18:44     ` Lars Magne Ingebrigtsen
1996-05-24 20:58   ` Ken Raeburn
1996-05-24 21:21     ` Lars Magne Ingebrigtsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9605241024.AA24364@cujo.cygnus.com \
    --to=raeburn@cygnus.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).