From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/6367 Path: main.gmane.org!not-for-mail From: Ken Raeburn Newsgroups: gmane.emacs.gnus.general Subject: performance of fuzzy subject handling and killing threads Date: Fri, 24 May 1996 06:24:51 -0400 Message-ID: <9605241024.AA24364@cujo.cygnus.com> NNTP-Posting-Host: coloc-standby.netfonds.no X-Trace: main.gmane.org 1035146832 3392 80.91.224.250 (20 Oct 2002 20:47:12 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Sun, 20 Oct 2002 20:47:12 +0000 (UTC) Return-Path: ding-request@ifi.uio.no Original-Received: from ifi.uio.no (ifi.uio.no [129.240.64.2]) by deanna.miranova.com (8.7.5/8.6.9) with SMTP id EAA06529 for ; Fri, 24 May 1996 04:10:57 -0700 Original-Received: from cygnus.com (cygnus.com [140.174.1.1]) by ifi.uio.no with ESMTP (8.6.11/ifi2.4) id for ; Fri, 24 May 1996 12:25:04 +0200 Original-Received: from tweedledumb.cygnus.com (tweedledumb.cygnus.com [192.80.44.1]) by cygnus.com (8.6.12/8.6.9) with SMTP id DAA13334; Fri, 24 May 1996 03:24:55 -0700 Original-Received: from cujo.cygnus.com by tweedledumb.cygnus.com (4.1/4.7) id AA21318; Fri, 24 May 96 06:24:53 EDT Original-Received: by cujo.cygnus.com; (5.65v3.2/1.1.8.2/20Sep95-0235PM) id AA24364; Fri, 24 May 1996 06:24:51 -0400 Original-To: ding@ifi.uio.no Xref: main.gmane.org gmane.emacs.gnus.general:6367 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:6367 I noticed that killing a thread in a large newsgroup was taking a long time. (I'm trying to catch up on some 8000 messages from the Linux kernel developers' list, so I'm hitting 'k' a lot.) A little poking around showed me that gnus was spending much of its time switching buffers and trimming subjects to see if they might match the one I'm killing. For all the remaining articles. And then, next time I hit 'k', it does it all over again. Oh, and it spent a lot of time in garbage collection. Each of those comparisons produces a new string to compare with, then discards it. Since the simplified strings come out the same each time they're generated, I figured, why not cache them? It'll slow down the first pass through the article list a little, but following passes should be much faster. And in newsgroups with lots of unread articles -- enough for the additional cost of building the cache to be noticeable -- I'm guessing it's probably not the most common case that 'k' gets used exactly once. And while I'm at it, why not intern them in a group-local obarray? That way emacs itself would automatically keep them unique, so comparisons would be even cheaper, and the extra storage required would be reduced somewhat in most cases. Well, I threw together a quick hack to try this. In this 8000-message nnml group, hitting 'k' 12 times and killing fifty or so articles originally took me over 10 minutes. Now it's down to about 1 minute, most of that spent in doing the first kill. So the first one is now a little slower, but later ones are much faster. When I worked on this, I was looking to optimize this specific case, using 'k' in a big group. I haven't really looked at the various ways subject comparisons can be done for searching, killing, threading, etc., so I don't know how much this can be generalized. I don't have time to take it further right now, and I don't have the original 0.83 code at the moment to make diffs from, but I'll give the basics in case someone else wants to give it a shot. (And if anyone wants my full sources they can have them.) I created a new variable gnus-simple-subject-obarray, and added it to gnus-summary-local-variables. I changed mail-header to add a simple-subject field, and changed each place a header was created to adjust the size or provide a nil initializer. These functions build up the cache: (defsubst find-simple-subj-string (str) (if (not gnus-simple-subject-obarray) (setq gnus-simple-subject-obarray (make-vector 100 0))) (intern (gnus-simplify-subject-fully str) gnus-simple-subject-obarray)) (defsubst find-simple-subj (header) (let ((h (mail-header-simple-subject header))) (if (not h) (progn (setq h (find-simple-subj-string (mail-header-subject header))) (mail-header-set-simple-subject header h)) ) h)) (Possibly I should've made the obarray larger?) Calling find-simple-subj will return the interned simplified subject for the header, actually computing it only once per article. Then, gnus-summary-find-subject needs to be updated. It can call find-simple-subj-string with the string it gets passed, or the calling interface can be changed to pass it the already-interned symbol. That makes more work for the caller, but in the case of repeated calls, as from gnus-summary-mark-same-subject, can mean a little more savings; how much in this case is dependent on the number of find-subject calls made, not the number of articles checked, so it's probably not quite as important. I did it by adding a new function with the new interface: (defun gnus-summary-find-subject-sym (simp-subject &optional unread backward article) (let* ((article (or article (gnus-summary-article-number))) (articles (gnus-data-list backward)) (arts (gnus-data-find-list article articles)) result) (when (or (not gnus-summary-check-current) (not unread) (not (gnus-data-unread-p (car arts)))) (setq arts (cdr arts))) (while arts (and (or (not unread) (gnus-data-unread-p (car arts))) (vectorp (gnus-data-header (car arts))) (eq simp-subject (find-simple-subj (gnus-data-header (car arts)))) (setq result (car arts) arts nil)) (setq arts (cdr arts))) (and result (goto-char (gnus-data-pos result)) (gnus-data-number result)))) (defun gnus-summary-mark-same-subject (subject &optional unmark) "Mark articles with same SUBJECT as read, and return marked number. If optional argument UNMARK is positive, remove any kinds of marks. If optional argument UNMARK is negative, mark articles as unread instead." (setq subject (find-simple-subj-string subject)) (let ((count 1)) (save-excursion (cond ((null unmark) ; Mark as read. (while (and (progn (gnus-summary-mark-article-as-read gnus-killed-mark) (gnus-summary-show-thread) t) (gnus-summary-find-subject-sym subject)) (setq count (1+ count)))) ((> unmark 0) ; Tick. (while (and (progn (gnus-summary-mark-article-as-unread gnus-ticked-mark) (gnus-summary-show-thread) t) (gnus-summary-find-subject-sym subject)) (setq count (1+ count)))) (t ; Mark as unread. (while (and (progn (gnus-summary-mark-article-as-unread gnus-unread-mark) (gnus-summary-show-thread) t) (gnus-summary-find-subject-sym subject)) (setq count (1+ count))))) (gnus-set-mode-line 'summary) ;; Return the number of marked articles. count))) There may be other routines besides find-subject that could be adapted; I haven't looked. These changes do increase memory use in some ways. An extra element is added to the mail-header vector always, even if it isn't used. Maintaining symbols in an obarray does involve some extra storage beyond the strings themselves, though that's per unique simplified subject. This storage is retained until you leave the newsgroup, unlike the short-lived strings generated currently. Hope this is of interest to someone. If not, maybe I'll get back to it myself sometime.... Ken