Gnus development mailing list
 help / color / mirror / Atom feed
* massive cpu time for scoring large imap folders?
@ 2009-10-08  0:18 Greg Troxel
  2009-10-08  6:51 ` Tassilo Horn
  0 siblings, 1 reply; 6+ messages in thread
From: Greg Troxel @ 2009-10-08  0:18 UTC (permalink / raw)
  To: ding

[-- Attachment #1: Type: text/plain, Size: 800 bytes --]


I am running

  emacs22
  cvs gnus head
  dovecot 1.1.19 (via ssl on same machine)
  NetBSD/i386 5.0ish
  2.8GHz intel cpu, 2G RAM, etc. - not a super slow machine

All my mail is in IMAP on the local machine, accessed over imap/ssl from
gnus and other places.  Generally all is well.  But, I have group (svn
commit messages and trac updates from a project I've ignored for a
month) and it has 1000 unread messages.

I can enter the group with '200<space>' and see the recent messages, and
some of them even might get expired (auto-expire t).  But trying to
enter the whole group uses gobs of CPU time.  I just started
'gnus-batch-score' and so far it has used 9 minutes of cpu time.  I'll
let it run overnight.

Is there some n^3 algorithm lurking?  I'd expect scoring to be
per-message linear.




[-- Attachment #2: Type: application/pgp-signature, Size: 194 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: massive cpu time for scoring large imap folders?
  2009-10-08  0:18 massive cpu time for scoring large imap folders? Greg Troxel
@ 2009-10-08  6:51 ` Tassilo Horn
  2009-10-08 14:46   ` Greg Troxel
  0 siblings, 1 reply; 6+ messages in thread
From: Tassilo Horn @ 2009-10-08  6:51 UTC (permalink / raw)
  To: Greg Troxel; +Cc: ding

Greg Troxel <gdt@work.lexort.com> writes:

Hi Greg!

> All my mail is in IMAP on the local machine, accessed over imap/ssl
> from gnus and other places.  Generally all is well.  But, I have group
> (svn commit messages and trac updates from a project I've ignored for
> a month) and it has 1000 unread messages.

Quite similar config to mine.  But for me, scoring is quite fast,
although it doesn't happen often that there are more than a few hundreds
of new unread messages.  And I have to admit, that I don't use very
sophisticated scoring rules.  It's basically only adaptive scoring and
scoring articles that reference my own messages...

> I can enter the group with '200<space>' and see the recent messages,
> and some of them even might get expired (auto-expire t).  But trying
> to enter the whole group uses gobs of CPU time.  I just started
> 'gnus-batch-score' and so far it has used 9 minutes of cpu time.  I'll
> let it run overnight.

9 minutes for scoring 1000 messages on such a fast machine is much too
long, AFAICT.  Or do you have some exorbitant scoring rules, for example
many rules that score on headers not in `gnus-extra-headers' /
`nnmail-extra-headers', or even worse on the body of messages?

Anyway, you might want to start debugging like that.

  M-x toggle-debug-on-quit
  Try to enter that group and wait for the scoring to start
  C-g

Then you get a backtrace buffer, and maybe that helps you.

Bye,
Tassilo



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: massive cpu time for scoring large imap folders?
  2009-10-08  6:51 ` Tassilo Horn
@ 2009-10-08 14:46   ` Greg Troxel
  2009-10-08 17:52     ` Tassilo Horn
  2009-10-08 20:05     ` Ted Zlatanov
  0 siblings, 2 replies; 6+ messages in thread
From: Greg Troxel @ 2009-10-08 14:46 UTC (permalink / raw)
  To: ding


Tassilo Horn <tassilo@member.fsf.org> writes:

>> All my mail is in IMAP on the local machine, accessed over imap/ssl
>> from gnus and other places.  Generally all is well.  But, I have group
>> (svn commit messages and trac updates from a project I've ignored for
>> a month) and it has 1000 unread messages.
>
> Quite similar config to mine.  But for me, scoring is quite fast,
> although it doesn't happen often that there are more than a few hundreds
> of new unread messages.  And I have to admit, that I don't use very
> sophisticated scoring rules.  It's basically only adaptive scoring and
> scoring articles that reference my own messages...
>
>> I can enter the group with '200<space>' and see the recent messages,
>> and some of them even might get expired (auto-expire t).  But trying
>> to enter the whole group uses gobs of CPU time.  I just started
>> 'gnus-batch-score' and so far it has used 9 minutes of cpu time.  I'll
>> let it run overnight.
>
> 9 minutes for scoring 1000 messages on such a fast machine is much too
> long, AFAICT.  Or do you have some exorbitant scoring rules, for example
> many rules that score on headers not in `gnus-extra-headers' /
> `nnmail-extra-headers', or even worse on the body of messages?
>
> Anyway, you might want to start debugging like that.
>
>   M-x toggle-debug-on-quit
>   Try to enter that group and wait for the scoring to start
>   C-g
>
> Then you get a backtrace buffer, and maybe that helps you.

Here's my score file, anonymized but structurally the same - nothing
fancy.

(("subject"
  ("Delivery Status Notification (Failure)" -1000 nil r)
  ("Undelivered Mail Returned to Sender" -1000 nil r)
  (".* foo-utils/\\(trunk\\|branches\\)/foo-controller" -1000 nil r)
  (".* \\(bar\\|baz\\|routing\\|foo-opnet\\|bam\\|xyz-.*\\|docs-team\\)/\\(trunk\\|branches\\)" -1000 nil r)
  (".* extern/\\(abc\\|def-reasoner\\)/\\(trunk\\|branches\\)" -1000 nil r)
  (".* framework/trunk/opnet" -1000 nil r)
  ("Flamebox Status Update" -1000 nil r)
  ("framework/trunk/build" nil nil r)
  (".*docs/trunk/Protocols/XYZ/papers" -1000 nil r)))


I turned on debug and gave it a few seconds and then:

Debugger entered--Lisp error: (quit)
  re-search-forward(".* foo-utils/\\(trunk\\|branches\\)/foo-controller" nil t)
  gnus-score-string((((touched nil) ("subject" ... ... ... ... ... ... ... ... ...))) "subject" 733688 733681 nil)
  gnus-score-headers(("/home/gdt/News/nnimap+ir.bbn.com:foo.svn.SCORE") nil)
  gnus-possibly-score-headers()
  gnus-summary-read-group-1("nnimap+ir.bbn.com:foo.svn" nil nil nil nil nil)
  gnus-summary-read-group("nnimap+ir.bbn.com:foo.svn" nil nil nil nil nil nil)
  gnus-group-read-group(nil nil nil)
  gnus-topic-read-group(nil)
  call-interactively(gnus-topic-read-group)

which looks sane.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: massive cpu time for scoring large imap folders?
  2009-10-08 14:46   ` Greg Troxel
@ 2009-10-08 17:52     ` Tassilo Horn
  2009-10-08 18:16       ` Greg Troxel
  2009-10-08 20:05     ` Ted Zlatanov
  1 sibling, 1 reply; 6+ messages in thread
From: Tassilo Horn @ 2009-10-08 17:52 UTC (permalink / raw)
  To: Greg Troxel; +Cc: ding

Greg Troxel <gdt@work.lexort.com> writes:

Hi Greg,

> Here's my score file, anonymized but structurally the same - nothing
> fancy.
>
> (("subject"
>   ("Delivery Status Notification (Failure)" -1000 nil r)
>   ("Undelivered Mail Returned to Sender" -1000 nil r)
>   (".* foo-utils/\\(trunk\\|branches\\)/foo-controller" -1000 nil r)
>   (".* \\(bar\\|baz\\|routing\\|foo-opnet\\|bam\\|xyz-.*\\|docs-team\\)/\\(trunk\\|branches\\)" -1000 nil r)
>   (".* extern/\\(abc\\|def-reasoner\\)/\\(trunk\\|branches\\)" -1000 nil r)
>   (".* framework/trunk/opnet" -1000 nil r)
>   ("Flamebox Status Update" -1000 nil r)
>   ("framework/trunk/build" nil nil r)
>   (".*docs/trunk/Protocols/XYZ/papers" -1000 nil r)))

Just a wild guess, but does is speed up scoring to put Subject into
gnus-extra-headers and nnmail-extra-headers?

Bye,
Tassilo



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: massive cpu time for scoring large imap folders?
  2009-10-08 17:52     ` Tassilo Horn
@ 2009-10-08 18:16       ` Greg Troxel
  0 siblings, 0 replies; 6+ messages in thread
From: Greg Troxel @ 2009-10-08 18:16 UTC (permalink / raw)
  To: ding


Tassilo Horn <tassilo@member.fsf.org> writes:

> Greg Troxel <gdt@work.lexort.com> writes:
>
> Hi Greg,
>
>> Here's my score file, anonymized but structurally the same - nothing
>> fancy.
>>
>> (("subject"
>>   ("Delivery Status Notification (Failure)" -1000 nil r)
>>   ("Undelivered Mail Returned to Sender" -1000 nil r)
>>   (".* foo-utils/\\(trunk\\|branches\\)/foo-controller" -1000 nil r)
>>   (".* \\(bar\\|baz\\|routing\\|foo-opnet\\|bam\\|xyz-.*\\|docs-team\\)/\\(trunk\\|branches\\)" -1000 nil r)
>>   (".* extern/\\(abc\\|def-reasoner\\)/\\(trunk\\|branches\\)" -1000 nil r)
>>   (".* framework/trunk/opnet" -1000 nil r)
>>   ("Flamebox Status Update" -1000 nil r)
>>   ("framework/trunk/build" nil nil r)
>>   (".*docs/trunk/Protocols/XYZ/papers" -1000 nil r)))
>
> Just a wild guess, but does is speed up scoring to put Subject into
> gnus-extra-headers and nnmail-extra-headers?

Good guess!  Putting Subject in gnus-extra-headers does wonders for
speed - now fast enough I wouldn't have even thought about asking about
it.

Should gnus be changed?  Scoring on subject seems very normal.




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: massive cpu time for scoring large imap folders?
  2009-10-08 14:46   ` Greg Troxel
  2009-10-08 17:52     ` Tassilo Horn
@ 2009-10-08 20:05     ` Ted Zlatanov
  1 sibling, 0 replies; 6+ messages in thread
From: Ted Zlatanov @ 2009-10-08 20:05 UTC (permalink / raw)
  To: ding

On Thu, 08 Oct 2009 10:46:18 -0400 Greg Troxel <gdt@work.lexort.com> wrote: 

GT> Here's my score file, anonymized but structurally the same - nothing
GT> fancy.

GT> (("subject"
GT>   ("Delivery Status Notification (Failure)" -1000 nil r)
GT>   ("Undelivered Mail Returned to Sender" -1000 nil r)
GT>   (".* foo-utils/\\(trunk\\|branches\\)/foo-controller" -1000 nil r)
GT>   (".* \\(bar\\|baz\\|routing\\|foo-opnet\\|bam\\|xyz-.*\\|docs-team\\)/\\(trunk\\|branches\\)" -1000 nil r)
GT>   (".* extern/\\(abc\\|def-reasoner\\)/\\(trunk\\|branches\\)" -1000 nil r)
GT>   (".* framework/trunk/opnet" -1000 nil r)
GT>   ("Flamebox Status Update" -1000 nil r)
GT>   ("framework/trunk/build" nil nil r)
GT>   (".*docs/trunk/Protocols/XYZ/papers" -1000 nil r)))

.* may be unnecessary.  Can you try without it?

I would cut out rules with a binary search until I either find the one
that's causing problems, or determine it's not any particular rule.

Ted




^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-10-08 20:05 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-10-08  0:18 massive cpu time for scoring large imap folders? Greg Troxel
2009-10-08  6:51 ` Tassilo Horn
2009-10-08 14:46   ` Greg Troxel
2009-10-08 17:52     ` Tassilo Horn
2009-10-08 18:16       ` Greg Troxel
2009-10-08 20:05     ` Ted Zlatanov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).