Gnus development mailing list
 help / color / mirror / Atom feed
* spam.el is a bit aggressive loading/saving spam-stat data
@ 2003-02-20 18:22 David Z Maze
  2003-02-21  8:16 ` Niklas Morberg
  2003-02-21 15:14 ` Ted Zlatanov
  0 siblings, 2 replies; 23+ messages in thread
From: David Z Maze @ 2003-02-20 18:22 UTC (permalink / raw)


spam.el with spam-stat seems to work fairly well for me (certainly, as
well as ifile ever did).  The one thing I was hoping for out of it,
though, was that since the scoring information lived with the Emacs
process, I wouldn't take a performance hit for reading and parsing a
500K text file for every incoming message.  The code in spam.el seems
to be on the aggressive side loading and saving, though: spam-split,
which is called for every message, calls (spam-stat-load), and both of
the spam-stat-register functions end by calling (spam-stat-save).

Is it straightforward to change this so that (spam-stat-load) happens
once when Gnus starts up, and then (spam-stat-save) is called, say,
along with everything else when I press 's' from the group buffer?
I'd think this would noticably improve splitting performance for me.

Thanks,

-- 
David Maze             dmaze@mit.edu          http://www.mit.edu/~dmaze/
"Theoretical politics is interesting.  Politicking should be illegal."
	-- Abra Mitchell




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: spam.el is a bit aggressive loading/saving spam-stat data
  2003-02-20 18:22 spam.el is a bit aggressive loading/saving spam-stat data David Z Maze
@ 2003-02-21  8:16 ` Niklas Morberg
  2003-02-21 15:14 ` Ted Zlatanov
  1 sibling, 0 replies; 23+ messages in thread
From: Niklas Morberg @ 2003-02-21  8:16 UTC (permalink / raw)


David Z Maze <dmaze@MIT.EDU> writes:

> Is it straightforward to change this so that (spam-stat-load)
> happens once when Gnus starts up, and then (spam-stat-save)
> is called, say, along with everything else when I press 's'
> from the group buffer? I'd think this would noticably improve
> splitting performance for me.

I've also noticed this excessive loading and saving. I never
press `s' in the group buffer though, so this solution wouldn't
be good for me.

Would it make sense to save the file when `g' is pressed in
the group buffer and when you quit gnus instead?

Niklas




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: spam.el is a bit aggressive loading/saving spam-stat data
  2003-02-20 18:22 spam.el is a bit aggressive loading/saving spam-stat data David Z Maze
  2003-02-21  8:16 ` Niklas Morberg
@ 2003-02-21 15:14 ` Ted Zlatanov
  2003-02-21 20:25   ` David Z Maze
  1 sibling, 1 reply; 23+ messages in thread
From: Ted Zlatanov @ 2003-02-21 15:14 UTC (permalink / raw)


On Thu, 20 Feb 2003, dmaze@MIT.EDU wrote:
> Is it straightforward to change this so that (spam-stat-load)
> happens once when Gnus starts up, and then (spam-stat-save) is
> called, say, along with everything else when I press 's' from the
> group buffer?  I'd think this would noticably improve splitting
> performance for me.

How about spam-stat-load on summary entry, and spam-stat-save on
summary exit if spam-use-stat is on?  That seems like the right place
to put those hooks, and you won't have to hit 's' unnecessarily.

Ted



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: spam.el is a bit aggressive loading/saving spam-stat data
  2003-02-21 15:14 ` Ted Zlatanov
@ 2003-02-21 20:25   ` David Z Maze
  2003-02-21 20:49     ` Ted Zlatanov
  0 siblings, 1 reply; 23+ messages in thread
From: David Z Maze @ 2003-02-21 20:25 UTC (permalink / raw)


Ted Zlatanov <tzz@lifelogs.com> writes:
> On Thu, 20 Feb 2003, dmaze@MIT.EDU wrote:
>> Is it straightforward to change this so that (spam-stat-load)
>> happens once when Gnus starts up, and then (spam-stat-save) is
>> called, say, along with everything else when I press 's' from the
>> group buffer?  I'd think this would noticably improve splitting
>> performance for me.
>
> How about spam-stat-load on summary entry, and spam-stat-save on
> summary exit if spam-use-stat is on?

spam-stat-load needs to be called before splitting happens.  My
understanding is that, once it's loaded, it doesn't need to be
reloaded or saved until Emacs exits.

> That seems like the right place to put those hooks, and you won't
> have to hit 's' unnecessarily.

The same code gets called from gnus-group-save-newsrc and
gnus-group-exit, right?  If spam-stat-save really does need to be
called often, then calling it on summary exit makes sense, but
otherwise saving it at the same time as .newsrc.eld would minimize
(possibly slow) disk access.

-- 
David Maze             dmaze@mit.edu          http://www.mit.edu/~dmaze/
"Theoretical politics is interesting.  Politicking should be illegal."
	-- Abra Mitchell




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: spam.el is a bit aggressive loading/saving spam-stat data
  2003-02-21 20:25   ` David Z Maze
@ 2003-02-21 20:49     ` Ted Zlatanov
  2003-02-21 21:06       ` David Z Maze
  2003-02-21 23:58       ` Alex Schroeder
  0 siblings, 2 replies; 23+ messages in thread
From: Ted Zlatanov @ 2003-02-21 20:49 UTC (permalink / raw)


On Fri, 21 Feb 2003, dmaze@MIT.EDU wrote:
> Ted Zlatanov <tzz@lifelogs.com> writes:

>> How about spam-stat-load on summary entry, and spam-stat-save on
>> summary exit if spam-use-stat is on?
> 
> spam-stat-load needs to be called before splitting happens.  My
> understanding is that, once it's loaded, it doesn't need to be
> reloaded or saved until Emacs exits.

You're right, I was not thinking.

>> That seems like the right place to put those hooks, and you won't
>> have to hit 's' unnecessarily.
> 
> The same code gets called from gnus-group-save-newsrc and
> gnus-group-exit, right?  If spam-stat-save really does need to be
> called often, then calling it on summary exit makes sense, but
> otherwise saving it at the same time as .newsrc.eld would minimize
> (possibly slow) disk access.

Well, it doesn't need to be called often, and it could be added to the
save-newsrc hook.  I just thought most people would want their stats
up-to-date on disk, but I can see why it would be good to minimize
disk access.  Maybe we could allow both behaviors?

What do you think about a new variable spam-stat-save-frequency with
choices "often" and "with-newsrc"?  I guess we don't need "like a
maniac" as an option :)

Ted



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: spam.el is a bit aggressive loading/saving spam-stat data
  2003-02-21 20:49     ` Ted Zlatanov
@ 2003-02-21 21:06       ` David Z Maze
  2003-02-21 23:58       ` Alex Schroeder
  1 sibling, 0 replies; 23+ messages in thread
From: David Z Maze @ 2003-02-21 21:06 UTC (permalink / raw)


Ted Zlatanov <tzz@lifelogs.com> writes:
> What do you think about a new variable spam-stat-save-frequency with
> choices "often" and "with-newsrc"?  I guess we don't need "like a
> maniac" as an option :)

<shrug> Works for me.

(setq gnus-is-appallingly-slow nil) ;-)

-- 
David Maze             dmaze@mit.edu          http://www.mit.edu/~dmaze/
"Theoretical politics is interesting.  Politicking should be illegal."
	-- Abra Mitchell




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: spam.el is a bit aggressive loading/saving spam-stat data
  2003-02-21 20:49     ` Ted Zlatanov
  2003-02-21 21:06       ` David Z Maze
@ 2003-02-21 23:58       ` Alex Schroeder
  2003-02-24 21:53         ` Ted Zlatanov
  1 sibling, 1 reply; 23+ messages in thread
From: Alex Schroeder @ 2003-02-21 23:58 UTC (permalink / raw)


Ted Zlatanov <tzz@lifelogs.com> writes:

> Well, it doesn't need to be called often, and it could be added to the
> save-newsrc hook.  I just thought most people would want their stats
> up-to-date on disk, but I can see why it would be good to minimize
> disk access.  Maybe we could allow both behaviors?
>
> What do you think about a new variable spam-stat-save-frequency with
> choices "often" and "with-newsrc"?  I guess we don't need "like a
> maniac" as an option :)

The only problem is that when you quit Gnus without saving, then the
backends might have reclassified an article from spam to non-spam (and
moved the article physically from one group to another), but the
scores are unchanged.  Depending on how you want to look at it, it
seems to me that the only good solution is this:

Only save spam-stat data when newsrc is saved, and add some elisp to
.newsrc-dribble whenever articles are processed.  Then the information
should be more or less in sync, as .newsrc-dribble gets autosaved...
Perhaps .newsrc-dribble gets saved whenever the backends save
something, I haven't investigated.  That would be the perfect
solution, then.

Alex.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: spam.el is a bit aggressive loading/saving spam-stat data
  2003-02-21 23:58       ` Alex Schroeder
@ 2003-02-24 21:53         ` Ted Zlatanov
  2003-02-26  2:23           ` David Z Maze
  0 siblings, 1 reply; 23+ messages in thread
From: Ted Zlatanov @ 2003-02-24 21:53 UTC (permalink / raw)


On Sat, 22 Feb 2003, alex@emacswiki.org wrote:
> The only problem is that when you quit Gnus without saving, then the
> backends might have reclassified an article from spam to non-spam
> (and moved the article physically from one group to another), but
> the scores are unchanged.  Depending on how you want to look at it,
> it seems to me that the only good solution is this:
> 
> Only save spam-stat data when newsrc is saved, and add some elisp to
> .newsrc-dribble whenever articles are processed.  Then the
> information should be more or less in sync, as .newsrc-dribble gets
> autosaved...  Perhaps .newsrc-dribble gets saved whenever the
> backends save something, I haven't investigated.  That would be the
> perfect solution, then.

For now, I moved spam-stat-save out of the classification functions in
spam.el, and added spam-stat-save to the gnus-save-newsrc-hook.  The
add-hook is done iff spam-use-stat is set.

I am not sure how to handle a spam-stat .dribble, if you want to write
that code (spam-stat-load-dribble, spam-stat-save-dribble) I'll be
glad to trigger it from spam.el.

Thanks
Ted



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: spam.el is a bit aggressive loading/saving spam-stat data
  2003-02-24 21:53         ` Ted Zlatanov
@ 2003-02-26  2:23           ` David Z Maze
  2003-02-26 21:15             ` Ted Zlatanov
  0 siblings, 1 reply; 23+ messages in thread
From: David Z Maze @ 2003-02-26  2:23 UTC (permalink / raw)


Ted Zlatanov <tzz@lifelogs.com> writes:
> For now, I moved spam-stat-save out of the classification functions in
> spam.el, and added spam-stat-save to the gnus-save-newsrc-hook.  The
> add-hook is done iff spam-use-stat is set.

Could you also make the parallel change with spam-stat-load?  It could
be added to perhaps gnus-get-new-news-hook, and then removed from
spam-split.  (A little tricky; you want to make sure it's loaded when
Gnus is first started, but don't want to lose updated data when you
rescan groups.)

-- 
David Maze             dmaze@mit.edu          http://www.mit.edu/~dmaze/
"Theoretical politics is interesting.  Politicking should be illegal."
	-- Abra Mitchell




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: spam.el is a bit aggressive loading/saving spam-stat data
  2003-02-26  2:23           ` David Z Maze
@ 2003-02-26 21:15             ` Ted Zlatanov
  2003-03-01 13:26               ` David Z Maze
  0 siblings, 1 reply; 23+ messages in thread
From: Ted Zlatanov @ 2003-02-26 21:15 UTC (permalink / raw)


On Tue, 25 Feb 2003, dmaze@MIT.EDU wrote:
> Could you also make the parallel change with spam-stat-load?  It
> could be added to perhaps gnus-get-new-news-hook, and then removed
> from spam-split.  (A little tricky; you want to make sure it's
> loaded when Gnus is first started, but don't want to lose updated
> data when you rescan groups.)

Oh, I see the problem.  Sorry!  Fixed, I moved spam-stat-load to
gnus-get-new-news-hook as you suggested.

Ted



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: spam.el is a bit aggressive loading/saving spam-stat data
  2003-02-26 21:15             ` Ted Zlatanov
@ 2003-03-01 13:26               ` David Z Maze
  2003-03-01 15:08                 ` Ted Zlatanov
  0 siblings, 1 reply; 23+ messages in thread
From: David Z Maze @ 2003-03-01 13:26 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 790 bytes --]

Ted Zlatanov <tzz@lifelogs.com> writes:

> On Tue, 25 Feb 2003, dmaze@MIT.EDU wrote:
>> Could you also make the parallel change with spam-stat-load?  It
>> could be added to perhaps gnus-get-new-news-hook, and then removed
>> from spam-split.  (A little tricky; you want to make sure it's
>> loaded when Gnus is first started, but don't want to lose updated
>> data when you rescan groups.)
>
> Oh, I see the problem.  Sorry!  Fixed, I moved spam-stat-load to
> gnus-get-new-news-hook as you suggested.

Well, the good news: Gnus no longer takes multiple seconds per message
to read mail.  :-)  The bad news, part 1, is that this only works if
spam-use-stat is defined before spam.el is loaded.  (So if you
(require 'spam) at the top of your .gnus, say, you lose.)  This patch
fixes that:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: spam.el.diff --]
[-- Type: text/x-patch, Size: 940 bytes --]

Index: spam.el
===================================================================
RCS file: /usr/local/cvsroot/gnus/lisp/spam.el,v
retrieving revision 6.77
diff -u -r6.77 spam.el
--- spam.el	28 Feb 2003 21:33:47 -0000	6.77
+++ spam.el	1 Mar 2003 13:40:08 -0000
@@ -810,10 +810,15 @@
 	       (insert article-string)
 	       (spam-stat-buffer-is-non-spam))))))
 
+      (defun spam-maybe-spam-stat-load ()
+	(if spam-use-stat (spam-stat-load)))
+
+      (defun spam-maybe-spam-stat-save ()
+	(if spam-use-stat (spam-stat-save)))
+
       ;; Add hooks for loading and saving the spam stats
-      (when spam-use-stat
-	(add-hook 'gnus-save-newsrc-hook 'spam-stat-save)
-	(add-hook 'gnus-get-new-news-hook 'spam-stat-load)))
+      (add-hook 'gnus-save-newsrc-hook 'spam-maybe-spam-stat-save)
+      (add-hook 'gnus-get-new-news-hook 'spam-maybe-spam-stat-load))
 
   (file-error (progn
 		(defalias 'spam-stat-register-ham-routine 'ignore)

[-- Attachment #3: Type: text/plain, Size: 574 bytes --]


Even with that, spam-stat-load doesn't seem to get called when Gnus is
loaded.  What in gnus-1 causes incoming mail to be read on startup?
The only call to gnus-group-get-new-news there is protected by a call
to gnus-alive-p, and the code seems to leap from reading in all of the
prerequisite files to building the group buffer.  Does something in
the startup sequence actually run gnus-get-new-news-hook?

-- 
David Maze             dmaze@mit.edu          http://www.mit.edu/~dmaze/
"Theoretical politics is interesting.  Politicking should be illegal."
	-- Abra Mitchell

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: spam.el is a bit aggressive loading/saving spam-stat data
  2003-03-01 13:26               ` David Z Maze
@ 2003-03-01 15:08                 ` Ted Zlatanov
  2003-03-02 23:49                   ` David Z Maze
  0 siblings, 1 reply; 23+ messages in thread
From: Ted Zlatanov @ 2003-03-01 15:08 UTC (permalink / raw)
  Cc: ding

On Sat, 01 Mar 2003, dmaze@MIT.EDU wrote:
> Well, the good news: Gnus no longer takes multiple seconds per
> message to read mail.  :-) The bad news, part 1, is that this only
> works if spam-use-stat is defined before spam.el is loaded.  (So if
> you (require 'spam) at the top of your .gnus, say, you lose.)  This
> patch fixes that:

I added the patch, but modified it for style (I prefer when to if for
single logicals), I hope you don't mind :)

> Even with that, spam-stat-load doesn't seem to get called when Gnus
> is loaded.  What in gnus-1 causes incoming mail to be read on
> startup?  The only call to gnus-group-get-new-news there is
> protected by a call to gnus-alive-p, and the code seems to leap from
> reading in all of the prerequisite files to building the group
> buffer.  Does something in the startup sequence actually run
> gnus-get-new-news-hook?

I added 

      (add-hook 'gnus-startup-hook 'spam-maybe-spam-stat-load))

to the list of hook modifications, since loading the stats twice
shouldn't hurt and it seems like the safest way to force the loading.
Let me know if it works.

I don't know why gnus-1 is not calling the stats loading, sorry.

Ted




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: spam.el is a bit aggressive loading/saving spam-stat data
  2003-03-01 15:08                 ` Ted Zlatanov
@ 2003-03-02 23:49                   ` David Z Maze
  2003-03-06 13:22                     ` Niklas Morberg
  0 siblings, 1 reply; 23+ messages in thread
From: David Z Maze @ 2003-03-02 23:49 UTC (permalink / raw)


Ted Zlatanov <tzz@lifelogs.com> writes:

> I added 
>
>       (add-hook 'gnus-startup-hook 'spam-maybe-spam-stat-load))
>
> to the list of hook modifications, since loading the stats twice
> shouldn't hurt and it seems like the safest way to force the loading.
> Let me know if it works.

Seems to be fine now.  Thanks!

-- 
David Maze             dmaze@mit.edu          http://www.mit.edu/~dmaze/
"Theoretical politics is interesting.  Politicking should be illegal."
	-- Abra Mitchell




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: spam.el is a bit aggressive loading/saving spam-stat data
  2003-03-02 23:49                   ` David Z Maze
@ 2003-03-06 13:22                     ` Niklas Morberg
  2003-03-06 15:39                       ` Ted Zlatanov
  0 siblings, 1 reply; 23+ messages in thread
From: Niklas Morberg @ 2003-03-06 13:22 UTC (permalink / raw)


I just noticed that the spam-stat data is loaded when moving
messages, which seems a bit unnecessary. The reason is that
gnus-summary-move-article makes a call to
gnus-group-get-new-news-this-group which in turn loads the
spam-stat file.

Would it be possible to get rid of this as well?

Niklas




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: spam.el is a bit aggressive loading/saving spam-stat data
  2003-03-06 13:22                     ` Niklas Morberg
@ 2003-03-06 15:39                       ` Ted Zlatanov
  2003-03-07 14:27                         ` Bill White
  0 siblings, 1 reply; 23+ messages in thread
From: Ted Zlatanov @ 2003-03-06 15:39 UTC (permalink / raw)


On Thu, 06 Mar 2003, niklas.morberg@axis.com wrote:
> I just noticed that the spam-stat data is loaded when moving
> messages, which seems a bit unnecessary. The reason is that
> gnus-summary-move-article makes a call to
> gnus-group-get-new-news-this-group which in turn loads the
> spam-stat file.
> 
> Would it be possible to get rid of this as well?

I added a gnus-get-top-new-news-hook which is NOT invoked by
gnus-group-get-new-news-this-group to get around this, only by
gnus-get-new-news.  I also moved the spam.el stat loading hook to the
gnus-get-top-new-news-hook.  If anyone has a problem with that new
hook, let me know (but seriously, can Gnus ever have enough hooks? :)

Note I left spam-setup-widening in the gnus-get-new-news-hook
intentionally.

Thanks
Ted



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: spam.el is a bit aggressive loading/saving spam-stat data
  2003-03-06 15:39                       ` Ted Zlatanov
@ 2003-03-07 14:27                         ` Bill White
  2003-03-07 14:38                           ` Niklas Morberg
  2003-03-07 14:55                           ` Ted Zlatanov
  0 siblings, 2 replies; 23+ messages in thread
From: Bill White @ 2003-03-07 14:27 UTC (permalink / raw)


On Thu Mar 06 2003 at 09:39, Ted Zlatanov <tzz@lifelogs.com> said:

> On Thu, 06 Mar 2003, niklas.morberg@axis.com wrote:
>> I just noticed that the spam-stat data is loaded when moving
>> messages, which seems a bit unnecessary. The reason is that
>> gnus-summary-move-article makes a call to
>> gnus-group-get-new-news-this-group which in turn loads the
>> spam-stat file.
>> 
>> Would it be possible to get rid of this as well?
>
> I added a gnus-get-top-new-news-hook which is NOT invoked by
> gnus-group-get-new-news-this-group to get around this, only by
> gnus-get-new-news.  I also moved the spam.el stat loading hook to
> the gnus-get-top-new-news-hook.  If anyone has a problem with that
> new hook, let me know (but seriously, can Gnus ever have enough
> hooks? :)

It takes roughly 20-30 seconds to load my spam-stat file.  Is this a
reasonable duration, or am I doing something wrong?  It's far too long
to wait at each new mail retrieval.

Would it be reasonable to load the spam-stat file only when gnus first
starts, then save it only when quitting gnus or at some other rare
event during which I'm not waiting eagerly for something?

Cheers -

bw
-- 
Bill White . billw@wolfram.com . http://members.wri.com/billw
"No ma'am, we're musicians."




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: spam.el is a bit aggressive loading/saving spam-stat data
  2003-03-07 14:27                         ` Bill White
@ 2003-03-07 14:38                           ` Niklas Morberg
  2003-03-07 15:13                             ` Bill White
  2003-03-07 14:55                           ` Ted Zlatanov
  1 sibling, 1 reply; 23+ messages in thread
From: Niklas Morberg @ 2003-03-07 14:38 UTC (permalink / raw)


Bill White <billw@wolfram.com> writes:

> It takes roughly 20-30 seconds to load my spam-stat file.
> Is this a reasonable duration, or am I doing something
> wrong?

It takes ~5 seconds for me. But that does not necessarily
mean that there's anything wrong with your setup. I guess
the more mails you receive and use for training, the bigger
the spam-stat file gets.

Niklas




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: spam.el is a bit aggressive loading/saving spam-stat data
  2003-03-07 14:27                         ` Bill White
  2003-03-07 14:38                           ` Niklas Morberg
@ 2003-03-07 14:55                           ` Ted Zlatanov
  2003-03-07 22:55                             ` A.J. Rossini
  1 sibling, 1 reply; 23+ messages in thread
From: Ted Zlatanov @ 2003-03-07 14:55 UTC (permalink / raw)


On Fri, 07 Mar 2003, billw@wolfram.com wrote:
> It takes roughly 20-30 seconds to load my spam-stat file.  Is this a
> reasonable duration, or am I doing something wrong?  It's far too
> long to wait at each new mail retrieval.

Alex Schroeder, the spam-stat.el maintainer, might be able to answer.
Your stat file may be simply too large, and I don't think Alex has
written stats expiration code.

For large files like the stats database, even with expiration there
may be too much data, especially over NFS.  It would be nice if
spam-stat.el could use a persistent connection to a SQL database, or a
local database file (Berkeley DB, for instance).

As far as the SQL goes, I can write a simple Perl daemon that
maintains a connection to a database table and understands the basic
commands spam-stat.el needs (basically just read word stats/store word
stats).  I don't know if there's something within Emacs that could
help, so we don't have to have an external utility.

> Would it be reasonable to load the spam-stat file only when gnus
> first starts, then save it only when quitting gnus or at some other
> rare event during which I'm not waiting eagerly for something?

You could remove spam-stat-maybe-load from gnus-get-top-new-news-hook,
it's in gnus-startup-hook already if you have set spam-use-stat when
loading spam.el.

Ted



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: spam.el is a bit aggressive loading/saving spam-stat data
  2003-03-07 14:38                           ` Niklas Morberg
@ 2003-03-07 15:13                             ` Bill White
  2003-03-10  8:05                               ` Niklas Morberg
  0 siblings, 1 reply; 23+ messages in thread
From: Bill White @ 2003-03-07 15:13 UTC (permalink / raw)


On Fri Mar 07 2003 at 08:38, Niklas Morberg <niklas.morberg@axis.com> said:

> Bill White <billw@wolfram.com> writes:
>
>> It takes roughly 20-30 seconds to load my spam-stat file.
>> Is this a reasonable duration, or am I doing something
>> wrong?
>
> It takes ~5 seconds for me. But that does not necessarily mean that
> there's anything wrong with your setup. I guess the more mails you
> receive and use for training, the bigger the spam-stat file gets.

I get around 100-200 spams a day, which, amazingly, is entirely
manageable with the new spam-aware gnus.

Meanwhile,

   File size: (nth 7 (file-attributes spam-stat-file)) => 928737
   Number of words: (hash-table-count spam-stat) => 60066

   Reduce table size: (spam-stat-reduce-size)
   Save table: (spam-stat-save)
   File size: (nth 7 (file-attributes spam-stat-file)) => 261781
   Number of words: (hash-table-count spam-stat) => 17144

1 - Perhaps the table size reduction could be automated in a hook at some
daily gnus event (I quit gnus each day when I leave the office, so
that would be a reasonable time for me).  I'll try this today:

(add-hook 'gnus-exit-gnus-hook 'spam-stat-reduce-size 'spam-stat-save)

2 - Would it help to byte-compile spam-stat-file?

Cheers -

bw
-- 
Bill White . billw@wolfram.com . http://members.wri.com/billw
"No ma'am, we're musicians."




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: spam.el is a bit aggressive loading/saving spam-stat data
  2003-03-07 14:55                           ` Ted Zlatanov
@ 2003-03-07 22:55                             ` A.J. Rossini
  2003-03-08  0:49                               ` Alex Schroeder
  0 siblings, 1 reply; 23+ messages in thread
From: A.J. Rossini @ 2003-03-07 22:55 UTC (permalink / raw)


Ted Zlatanov <tzz@lifelogs.com> writes:

> Alex Schroeder, the spam-stat.el maintainer, might be able to answer.
> Your stat file may be simply too large, and I don't think Alex has
> written stats expiration code.

Actually, that generates an interesting statistical question -- how to
estimate the temporal window/down-weighting of scores to adaptively
optimize sensitivity/specificity in a time-heterogeneous setting,
within the context of a reciever-operating-characteristic (ROC)
curve... 

Guess I'll have to look at the code and hack... 

best,
-tony

-- 
A.J. Rossini				Rsrch. Asst. Prof. of Biostatistics
U. of Washington Biostatistics		rossini@u.washington.edu	
FHCRC/SCHARP/HIV Vaccine Trials Net	rossini@scharp.org
-------------- http://software.biostat.washington.edu/ --------------------
FHCRC:Tu: 206-667-7025 (fax=4812)|Voicemail is pretty sketchy/use Email
UW:   Th: 206-543-1044 (fax=3286)|Change last 4 digits of phone to FAX
(CHANGE: monday/wednesday/friday locations are completely unpredictable.)



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: spam.el is a bit aggressive loading/saving spam-stat data
  2003-03-07 22:55                             ` A.J. Rossini
@ 2003-03-08  0:49                               ` Alex Schroeder
  0 siblings, 0 replies; 23+ messages in thread
From: Alex Schroeder @ 2003-03-08  0:49 UTC (permalink / raw)


rossini@blindglobe.net (A.J. Rossini) writes:

> Actually, that generates an interesting statistical question -- how to
> estimate the temporal window/down-weighting of scores to adaptively
> optimize sensitivity/specificity in a time-heterogeneous setting,
> within the context of a reciever-operating-characteristic (ROC)
> curve... 

Heh.  Whatever.  :)

It works for me because I use only spam-stat.el -- no spam.el!  I have
a few thousand mails in mail.misc and mail.spam -- and should I ever
feel that I have too much (eg. more than a year worth of spam), then I
can just delete it manually.  Every now and then I delete my
dictionary and run an Emacs just to recompute the dictionary.  The
size of the ~/.spam-stat.el file is currently 447272 bytes, 28143
words, 5290 non-spam mails, and 466 spam mails.  It works very well.
At work I have over 4000 spam mails and a much smaller number of
non-spam mail.  A lot of spam is not caught.

Alex.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: spam.el is a bit aggressive loading/saving spam-stat data
  2003-03-07 15:13                             ` Bill White
@ 2003-03-10  8:05                               ` Niklas Morberg
  2003-03-11 12:53                                 ` Bill White
  0 siblings, 1 reply; 23+ messages in thread
From: Niklas Morberg @ 2003-03-10  8:05 UTC (permalink / raw)


Bill White <billw@wolfram.com> writes:

>    Reduce table size: (spam-stat-reduce-size)

Ah. I had forgetten about this. Thanks for pointing it out.

My file size went from 550572 to 109292 and the number of
words from 36239 to 7426. 

> 1 - Perhaps the table size reduction could be automated in
> a hook at some daily gnus event (I quit gnus each day when
> I leave the office, so that would be a reasonable time for
> me). I'll try this today:
>
> (add-hook 'gnus-exit-gnus-hook 'spam-stat-reduce-size 'spam-stat-save)

Nice. I'll use gnus-demon-add-handler instead since I always
keep gnus up and running.

Niklas




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: spam.el is a bit aggressive loading/saving spam-stat data
  2003-03-10  8:05                               ` Niklas Morberg
@ 2003-03-11 12:53                                 ` Bill White
  0 siblings, 0 replies; 23+ messages in thread
From: Bill White @ 2003-03-11 12:53 UTC (permalink / raw)


On Monday of the First Week of Lent, A. D. 2003, at 02:05, Niklas Morberg <niklas.morberg@axis.com> said:

> Bill White <billw@wolfram.com> writes:
>
>>    Reduce table size: (spam-stat-reduce-size)
>
> Ah. I had forgetten about this. Thanks for pointing it out.
>
> My file size went from 550572 to 109292 and the number of words from
> 36239 to 7426.
>
>> 1 - Perhaps the table size reduction could be automated in a hook
>> at some daily gnus event (I quit gnus each day when I leave the
>> office, so that would be a reasonable time for me). I'll try this
>> today:
>>
>> (add-hook 'gnus-exit-gnus-hook 'spam-stat-reduce-size 'spam-stat-save)
>
> Nice. I'll use gnus-demon-add-handler instead since I always keep
> gnus up and running.

Oops - of course that should be

(add-hook 'gnus-exit-gnus-hook 'spam-stat-reduce-size)
(add-hook 'gnus-exit-gnus-hook 'spam-stat-save)

Cheers -

bw
-- 
Bill White . billw@wolfram.com . http://members.wri.com/billw
"No ma'am, we're musicians."




^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2003-03-11 12:53 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-02-20 18:22 spam.el is a bit aggressive loading/saving spam-stat data David Z Maze
2003-02-21  8:16 ` Niklas Morberg
2003-02-21 15:14 ` Ted Zlatanov
2003-02-21 20:25   ` David Z Maze
2003-02-21 20:49     ` Ted Zlatanov
2003-02-21 21:06       ` David Z Maze
2003-02-21 23:58       ` Alex Schroeder
2003-02-24 21:53         ` Ted Zlatanov
2003-02-26  2:23           ` David Z Maze
2003-02-26 21:15             ` Ted Zlatanov
2003-03-01 13:26               ` David Z Maze
2003-03-01 15:08                 ` Ted Zlatanov
2003-03-02 23:49                   ` David Z Maze
2003-03-06 13:22                     ` Niklas Morberg
2003-03-06 15:39                       ` Ted Zlatanov
2003-03-07 14:27                         ` Bill White
2003-03-07 14:38                           ` Niklas Morberg
2003-03-07 15:13                             ` Bill White
2003-03-10  8:05                               ` Niklas Morberg
2003-03-11 12:53                                 ` Bill White
2003-03-07 14:55                           ` Ted Zlatanov
2003-03-07 22:55                             ` A.J. Rossini
2003-03-08  0:49                               ` Alex Schroeder

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).