spam.el: generic bayes interface?

Gnus development mailing list
 help / color / mirror / Atom feed

* spam.el: generic bayes interface?
@ 2004-01-20 21:17 Reiner Steib
  2004-01-21  0:08 ` Ted Zlatanov
  0 siblings, 1 reply; 6+ messages in thread
From: Reiner Steib @ 2004-01-20 21:17 UTC (permalink / raw)


Hi,

in the German Gnus group someone asked how to use the
SpamAssassin/Bayes (see sa-learn(1)) thingie with Gnus.  I happily
pointed him to `spam.el' and the fine manual.  But it turned out that
there is no interface for SpamAssassin/Bayes in `spam.el' (or at least
I couldn't locate it).

I assume that SpamAssassin/Bayes works very similar to bogofilter [1],
so it probably works by abusing the `spam-bogofilter-*' [2] variables.
But this is a quite dubious approach, IMHO.  Wouldn't it make sense to
add a generic bayes interface with say `spam-bayes-...' variables
(similar to the `browse-url-generic*' variables) instead of adding a
set of variables for each (new) Bayesian filter?

Bye, Reiner.

[1] E.g. for training spam, he has to use "sa-learn --spam".

[2] M-x apropos-variable RET spam-bogofilter- RET
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo--- PGP key available via WWW   http://rsteib.home.pages.de/




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: spam.el: generic bayes interface?
  2004-01-20 21:17 spam.el: generic bayes interface? Reiner Steib
@ 2004-01-21  0:08 ` Ted Zlatanov
  2004-01-21  4:02   ` Hubert Chan
  0 siblings, 1 reply; 6+ messages in thread
From: Ted Zlatanov @ 2004-01-21  0:08 UTC (permalink / raw)
  Cc: Hubert Chan

On Tue, 20 Jan 2004, 4.uce.03.r.s@nurfuerspam.de wrote:

> in the German Gnus group someone asked how to use the
> SpamAssassin/Bayes (see sa-learn(1)) thingie with Gnus.  I happily
> pointed him to `spam.el' and the fine manual.  But it turned out
> that there is no interface for SpamAssassin/Bayes in `spam.el' (or
> at least I couldn't locate it).

Yes, spam-use-regex-headers will do the right thing for splitting
incoming mail, but there's no SA specific backend.  Hubert Chan wrote
a SA backend, and I have been late in replying to his questions.
It's coming, though.

> I assume that SpamAssassin/Bayes works very similar to bogofilter
> [1], so it probably works by abusing the `spam-bogofilter-*' [2]
> variables.  But this is a quite dubious approach, IMHO.  Wouldn't it
> make sense to add a generic bayes interface with say
> `spam-bayes-...' variables (similar to the `browse-url-generic*'
> variables) instead of adding a set of variables for each (new)
> Bayesian filter?

The problem is that then you force people into just one Bayesian
approach (how would SA and bogofilter work together?), and I'm not
sure it's a good idea.  Granted, most people use just one Bayesian
filter, so it's probably nice to switch filters with just one thing.

But consider that the registry must track which Bayesian backend has
registered which message.  Let's say the registry knows that
spam-use-bayesian has registered message A, and that was Bogofilter at
the time, but the user switches to SA later.  Now the registry doesn't
know that SA has not registered message A, and spam.el will
not re-register message A.  It's just an example, but things will be
slightly harder to track in general.

Also, I can't drop the current Bayesian spam-use-* backends that users
are using.  So now we will have the general case of spam-use-bayesian
plus the specific backends.  Seems pretty confusing.

I would prefer to make adding new Bayesian backends easy, but give
them separate spam-use-BACKEND symbols.  Hubert's work will be
helpful here, because I've been too lazy/busy to write a good
example :)

Ted

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: spam.el: generic bayes interface?
  2004-01-21  0:08 ` Ted Zlatanov
@ 2004-01-21  4:02   ` Hubert Chan
  2004-01-21 18:47     ` Ted Zlatanov
  0 siblings, 1 reply; 6+ messages in thread
From: Hubert Chan @ 2004-01-21  4:02 UTC (permalink / raw)


>>>>> "Ted" == Ted Zlatanov <tzz@lifelogs.com> writes:

[...]

Ted> Yes, spam-use-regex-headers will do the right thing for splitting
Ted> incoming mail, but there's no SA specific backend.  Hubert Chan
Ted> wrote a SA backend, and I have been late in replying to his
Ted> questions.  It's coming, though.

I see it in CVS now. ;-)  I promised to write documentation too, but
that won't happen until at least next week some time.  In the mean
time, though, the variable documentation should probably suffice for
most people.

[...]

Ted> The problem is that then you force people into just one Bayesian
Ted> approach (how would SA and bogofilter work together?), and I'm not
Ted> sure it's a good idea.  Granted, most people use just one Bayesian
Ted> filter, so it's probably nice to switch filters with just one
Ted> thing.

Well, there are at least some good reasons that someone might want to
use multiple Bayesian filters.  For example, one might want to just try
out the effectiveness of one filter, while retaining their original
filter as a backup.  Also, if one wishes to switch Bayesian filters,
and does not have a corpus of spam/ham to train the filter, there would
have to be a transition time during which the new filter is trained,
while the old filter is still being used for splitting.  And, of
course, during this time, one would still want to keep training the old
filter at the same time.

This got me thinking, though, Ted, that the registration code for the
spam/ham processors is pretty similar.  They seem to mostly work in one
of two ways -- either register one at a time, or register multiple
articles at a time in a mbox-style format.  I think they all feed the
articles via standard input.  I would imagine that we would be able to
share a lot of common code.  Maybe write a function that feeds the
article(s) to the registration program, and pass the name of the
program and its arguments as arguments to that function.  Then the
registration functions just have to call that function with the
appropriate arguments.  Hmm.  I'll have to look at the code to see if
that would actually work...

-- 
Hubert Chan <hubert@uhoreg.ca> - http://www.uhoreg.ca/
PGP/GnuPG key: 1024D/124B61FA
Fingerprint: 96C5 012F 5F74 A5F7 1FF7  5291 AF29 C719 124B 61FA
Key available at wwwkeys.pgp.net.   Encrypted e-mail preferred.




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: spam.el: generic bayes interface?
  2004-01-21  4:02   ` Hubert Chan
@ 2004-01-21 18:47     ` Ted Zlatanov
  2004-01-21 20:24       ` Hubert Chan
  0 siblings, 1 reply; 6+ messages in thread
From: Ted Zlatanov @ 2004-01-21 18:47 UTC (permalink / raw)
  Cc: Hubert Chan

On Tue, 20 Jan 2004, hubert@uhoreg.ca wrote:

> Well, there are at least some good reasons that someone might want
> to use multiple Bayesian filters.  For example, one might want to
> just try out the effectiveness of one filter, while retaining their
> original filter as a backup.  Also, if one wishes to switch Bayesian
> filters, and does not have a corpus of spam/ham to train the filter,
> there would have to be a transition time during which the new filter
> is trained, while the old filter is still being used for splitting.
> And, of course, during this time, one would still want to keep
> training the old filter at the same time.

I'm OK with that, we can add a spam-use-generic-bayesian if it's
necessary.  I just think customization, registry tracking, and other
things won't work so well when we generalize the interface too much.

If you or someone else produces that generic-bayesian backend, I
don't see a problem with putting it in.  We can't anticipate the new
bayesian filters people might want to use, after all.

> This got me thinking, though, Ted, that the registration code for
> the spam/ham processors is pretty similar.  They seem to mostly work
> in one of two ways -- either register one at a time, or register
> multiple articles at a time in a mbox-style format.  

Yes, I've noticed that too after the 3rd time I wrote that code :)

> I think they all feed the articles via standard input.  I would
> imagine that we would be able to share a lot of common code.  Maybe
> write a function that feeds the article(s) to the registration
> program, and pass the name of the program and its arguments as
> arguments to that function.  Then the registration functions just
> have to call that function with the appropriate arguments.  Hmm.
> I'll have to look at the code to see if that would actually work...

It could work.  I've been trying to make the functions generic on the
API side, now it's time to make them generic on the backend side as
well.  I'm afraid it will make the code more complex, but adding new
backends should be significantly easier.

I'll work on gnus-encrypt.el first though, so feel free to start on
this if you have the interest :)

Thanks
Ted



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: spam.el: generic bayes interface?
  2004-01-21 18:47     ` Ted Zlatanov
@ 2004-01-21 20:24       ` Hubert Chan
  2004-01-22 18:23         ` Ted Zlatanov
  0 siblings, 1 reply; 6+ messages in thread
From: Hubert Chan @ 2004-01-21 20:24 UTC (permalink / raw)


>>>>> "Ted" == Ted Zlatanov <tzz@lifelogs.com> writes:

[...]

Ted> If you or someone else produces that generic-bayesian backend, I
Ted> don't see a problem with putting it in.  We can't anticipate the
Ted> new bayesian filters people might want to use, after all.

I don't plan on doing that.  I can't see any good way to generalize the
interfaces in any way that makes sense.

[...]

Ted> I'll work on gnus-encrypt.el first though, so feel free to start on
                  ^^^^^^^^^^^^^^^

Oooh.  Cool... ;-)  ... What would this do?

Ted> this if you have the interest :)

I won't have time to do anything until at least next week.  Possibly
even later.  And then there are a few things that I want to work on
first -- documentation, the email forwarding backend.  Maybe we can
have a race to see who's able to scrape up some free time first. ;-)

Hubert




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: spam.el: generic bayes interface?
  2004-01-21 20:24       ` Hubert Chan
@ 2004-01-22 18:23         ` Ted Zlatanov
  0 siblings, 0 replies; 6+ messages in thread
From: Ted Zlatanov @ 2004-01-22 18:23 UTC (permalink / raw)
  Cc: Hubert Chan

On Wed, 21 Jan 2004, hubert@uhoreg.ca wrote:

>>>>>> "Ted" == Ted Zlatanov <tzz@lifelogs.com> writes:
> 
> Ted> I'll work on gnus-encrypt.el
>                   ^^^^^^^^^^^^^^^
> Oooh.  Cool... ;-)  ... What would this do?

Encrypt and decrypt files such as .authinfo and .newsrc.eld.  I'll
have something ready for testing soon.

Ted



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2004-01-22 18:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-01-20 21:17 spam.el: generic bayes interface? Reiner Steib
2004-01-21  0:08 ` Ted Zlatanov
2004-01-21  4:02   ` Hubert Chan
2004-01-21 18:47     ` Ted Zlatanov
2004-01-21 20:24       ` Hubert Chan
2004-01-22 18:23         ` Ted Zlatanov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).