* spam.el: generic bayes interface? @ 2004-01-20 21:17 Reiner Steib 2004-01-21 0:08 ` Ted Zlatanov 0 siblings, 1 reply; 6+ messages in thread From: Reiner Steib @ 2004-01-20 21:17 UTC (permalink / raw) Hi, in the German Gnus group someone asked how to use the SpamAssassin/Bayes (see sa-learn(1)) thingie with Gnus. I happily pointed him to `spam.el' and the fine manual. But it turned out that there is no interface for SpamAssassin/Bayes in `spam.el' (or at least I couldn't locate it). I assume that SpamAssassin/Bayes works very similar to bogofilter [1], so it probably works by abusing the `spam-bogofilter-*' [2] variables. But this is a quite dubious approach, IMHO. Wouldn't it make sense to add a generic bayes interface with say `spam-bayes-...' variables (similar to the `browse-url-generic*' variables) instead of adding a set of variables for each (new) Bayesian filter? Bye, Reiner. [1] E.g. for training spam, he has to use "sa-learn --spam". [2] M-x apropos-variable RET spam-bogofilter- RET -- ,,, (o o) ---ooO-(_)-Ooo--- PGP key available via WWW http://rsteib.home.pages.de/ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: spam.el: generic bayes interface? 2004-01-20 21:17 spam.el: generic bayes interface? Reiner Steib @ 2004-01-21 0:08 ` Ted Zlatanov 2004-01-21 4:02 ` Hubert Chan 0 siblings, 1 reply; 6+ messages in thread From: Ted Zlatanov @ 2004-01-21 0:08 UTC (permalink / raw) Cc: Hubert Chan On Tue, 20 Jan 2004, 4.uce.03.r.s@nurfuerspam.de wrote: > in the German Gnus group someone asked how to use the > SpamAssassin/Bayes (see sa-learn(1)) thingie with Gnus. I happily > pointed him to `spam.el' and the fine manual. But it turned out > that there is no interface for SpamAssassin/Bayes in `spam.el' (or > at least I couldn't locate it). Yes, spam-use-regex-headers will do the right thing for splitting incoming mail, but there's no SA specific backend. Hubert Chan wrote a SA backend, and I have been late in replying to his questions. It's coming, though. > I assume that SpamAssassin/Bayes works very similar to bogofilter > [1], so it probably works by abusing the `spam-bogofilter-*' [2] > variables. But this is a quite dubious approach, IMHO. Wouldn't it > make sense to add a generic bayes interface with say > `spam-bayes-...' variables (similar to the `browse-url-generic*' > variables) instead of adding a set of variables for each (new) > Bayesian filter? The problem is that then you force people into just one Bayesian approach (how would SA and bogofilter work together?), and I'm not sure it's a good idea. Granted, most people use just one Bayesian filter, so it's probably nice to switch filters with just one thing. But consider that the registry must track which Bayesian backend has registered which message. Let's say the registry knows that spam-use-bayesian has registered message A, and that was Bogofilter at the time, but the user switches to SA later. Now the registry doesn't know that SA has not registered message A, and spam.el will not re-register message A. It's just an example, but things will be slightly harder to track in general. Also, I can't drop the current Bayesian spam-use-* backends that users are using. So now we will have the general case of spam-use-bayesian plus the specific backends. Seems pretty confusing. I would prefer to make adding new Bayesian backends easy, but give them separate spam-use-BACKEND symbols. Hubert's work will be helpful here, because I've been too lazy/busy to write a good example :) Ted ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: spam.el: generic bayes interface? 2004-01-21 0:08 ` Ted Zlatanov @ 2004-01-21 4:02 ` Hubert Chan 2004-01-21 18:47 ` Ted Zlatanov 0 siblings, 1 reply; 6+ messages in thread From: Hubert Chan @ 2004-01-21 4:02 UTC (permalink / raw) >>>>> "Ted" == Ted Zlatanov <tzz@lifelogs.com> writes: [...] Ted> Yes, spam-use-regex-headers will do the right thing for splitting Ted> incoming mail, but there's no SA specific backend. Hubert Chan Ted> wrote a SA backend, and I have been late in replying to his Ted> questions. It's coming, though. I see it in CVS now. ;-) I promised to write documentation too, but that won't happen until at least next week some time. In the mean time, though, the variable documentation should probably suffice for most people. [...] Ted> The problem is that then you force people into just one Bayesian Ted> approach (how would SA and bogofilter work together?), and I'm not Ted> sure it's a good idea. Granted, most people use just one Bayesian Ted> filter, so it's probably nice to switch filters with just one Ted> thing. Well, there are at least some good reasons that someone might want to use multiple Bayesian filters. For example, one might want to just try out the effectiveness of one filter, while retaining their original filter as a backup. Also, if one wishes to switch Bayesian filters, and does not have a corpus of spam/ham to train the filter, there would have to be a transition time during which the new filter is trained, while the old filter is still being used for splitting. And, of course, during this time, one would still want to keep training the old filter at the same time. This got me thinking, though, Ted, that the registration code for the spam/ham processors is pretty similar. They seem to mostly work in one of two ways -- either register one at a time, or register multiple articles at a time in a mbox-style format. I think they all feed the articles via standard input. I would imagine that we would be able to share a lot of common code. Maybe write a function that feeds the article(s) to the registration program, and pass the name of the program and its arguments as arguments to that function. Then the registration functions just have to call that function with the appropriate arguments. Hmm. I'll have to look at the code to see if that would actually work... -- Hubert Chan <hubert@uhoreg.ca> - http://www.uhoreg.ca/ PGP/GnuPG key: 1024D/124B61FA Fingerprint: 96C5 012F 5F74 A5F7 1FF7 5291 AF29 C719 124B 61FA Key available at wwwkeys.pgp.net. Encrypted e-mail preferred. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: spam.el: generic bayes interface? 2004-01-21 4:02 ` Hubert Chan @ 2004-01-21 18:47 ` Ted Zlatanov 2004-01-21 20:24 ` Hubert Chan 0 siblings, 1 reply; 6+ messages in thread From: Ted Zlatanov @ 2004-01-21 18:47 UTC (permalink / raw) Cc: Hubert Chan On Tue, 20 Jan 2004, hubert@uhoreg.ca wrote: > Well, there are at least some good reasons that someone might want > to use multiple Bayesian filters. For example, one might want to > just try out the effectiveness of one filter, while retaining their > original filter as a backup. Also, if one wishes to switch Bayesian > filters, and does not have a corpus of spam/ham to train the filter, > there would have to be a transition time during which the new filter > is trained, while the old filter is still being used for splitting. > And, of course, during this time, one would still want to keep > training the old filter at the same time. I'm OK with that, we can add a spam-use-generic-bayesian if it's necessary. I just think customization, registry tracking, and other things won't work so well when we generalize the interface too much. If you or someone else produces that generic-bayesian backend, I don't see a problem with putting it in. We can't anticipate the new bayesian filters people might want to use, after all. > This got me thinking, though, Ted, that the registration code for > the spam/ham processors is pretty similar. They seem to mostly work > in one of two ways -- either register one at a time, or register > multiple articles at a time in a mbox-style format. Yes, I've noticed that too after the 3rd time I wrote that code :) > I think they all feed the articles via standard input. I would > imagine that we would be able to share a lot of common code. Maybe > write a function that feeds the article(s) to the registration > program, and pass the name of the program and its arguments as > arguments to that function. Then the registration functions just > have to call that function with the appropriate arguments. Hmm. > I'll have to look at the code to see if that would actually work... It could work. I've been trying to make the functions generic on the API side, now it's time to make them generic on the backend side as well. I'm afraid it will make the code more complex, but adding new backends should be significantly easier. I'll work on gnus-encrypt.el first though, so feel free to start on this if you have the interest :) Thanks Ted ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: spam.el: generic bayes interface? 2004-01-21 18:47 ` Ted Zlatanov @ 2004-01-21 20:24 ` Hubert Chan 2004-01-22 18:23 ` Ted Zlatanov 0 siblings, 1 reply; 6+ messages in thread From: Hubert Chan @ 2004-01-21 20:24 UTC (permalink / raw) >>>>> "Ted" == Ted Zlatanov <tzz@lifelogs.com> writes: [...] Ted> If you or someone else produces that generic-bayesian backend, I Ted> don't see a problem with putting it in. We can't anticipate the Ted> new bayesian filters people might want to use, after all. I don't plan on doing that. I can't see any good way to generalize the interfaces in any way that makes sense. [...] Ted> I'll work on gnus-encrypt.el first though, so feel free to start on ^^^^^^^^^^^^^^^ Oooh. Cool... ;-) ... What would this do? Ted> this if you have the interest :) I won't have time to do anything until at least next week. Possibly even later. And then there are a few things that I want to work on first -- documentation, the email forwarding backend. Maybe we can have a race to see who's able to scrape up some free time first. ;-) Hubert ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: spam.el: generic bayes interface? 2004-01-21 20:24 ` Hubert Chan @ 2004-01-22 18:23 ` Ted Zlatanov 0 siblings, 0 replies; 6+ messages in thread From: Ted Zlatanov @ 2004-01-22 18:23 UTC (permalink / raw) Cc: Hubert Chan On Wed, 21 Jan 2004, hubert@uhoreg.ca wrote: >>>>>> "Ted" == Ted Zlatanov <tzz@lifelogs.com> writes: > > Ted> I'll work on gnus-encrypt.el > ^^^^^^^^^^^^^^^ > Oooh. Cool... ;-) ... What would this do? Encrypt and decrypt files such as .authinfo and .newsrc.eld. I'll have something ready for testing soon. Ted ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2004-01-22 18:23 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2004-01-20 21:17 spam.el: generic bayes interface? Reiner Steib 2004-01-21 0:08 ` Ted Zlatanov 2004-01-21 4:02 ` Hubert Chan 2004-01-21 18:47 ` Ted Zlatanov 2004-01-21 20:24 ` Hubert Chan 2004-01-22 18:23 ` Ted Zlatanov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).