* Re: Bogofilter [not found] ` <m2r8ayh7zm.fsf_-_@bluesteel.grierwhite.com> @ 2003-01-27 19:01 ` David Z Maze [not found] ` <9cfvg09rx37.fsf@rogue.ncsl.nist.gov> 0 siblings, 1 reply; 9+ messages in thread From: David Z Maze @ 2003-01-27 19:01 UTC (permalink / raw) chris@grierwhite.com (Christopher J. White) writes: > I'm intrigued...my spam load has reached critical levels, so I'm > interested in trying out some spam filtering techniques. Are you > using bogofilter with gnus? If so, what's your experience, does it > ever filter out "good" email as spam (as it's statistical in nature)? > If you don't have several hundred spam messages around to train it, > how else do you train it, or do I have to wait til I get enough saved > up spam? I use ifile, not bogofilter, but the two packages seem to be similar in nature. Yes, I get some false positives with ifile. It's also not like I didn't have a big pile of spam sitting around before I started using ifile (I hand-sorted it into its own group); if nothing else, you can set up a Hotmail account and wait for a couple of weeks. :-) > Finally, how do you integrate bogofilter with gnus for mail reading > (read via POP3, stored as nnml). You're using > User-Agent: Gnus/5.090014 (Oort Gnus v0.14) Emacs/21.2 (powerpc-apple-darwin) so you can just use the functions in spam.el. My .gnus file has (setq gnus-spam-newsgroup-contents '(("nnml:mail.misc.spam" gnus-group-spam-classification-spam) ("nnml:.*" gnus-group-spam-classification-ham)) gnus-spam-process-newsgroups '(("nnml:.*" (gnus-group-spam-exit-processor-ifile gnus-group-ham-exit-processor-ifile))) gnus-spam-process-destinations '(("nnml:.*" "nnml:mail.misc.spam")) spam-junk-mailgroups '("mail.misc.spam") spam-split-group "mail.misc.spam" spam-use-ifile t ) and I call (: spam-split) in nnmail-split-fancy. I think all of these should Just Work for you if you change group names appropriately and substitute bogofilter for ifile. -- David Maze dmaze@mit.edu http://www.mit.edu/~dmaze/ "Theoretical politics is interesting. Politicking should be illegal." -- Abra Mitchell ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <9cfvg09rx37.fsf@rogue.ncsl.nist.gov>]
* Re: Bogofilter [not found] ` <9cfvg09rx37.fsf@rogue.ncsl.nist.gov> @ 2003-01-29 7:10 ` Kai Großjohann 2003-01-29 19:59 ` Bogofilter Ian Soboroff 0 siblings, 1 reply; 9+ messages in thread From: Kai Großjohann @ 2003-01-29 7:10 UTC (permalink / raw) Ian Soboroff <org@acm.isoboroff> writes: > It's nice to see such a craze over naive-Bayes filtering techniques, > but they can get overtrained pretty easily. Yeah. I don't know much about automatic classification, but I seem to recall that naive-Bayes is not the most effective method. So are there better algorithms around and is there an implementation that can be integrated into Gnus, similar to ifile? -- Ambibibentists unite! ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bogofilter 2003-01-29 7:10 ` Bogofilter Kai Großjohann @ 2003-01-29 19:59 ` Ian Soboroff [not found] ` <4nlm12ehn5.fsf@lockgroove.bwh.harvard.edu> 0 siblings, 1 reply; 9+ messages in thread From: Ian Soboroff @ 2003-01-29 19:59 UTC (permalink / raw) kai.grossjohann@uni-duisburg.de (Kai Großjohann) writes: > Ian Soboroff <org@acm.isoboroff> writes: > >> It's nice to see such a craze over naive-Bayes filtering techniques, >> but they can get overtrained pretty easily. > > Yeah. I don't know much about automatic classification, but I seem > to recall that naive-Bayes is not the most effective method. > > So are there better algorithms around and is there an implementation > that can be integrated into Gnus, similar to ifile? There are boatloads of text classification algorithms. Naive Bayes is the canonical second best solution to any problem, and has the advantage of being fast. Support Vector Machines are better but NB can get quite close in some data. SVMs are hard to update, but to be honest an email classifier could probably be just fine retraining overnight. My favorite classifier tool is Andrew McCallum's BOW toolkit. It does NB, SVM, kNN, EM, and probably three other things I forgot about, and has nice support for doing measurements and experiments. I was _this_ close to writing a scoring module for Gnus based on it, when I ran across ifile. The _right_ thing to do is something like nnir, that is, a classifier framework that you can plug anything into underneath. ifile-gnus.el is probably most of what's needed (plus a couple more functions to easily move mail without triggering a reclassification). Ian ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <4nlm12ehn5.fsf@lockgroove.bwh.harvard.edu>]
* Re: Bogofilter [not found] ` <4nlm12ehn5.fsf@lockgroove.bwh.harvard.edu> @ 2003-01-30 18:09 ` Kai Großjohann 2003-01-31 16:46 ` Bogofilter Ted Zlatanov 0 siblings, 1 reply; 9+ messages in thread From: Kai Großjohann @ 2003-01-30 18:09 UTC (permalink / raw) Ted Zlatanov <tzz@lifelogs.com> writes: > On Wed, 29 Jan 2003, org@acm.isoboroff wrote: >> The _right_ thing to do is something like nnir, that is, a >> classifier framework that you can plug anything into underneath. > > I'm sort of working on that right now, it will be a generic framework > for spam.el. Ian is talking about text classification which could be used for general splitting (not just the spam/ham thing that spam.el does). This doesn't mean that spam.el is bad, just that it solves a different problem. I think that the tracking part could be useful for using text classification for splitting. But maybe it's enough to add the right hooks to Gnus, and spam.el uses them in one way whereas the use-a-classifier-for-splitting thing uses them in another way. -- Ambibibentists unite! ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Bogofilter 2003-01-30 18:09 ` Bogofilter Kai Großjohann @ 2003-01-31 16:46 ` Ted Zlatanov 0 siblings, 0 replies; 9+ messages in thread From: Ted Zlatanov @ 2003-01-31 16:46 UTC (permalink / raw) On Thu, 30 Jan 2003, kai.grossjohann@uni-duisburg.de wrote: > Ted Zlatanov <tzz@lifelogs.com> writes: > >> On Wed, 29 Jan 2003, org@acm.isoboroff wrote: >>> The _right_ thing to do is something like nnir, that is, a >>> classifier framework that you can plug anything into underneath. >> >> I'm sort of working on that right now, it will be a generic >> framework for spam.el. > > Ian is talking about text classification which could be used for > general splitting (not just the spam/ham thing that spam.el does). > > This doesn't mean that spam.el is bad, just that it solves a > different problem. > I think that the tracking part could be useful for using text > classification for splitting. But maybe it's enough to add the > right hooks to Gnus, and spam.el uses them in one way whereas the > use-a-classifier-for-splitting thing uses them in another way. I should have said "I'm working on a generic framework, which will be used by spam.el". I want to make it easy to track a message as it exists in Gnus, and spam.el will use that but other packages can too. The idea will be to add any data you want to a message ID, and have that data persist as the message moves around. spam.el will attach things like "processed as spam by bogofilter," "moved to group ABC," or "split by ifile into group XYZ." Right now I'm in the "thinking about it" stage, no code yet... Ted ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <87ptqiodvb.fsf@unix.home>]
* Re: spam assassin filtering [not found] ` <87ptqiodvb.fsf@unix.home> @ 2003-01-29 11:06 ` Alain Picard 2003-01-29 15:35 ` Michael Below 0 siblings, 1 reply; 9+ messages in thread From: Alain Picard @ 2003-01-29 11:06 UTC (permalink / raw) deskpot@despammed.com (Vasily Korytov) writes: > Yep, it works here. It's very simple here: I have procmail as MDA and I > call spamc (spamd is run at startup) from my ~/.procmailrc. Then I have > ("junk.spam" "^X-Spam-Status: Yes") entry in my nnmail-split-methods. I was hoping for a procmail-free solution, as this is on a laptop system, and I prefer to get the mail "on demand", rather than from a procmail daemon. But thanks for the tip, I may have to use it nonetheless. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: spam assassin filtering 2003-01-29 11:06 ` spam assassin filtering Alain Picard @ 2003-01-29 15:35 ` Michael Below [not found] ` <87u1fr7lwn.fsf@jan.korger> 0 siblings, 1 reply; 9+ messages in thread From: Michael Below @ 2003-01-29 15:35 UTC (permalink / raw) Alain Picard <apicard+die-spammer-die@optushome.com.au> writes: > deskpot@despammed.com (Vasily Korytov) writes: > >> Yep, it works here. It's very simple here: I have procmail as MDA >> and I call spamc (spamd is run at startup) from my >> ~/.procmailrc. Then I have ("junk.spam" "^X-Spam-Status: Yes") >> entry in my nnmail-split-methods. > > I was hoping for a procmail-free solution, as this is on a laptop > system, and I prefer to get the mail "on demand", rather than from a > procmail daemon. I didn't even know that procmail can be used as a daemon. Just start fetchmail on each IP-up, and make fetchmail hand the mail over to procmail (using procmail as a MDA). Then procmail pipes the mail through spamassassin (you don't have to use spamd/spamc) and does some sorting. No need for daemons. Michael -- _Agricultural activity_ is the management by an enterprise of the biological transformation of biological assets for sale, into agricultural produce, or into additional biological assets. IAS 41,5 ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <87u1fr7lwn.fsf@jan.korger>]
* Re: spam assassin filtering [not found] ` <87u1fr7lwn.fsf@jan.korger> @ 2003-01-29 21:14 ` Vasily Korytov 2003-01-29 21:52 ` Tim Haynes 0 siblings, 1 reply; 9+ messages in thread From: Vasily Korytov @ 2003-01-29 21:14 UTC (permalink / raw) >>>>> "JK" == Jan Korger writes: JK> BUT if you use an MTA, i.e. deliever from fetchmail to port 25 or call JK> sendmail or similar, you will end up if 1 procmail + 1 spamassassin(perl) JK> process per message plus some MTA processes. This is verly likely to JK> eat up all of your RAM and end in an disaster, i.e. you will loose your JK> mail. (This happend to me while downloading ~100 messages.) Use spamd/spamc pair. Anyway, my old modem takes care of not losing my mail in such situations. =)) ---Vas ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: spam assassin filtering 2003-01-29 21:14 ` Vasily Korytov @ 2003-01-29 21:52 ` Tim Haynes 0 siblings, 0 replies; 9+ messages in thread From: Tim Haynes @ 2003-01-29 21:52 UTC (permalink / raw) deskpot@despammed.com (Vasily Korytov) writes: > JK> process per message plus some MTA processes. This is verly likely to > JK> eat up all of your RAM and end in an disaster, i.e. you will loose > JK> your mail. (This happend to me while downloading ~100 messages.) > > Use spamd/spamc pair. Anyway, my old modem takes care of not losing my > mail in such situations. =)) Fetchmail will not flush a message off the upstream server if it gets a failure code from the local delivery agent. spamc/d do make it considerably quicker processing a mail, as the perl interpreter is only invoked the once. You can add locking to procmail rules (the fine manpage mentions appending a `:' to the end of the intro line to a recipe triple). Me, I have my colo-swerver handle the initial incoming mails, bogofilter being invoked for anything questionable; the mails that pass are copied on to the ISP at home and pulled down with fetchmail and re-bogofiltered as well. Seems to work :) ~Tim -- But mountains are holy places, |piglet@stirfried.vegetable.org.uk And beauty is free / We can still walk |http://spodzone.org.uk/ Through the garden | Our earth was once green | ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2003-01-31 16:46 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <874r7vclji.fsf@ibook.optushome.com.au> [not found] ` <844r7vjc3l.fsf@lucy.is.informatik.uni-duisburg.de> [not found] ` <81k7gq1z6a.fsf@shasta.cs.uiuc.edu> [not found] ` <m2r8ayh7zm.fsf_-_@bluesteel.grierwhite.com> 2003-01-27 19:01 ` Bogofilter David Z Maze [not found] ` <9cfvg09rx37.fsf@rogue.ncsl.nist.gov> 2003-01-29 7:10 ` Bogofilter Kai Großjohann 2003-01-29 19:59 ` Bogofilter Ian Soboroff [not found] ` <4nlm12ehn5.fsf@lockgroove.bwh.harvard.edu> 2003-01-30 18:09 ` Bogofilter Kai Großjohann 2003-01-31 16:46 ` Bogofilter Ted Zlatanov [not found] ` <87ptqiodvb.fsf@unix.home> 2003-01-29 11:06 ` spam assassin filtering Alain Picard 2003-01-29 15:35 ` Michael Below [not found] ` <87u1fr7lwn.fsf@jan.korger> 2003-01-29 21:14 ` Vasily Korytov 2003-01-29 21:52 ` Tim Haynes
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).