* Getting started with spam filtering @ 2003-01-07 13:51 Niklas Morberg 2003-01-07 14:44 ` Ted Zlatanov 0 siblings, 1 reply; 33+ messages in thread From: Niklas Morberg @ 2003-01-07 13:51 UTC (permalink / raw) I've read the documentation on spam filtering in the latest CVS, and although it is very thorough, my head is spinning. I just don't understand what to do to get started... From the documentation it looks like spam.el is all I need, but is this really the case? It seems like I need a spam processor (whatever that is) as well . I'm guessing spam.el is not a spam processor, right?. To further complicate things I'm using nnimap which seems to narrow down the options somewhat. The documentation would be more useful (to me) if it could explain the minimal stuff I need to get started. Is spam.el enough? Will spam.el and spam-stat.el suffice then? A recommendation of what spam processor to use would also be helpful. (Are these spam processors at all available for us Windows users, btw? Cygwin would work.) Do I have to use an external program? I'm sorry if I'm not making any sense, as I said I'm very confused at the moment... Niklas ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Getting started with spam filtering 2003-01-07 13:51 Getting started with spam filtering Niklas Morberg @ 2003-01-07 14:44 ` Ted Zlatanov 2003-01-07 16:43 ` Frank Schmitt ` (2 more replies) 0 siblings, 3 replies; 33+ messages in thread From: Ted Zlatanov @ 2003-01-07 14:44 UTC (permalink / raw) Cc: ding On Tue, 07 Jan 2003, niklas.morberg@axis.com wrote: > I've read the documentation on spam filtering in the latest > CVS, and although it is very thorough, my head is spinning. > I just don't understand what to do to get started... Yes, there's no HOWTO yet. I'm not done with the functionality yet :) If anyone's willing to maintain a spam.el HOWTO meanwhile, feel free. Below is what may be the first attempt at it. >>From the documentation it looks like spam.el is all I need, > but is this really the case? It seems like I need a spam > processor (whatever that is) as well . I'm guessing spam.el > is not a spam processor, right?. To further complicate > things I'm using nnimap which seems to narrow down the > options somewhat. To start, set spam-split-group (the default of "spam" is sensible) to where you want your spam to go. Also (customize-variable spam-junk-mailgroups) if you want to explicitly consider some groups spam groups BY NAME. A spam/ham processor is a "backend" that will take spam. There are internal ones like whitelists/blacklists/BBDB, and external ones like ifile/bogofilter. You set spam/ham processors either for a group/topic with `G c', the spam-process parameter, or for a regex matching a group with (customize-variable gnus-spam-process-newsgroups). To split incoming mail, most spam/ham processors have a corresponding spam-use-PROCESSOR variable that you can set. You add (: spam-split) to your split rules, the rest is available with (customize-group "spam"). The incoming split and the spam/ham processor work together. For instance, if you use the bogofilter spam processor, you would probably want to also set the spam-use-bogofilter variable to t so your incoming mail gets classified into spam/non-spam according to what bogofilter has learned from processing your spam. The spam-use-blackholes is the only incoming spam-split variable that does not have a corresponding spam/ham processor. There are some rules as to how spam and ham articles are treated in spam/ham/unclassified groups. Basically, if it's spam, it will always be processed by a spam processor you define for that group. Otherwise, the group's content type matters. You define the spam/ham/unclassified content of a group either for a group/topic with `G c', the spam-contents parameter, or for a regex matching a group with (customize-variable gnus-spam-newsgroup-contents). > The documentation would be more useful (to me) if it could > explain the minimal stuff I need to get started. Is spam.el > enough? Will spam.el and spam-stat.el suffice then? spam-stat.el is not used by spam.el right now, and has a separate manual section. I'm considering adding it as a spam/ham processor to spam.el, but I don't know if anyone needs that. > A recommendation of what spam processor to use would also be > helpful. It depends on what you want. There's whitelists, blacklists, BBDB, ifile, bogofilter... spam.el is all about user choice. I found the spam-use-blackholes almost essential. I don't lose that mail, it just goes to a "spam" folder. I haven't had a positive yet. It requires either a very recent Emacs, or the "dig" program in your path. > (Are these spam processors at all available for us Windows users, > btw? Cygwin would work.) Do I have to use an external program? BBDB, whitelists, and blacklists are internal. Blackhole checks need dig.el or dns.el to work. Ifile and Bogofilter should work if executable-find can find the program, and call-process-region works with that program. I don't have a Win32 machine to test on, so let me know if things break. > I'm sorry if I'm not making any sense, as I said I'm very > confused at the moment... No problem, I'd like to help everyone interested in spam.el. I just can't write code, maintain the manual, and a HOWTO at the same time. Ted ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Getting started with spam filtering 2003-01-07 14:44 ` Ted Zlatanov @ 2003-01-07 16:43 ` Frank Schmitt 2003-01-07 17:30 ` Ifile vs. bogofilter (was: Getting started with spam filtering) Frank Schmitt 2003-01-08 6:04 ` Getting started with spam filtering Kai Großjohann 2003-01-08 11:28 ` Niklas Morberg 2 siblings, 1 reply; 33+ messages in thread From: Frank Schmitt @ 2003-01-07 16:43 UTC (permalink / raw) Ted Zlatanov <tzz@lifelogs.com> writes: >> (Are these spam processors at all available for us Windows users, >> btw? Cygwin would work.) Do I have to use an external program? > > BBDB, whitelists, and blacklists are internal. Blackhole checks need > dig.el or dns.el to work. > > Ifile and Bogofilter should work if executable-find can find the > program, and call-process-region works with that program. I don't > have a Win32 machine to test on, so let me know if things break. It took me a whole weekend to get ifile up and running under cygwin (The most import things are to link against libiberty.a and install cygipc, but there was some more stuff I had to do, can't remember everything). Spamassassin works OK under cygwin, bogofilter haven't I tried. -- One Ring to rule them all, One Ring to find them, One Ring to bring them all and in the darkness bind them In the Land of Mordor where the Shadows lie. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Ifile vs. bogofilter (was: Getting started with spam filtering) 2003-01-07 16:43 ` Frank Schmitt @ 2003-01-07 17:30 ` Frank Schmitt 0 siblings, 0 replies; 33+ messages in thread From: Frank Schmitt @ 2003-01-07 17:30 UTC (permalink / raw) Frank Schmitt <usereplyto@Frank-Schmitt.net> writes: > It took me a whole weekend to get ifile up and running under cygwin (The > most import things are to link against libiberty.a and install cygipc, > but there was some more stuff I had to do, can't remember everything). > > Spamassassin works OK under cygwin, bogofilter haven't I tried. I just downloaded and installed bogofilter, it compiled flawlessly and seems to be much faster than ifile, so I'm wondering: Should I switch from ifile to bogofilter? Any experiences or links to comparisons? -- One Ring to rule them all, One Ring to find them, One Ring to bring them all and in the darkness bind them In the Land of Mordor where the Shadows lie. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Getting started with spam filtering 2003-01-07 14:44 ` Ted Zlatanov 2003-01-07 16:43 ` Frank Schmitt @ 2003-01-08 6:04 ` Kai Großjohann 2003-01-08 9:18 ` Lars Magne Ingebrigtsen 2003-01-08 15:26 ` Ted Zlatanov 2003-01-08 11:28 ` Niklas Morberg 2 siblings, 2 replies; 33+ messages in thread From: Kai Großjohann @ 2003-01-08 6:04 UTC (permalink / raw) Ted Zlatanov <tzz@lifelogs.com> writes: > spam-stat.el is not used by spam.el right now, and has a separate > manual section. I'm considering adding it as a spam/ham processor to > spam.el, but I don't know if anyone needs that. Do I remember correctly that spam-stat.el is an Emacs Lisp port of Paul Graham's CL implementation of Naive Bayes? In that case, I guess that some people might indeed be interested -- I think there are people who like to avoid calling external programs. -- Ambibibentists unite! ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Getting started with spam filtering 2003-01-08 6:04 ` Getting started with spam filtering Kai Großjohann @ 2003-01-08 9:18 ` Lars Magne Ingebrigtsen 2003-01-08 15:26 ` Ted Zlatanov 1 sibling, 0 replies; 33+ messages in thread From: Lars Magne Ingebrigtsen @ 2003-01-08 9:18 UTC (permalink / raw) kai.grossjohann@uni-duisburg.de (Kai Großjohann) writes: > In that case, I guess that some people might indeed be interested -- > I think there are people who like to avoid calling external > programs. Yup. Under both Windows and MacOS* it can be awkward, and even on other operating systems it's nice to reduce dependencies on external executables. -- (domestic pets only, the antidote for overdose, milk.) larsi@gnus.org * Lars Magne Ingebrigtsen ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Getting started with spam filtering 2003-01-08 6:04 ` Getting started with spam filtering Kai Großjohann 2003-01-08 9:18 ` Lars Magne Ingebrigtsen @ 2003-01-08 15:26 ` Ted Zlatanov 2003-01-08 23:36 ` Alex Schroeder 2003-01-09 8:03 ` Niklas Morberg 1 sibling, 2 replies; 33+ messages in thread From: Ted Zlatanov @ 2003-01-08 15:26 UTC (permalink / raw) Cc: ding On Wed, 08 Jan 2003, kai.grossjohann@uni-duisburg.de wrote: > Ted Zlatanov <tzz@lifelogs.com> writes: > >> spam-stat.el is not used by spam.el right now, and has a separate >> manual section. I'm considering adding it as a spam/ham processor >> to spam.el, but I don't know if anyone needs that. > > Do I remember correctly that spam-stat.el is an Emacs Lisp port of > Paul Graham's CL implementation of Naive Bayes? In that case, I > guess that some people might indeed be interested -- I think there > are people who like to avoid calling external programs. Right, but I need someone to actually request that - it's at the bottom of the queue now, I'd rather write docs and fix existing code right now. So, if anyone wants spam-stat.el integration within spam.el sooner instead of later, please let me know. Ted ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Getting started with spam filtering 2003-01-08 15:26 ` Ted Zlatanov @ 2003-01-08 23:36 ` Alex Schroeder 2003-01-09 14:23 ` Jorge Godoy 2003-01-09 8:03 ` Niklas Morberg 1 sibling, 1 reply; 33+ messages in thread From: Alex Schroeder @ 2003-01-08 23:36 UTC (permalink / raw) Ted Zlatanov <tzz@lifelogs.com> writes: > Right, but I need someone to actually request that - it's at the > bottom of the queue now, I'd rather write docs and fix existing code > right now. So, if anyone wants spam-stat.el integration within > spam.el sooner instead of later, please let me know. Just for the record -- I use spam-stat.el without spam.el at the moment, and it works just fine. So if anybody has questions about how to get started, just ask and I will do my best, much as Ted is doing for spam.el. :) Alex. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Getting started with spam filtering 2003-01-08 23:36 ` Alex Schroeder @ 2003-01-09 14:23 ` Jorge Godoy 2003-01-09 15:11 ` Andreas Fuchs 2003-01-09 18:38 ` Alex Schroeder 0 siblings, 2 replies; 33+ messages in thread From: Jorge Godoy @ 2003-01-09 14:23 UTC (permalink / raw) Cc: ding Alex Schroeder <alex@emacswiki.org> writes: > Just for the record -- I use spam-stat.el without spam.el at the > moment, and it works just fine. So if anybody has questions about how > to get started, just ask and I will do my best, much as Ted is doing > for spam.el. :) It would be interesting if you could post a step-by-step on how to get it running. I can publish it on a website for you (or you can add it to Emacs Wiki :-)) See you, -- Godoy. <godoy@ieee.org> ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Getting started with spam filtering 2003-01-09 14:23 ` Jorge Godoy @ 2003-01-09 15:11 ` Andreas Fuchs 2003-01-09 18:38 ` Alex Schroeder 1 sibling, 0 replies; 33+ messages in thread From: Andreas Fuchs @ 2003-01-09 15:11 UTC (permalink / raw) Today, Jorge Godoy <godoy@ieee.org> wrote: > It would be interesting if you could post a step-by-step on how to get > it running. I can publish it on a website for you (or you can add it > to Emacs Wiki :-)) Or to my.gnus.org. Would fit in there well, I think. -- Andreas Fuchs, <asf@acm.org>, asf@jabber.at, antifuchs ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Getting started with spam filtering 2003-01-09 14:23 ` Jorge Godoy 2003-01-09 15:11 ` Andreas Fuchs @ 2003-01-09 18:38 ` Alex Schroeder 2003-01-10 7:44 ` Niklas Morberg 1 sibling, 1 reply; 33+ messages in thread From: Alex Schroeder @ 2003-01-09 18:38 UTC (permalink / raw) Jorge Godoy <godoy@ieee.org> writes: > Alex Schroeder <alex@emacswiki.org> writes: > >> Just for the record -- I use spam-stat.el without spam.el at the >> moment, and it works just fine. So if anybody has questions about how >> to get started, just ask and I will do my best, much as Ted is doing >> for spam.el. :) > > It would be interesting if you could post a step-by-step on how to get > it running. I can publish it on a website for you (or you can add it > to Emacs Wiki :-)) You can start by looking in the Gnus manual node "Filtering Spam Using Statistics (spam-stat.el)". The node "Creating a spam-stat dictionary" explains how to create your dictionary. The node "Splitting mail using spam-stat" explains how change your mail splitting setup. The short version is this: You need two nnml directories, one with spam, one with non-spam mails before you start! 1. Call `spam-stat-process-spam-directory' on `~/Mail/mail/spam'. 2. Call `spam-stat-process-non-spam-directory' on `~/Mail/mail/misc'. 3. Call `spam-stat-save' to save the dictionary. 4. Add this to your `~/.gnus' file: (require 'spam-stat) (spam-stat-load) 5. Change your mail splitting following the one of these examples: (setq nnmail-split-fancy `(| (: spam-stat-split-fancy) "mail.misc")) (setq nnmail-split-fancy `(| ("Content-Type" "text/html" "mail.spam.filtered") (: spam-stat-split-fancy) ("Subject" "\\bspam-stat\\b" "mail.emacs") "mail.misc")) I am grateful for any holes poking in the manual section on spam-stat.el -- only then can I improve it. I just noticed, for example, that the manual section that tells you to call spam-stat-save is not formatted correctly... Alex. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Getting started with spam filtering 2003-01-09 18:38 ` Alex Schroeder @ 2003-01-10 7:44 ` Niklas Morberg 2003-01-10 12:12 ` Alex Schroeder 0 siblings, 1 reply; 33+ messages in thread From: Niklas Morberg @ 2003-01-10 7:44 UTC (permalink / raw) Cc: ding Alex Schroeder <alex@emacswiki.org> writes: > I am grateful for any holes poking in the manual section > on spam-stat.el -- only then can I improve it. Here's something that _maybe_ should be changed/added. I only use nnimap, but I use the agent as a cache. Therefore it is possible for me (I guess?) to use spam-stat.el, but doing the training on my cached articles instead. I.e. you can use automated dictionary creation even if you have another backend than nnml. Also, splitting is possible using nnimap-split-fancy as well and not only nnmail-split-fancy, right? Btw, I really liked the "Here is how you would create your dictionary" part of the documentation. Perhaps something could be added to help beginners a bit more? Something along the lines of: 1. Copy this text to the *scratch* buffer 2. edit the strings to reflect your settings 3. go to the end of each statement and press C-x C-e to evaluate the function Niklas ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Getting started with spam filtering 2003-01-10 7:44 ` Niklas Morberg @ 2003-01-10 12:12 ` Alex Schroeder 2003-01-10 12:51 ` Niklas Morberg 2003-01-10 13:43 ` Kai Großjohann 0 siblings, 2 replies; 33+ messages in thread From: Alex Schroeder @ 2003-01-10 12:12 UTC (permalink / raw) Niklas Morberg <niklas.morberg@axis.com> writes: > Here's something that _maybe_ should be changed/added. I > only use nnimap, but I use the agent as a cache. Therefore > it is possible for me (I guess?) to use spam-stat.el, but > doing the training on my cached articles instead. > > I.e. you can use automated dictionary creation even if you > have another backend than nnml. > > Also, splitting is possible using nnimap-split-fancy as well > and not only nnmail-split-fancy, right? Heh, I must confess that nnimap does not work with out company IMAP server (the article numbers are bigger than elisp allows) -- so I have never tried it. So -- if *you* have a working setup, I would be very happy if you could explain it to me, and then I will add it to the manual. ;) At the moment, it seems that I can just write that the same applies for IMAP, just use nnmail-split-fancy and nnimap-split-fancy instead? I did not understand the cache part, however. Does the above work *only* with cached articles, or does the above without caching just limit itself to the headers (which might also work, of course, or it might not be enough, that remains to be seen). > Btw, I really liked the "Here is how you would create your > dictionary" part of the documentation. Perhaps something > could be added to help beginners a bit more? Something along > the lines of: > > 1. Copy this text to the *scratch* buffer > 2. edit the strings to reflect your settings > 3. go to the end of each statement and press C-x C-e to > evaluate the function Hm. Are there other parts of the Gnus manual at this level of detail? Perhaps it would be more appropriate to write a little widget wizard. Shall I write one so you can take a look at what it would be like? Alex. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Getting started with spam filtering 2003-01-10 12:12 ` Alex Schroeder @ 2003-01-10 12:51 ` Niklas Morberg 2003-01-10 13:10 ` Ted Zlatanov ` (2 more replies) 2003-01-10 13:43 ` Kai Großjohann 1 sibling, 3 replies; 33+ messages in thread From: Niklas Morberg @ 2003-01-10 12:51 UTC (permalink / raw) [-- Attachment #1: Type: text/plain, Size: 1640 bytes --] Alex Schroeder <alex@emacswiki.org> writes: > So -- if *you* have a working setup, I would be very happy > if you could explain it to me, and then I will add it to > the manual. ;) Actually, it was quite simple to set up with the new changes made to spam.el. See attached file for notes I took while setting it up. > At the moment, it seems that I can just write that the > same applies for IMAP, just use nnmail-split-fancy and > nnimap-split-fancy instead? That would be just fine. > I did not understand the cache part, however. Does the > above work *only* with cached articles, or does the above > without caching just limit itself to the headers (which > might also work, of course, or it might not be enough, > that remains to be seen). I do the training on the cached articles and since I use fancy splitting, I guess gnus uses all of the article when splitting. Doing the training without having the articles available locally (via the agent or the cache) seems tricky. >> 1. Copy this text to the *scratch* buffer >> 2. edit the strings to reflect your settings >> 3. go to the end of each statement and press C-x C-e to >> evaluate the function > > Hm. Are there other parts of the Gnus manual at this level of detail? Probably not, no. Maybe it's best to leave it as is. Btw, it would be nice to be able to check the spam-stat score of an article. Just as `S t' is bound to `(spam-bogofilter-score)' I think it should be possible to run `(spam-stat-score-buffer)' or something similar with a key combination. Maybe `S t' could be used for that too since you are unlikely to run multiple spam processors? Niklas [-- Attachment #2: getting_started_spam.txt --] [-- Type: text/plain, Size: 1745 bytes --] Using spam.el and spam-stat.el with an nnimap backend. I previously had a group called "spam", which is the default spam group for spam.el. Otherwise I guess you need to create this. All my mails start in INBOX, I then split mails with nnimap-split-fancy to email list groups and to the "spam" group based on a corporate spam filter. The emails that are left are put in "incoming". I did the following changes in .emacs: (load "spam") in gnus.el add this rule to nnimap-split-fancy: (: spam-split) Customize: '(spam-use-stat t) '(gnus-spam-process-destinations (quote (("incoming" "spam")))) Train spam-stat: Start from scratch: (spam-stat-reset) Reset:(setq spam-stat (make-hash-table :test 'equal)) Learn spam: (spam-stat-process-spam-directory "~/News/agent/nnimap/mailse01.axis.se/spam") Learn non-spam: (spam-stat-process-non-spam-directory "~/News/agent/nnimap/mailse01.axis.se/incoming") (spam-stat-process-non-spam-directory "~/News/agent/nnimap/mailse01.axis.se/INBOX_Archive") (spam-stat-process-non-spam-directory "~/News/agent/nnimap/mailse01.axis.se/INBOX_Personal") (spam-stat-process-non-spam-directory "~/News/agent/nnimap/mailse01.axis.se/lists_ding") Reduce table size: (spam-stat-reduce-size) Save table: (spam-stat-save) Then I customized group parameters for the "incoming" group containing ham mails as such: (spam-contents gnus-group-spam-classification-ham) (spam-process (gnus-group-ham-exit-processor-stat)) and for the spam group: (spam-contents gnus-group-spam-classification-spam) (spam-process (gnus-group-spam-exit-processor-stat)) (ham-process-destination . "incoming") and that's it. When spam end up in ham groups I just press M-d and the stats are updated when exiting the group. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Getting started with spam filtering 2003-01-10 12:51 ` Niklas Morberg @ 2003-01-10 13:10 ` Ted Zlatanov 2003-01-10 13:43 ` Niklas Morberg 2003-01-10 14:44 ` Getting started with spam filtering Alex Schroeder 2003-01-15 1:32 ` Danny Siu 2 siblings, 1 reply; 33+ messages in thread From: Ted Zlatanov @ 2003-01-10 13:10 UTC (permalink / raw) Cc: ding On Fri, 10 Jan 2003, niklas.morberg@axis.com wrote: > Btw, it would be nice to be able to check the spam-stat score of > an article. Just as `S t' is bound to `(spam-bogofilter-score)' > I think it should be possible to run `(spam-stat-score-buffer)' > or something similar with a key combination. > > Maybe `S t' could be used for that too since you are unlikely > to run multiple spam processors? That would be sensible, but not too easy for the users - I'd rather have static commands in the manual, not "this will be invoked thus if A, but thus if B." How about `S t i' for ifile, `S t b' for bogofilter, and so on? Then the users can shortcut the functions to whatever keys they want. Does that make sense? Ted ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Getting started with spam filtering 2003-01-10 13:10 ` Ted Zlatanov @ 2003-01-10 13:43 ` Niklas Morberg 2003-01-10 16:39 ` Ted Zlatanov 0 siblings, 1 reply; 33+ messages in thread From: Niklas Morberg @ 2003-01-10 13:43 UTC (permalink / raw) Ted Zlatanov <tzz@lifelogs.com> writes: > On Fri, 10 Jan 2003, niklas.morberg@axis.com wrote: >> Btw, it would be nice to be able to check the spam-stat score of >> an article. Just as `S t' is bound to `(spam-bogofilter-score)' >> I think it should be possible to run `(spam-stat-score-buffer)' >> or something similar with a key combination. >> >> Maybe `S t' could be used for that too since you are unlikely >> to run multiple spam processors? > > That would be sensible, but not too easy for the users - > I'd rather have static commands in the manual, not "this > will be invoked thus if A, but thus if B." Couldn't `S t' be bound to a wrapper function (`spam-score' or something) that calls the appropriate spam processor's score mechanism? > How about `S t i' for ifile, `S t b' for bogofilter, and so > on? Then the users can shortcut the functions to whatever > keys they want. Does that make sense? That works too. I don't really have a strong preference about one solution over the other, go with what you think is best. Niklas ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Getting started with spam filtering 2003-01-10 13:43 ` Niklas Morberg @ 2003-01-10 16:39 ` Ted Zlatanov 2003-01-24 13:45 ` Displaying spam score (Was: Re: Getting started with spam filtering) Niklas Morberg 0 siblings, 1 reply; 33+ messages in thread From: Ted Zlatanov @ 2003-01-10 16:39 UTC (permalink / raw) Cc: ding On Fri, 10 Jan 2003, niklas.morberg@axis.com wrote: > Couldn't `S t' be bound to a wrapper function (`spam-score' > or something) that calls the appropriate spam processor's > score mechanism? It's not a bad idea, I like it but I'm concerned for the novice user. There's several spam scores that could be active at once: - ifile - bogofilter - blacklist - spam-stat And then there's the ham scores (BBDB, whitelist, plus ifile/bogofilter/spam-stat that unify spam and ham scores). I guess spam.el could have an abstract spam score, but that gets complicated because we have to correlate the spam scores of all the spam splitters that are enabled. Any ideas? Ted ^ permalink raw reply [flat|nested] 33+ messages in thread
* Displaying spam score (Was: Re: Getting started with spam filtering) 2003-01-10 16:39 ` Ted Zlatanov @ 2003-01-24 13:45 ` Niklas Morberg 0 siblings, 0 replies; 33+ messages in thread From: Niklas Morberg @ 2003-01-24 13:45 UTC (permalink / raw) Ted Zlatanov <tzz@lifelogs.com> writes: > There's several spam scores that could be active at once: > > - ifile > - bogofilter > - blacklist > - spam-stat > > And then there's the ham scores (BBDB, whitelist, plus > ifile/bogofilter/spam-stat that unify spam and ham scores). > > I guess spam.el could have an abstract spam score, but that gets > complicated because we have to correlate the spam scores of all the > spam splitters that are enabled. Any ideas? Not any great ideas, no. But anything that let's me see the spam-stat score easily would help. I'm guessing that you only use one of ifile, bogofilter or spam-stat at the same time. Having `S t' display one of these three scores would be good (for me). Having additional functions for displaying the BBDB, whitelist and blacklist scores might be useful for some users, but I wouldn't use it. Niklas ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Getting started with spam filtering 2003-01-10 12:51 ` Niklas Morberg 2003-01-10 13:10 ` Ted Zlatanov @ 2003-01-10 14:44 ` Alex Schroeder 2003-01-15 1:32 ` Danny Siu 2 siblings, 0 replies; 33+ messages in thread From: Alex Schroeder @ 2003-01-10 14:44 UTC (permalink / raw) Niklas Morberg <niklas.morberg@axis.com> writes: > I do the training on the cached articles and since I use > fancy splitting, I guess gnus uses all of the article when > splitting. The doc string of nnimap-split-fancy does not suggest that, so I am still unsure of how this will work. But I will read your notes now. ;) Alex. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Getting started with spam filtering 2003-01-10 12:51 ` Niklas Morberg 2003-01-10 13:10 ` Ted Zlatanov 2003-01-10 14:44 ` Getting started with spam filtering Alex Schroeder @ 2003-01-15 1:32 ` Danny Siu 2 siblings, 0 replies; 33+ messages in thread From: Danny Siu @ 2003-01-15 1:32 UTC (permalink / raw) Thanks for the write up. Finally got spam-stat working with nnimap and it is VERY nice! Splitting just works as expected. -- Danny Siu ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Getting started with spam filtering 2003-01-10 12:12 ` Alex Schroeder 2003-01-10 12:51 ` Niklas Morberg @ 2003-01-10 13:43 ` Kai Großjohann 2003-01-11 15:41 ` Simon Josefsson 1 sibling, 1 reply; 33+ messages in thread From: Kai Großjohann @ 2003-01-10 13:43 UTC (permalink / raw) Alex Schroeder <alex@emacswiki.org> writes: > Heh, I must confess that nnimap does not work with out company IMAP > server (the article numbers are bigger than elisp allows) -- so I have > never tried it. I take it that nnmaildir knows how to map between file names and article numbers. I wonder if that approach could be adapted to mapping between large and small article numbers for nnimap? -- Ambibibentists unite! ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Getting started with spam filtering 2003-01-10 13:43 ` Kai Großjohann @ 2003-01-11 15:41 ` Simon Josefsson 0 siblings, 0 replies; 33+ messages in thread From: Simon Josefsson @ 2003-01-11 15:41 UTC (permalink / raw) Cc: ding kai.grossjohann@uni-duisburg.de (Kai Großjohann) writes: > Alex Schroeder <alex@emacswiki.org> writes: > >> Heh, I must confess that nnimap does not work with out company IMAP >> server (the article numbers are bigger than elisp allows) -- so I have >> never tried it. > > I take it that nnmaildir knows how to map between file names and > article numbers. I wonder if that approach could be adapted to > mapping between large and small article numbers for nnimap? It could, but it will likely generate new problems and make debugging trickier. Doesn't nnmaildir use the filename as article number, btw? ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Getting started with spam filtering 2003-01-08 15:26 ` Ted Zlatanov 2003-01-08 23:36 ` Alex Schroeder @ 2003-01-09 8:03 ` Niklas Morberg 2003-01-09 16:24 ` Ted Zlatanov 1 sibling, 1 reply; 33+ messages in thread From: Niklas Morberg @ 2003-01-09 8:03 UTC (permalink / raw) Ted Zlatanov <tzz@lifelogs.com> writes: > Right, but I need someone to actually request that - > it's at the bottom of the queue now, I'd rather write > docs and fix existing code right now. So, if anyone > wants spam-stat.el integration within spam.el sooner > instead of later, please let me know. I couldn't get bogofilter up and running on my platform and ifile was apparently a pain to set up, so I would certainly be interested in this. However, there is no need to prioritize this over the other things you are doing with spam.el at the moment. I saw that someone else was currently using spam-stat.el, so I'll just bug him instead of you for a while :) Niklas ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Getting started with spam filtering 2003-01-09 8:03 ` Niklas Morberg @ 2003-01-09 16:24 ` Ted Zlatanov 2003-01-09 23:23 ` Alex Schroeder 0 siblings, 1 reply; 33+ messages in thread From: Ted Zlatanov @ 2003-01-09 16:24 UTC (permalink / raw) Cc: ding, Alex Schroeder [-- Attachment #1: Type: text/plain, Size: 1568 bytes --] On Thu, 09 Jan 2003, niklas.morberg@axis.com wrote: > Ted Zlatanov <tzz@lifelogs.com> writes: > >> Right, but I need someone to actually request that - >> it's at the bottom of the queue now, I'd rather write >> docs and fix existing code right now. So, if anyone >> wants spam-stat.el integration within spam.el sooner >> instead of later, please let me know. > > However, there is no need to prioritize this over the other > things you are doing with spam.el at the moment. I saw that > someone else was currently using spam-stat.el, so I'll just > bug him instead of you for a while :) Because this was something people wanted, I created a patch for spam.el, spam-stat.el, and gnus.el to add spam-stat functionality. It's pretty straightforward, but I need Alex to look at it and tell me if I'm doing things correctly or badly. I especially don't want spam-stat.el to install its hooks into article retrieval, since spam.el passes articles around to spam/ham processors as strings. Also, I added some things to spam-stat.el that are necessary and/or nice such as the spam score threshhold. Alex, can you look at the attached patch (it's against today's Gnus) and see what you think? I don't want to commit or test the patch until I know it's sensible and OK with you. Like I said, avoiding the hook installation is especially important. For spam.el, this adds nothing unusual - just the spam-stat ham and spam processors, and the spam-use-stat variable for splitting incoming mail. You just customize variables as usual to add spam-stat support. Ted [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: spam-stat.patch --] [-- Type: text/x-patch, Size: 4714 bytes --] Index: spam.el =================================================================== RCS file: /usr/local/cvsroot/gnus/lisp/spam.el,v retrieving revision 6.45 diff -r6.45 spam.el 122a123,127 > (defcustom spam-use-stat nil > "Whether spam-stat should be used by spam-split." > :type 'boolean > :group 'spam) > 281a287,292 > (defun spam-group-spam-processor-stat-p (group) > (spam-group-processor-p group 'gnus-group-spam-exit-processor-stat)) > > (defun spam-group-ham-processor-stat-p (group) > (spam-group-processor-p group 'gnus-group-ham-exit-processor-stat)) > 304a316,318 > (when (spam-group-spam-processor-stat-p gnus-newsgroup-name) > (spam-stat-register-spam-routine)) > 323a338,339 > (when (spam-group-ham-processor-stat-p gnus-newsgroup-name) > (spam-stat-register-ham-routine)) 434a451 > (spam-use-stat . spam-check-stat) 455a473,475 > > ;; load the spam-stat tables if needed > (when spam-use-stat (spam-stat-load)) 580d599 < ;; always accept the ifile category 610a630,675 > ;;;; spam-stat > > (condition-case nil > (progn > (let ((spam-stat-install-hooks nil)) > (require 'spam-stat)) > > (defun spam-check-stat () > "Check the spam-stat backend for the classification of this message" > (let ((spam-stat-split-fancy-spam-group spam-split-group) ; override > (spam-stat-buffer (buffer-name)) ; stat the current buffer > category return) > (spam-stat-split-fancy))) > > (defun spam-stat-register-spam-routine () > (spam-generic-register-routine > (lambda (article) > (let ((article-string (spam-get-article-as-string article))) > (with-temp-buffer > (insert-string article-string) > (spam-stat-buffer-is-spam)))) > nil) > (spam-stat-save)) > > (defun spam-stat-register-ham-routine () > (spam-generic-register-routine > nil > (lambda (article) > (let ((article-string (spam-get-article-as-string article))) > (with-temp-buffer > (insert-string article-string) > (spam-stat-buffer-is-non-spam))))) > (spam-stat-save))) > > (file-error (progn > (defalias 'spam-stat-register-ham-routine 'ignore) > (defalias 'spam-stat-register-spam-routine 'ignore) > (defalias 'spam-stat-buffer-is-spam 'ignore) > (defalias 'spam-stat-buffer-is-non-spam 'ignore) > (defalias 'spam-stat-split-fancy 'ignore) > (defalias 'spam-stat-load 'ignore) > (defalias 'spam-stat-save 'ignore) > (defalias 'spam-check-stat 'ignore)))) > > \f > Index: spam-stat.el =================================================================== RCS file: /usr/local/cvsroot/gnus/lisp/spam-stat.el,v retrieving revision 6.7 diff -r6.7 spam-stat.el 130c130 < wether a buffer contains spam or not." --- > whether a buffer contains spam or not." 138a139,144 > (defcustom spam-stat-install-hooks t > "Whether spam-stat should install its hooks in Gnus. > This is set to nil if you use spam-stat through spam.el." > :type 'boolean > :group 'spam-stat) > 158c164,165 < `spam-stat-split-fancy' is used in fancy splitting rules." --- > `spam-stat-split-fancy' is used in fancy splitting rules. Has no > effect when spam-stat is invoked through spam.el." 161a169,173 > (defcustom spam-stat-split-fancy-spam-threshhold 0.9 > "Spam score threshhold in spam-stat-split-fancy." > :type 'number > :group 'spam-stat) > 229,232c241,245 < (add-hook 'nnmail-prepare-incoming-message-hook < 'spam-stat-store-current-buffer) < (add-hook 'gnus-select-article-hook < 'spam-stat-store-gnus-article-buffer) --- > (when spam-stat-install-hooks > (add-hook 'nnmail-prepare-incoming-message-hook > 'spam-stat-store-current-buffer) > (add-hook 'gnus-select-article-hook > 'spam-stat-store-gnus-article-buffer)) 474c487 < (when (> (spam-stat-score-buffer) 0.9) --- > (when (> (spam-stat-score-buffer) spam-stat-split-fancy-spam-threshhold) Index: gnus.el =================================================================== RCS file: /usr/local/cvsroot/gnus/lisp/gnus.el,v retrieving revision 6.141 diff -r6.141 gnus.el 1797a1798,1801 > (defvar gnus-group-spam-exit-processor-stat "stat" > "The spam-stat summary exit spam processor. > Only applicable to spam groups.") > 1809a1814,1817 > (defvar gnus-group-ham-exit-processor-stat "stat-ham" > "The spam-stat summary exit ham processor. > Only applicable to non-spam (unclassified and ham) groups.") > 1825a1834 > (variable-item gnus-group-spam-exit-processor-stat) 1828a1838 > (variable-item gnus-group-ham-exit-processor-stat) 1846a1857 > (variable-item gnus-group-spam-exit-processor-stat) 1849a1861 > (variable-item gnus-group-ham-exit-processor-stat) ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Getting started with spam filtering 2003-01-09 16:24 ` Ted Zlatanov @ 2003-01-09 23:23 ` Alex Schroeder 2003-01-10 2:07 ` Ted Zlatanov 0 siblings, 1 reply; 33+ messages in thread From: Alex Schroeder @ 2003-01-09 23:23 UTC (permalink / raw) Ted Zlatanov <tzz@lifelogs.com> writes: > I especially don't want spam-stat.el to install its hooks into article > retrieval, since spam.el passes articles around to spam/ham processors > as strings. Also, I added some things to spam-stat.el that are > necessary and/or nice such as the spam score threshhold. > > Alex, can you look at the attached patch (it's against today's Gnus) > and see what you think? I don't want to commit or test the patch > until I know it's sensible and OK with you. Like I said, avoiding the > hook installation is especially important. Hm. While this will basically work, what happens if somebody does (require 'spam-stat) (require 'spam) etc. I think the hooks will be installed nevertheless. But then again -- what's so bad about the hooks? All they do is store a copy of the article text in another buffer. Sure, it takes some time. But so what? Alex. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Getting started with spam filtering 2003-01-09 23:23 ` Alex Schroeder @ 2003-01-10 2:07 ` Ted Zlatanov 2003-01-10 4:55 ` Alex Schroeder 0 siblings, 1 reply; 33+ messages in thread From: Ted Zlatanov @ 2003-01-10 2:07 UTC (permalink / raw) Cc: ding On Fri, 10 Jan 2003, alex@emacswiki.org wrote: > Hm. While this will basically work, what happens if somebody does > > (require 'spam-stat) > (require 'spam) > etc. > > I think the hooks will be installed nevertheless. So users shouldn't do it :) The idea is you use either spam-stat.el or spam.el but not both normally (because spam.el can include spam-stat.el for you). > But then again -- what's so bad about the hooks? All they do is > store a copy of the article text in another buffer. Sure, it takes > some time. But so what? I would assume Gnus doesn't need to be any slower than it already is. If people *want* the extra slowness, they can load the two as you indicated above. But I see no reason, if we can help it, to add extra processing when spam-stat.el is loaded by spam.el. Are you OK with the other changes in the patch? Ted ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Getting started with spam filtering 2003-01-10 2:07 ` Ted Zlatanov @ 2003-01-10 4:55 ` Alex Schroeder 2003-01-10 5:54 ` Ted Zlatanov 0 siblings, 1 reply; 33+ messages in thread From: Alex Schroeder @ 2003-01-10 4:55 UTC (permalink / raw) Ted Zlatanov <tzz@lifelogs.com> writes: > I would assume Gnus doesn't need to be any slower than it already is. > If people *want* the extra slowness, they can load the two as you > indicated above. But I see no reason, if we can help it, to add extra > processing when spam-stat.el is loaded by spam.el. > > Are you OK with the other changes in the patch? I am ok with all the changes. My only point was that it makes understanding and maintenance a bit more complicated for a relatively small gain in performance (and nobody has yet measured the effect of it). But that is all. Alex. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Getting started with spam filtering 2003-01-10 4:55 ` Alex Schroeder @ 2003-01-10 5:54 ` Ted Zlatanov 2003-01-10 10:41 ` Niklas Morberg 0 siblings, 1 reply; 33+ messages in thread From: Ted Zlatanov @ 2003-01-10 5:54 UTC (permalink / raw) Cc: ding On Fri, 10 Jan 2003, alex@emacswiki.org wrote: > I am ok with all the changes. My only point was that it makes > understanding and maintenance a bit more complicated for a > relatively small gain in performance (and nobody has yet measured > the effect of it). But that is all. OK. I've committed the patch. spam.el now supports spam-stat.el as a splitter (spam-use-stat) and ham/spam processor at summary exit. Thanks for your help. Ted ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Getting started with spam filtering 2003-01-10 5:54 ` Ted Zlatanov @ 2003-01-10 10:41 ` Niklas Morberg 2003-01-10 11:01 ` Niklas Morberg 0 siblings, 1 reply; 33+ messages in thread From: Niklas Morberg @ 2003-01-10 10:41 UTC (permalink / raw) Ted Zlatanov <tzz@lifelogs.com> writes: > On Fri, 10 Jan 2003, alex@emacswiki.org wrote: >> I am ok with all the changes. My only point was that it makes >> understanding and maintenance a bit more complicated for a >> relatively small gain in performance (and nobody has yet measured >> the effect of it). But that is all. > > OK. I've committed the patch. spam.el now supports spam-stat.el as a > splitter (spam-use-stat) and ham/spam processor at summary exit. Great. I've started to use it with nnimap as backend, it seems to work just fine. I'll let you know if anything weird happens. Niklas ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Getting started with spam filtering 2003-01-10 10:41 ` Niklas Morberg @ 2003-01-10 11:01 ` Niklas Morberg 2003-01-10 12:50 ` Ted Zlatanov 0 siblings, 1 reply; 33+ messages in thread From: Niklas Morberg @ 2003-01-10 11:01 UTC (permalink / raw) Nothing weird to report, but I am a bit confused regarding the group parameters for a ham group: 1. I set the spam-contents parameter to `gnus-group-spam-classification-ham' The parameter documentation says: On summary exit, the specified ham processors will be invoked on ham-marked messages. Exercise caution, since the ham processor will see the same message more than once because there is no ham message registry. 2. I set spam-process to `gnus-group-ham-exit-processor-stat'. The thing I don't understand is why I have to do both? Wouldn't it be enough to only do (2)? What happens if I only do (1) and not (2)? Niklas ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Getting started with spam filtering 2003-01-10 11:01 ` Niklas Morberg @ 2003-01-10 12:50 ` Ted Zlatanov 0 siblings, 0 replies; 33+ messages in thread From: Ted Zlatanov @ 2003-01-10 12:50 UTC (permalink / raw) Cc: ding On Fri, 10 Jan 2003, niklas.morberg@axis.com wrote: > Nothing weird to report, but I am a bit confused regarding > the group parameters for a ham group: > > 1. I set the spam-contents parameter to > `gnus-group-spam-classification-ham' > > The parameter documentation says: > > On summary exit, the specified ham processors will be invoked on > ham-marked messages. Exercise caution, since the ham processor > will see the same message more than once because there is no ham > message registry. > > 2. I set spam-process to > `gnus-group-ham-exit-processor-stat'. > > The thing I don't understand is why I have to do both? > Wouldn't it be enough to only do (2)? A ham group does more than just invoke ham processors - you could have a ham group just because you want to use the spam-process-destination parameter, and perhaps future functionality. > What happens if I only do (1) and not (2)? Nothing harmful. You are saying "I have a group with articles that are positively not spam. I don't want them processed, though." The manual has, I think, a good explanation of all the things that apply for a ham group - if it doesn't, let me know so I can improve it. Ted ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Getting started with spam filtering 2003-01-07 14:44 ` Ted Zlatanov 2003-01-07 16:43 ` Frank Schmitt 2003-01-08 6:04 ` Getting started with spam filtering Kai Großjohann @ 2003-01-08 11:28 ` Niklas Morberg 2003-01-08 15:23 ` Ted Zlatanov 2 siblings, 1 reply; 33+ messages in thread From: Niklas Morberg @ 2003-01-08 11:28 UTC (permalink / raw) I can't get the gnus-spam-process-destinations to work as expected. I have set this to: (("incoming" "spam")) and from the documentation I expect messages in incoming that are marked as spam (using e.g. `M-d') to be moved to the spam group when exiting. This does not happen. Is there something else that needs to be enabled too? Maybe it doesn't work for nnimap? I have not yet enabled any spam processors (I'm planning to use bogofilter). Niklas ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Getting started with spam filtering 2003-01-08 11:28 ` Niklas Morberg @ 2003-01-08 15:23 ` Ted Zlatanov 0 siblings, 0 replies; 33+ messages in thread From: Ted Zlatanov @ 2003-01-08 15:23 UTC (permalink / raw) Cc: ding On Wed, 08 Jan 2003, niklas.morberg@axis.com wrote: > I can't get the gnus-spam-process-destinations to work as > expected. I have set this to: > > (("incoming" "spam")) > > and from the documentation I expect messages in incoming > that are marked as spam (using e.g. `M-d') to be moved to > the spam group when exiting. This does not happen. Is there > something else that needs to be enabled too? Maybe it > doesn't work for nnimap? I was just replying to this in another thread. My logic is wrong, I do the move-or-expire only in spam groups, but it should be done either in *all* groups or in non-spam groups. I'll commit a patch a little later today and follow up in the other thread. Ted ^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2003-01-24 13:45 UTC | newest] Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-01-07 13:51 Getting started with spam filtering Niklas Morberg 2003-01-07 14:44 ` Ted Zlatanov 2003-01-07 16:43 ` Frank Schmitt 2003-01-07 17:30 ` Ifile vs. bogofilter (was: Getting started with spam filtering) Frank Schmitt 2003-01-08 6:04 ` Getting started with spam filtering Kai Großjohann 2003-01-08 9:18 ` Lars Magne Ingebrigtsen 2003-01-08 15:26 ` Ted Zlatanov 2003-01-08 23:36 ` Alex Schroeder 2003-01-09 14:23 ` Jorge Godoy 2003-01-09 15:11 ` Andreas Fuchs 2003-01-09 18:38 ` Alex Schroeder 2003-01-10 7:44 ` Niklas Morberg 2003-01-10 12:12 ` Alex Schroeder 2003-01-10 12:51 ` Niklas Morberg 2003-01-10 13:10 ` Ted Zlatanov 2003-01-10 13:43 ` Niklas Morberg 2003-01-10 16:39 ` Ted Zlatanov 2003-01-24 13:45 ` Displaying spam score (Was: Re: Getting started with spam filtering) Niklas Morberg 2003-01-10 14:44 ` Getting started with spam filtering Alex Schroeder 2003-01-15 1:32 ` Danny Siu 2003-01-10 13:43 ` Kai Großjohann 2003-01-11 15:41 ` Simon Josefsson 2003-01-09 8:03 ` Niklas Morberg 2003-01-09 16:24 ` Ted Zlatanov 2003-01-09 23:23 ` Alex Schroeder 2003-01-10 2:07 ` Ted Zlatanov 2003-01-10 4:55 ` Alex Schroeder 2003-01-10 5:54 ` Ted Zlatanov 2003-01-10 10:41 ` Niklas Morberg 2003-01-10 11:01 ` Niklas Morberg 2003-01-10 12:50 ` Ted Zlatanov 2003-01-08 11:28 ` Niklas Morberg 2003-01-08 15:23 ` Ted Zlatanov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).