* ifile or similar @ 2002-08-18 19:33 Adam Sjøgren [not found] ` <ajotn9$1db2j7$1@ID-125932.news.dfncis.de> [not found] ` <ifcrja.d4k.ln@obelix.bakkelygaard.dk> 0 siblings, 2 replies; 5+ messages in thread From: Adam Sjøgren @ 2002-08-18 19:33 UTC (permalink / raw) Hi. Has anyone integrated ifile <http://www.ai.mit.edu/~jrennie/ifile/> or some similar system with Gnus? Pointers much appriciated! Best regards, -- "Fra én som sover for lidt, Adam Sjøgren som synes verden er stor" asjo@koldfront.dk ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <ajotn9$1db2j7$1@ID-125932.news.dfncis.de>]
[parent not found: <87n0rkudz3.fsf@virgil.koldfront.dk>]
[parent not found: <ajp14m$1dacii$1@ID-125932.news.dfncis.de>]
* Re: ifile or similar [not found] ` <ajp14m$1dacii$1@ID-125932.news.dfncis.de> @ 2002-08-18 21:23 ` Adam Sjøgren 2002-08-18 23:01 ` Bruce Stephens 2002-08-19 15:59 ` Christopher Browne 0 siblings, 2 replies; 5+ messages in thread From: Adam Sjøgren @ 2002-08-18 21:23 UTC (permalink / raw) On 18 Aug 2002 20:47:18 GMT, Christopher Browne wrote: >> I guess the easiest would just be to have procmail/something add an >> X-header and have Gnus split on that. [...] > If you take that approach, I suggest that you have a _lot_ more than > just one spam category. There is little reason to expect "phone > sex" ads to be particularly similar to "Nigerian financial scams" or > for either to strongly resemble ads about enlarging sexual organs. > If you put them all together in one folder, that will muddy > discrimination. Really? Paul Grahams recent article "A Plan for Spam" seems to indicate otherwise: http://www.paulgraham.com/spam.html (which is where I found a pointer to ifile). I don't know if ifile works exactly as Paul Grahams scheme, though. An elisp implementation of that would be even more fun... :-) I was thinking of making nonspam, spam and virus. Virus-emails seem to me to be likely to have a different "pattern". > You want better results? Set up several folders; nnml:pyramid, > nnml:snakeoil, nnml:creditcards, nnml:gambling, nnml:porn, and such. That would defeat the purpose of not spending time on spam (if I have to sort my entire backlog of spam into categories first). Best wishes, -- "Fra én som sover for lidt, Adam Sjøgren som synes verden er stor" asjo@koldfront.dk ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ifile or similar 2002-08-18 21:23 ` Adam Sjøgren @ 2002-08-18 23:01 ` Bruce Stephens 2002-08-19 15:59 ` Christopher Browne 1 sibling, 0 replies; 5+ messages in thread From: Bruce Stephens @ 2002-08-18 23:01 UTC (permalink / raw) spamtrap@koldfront.dk (Adam Sjøgren) writes: [...] > I don't know if ifile works exactly as Paul Grahams scheme, > though. An elisp implementation of that would be even more > fun... :-) <http://www.emacswiki.org/cgi-bin/wiki.pl?SpamStat>, which was posted to gnu.emacs.sources a day or two ago. [...] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ifile or similar 2002-08-18 21:23 ` Adam Sjøgren 2002-08-18 23:01 ` Bruce Stephens @ 2002-08-19 15:59 ` Christopher Browne 1 sibling, 0 replies; 5+ messages in thread From: Christopher Browne @ 2002-08-19 15:59 UTC (permalink / raw) [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain, Size: 2207 bytes --] spamtrap@koldfront.dk (Adam Sjøgren) wrote: > On 18 Aug 2002 20:47:18 GMT, Christopher Browne wrote: > >>> I guess the easiest would just be to have procmail/something add an >>> X-header and have Gnus split on that. > [...] > >> If you take that approach, I suggest that you have a _lot_ more than >> just one spam category. There is little reason to expect "phone >> sex" ads to be particularly similar to "Nigerian financial scams" or >> for either to strongly resemble ads about enlarging sexual organs. >> If you put them all together in one folder, that will muddy >> discrimination. > > Really? Paul Grahams recent article "A Plan for Spam" seems to > indicate otherwise: > > http://www.paulgraham.com/spam.html > > (which is where I found a pointer to ifile). I don't know if ifile > works exactly as Paul Grahams scheme, though. An elisp implementation > of that would be even more fun... :-) His "plan for spam" seems a rather new toy. I have been using Ifile for about five years now. > I was thinking of making nonspam, spam and virus. Virus-emails seem to > me to be likely to have a different "pattern". > >> You want better results? Set up several folders; nnml:pyramid, >> nnml:snakeoil, nnml:creditcards, nnml:gambling, nnml:porn, and such. > > That would defeat the purpose of not spending time on spam (if I have > to sort my entire backlog of spam into categories first). All that the scheme is about is about classifying messages. If you make up one "pool" that is murky because it combines a lot of quite different stuff (nigerian pyramids versus porn versus credit card), you can't expect to get as good results as you get if you have a few more categories. It should be pretty straightforward: Better quality corpus -> better quality results. Consider: If you spend an hour setting up a better corpus, and this provides better results for the next five years, that's a pretty good investment of your time, isn't it? -- (reverse (concatenate 'string "gro.mca@" "enworbbc")) http://www.ntlug.org/~cbbrowne/lsf.html "I doubt this language difference would confuse anybody unless you were providing instructions on the insertion of a caffeine enema." -- On alt.coffee ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <ifcrja.d4k.ln@obelix.bakkelygaard.dk>]
* Re: ifile or similar [not found] ` <ifcrja.d4k.ln@obelix.bakkelygaard.dk> @ 2002-08-22 12:26 ` Clemens Fischer 0 siblings, 0 replies; 5+ messages in thread From: Clemens Fischer @ 2002-08-22 12:26 UTC (permalink / raw) Anders Wegge Jakobsen <wegge@bakkelygaard.dk> writes: > "Adam" == Adam Sjøgren <spamtrap@koldfront.dk> writes: > > If you're already using procmail for splitting mail, it is pretty > straightforward to do: > > FOLDER=`ifile -gwc -Q |cut -d' ' -f1`.spool > # > :0: > $FOLDER i'm not sure about the options. the most recent version ifile-1.0.7 has (finally!) an option for easier scripting. the call should be more like: # -ino: 2-20.08.02-20:08 new option "-c" with version 1.0.7: FOLDER=`/usr/local/bin/ifile -cQ` the option "-c" or "--concise" works with "-q" (query database for classification) and "-Q", which queries the database /and/ inserts the statistics of the current document into it. this obsoletes: #FOLDER=`ifile -g -v0 -Q |head -1 |cut -d' ' -f1` the "-c" option basically does what the extranous "|cut ..." pipe is (propably) supposed to do. the "-w" option in your invocation would make ifile lex for white space separated words. i think it is better to let it use the default lexer, which considers punctuations as well, thus including statistics on typical host name particles and header values. clemens ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2002-08-22 12:26 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2002-08-18 19:33 ifile or similar Adam Sjøgren [not found] ` <ajotn9$1db2j7$1@ID-125932.news.dfncis.de> [not found] ` <87n0rkudz3.fsf@virgil.koldfront.dk> [not found] ` <ajp14m$1dacii$1@ID-125932.news.dfncis.de> 2002-08-18 21:23 ` Adam Sjøgren 2002-08-18 23:01 ` Bruce Stephens 2002-08-19 15:59 ` Christopher Browne [not found] ` <ifcrja.d4k.ln@obelix.bakkelygaard.dk> 2002-08-22 12:26 ` Clemens Fischer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).