From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.user/949 Path: news.gmane.org!not-for-mail From: Christopher Browne Newsgroups: gmane.emacs.gnus.user Subject: Re: ifile or similar Date: 19 Aug 2002 15:59:58 GMT Organization: cbbrowne Computing Inc Message-ID: References: <878z34vu6q.fsf@virgil.koldfront.dk> <87n0rkudz3.fsf@virgil.koldfront.dk> <87r8gvuaii.fsf@virgil.koldfront.dk> NNTP-Posting-Host: main.gmane.org X-Trace: sea.gmane.org 1138667808 9072 80.91.229.2 (31 Jan 2006 00:36:48 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 31 Jan 2006 00:36:48 +0000 (UTC) Original-X-From: nobody Tue Jan 17 17:28:23 2006 Original-Path: quimby.gnus.org!lackawana.kippona.com!stargate.gts.cz!fu-berlin.de!uni-berlin.de!hse-mtl-ppp74279.qc.sympatico.CA!not-for-mail Original-Newsgroups: gnu.emacs.gnus Original-NNTP-Posting-Host: hse-mtl-ppp74279.qc.sympatico.ca (64.229.208.56) Original-X-Trace: fu-berlin.de 1029772798 47568598 64.229.208.56 (16 [125932]) X-Draft-From: ("nntp+chvatal:gnu.emacs.gnus" 5900) X-Home-Page: http://www.cbbrowne.com/info/ X-Emacs-Acronym: Edwardian Manifestation of All Colonial Sins Microsoft: We've got the solution for the problem we sold you. X-Shopping-List: (1) Scrumptious fads (2) Pink aluminum (3) Seismic retribution tape (4) Hazardous patriotic turkey confirmers (5) Aesthetic dissonant intrusion intrusions X-Uboat-Death-Message: ATTACKED BY SALESMEN. SINKING. U-26. Original-Xref: bridgekeeper.physik.uni-ulm.de gnus-emacs-gnus:1089 Original-Lines: 55 X-Gnus-Article-Number: 1089 Tue Jan 17 17:28:23 2006 Xref: news.gmane.org gmane.emacs.gnus.user:949 Archived-At: spamtrap@koldfront.dk (Adam Sjøgren) wrote: > On 18 Aug 2002 20:47:18 GMT, Christopher Browne wrote: > >>> I guess the easiest would just be to have procmail/something add an >>> X-header and have Gnus split on that. > [...] > >> If you take that approach, I suggest that you have a _lot_ more than >> just one spam category. There is little reason to expect "phone >> sex" ads to be particularly similar to "Nigerian financial scams" or >> for either to strongly resemble ads about enlarging sexual organs. >> If you put them all together in one folder, that will muddy >> discrimination. > > Really? Paul Grahams recent article "A Plan for Spam" seems to > indicate otherwise: > > http://www.paulgraham.com/spam.html > > (which is where I found a pointer to ifile). I don't know if ifile > works exactly as Paul Grahams scheme, though. An elisp implementation > of that would be even more fun... :-) His "plan for spam" seems a rather new toy. I have been using Ifile for about five years now. > I was thinking of making nonspam, spam and virus. Virus-emails seem to > me to be likely to have a different "pattern". > >> You want better results? Set up several folders; nnml:pyramid, >> nnml:snakeoil, nnml:creditcards, nnml:gambling, nnml:porn, and such. > > That would defeat the purpose of not spending time on spam (if I have > to sort my entire backlog of spam into categories first). All that the scheme is about is about classifying messages. If you make up one "pool" that is murky because it combines a lot of quite different stuff (nigerian pyramids versus porn versus credit card), you can't expect to get as good results as you get if you have a few more categories. It should be pretty straightforward: Better quality corpus -> better quality results. Consider: If you spend an hour setting up a better corpus, and this provides better results for the next five years, that's a pretty good investment of your time, isn't it? -- (reverse (concatenate 'string "gro.mca@" "enworbbc")) http://www.ntlug.org/~cbbrowne/lsf.html "I doubt this language difference would confuse anybody unless you were providing instructions on the insertion of a caffeine enema." -- On alt.coffee