Announcements and discussions for Gnus, the GNU Emacs Usenet newsreader
 help / color / mirror / Atom feed
From: Christopher Browne <cbbrowne@acm.org>
Subject: Re: ifile or similar
Date: 19 Aug 2002 15:59:58 GMT	[thread overview]
Message-ID: <ajr4lt$1dblmm$1@ID-125932.news.dfncis.de> (raw)
In-Reply-To: <87r8gvuaii.fsf@virgil.koldfront.dk>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2207 bytes --]

spamtrap@koldfront.dk (Adam Sjøgren) wrote:
> On 18 Aug 2002 20:47:18 GMT, Christopher Browne wrote:
>
>>> I guess the easiest would just be to have procmail/something add an
>>> X-header and have Gnus split on that.
> [...]
>
>> If you take that approach, I suggest that you have a _lot_ more than
>> just one spam category.  There is little reason to expect "phone
>> sex" ads to be particularly similar to "Nigerian financial scams" or
>> for either to strongly resemble ads about enlarging sexual organs.
>> If you put them all together in one folder, that will muddy
>> discrimination.
>
> Really? Paul Grahams recent article "A Plan for Spam" seems to
> indicate otherwise:
>
>  http://www.paulgraham.com/spam.html
>
> (which is where I found a pointer to ifile). I don't know if ifile
> works exactly as Paul Grahams scheme, though. An elisp implementation
> of that would be even more fun... :-)

His "plan for spam" seems a rather new toy.  I have been using Ifile
for about five years now.

> I was thinking of making nonspam, spam and virus. Virus-emails seem to
> me to be likely to have a different "pattern".
>
>> You want better results?  Set up several folders; nnml:pyramid,
>> nnml:snakeoil, nnml:creditcards, nnml:gambling, nnml:porn, and such.
>
> That would defeat the purpose of not spending time on spam (if I have
> to sort my entire backlog of spam into categories first).

All that the scheme is about is about classifying messages.

If you make up one "pool" that is murky because it combines a lot of
quite different stuff (nigerian pyramids versus porn versus credit
card), you can't expect to get as good results as you get if you have
a few more categories.

It should be pretty straightforward:

  Better quality corpus -> better quality results.

Consider: If you spend an hour setting up a better corpus, and this
provides better results for the next five years, that's a pretty good
investment of your time, isn't it?
-- 
(reverse (concatenate 'string "gro.mca@" "enworbbc"))
http://www.ntlug.org/~cbbrowne/lsf.html
"I  doubt this language  difference would  confuse anybody  unless you
were providing instructions on the insertion of a caffeine enema."
-- On alt.coffee


  parent reply	other threads:[~2002-08-19 15:59 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-08-18 19:33 Adam Sjøgren
     [not found] ` <ajotn9$1db2j7$1@ID-125932.news.dfncis.de>
     [not found]   ` <87n0rkudz3.fsf@virgil.koldfront.dk>
     [not found]     ` <ajp14m$1dacii$1@ID-125932.news.dfncis.de>
2002-08-18 21:23       ` Adam Sjøgren
2002-08-18 23:01         ` Bruce Stephens
2002-08-19 15:59         ` Christopher Browne [this message]
     [not found] ` <ifcrja.d4k.ln@obelix.bakkelygaard.dk>
2002-08-22 12:26   ` Clemens Fischer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='ajr4lt$1dblmm$1@ID-125932.news.dfncis.de' \
    --to=cbbrowne@acm.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).