9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: "Russ Cox" <rsc@plan9.bell-labs.com>
To: 9fans@cse.psu.edu
Subject: [9fans] bayesian spam filtering
Date: Mon,  1 Sep 2003 10:53:53 -0400	[thread overview]
Message-ID: <ea73544da4c80da1896bfa64585a90ee@plan9.bell-labs.com> (raw)

I've spent a few days here and there working on it,
and it's easy to get something half-decent but hard to
get something that's really quite good.

If you're interested in what I've done already, I just
packaged up all the pieces (I think) and put them in
/n/sources/extra/spam.tar.gz.  See /n/sources/extra/spam.notes
for the list of files I included as well as a note about
how to use it that I wrote but never circulated.

I think my stuff is close -- the biggest wart is that
it uses Berkeley DB to store the frequency tables.
I think that might be necessary, but I'm not really sure.
Tokenization is hard.  Msgcat does a reasonable job.

I've actually been using Mozilla Thunderbird to process
my mail mostly (though I'm typing this in nedmail).
The one feature it has that I really like is the
spam handling.  It could be better in some ways,
but the spam interface is much better than what I built.
When mail comes in, it's automatically flagged as
spam or not.  You can then correct its mistakes (if any)
by just clicking on lines in the message listing.
Then once you're happy you click "Delete Mail Marked as Junk"
and away they go.  If acme and nedmail had a way
way to select disjoint sets of messages, I think
that would help a lot.  I've found that although
Thunderbird's filtering is not as accurate as what
I built, the interface makes up for it.

Notice I said process, not read.  I still read with
nedmail and Mail, at least when I'm in Plan 9, but
to do things like kill off the spam and manage lots
of different mail boxes, the external mail readers
seem like a big win.  Sad but true.

Russ


             reply	other threads:[~2003-09-01 14:53 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-09-01 14:53 Russ Cox [this message]
2003-09-19  6:07 Russ Cox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ea73544da4c80da1896bfa64585a90ee@plan9.bell-labs.com \
    --to=rsc@plan9.bell-labs.com \
    --cc=9fans@cse.psu.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).