9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: steve.simon@snellwilcox.com
To: 9fans@cse.psu.edu
Subject: [9fans] spam filtering fs
Date: Mon,  1 Sep 2003 13:48:39 +0100	[thread overview]
Message-ID: <b89f5f87b787fdcd291b8f80d2d87dd8@yourdomain.dom> (raw)

Hi,

I'am starting to think about a spam filtering again.

I plan to use Paul Grahams ideas plus the changes
sugested by Gary Robinson; Basicially a Naive Baiesian
classifier.

I think its best to implement it as a filesystem which overlays
upas/fs which is transparent to valid email but opaque to spam,
this meanst the token frequency database remains in RAM, not
having to be reloaded to test each new email.

In order for the filter to learn the user must be able to classify
the few spams that slip through, so I propose to wstat() emails
to zero length before deletion if they are spam and simply delete
them if they are valid.

In the longer term I have some ideas about fingerprinting attached
images, adding tokens for "is html email", "has MS screen-saver attached"
etc.

Anyone any opinions on this approach?

-Steve


             reply	other threads:[~2003-09-01 12:48 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-09-01 12:48 steve.simon [this message]
2003-09-01 14:16 ` Fco.J.Ballesteros
2003-09-01 14:19   ` David Presotto
2003-09-01 14:30     ` Dan Cross
     [not found] <758707399@snellwilcox.com>
2003-09-01 14:36 ` steve.simon
2003-09-01 14:41   ` Fco.J.Ballesteros
2003-09-01 14:49     ` boyd, rounin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b89f5f87b787fdcd291b8f80d2d87dd8@yourdomain.dom \
    --to=steve.simon@snellwilcox.com \
    --cc=9fans@cse.psu.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).