From mboxrd@z Thu Jan  1 00:00:00 1970
Message-ID: <b89f5f87b787fdcd291b8f80d2d87dd8@yourdomain.dom>
From: steve.simon@snellwilcox.com
To: 9fans@cse.psu.edu
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
Subject: [9fans] spam filtering fs
Date: Mon,  1 Sep 2003 13:48:39 +0100
Topicbox-Message-UUID: 2738bd0e-eacc-11e9-9e20-41e7f4b1d025

Hi,

I'am starting to think about a spam filtering again.

I plan to use Paul Grahams ideas plus the changes
sugested by Gary Robinson; Basicially a Naive Baiesian
classifier.

I think its best to implement it as a filesystem which overlays
upas/fs which is transparent to valid email but opaque to spam,
this meanst the token frequency database remains in RAM, not
having to be reloaded to test each new email.

In order for the filter to learn the user must be able to classify
the few spams that slip through, so I propose to wstat() emails
to zero length before deletion if they are spam and simply delete
them if they are valid.

In the longer term I have some ideas about fingerprinting attached
images, adding tokens for "is html email", "has MS screen-saver attached"
etc.

Anyone any opinions on this approach?

-Steve