From mboxrd@z Thu Jan  1 00:00:00 1970
Message-ID: <054a01c382c2$491fe840$b9844051@insultant.net>
From: "boyd, rounin" <boyd@insultant.net>
To: <9fans@cse.psu.edu>
References: <e723225c6010f7a5755fc4d889f87f5b@juice.thebigchoice.com> <3F716844.2050005@acm.org>
Subject: Re: [9fans] spam
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Date: Wed, 24 Sep 2003 19:35:43 +0200
Content-Transfer-Encoding: quoted-printable
Topicbox-Message-UUID: 47508afe-eacc-11e9-9e20-41e7f4b1d025

here is my smtp level spam killer idea.

we need some terms first:

    Pok       =3D probability that it's ok  to deliver
    Pspam  =3D means spam
    Pgood  =3D some value <=3D Pspam

i think Pspam =3D 1 - Pok and Pok =3D=3D 0.001 [1/1000, 1 message in a 10=
00]

   Pbip =3D probability of a bad IP address
   Pbm =3D probability of a bad sender/address/message [MAIL FROM <...>]

so then we need a black and a white list (per user or global or a mix).
these must be small, otherwise we have a 9 mil round in the foot.

black list:

    seeded with a small number of open smtp relays/whatever IP
    addresses [dotted quads] which a human can administer.

white list:

    seeded with a small number (or none) people you 'like' which
    a human can administer.

both lists are a key/value pair.  the key is the dotted quad or the perso=
n
you like.  the value is a number.

so as soon as we get the MAIL FROM we calculate [dc follows]:

    Pbip Pbm * Pbip Pbm * 1 Pbip - 1 Pbm - *  +  /

and we call that Pgood

and if the result is:

    >  Pspam it gets returned
    <=3D Pspam it gets delivered

now, before you say 'division by zero':

    - iff the IP address is not found Pok is returned
    - iff the 'person' you like is not found Pok is returned

Pbip =3D 1 1 n / - iff n > 1

Pbm =3D 1/n iff n > 1

0 means 'not found' and in this and all other cases Pok is returned.

if you've got this far then the interesting stuff happens:

  law 1: it MUST fail safe

a message that has Pgood <=3D Pspam gets delivered and 2 things
happen when the Pgood is evaluated:

    1) Pgood   > Pspam : 'bad' dotted quads have their n++
    2) Pgood <=3D Pspam : good 'people's have there n++, ['bad' dotted qu=
ads could
have their n---]

well it's more than that, 'cos you can say in the case where
Pgood > Pspam that the dotted quad is _automatically_
added to the black list.

using these techniques i believe it can 'learn'.

when Pgood > Pspam we kill 'em, potentially auditing the transaction, BUT
also sending a reply (iirc MAIL FROM <> is for that) so they can say i'm
not a T [bad guy] in a form that a machine/program could not (or it would
take a significant effort defeat).

this is the moat.  the filter is the castle walls.

i would more than appreciate mail of the form:

    boyd, you fuckhead, you overlooked this case

this stuff is hard.  i know what i know, but;

    i'm just a small town white boy
    tryin' ta make ends meet

going back to 'law 1' any 'spam' must be saved in an easily retrievable f=
orm;
upas/deliver can do this.  but it's double edged sword, but disk is cheap=
.
the purpose is to get the machine to do '1 shot 1 kill', so you don't win=
d
up with a bunch of shit to sift through.

voil=E0


(c) Boyd Roberts <boyd@insultant.net> (All Rights Reserved)


ps.  i blame it all on 4 hours sleep, new 'zep DVD and red -- Kashmir!!