From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <054a01c382c2$491fe840$b9844051@insultant.net> From: "boyd, rounin" To: <9fans@cse.psu.edu> References: <3F716844.2050005@acm.org> Subject: Re: [9fans] spam MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Date: Wed, 24 Sep 2003 19:35:43 +0200 Content-Transfer-Encoding: quoted-printable Topicbox-Message-UUID: 47508afe-eacc-11e9-9e20-41e7f4b1d025 here is my smtp level spam killer idea. we need some terms first: Pok =3D probability that it's ok to deliver Pspam =3D means spam Pgood =3D some value <=3D Pspam i think Pspam =3D 1 - Pok and Pok =3D=3D 0.001 [1/1000, 1 message in a 10= 00] Pbip =3D probability of a bad IP address Pbm =3D probability of a bad sender/address/message [MAIL FROM <...>] so then we need a black and a white list (per user or global or a mix). these must be small, otherwise we have a 9 mil round in the foot. black list: seeded with a small number of open smtp relays/whatever IP addresses [dotted quads] which a human can administer. white list: seeded with a small number (or none) people you 'like' which a human can administer. both lists are a key/value pair. the key is the dotted quad or the perso= n you like. the value is a number. so as soon as we get the MAIL FROM we calculate [dc follows]: Pbip Pbm * Pbip Pbm * 1 Pbip - 1 Pbm - * + / and we call that Pgood and if the result is: > Pspam it gets returned <=3D Pspam it gets delivered now, before you say 'division by zero': - iff the IP address is not found Pok is returned - iff the 'person' you like is not found Pok is returned Pbip =3D 1 1 n / - iff n > 1 Pbm =3D 1/n iff n > 1 0 means 'not found' and in this and all other cases Pok is returned. if you've got this far then the interesting stuff happens: law 1: it MUST fail safe a message that has Pgood <=3D Pspam gets delivered and 2 things happen when the Pgood is evaluated: 1) Pgood > Pspam : 'bad' dotted quads have their n++ 2) Pgood <=3D Pspam : good 'people's have there n++, ['bad' dotted qu= ads could have their n---] well it's more than that, 'cos you can say in the case where Pgood > Pspam that the dotted quad is _automatically_ added to the black list. using these techniques i believe it can 'learn'. when Pgood > Pspam we kill 'em, potentially auditing the transaction, BUT also sending a reply (iirc MAIL FROM <> is for that) so they can say i'm not a T [bad guy] in a form that a machine/program could not (or it would take a significant effort defeat). this is the moat. the filter is the castle walls. i would more than appreciate mail of the form: boyd, you fuckhead, you overlooked this case this stuff is hard. i know what i know, but; i'm just a small town white boy tryin' ta make ends meet going back to 'law 1' any 'spam' must be saved in an easily retrievable f= orm; upas/deliver can do this. but it's double edged sword, but disk is cheap= . the purpose is to get the machine to do '1 shot 1 kill', so you don't win= d up with a bunch of shit to sift through. voil=E0 (c) Boyd Roberts (All Rights Reserved) ps. i blame it all on 4 hours sleep, new 'zep DVD and red -- Kashmir!!