From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-Version: 1.0 In-Reply-To: <56c4420c09a80cff05fd90deaef6c2b7@gandalf.orthanc.ca> References: <56c4420c09a80cff05fd90deaef6c2b7@gandalf.orthanc.ca> Date: Sat, 13 Nov 2010 18:15:20 -0500 Message-ID: Subject: Re: [9fans] opposite of bloom filter From: Russ Cox To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Topicbox-Message-UUID: 7c896f56-ead6-11e9-9d60-3106f5b1d025 > One problem with this is handling wildcarded addresses. How do you indica= te > (say) lyndon+* is allowable in a bloom filter, where the '+' is an > arbitrary (to the upstream) symbol. Tell the accepting site to strip +* from all the email addresses before checking. There aren't that many cases like that. + is the canonical one. > spammers have a solution to this. =C2=A0they send to random hashes, > e.g. > > ladd Nov 13 04:08:12 Disallowed gossinternational.com!ruiohfsd (gossinter= national.com/124.172.212.142) to blocked name quanstro.net!b94cd358e11d3ffb= 43628c10bc786087 > > i think the idea of spooling email is largely discredited. > it opens up the possiblity for backscatter spam, or the lack of > delivery rejection notification. =C2=A0either one is not good. =C2=A0i th= ink the > acepting smtp server has to be in a position to make a definitive > decision on disposition. =C2=A0(sorry.) The solution I described (a Bloom filter of all the valid addresses) would work fine for this. An optimally sized Bloom filter requires about 4.8 bits per power of ten per address. If you want a 1 in 1000 chance of a spammy address getting through and have n valid addresses, you need to a Bloom filter of size 3 * 4.8 * n =3D 14.4n bits. Russ