9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] opposite of bloom filter
@ 2010-11-13 18:43 Enrico Weigelt
  2010-11-13 22:26 ` Lyndon Nerenberg (VE6BBM/VE7TFX)
  2010-11-13 22:31 ` Russ Cox
  0 siblings, 2 replies; 14+ messages in thread
From: Enrico Weigelt @ 2010-11-13 18:43 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs


Hi folks,


I'm currently looking for the opposite of an bloom filter, which
may have false negatives but no false positives.

The purpose is allowing an spooling (store+forward) mail relay
to learn which addresses are not accepted by the actual maildrop
(which is connected by an uucp-link, so no direct smtp chat),
to get rid of the thousands silly error bounces from brute force
attacks on email addresses.

An trivial idea would be simply using an hashtable as blacklist,
but obviously that would become very big. I need a more compact
form, perhaps a subset of regex'es, a colored (b)tree, etc ?


Any ideas ?


thx
--
----------------------------------------------------------------------
 Enrico Weigelt, metux IT service -- http://www.metux.de/

 phone:  +49 36207 519931  email: weigelt@metux.de
 mobile: +49 151 27565287  icq:   210169427         skype: nekrad666
----------------------------------------------------------------------
 Embedded-Linux / Portierung / Opensource-QM / Verteilte Systeme
----------------------------------------------------------------------



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] opposite of bloom filter
  2010-11-13 18:43 [9fans] opposite of bloom filter Enrico Weigelt
@ 2010-11-13 22:26 ` Lyndon Nerenberg (VE6BBM/VE7TFX)
  2010-11-13 22:48   ` erik quanstrom
  2010-11-14  2:52   ` Bruce Ellis
  2010-11-13 22:31 ` Russ Cox
  1 sibling, 2 replies; 14+ messages in thread
From: Lyndon Nerenberg (VE6BBM/VE7TFX) @ 2010-11-13 22:26 UTC (permalink / raw)
  To: weigelt, 9fans

> The purpose is allowing an spooling (store+forward) mail relay
> to learn which addresses are not accepted by the actual maildrop
> (which is connected by an uucp-link, so no direct smtp chat),
> to get rid of the thousands silly error bounces from brute force
> attacks on email addresses.

Very(!) interesting approach to this.  I still transfer large chunks
of my mail over UUCP, so this is a problem I face daily.  I'd appreciate
being kept in the loop for anything you find out. (And collaborating on
a solution.)

--lyndon




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] opposite of bloom filter
  2010-11-13 18:43 [9fans] opposite of bloom filter Enrico Weigelt
  2010-11-13 22:26 ` Lyndon Nerenberg (VE6BBM/VE7TFX)
@ 2010-11-13 22:31 ` Russ Cox
  2010-11-13 22:38   ` Lyndon Nerenberg (VE6BBM/VE7TFX)
  1 sibling, 1 reply; 14+ messages in thread
From: Russ Cox @ 2010-11-13 22:31 UTC (permalink / raw)
  To: weigelt, Fans of the OS Plan 9 from Bell Labs

> I'm currently looking for the opposite of an bloom filter, which
> may have false negatives but no false positives.
>
> The purpose is allowing an spooling (store+forward) mail relay
> to learn which addresses are not accepted by the actual maildrop
> (which is connected by an uucp-link, so no direct smtp chat),
> to get rid of the thousands silly error bounces from brute force
> attacks on email addresses.
>
> An trivial idea would be simply using an hashtable as blacklist,
> but obviously that would become very big. I need a more compact
> form, perhaps a subset of regex'es, a colored (b)tree, etc ?
>
> Any ideas ?

Use a Bloom filter.

You are looking for the opposite of a Bloom filter in order
to ask the question "is this an invalid email address?".
Instead, use a Bloom filter to ask the question
"is this a valid email address?".

This requires the remote uucp site to give you a Bloom
filter with all the valid addresses inserted, but that seems
unavoidable.  I don't know how the opposite-of-Bloom-filter
approach would work anyway.

Russ


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] opposite of bloom filter
  2010-11-13 22:31 ` Russ Cox
@ 2010-11-13 22:38   ` Lyndon Nerenberg (VE6BBM/VE7TFX)
  2010-11-13 23:15     ` Russ Cox
  0 siblings, 1 reply; 14+ messages in thread
From: Lyndon Nerenberg (VE6BBM/VE7TFX) @ 2010-11-13 22:38 UTC (permalink / raw)
  To: 9fans

> This requires the remote uucp site to give you a Bloom
> filter with all the valid addresses inserted, but that seems
> unavoidable.  I don't know how the opposite-of-Bloom-filter
> approach would work anyway.

One problem with this is handling wildcarded addresses. How do you indicate
(say) lyndon+* is allowable in a bloom filter, where the '+' is an
arbitrary (to the upstream) symbol.

One idea I've been looking at is exporting a representation of a trie that
can be interpreted by the upstream.

--lyndon




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] opposite of bloom filter
  2010-11-13 22:26 ` Lyndon Nerenberg (VE6BBM/VE7TFX)
@ 2010-11-13 22:48   ` erik quanstrom
  2010-11-13 23:10     ` Lyndon Nerenberg (VE6BBM/VE7TFX)
  2010-11-14  2:52   ` Bruce Ellis
  1 sibling, 1 reply; 14+ messages in thread
From: erik quanstrom @ 2010-11-13 22:48 UTC (permalink / raw)
  To: 9fans

On Sat Nov 13 17:28:09 EST 2010, lyndon@orthanc.ca wrote:
> > The purpose is allowing an spooling (store+forward) mail relay
> > to learn which addresses are not accepted by the actual maildrop
> > (which is connected by an uucp-link, so no direct smtp chat),
> > to get rid of the thousands silly error bounces from brute force
> > attacks on email addresses.
>
> Very(!) interesting approach to this.  I still transfer large chunks
> of my mail over UUCP, so this is a problem I face daily.  I'd appreciate
> being kept in the loop for anything you find out. (And collaborating on
> a solution.)

spammers have a solution to this.  they send to random hashes,
e.g.

ladd Nov 13 04:08:12 Disallowed gossinternational.com!ruiohfsd (gossinternational.com/124.172.212.142) to blocked name quanstro.net!b94cd358e11d3ffb43628c10bc786087

i think the idea of spooling email is largely discredited.
it opens up the possiblity for backscatter spam, or the lack of
delivery rejection notification.  either one is not good.  i think the
acepting smtp server has to be in a position to make a definitive
decision on disposition.  (sorry.)

- erik



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] opposite of bloom filter
  2010-11-13 22:48   ` erik quanstrom
@ 2010-11-13 23:10     ` Lyndon Nerenberg (VE6BBM/VE7TFX)
  0 siblings, 0 replies; 14+ messages in thread
From: Lyndon Nerenberg (VE6BBM/VE7TFX) @ 2010-11-13 23:10 UTC (permalink / raw)
  To: quanstro, 9fans

> i think the idea of spooling email is largely discredited.

It's not a spam avoidance trick.  It's how I get around arbitrary
blockage of SMTP/submission port injection when I'm not sitting at
home.  If you read your mail on a laptop, it's the easiest way around
all the ISP/Hotel/Public-WIFI filtering nonsense that goes on.




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] opposite of bloom filter
  2010-11-13 22:38   ` Lyndon Nerenberg (VE6BBM/VE7TFX)
@ 2010-11-13 23:15     ` Russ Cox
  2010-11-13 23:20       ` erik quanstrom
  2010-11-13 23:28       ` Lyndon Nerenberg (VE6BBM/VE7TFX)
  0 siblings, 2 replies; 14+ messages in thread
From: Russ Cox @ 2010-11-13 23:15 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> One problem with this is handling wildcarded addresses. How do you indicate
> (say) lyndon+* is allowable in a bloom filter, where the '+' is an
> arbitrary (to the upstream) symbol.

Tell the accepting site to strip +* from all the email addresses
before checking.  There aren't that many cases like that.
+ is the canonical one.

> spammers have a solution to this.  they send to random hashes,
> e.g.
>
> ladd Nov 13 04:08:12 Disallowed gossinternational.com!ruiohfsd (gossinternational.com/124.172.212.142) to blocked name quanstro.net!b94cd358e11d3ffb43628c10bc786087
>
> i think the idea of spooling email is largely discredited.
> it opens up the possiblity for backscatter spam, or the lack of
> delivery rejection notification.  either one is not good.  i think the
> acepting smtp server has to be in a position to make a definitive
> decision on disposition.  (sorry.)

The solution I described (a Bloom filter of all the valid addresses)
would work fine for this.  An optimally sized Bloom filter requires
about 4.8 bits per power of ten per address.  If you want a 1 in 1000
chance of a spammy address getting through and have n valid addresses,
you need to a Bloom filter of size 3 * 4.8 * n = 14.4n bits.

Russ


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] opposite of bloom filter
  2010-11-13 23:15     ` Russ Cox
@ 2010-11-13 23:20       ` erik quanstrom
  2010-11-13 23:28       ` Lyndon Nerenberg (VE6BBM/VE7TFX)
  1 sibling, 0 replies; 14+ messages in thread
From: erik quanstrom @ 2010-11-13 23:20 UTC (permalink / raw)
  To: 9fans

> > ladd Nov 13 04:08:12 Disallowed gossinternational.com!ruiohfsd (gossinternational.com/124.172.212.142) to blocked name quanstro.net!b94cd358e11d3ffb43628c10bc786087
> >
> > i think the idea of spooling email is largely discredited.
> > it opens up the possiblity for backscatter spam, or the lack of
> > delivery rejection notification.  either one is not good.  i think the
> > acepting smtp server has to be in a position to make a definitive
> > decision on disposition.  (sorry.)
>
> The solution I described (a Bloom filter of all the valid addresses)
> would work fine for this.  An optimally sized Bloom filter requires
> about 4.8 bits per power of ten per address.  If you want a 1 in 1000
> chance of a spammy address getting through and have n valid addresses,
> you need to a Bloom filter of size 3 * 4.8 * n = 14.4n bits.

i don't doubt the efficiency of bloom filters, but since
these addresses are never repeated, i don't see how it
would help.  (i've gotten 1000s of these/day.)

- erik



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] opposite of bloom filter
  2010-11-13 23:15     ` Russ Cox
  2010-11-13 23:20       ` erik quanstrom
@ 2010-11-13 23:28       ` Lyndon Nerenberg (VE6BBM/VE7TFX)
  2010-11-13 23:46         ` erik quanstrom
  1 sibling, 1 reply; 14+ messages in thread
From: Lyndon Nerenberg (VE6BBM/VE7TFX) @ 2010-11-13 23:28 UTC (permalink / raw)
  To: 9fans

> Tell the accepting site to strip +* from all the email addresses
> before checking.  There aren't that many cases like that.

There aren't many, but at least one that I care about exists.  The
case is one-off throw away addresses.  When I send a message, I
generate an address crypto-based on the recipient and the time-frame I
expect a response.  I don't want mail coming back outside the
specified response period.  A bloom filter can't do this†.  A state
machine driven trie can.

The acceptance criteria are not relevant.  If they can be expressed as
part of the restriction list, they are valid.  And there are several
that I'm not going to get into arguments with people about over their
validity.

--lyndon




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] opposite of bloom filter
  2010-11-13 23:28       ` Lyndon Nerenberg (VE6BBM/VE7TFX)
@ 2010-11-13 23:46         ` erik quanstrom
  2010-11-13 23:51           ` Lyndon Nerenberg (VE6BBM/VE7TFX)
  0 siblings, 1 reply; 14+ messages in thread
From: erik quanstrom @ 2010-11-13 23:46 UTC (permalink / raw)
  To: 9fans

> There aren't many, but at least one that I care about exists.  The
> case is one-off throw away addresses.  When I send a message, I
> generate an address crypto-based on the recipient and the time-frame I
> expect a response.  I don't want mail coming back outside the
> specified response period.  A bloom filter can't do this†.  A state
> machine driven trie can.

okay, there must be more to the story.  why do you need crypto
secure burner email addresses to avoid spam?

- erik



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] opposite of bloom filter
  2010-11-13 23:46         ` erik quanstrom
@ 2010-11-13 23:51           ` Lyndon Nerenberg (VE6BBM/VE7TFX)
  0 siblings, 0 replies; 14+ messages in thread
From: Lyndon Nerenberg (VE6BBM/VE7TFX) @ 2010-11-13 23:51 UTC (permalink / raw)
  To: 9fans

> okay, there must be more to the story.  why do you need crypto
> secure burner email addresses to avoid spam?

If I could tell you that, I wouldn't need them.




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] opposite of bloom filter
  2010-11-13 22:26 ` Lyndon Nerenberg (VE6BBM/VE7TFX)
  2010-11-13 22:48   ` erik quanstrom
@ 2010-11-14  2:52   ` Bruce Ellis
  2010-11-14  2:54     ` Bruce Ellis
  1 sibling, 1 reply; 14+ messages in thread
From: Bruce Ellis @ 2010-11-14  2:52 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

grep -f is very efficient. you can extract it to a lib if you like.
please think about this and peek at the code before replying. i
understand the code because i was lucky enough to be in the room when
it was written.

a negatie bloom sounds good but your positives will (potentially) collide.

so the data structures i would recommend are a suffix tree
[McCreight], but reversed as the suffixes are common, or if you are
thinking of hashing go for a scapegoat tree.

brucee

On Sun, Nov 14, 2010 at 9:26 AM, Lyndon Nerenberg (VE6BBM/VE7TFX)
<lyndon@orthanc.ca> wrote:
>> The purpose is allowing an spooling (store+forward) mail relay
>> to learn which addresses are not accepted by the actual maildrop
>> (which is connected by an uucp-link, so no direct smtp chat),
>> to get rid of the thousands silly error bounces from brute force
>> attacks on email addresses.
>
> Very(!) interesting approach to this.  I still transfer large chunks
> of my mail over UUCP, so this is a problem I face daily.  I'd appreciate
> being kept in the loop for anything you find out. (And collaborating on
> a solution.)
>
> --lyndon
>
>
>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] opposite of bloom filter
  2010-11-14  2:52   ` Bruce Ellis
@ 2010-11-14  2:54     ` Bruce Ellis
  2010-11-14  2:56       ` Bruce Ellis
  0 siblings, 1 reply; 14+ messages in thread
From: Bruce Ellis @ 2010-11-14  2:54 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

oops "prefixes are common". it's hot and i'm still wingless.

brucee

On Sun, Nov 14, 2010 at 1:52 PM, Bruce Ellis <bruce.ellis@gmail.com> wrote:
> grep -f is very efficient. you can extract it to a lib if you like.
> please think about this and peek at the code before replying. i
> understand the code because i was lucky enough to be in the room when
> it was written.
>
> a negatie bloom sounds good but your positives will (potentially) collide.
>
> so the data structures i would recommend are a suffix tree
> [McCreight], but reversed as the suffixes are common, or if you are
> thinking of hashing go for a scapegoat tree.
>
> brucee
>
> On Sun, Nov 14, 2010 at 9:26 AM, Lyndon Nerenberg (VE6BBM/VE7TFX)
> <lyndon@orthanc.ca> wrote:
>>> The purpose is allowing an spooling (store+forward) mail relay
>>> to learn which addresses are not accepted by the actual maildrop
>>> (which is connected by an uucp-link, so no direct smtp chat),
>>> to get rid of the thousands silly error bounces from brute force
>>> attacks on email addresses.
>>
>> Very(!) interesting approach to this.  I still transfer large chunks
>> of my mail over UUCP, so this is a problem I face daily.  I'd appreciate
>> being kept in the loop for anything you find out. (And collaborating on
>> a solution.)
>>
>> --lyndon
>>
>>
>>
>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [9fans] opposite of bloom filter
  2010-11-14  2:54     ` Bruce Ellis
@ 2010-11-14  2:56       ` Bruce Ellis
  0 siblings, 0 replies; 14+ messages in thread
From: Bruce Ellis @ 2010-11-14  2:56 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

crap i was right the first time. suffixes from .com to .crazy-tool.com

On Sun, Nov 14, 2010 at 1:54 PM, Bruce Ellis <bruce.ellis@gmail.com> wrote:
> oops "prefixes are common". it's hot and i'm still wingless.
>
> brucee
>
> On Sun, Nov 14, 2010 at 1:52 PM, Bruce Ellis <bruce.ellis@gmail.com> wrote:
>> grep -f is very efficient. you can extract it to a lib if you like.
>> please think about this and peek at the code before replying. i
>> understand the code because i was lucky enough to be in the room when
>> it was written.
>>
>> a negatie bloom sounds good but your positives will (potentially) collide.
>>
>> so the data structures i would recommend are a suffix tree
>> [McCreight], but reversed as the suffixes are common, or if you are
>> thinking of hashing go for a scapegoat tree.
>>
>> brucee
>>
>> On Sun, Nov 14, 2010 at 9:26 AM, Lyndon Nerenberg (VE6BBM/VE7TFX)
>> <lyndon@orthanc.ca> wrote:
>>>> The purpose is allowing an spooling (store+forward) mail relay
>>>> to learn which addresses are not accepted by the actual maildrop
>>>> (which is connected by an uucp-link, so no direct smtp chat),
>>>> to get rid of the thousands silly error bounces from brute force
>>>> attacks on email addresses.
>>>
>>> Very(!) interesting approach to this.  I still transfer large chunks
>>> of my mail over UUCP, so this is a problem I face daily.  I'd appreciate
>>> being kept in the loop for anything you find out. (And collaborating on
>>> a solution.)
>>>
>>> --lyndon
>>>
>>>
>>>
>>
>



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2010-11-14  2:56 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-13 18:43 [9fans] opposite of bloom filter Enrico Weigelt
2010-11-13 22:26 ` Lyndon Nerenberg (VE6BBM/VE7TFX)
2010-11-13 22:48   ` erik quanstrom
2010-11-13 23:10     ` Lyndon Nerenberg (VE6BBM/VE7TFX)
2010-11-14  2:52   ` Bruce Ellis
2010-11-14  2:54     ` Bruce Ellis
2010-11-14  2:56       ` Bruce Ellis
2010-11-13 22:31 ` Russ Cox
2010-11-13 22:38   ` Lyndon Nerenberg (VE6BBM/VE7TFX)
2010-11-13 23:15     ` Russ Cox
2010-11-13 23:20       ` erik quanstrom
2010-11-13 23:28       ` Lyndon Nerenberg (VE6BBM/VE7TFX)
2010-11-13 23:46         ` erik quanstrom
2010-11-13 23:51           ` Lyndon Nerenberg (VE6BBM/VE7TFX)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).