9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* Re: [9fans] tactic
       [not found] <QKKPXMWZAUGMQBSVGGFUZ@guanajuato.com>
@ 2004-03-31 14:38 ` ron minnich
  2004-03-31 17:02   ` Micah Stetson
  2004-03-31 21:05   ` boyd, rounin
  0 siblings, 2 replies; 19+ messages in thread
From: ron minnich @ 2004-03-31 14:38 UTC (permalink / raw)
  To: 9fans

you just can't beat those bayesian spam filters with a stick, eh?

:-)

ron

p.s. Yeah, I am sure there is not bayesian spam filter on this list, but
still, I betcha that last message (most appropriate name, eh?) would have
gone right through.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] tactic
  2004-03-31 14:38 ` [9fans] tactic ron minnich
@ 2004-03-31 17:02   ` Micah Stetson
  2004-04-01  3:55     ` ron minnich
  2004-03-31 21:05   ` boyd, rounin
  1 sibling, 1 reply; 19+ messages in thread
From: Micah Stetson @ 2004-03-31 17:02 UTC (permalink / raw)
  To: 9fans

> p.s. Yeah, I am sure there is not bayesian spam filter on this list, but
> still, I betcha that last message (most appropriate name, eh?) would have
> gone right through.

My bayesian filter caught it.

Micah



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] tactic
  2004-03-31 14:38 ` [9fans] tactic ron minnich
  2004-03-31 17:02   ` Micah Stetson
@ 2004-03-31 21:05   ` boyd, rounin
  2004-04-01 15:10     ` Joel Salomon
  2004-04-01 16:08     ` Dave Lukes
  1 sibling, 2 replies; 19+ messages in thread
From: boyd, rounin @ 2004-03-31 21:05 UTC (permalink / raw)
  To: 9fans

> you just can't beat those bayesian spam filters with a stick, eh?

bayesian doan work.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] tactic
  2004-03-31 17:02   ` Micah Stetson
@ 2004-04-01  3:55     ` ron minnich
  0 siblings, 0 replies; 19+ messages in thread
From: ron minnich @ 2004-04-01  3:55 UTC (permalink / raw)
  To: 9fans

On Wed, 31 Mar 2004, Micah Stetson wrote:

> My bayesian filter caught it.

Wrong again. Darn. I gotta get me one of these things.

ron



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] tactic
  2004-04-01 15:10     ` Joel Salomon
@ 2004-04-01 14:30       ` boyd, rounin
  2004-04-01 15:47         ` Jon Snader
  0 siblings, 1 reply; 19+ messages in thread
From: boyd, rounin @ 2004-04-01 14:30 UTC (permalink / raw)
  To: 9fans

> boyd, rounin said:
> >> you just can't beat those bayesian spam filters with a stick, eh?
> >
> > bayesian doan work.
> >
> >
> I've been getting spam that has significant amounts of Dissociated
> Press-style text after the sales pitch - they're trying to make bayesian
> filters break.

yes, i've seen that, hence my comment [above].



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] tactic
  2004-03-31 21:05   ` boyd, rounin
@ 2004-04-01 15:10     ` Joel Salomon
  2004-04-01 14:30       ` boyd, rounin
  2004-04-01 16:08     ` Dave Lukes
  1 sibling, 1 reply; 19+ messages in thread
From: Joel Salomon @ 2004-04-01 15:10 UTC (permalink / raw)
  To: 9fans

boyd, rounin said:
>> you just can't beat those bayesian spam filters with a stick, eh?
>
> bayesian doan work.
>
>
I've been getting spam that has significant amounts of Dissociated
Press-style text after the sales pitch - they're trying to make bayesian
filters break.

In html mail spam, some of this is in microscopic font - I needed to zoom
to 32x to make out any words.

--Joel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] tactic
  2004-04-01 14:30       ` boyd, rounin
@ 2004-04-01 15:47         ` Jon Snader
  2004-04-01 17:00           ` Dave Lukes
  0 siblings, 1 reply; 19+ messages in thread
From: Jon Snader @ 2004-04-01 15:47 UTC (permalink / raw)
  To: 9fans

On Thu, Apr 01, 2004 at 04:30:14PM +0200, boyd, rounin wrote:
> > >
> > > bayesian doan work.
> > >
> > I've been getting spam that has significant amounts of Dissociated
> > Press-style text after the sales pitch - they're trying to make bayesian
> > filters break.
> 
> yes, i've seen that, hence my comment [above].

They're working fine for me.  The filter adapted to the garbage
salad at the end pretty quickly, and now it routinely drops all
those messages in the spam trap.

jcs


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] tactic
  2004-03-31 21:05   ` boyd, rounin
  2004-04-01 15:10     ` Joel Salomon
@ 2004-04-01 16:08     ` Dave Lukes
  2004-04-01 16:46       ` George Michaelson
  2004-04-01 18:24       ` Tad Hunt
  1 sibling, 2 replies; 19+ messages in thread
From: Dave Lukes @ 2004-04-01 16:08 UTC (permalink / raw)
  To: 9fans

> bayesian doan work.
No, not in the absolute sense, but neither does anything else.

I've yet to see anything that works better than SpamAssassin
in a "real" situation (i.e. no "please confirm ..." etc.)

At the moment, we're getting slightly more spam than usual,
probably due to all the markov stew the spammers are using:
the tuning becomes somewhat more critical:
e.g. if I have insomnia and dump the spam into the processor
through the night, the intra-day spam drops off by ~~%50.

BTW, FYI:
I'm backpedalling rapidly on SPF:
it looks like it already has enough traction to  do some good ...

Also SPF will, hopefully, allow us to "sharpen" bayesian filters
by allowing through the good stuff and thus not polluting the filters so
much.

	Dave.




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] tactic
  2004-04-01 16:08     ` Dave Lukes
@ 2004-04-01 16:46       ` George Michaelson
  2004-04-01 18:24       ` Tad Hunt
  1 sibling, 0 replies; 19+ messages in thread
From: George Michaelson @ 2004-04-01 16:46 UTC (permalink / raw)
  To: 9fans

On Thu, 01 Apr 2004 17:08:07 +0100 Dave Lukes <davel@anvil.com> wrote:

>> bayesian doan work.

Bayesian koan work.

the disciple was hit by the master with a stick and was enlightened. But the

spam kept coming, he just didn't care any more.

-George


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] tactic
  2004-04-01 15:47         ` Jon Snader
@ 2004-04-01 17:00           ` Dave Lukes
  2004-04-01 17:24             ` Jon Snader
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Lukes @ 2004-04-01 17:00 UTC (permalink / raw)
  To: 9fans

> They're working fine for me.  The filter adapted to the garbage
> salad at the end pretty quickly, and now it routinely drops all
> those messages in the spam trap.

Yes, but ...
1) your database(s) just keep growing
2) you're fuzzying the line a lot:
   our well tuned spamassassin scores most stuff _very_ close
   to the "non-spam" score: a difference of .1/5 (2%) in the score
   means about another 20-30 spams getting through.
   So, as you "pollute" your filter, you increase the likelihood
   of false positives/negatives.

	Dave.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] tactic
  2004-04-01 17:00           ` Dave Lukes
@ 2004-04-01 17:24             ` Jon Snader
  0 siblings, 0 replies; 19+ messages in thread
From: Jon Snader @ 2004-04-01 17:24 UTC (permalink / raw)
  To: 9fans

On Thu, Apr 01, 2004 at 06:00:15PM +0100, Dave Lukes wrote:
> > They're working fine for me.  The filter adapted to the garbage
> > salad at the end pretty quickly, and now it routinely drops all
> > those messages in the spam trap.
> 
> Yes, but ...
> 1) your database(s) just keep growing
> 2) you're fuzzying the line a lot:
>    our well tuned spamassassin scores most stuff _very_ close
>    to the "non-spam" score: a difference of .1/5 (2%) in the score
>    means about another 20-30 spams getting through.
>    So, as you "pollute" your filter, you increase the likelihood
>    of false positives/negatives.
> 

I haven't used SpamAssassin, so I'll take your word for it.  My
use of a Bayesian filter is for my personal account, which is
relatively low volume (150-200 messages per day).  I doubt that
a Bayesian filter would work as well when it's filtering for
multiple people.

The point of my post, though, was only that Bayesian filters could
deal with the garbage salad.  They obviously aren't the perfect
solution, or even the best solution for all purposes.  They do,
however work well for me.  Maybe one or two spams slip by on a
given day, and I get virtually no false positives.  YMMV, of
course.

jcs


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] tactic
  2004-04-01 16:08     ` Dave Lukes
  2004-04-01 16:46       ` George Michaelson
@ 2004-04-01 18:24       ` Tad Hunt
  2004-04-02  0:26         ` Dave Lukes
  1 sibling, 1 reply; 19+ messages in thread
From: Tad Hunt @ 2004-04-01 18:24 UTC (permalink / raw)
  To: 9fans

In message <1080835687.17780.568.camel@zevon>, Dave Lukes said:
;I've yet to see anything that works better than SpamAssassin
;in a "real" situation (i.e. no "please confirm ..." etc.)

I find that spamassassin sucked rocks for filtering my email.

I am currently using CRM114: http://crm114.sourceforge.net/

The drawback is that you have to train it before it works very
well.  In the beginning it's about 50%, but after the first week
or two it has been well above 98% for me.  It usually miss classifies
a few messages out of about 150+/day for me.  I should note that
3/4 of my email is actually SPAM, so I'm pretty happy with it.

-Tad


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] tactic
  2004-04-01 18:24       ` Tad Hunt
@ 2004-04-02  0:26         ` Dave Lukes
  0 siblings, 0 replies; 19+ messages in thread
From: Dave Lukes @ 2004-04-02  0:26 UTC (permalink / raw)
  To: 9fans

First, don't get me wrong: I'm NOT a spamassassin fan, BUT ...
it's the best I've seen, and I've got ~80 people to keep happy.

> I find that spamassassin sucked rocks for filtering my email.

Yeah, it probably did: I've never tried it but
I'm sure it's uneconomical for small workloads.

> I am currently using CRM114: http://crm114.sourceforge.net/

Nice, but ...

> The drawback is that you have to train it before it works very
> well.  In the beginning it's about 50%, but after the first week
> or two it has been well above 98% for me.

And most of the usage I've seen has been on small,
low-volume connections.
i.e. even more than spamassassin it probably works better
in single-mbox situations.

OTOH out-of-the-box spamassassin does~~~80%,
and we've got it up to ~96% (one server for all 80 people),
which is deemed acceptable by the powers that be.

If I tuned it a bit more wrt rbls etc. & did the
per-user profile stuff I'm sure I could hit 99%.
(The only negative there is that the per-user stuff is
 vulnerable to forgery:-(.)

>   It usually miss classifies
> a few messages out of about 150+/day for me.  I should note that
> 3/4 of my email is actually SPAM, so I'm pretty happy with it.

Strangely, we're only getting ~55% crap at the moment.
Maybe 'cos I bounce (not can) spam?

Back to plan9 programming ...,
	Dave.




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [9fans] tactic
@ 2004-04-02  9:25 plan9fans
  0 siblings, 0 replies; 19+ messages in thread
From: plan9fans @ 2004-04-02  9:25 UTC (permalink / raw)
  To: 9fans

Hi,

For spam filtering I use the Plan9 auto one time certificate
system (please send me back an email with this magic string in
the subject line etc.) It does increase traffic but it works
very well.

Random thoughts on Baysian filtering:

I don't understand why none of the filters seem to have an LRU
policy to keep the database size down; perhaps they do now I am
a bit out of touch.

I am impressed with CRM114's sort-of Markov chain approach which looks
much more likely to succeed. I keep meaning to try it...

I have started to get emails consisting of text rendered into an image,
and no textural content at all. This is not as much a problem to filter
as it might appear, there are image fingerprinting techniques which
are very robust to size, resolution, and colour changes. These
fingerprints could then be used another degree of freedom in a
baysian framework.

-Steve


^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [9fans] tactic
  2004-04-01 16:13 Tiit Lankots
@ 2004-04-01 16:56 ` Dave Lukes
  0 siblings, 0 replies; 19+ messages in thread
From: Dave Lukes @ 2004-04-01 16:56 UTC (permalink / raw)
  To: 9fans

> I backed out of SpamAssassin quickly. Turned out to be a memory/cpu
> hog.

Sure is: we're only ~80 people/5000 messages per day,
and we'll probably need to throw a load of hardware at it
to keep it going, but so what?

Its still a damned sight cheaper than 80 people
manually spamfiltering.

	Dave.




^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [9fans] tactic
@ 2004-04-01 16:13 Tiit Lankots
  2004-04-01 16:56 ` Dave Lukes
  0 siblings, 1 reply; 19+ messages in thread
From: Tiit Lankots @ 2004-04-01 16:13 UTC (permalink / raw)
  To: 9fans

> I've yet to see anything that works better than SpamAssassin
> in a "real" situation (i.e. no "please confirm ..." etc.)

I backed out of SpamAssassin quickly. Turned out to be a memory/cpu
hog.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [9fans] tactic
  2004-04-01 15:25 Tiit Lankots
  2004-04-01 15:26 ` Joel Salomon
@ 2004-04-01 15:32 ` Dave Lukes
  1 sibling, 0 replies; 19+ messages in thread
From: Dave Lukes @ 2004-04-01 15:32 UTC (permalink / raw)
  To: 9fans

> My bayesian is trained to trigger on html.

Sorry: I didn't understand that:
are you saying you've weighted your bayesian filter against html,
or that you just reject all html mail before the statistical stuff?

>  I do not receive
> non-spam html mail.

You are very lucky!

	Dave.




^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [9fans] tactic
  2004-04-01 15:25 Tiit Lankots
@ 2004-04-01 15:26 ` Joel Salomon
  2004-04-01 15:32 ` Dave Lukes
  1 sibling, 0 replies; 19+ messages in thread
From: Joel Salomon @ 2004-04-01 15:26 UTC (permalink / raw)
  To: 9fans

Tiit Lankots said:
>> In html mail spam, some of this is in microscopic font - I
>> needed to zoom
>> to 32x to make out any words.
>
> My bayesian is trained to trigger on html. I do not receive
> non-spam html mail.
>

Good idea. This, however, was a spamtrap email address - I get *no*
non-spam email there.

I just got curious about any new tactics, and adding random
"conversational" text (probably snarfed off usenet) is a pretty clever one
- expect the false positive rate to go up once you've trained your filter
on a couple of these.

--Joel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [9fans] tactic
@ 2004-04-01 15:25 Tiit Lankots
  2004-04-01 15:26 ` Joel Salomon
  2004-04-01 15:32 ` Dave Lukes
  0 siblings, 2 replies; 19+ messages in thread
From: Tiit Lankots @ 2004-04-01 15:25 UTC (permalink / raw)
  To: 9fans

> In html mail spam, some of this is in microscopic font - I 
> needed to zoom
> to 32x to make out any words.

My bayesian is trained to trigger on html. I do not receive
non-spam html mail.


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2004-04-02  9:25 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <QKKPXMWZAUGMQBSVGGFUZ@guanajuato.com>
2004-03-31 14:38 ` [9fans] tactic ron minnich
2004-03-31 17:02   ` Micah Stetson
2004-04-01  3:55     ` ron minnich
2004-03-31 21:05   ` boyd, rounin
2004-04-01 15:10     ` Joel Salomon
2004-04-01 14:30       ` boyd, rounin
2004-04-01 15:47         ` Jon Snader
2004-04-01 17:00           ` Dave Lukes
2004-04-01 17:24             ` Jon Snader
2004-04-01 16:08     ` Dave Lukes
2004-04-01 16:46       ` George Michaelson
2004-04-01 18:24       ` Tad Hunt
2004-04-02  0:26         ` Dave Lukes
2004-04-01 15:25 Tiit Lankots
2004-04-01 15:26 ` Joel Salomon
2004-04-01 15:32 ` Dave Lukes
2004-04-01 16:13 Tiit Lankots
2004-04-01 16:56 ` Dave Lukes
2004-04-02  9:25 plan9fans

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).