caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] the maddening filter
@ 2004-06-24 17:49 Brandon J. Van Every
  2004-06-24 18:26 ` Xavier Leroy
  2004-06-24 18:38 ` Brandon J. Van Every
  0 siblings, 2 replies; 3+ messages in thread
From: Brandon J. Van Every @ 2004-06-24 17:49 UTC (permalink / raw)
  To: caml

Try as I might, I cannot get a message about the Seattle ML SIG to go
through.  I am avoiding the use of certain words on the presumption that
I am filtered because of them.  I have two questions:

1) is there any practical way to test the filter's algorithm?

2) can't we have something else, like a "known ok posters" list?

I notice that in the time I've been trying to get a legitimate
announcement through, we've gotten some nudie enhancement product
thingy.  I do not like the algorithm's priorities.  I feel like I'm
spelling out letters to avoid a smart animal hearing them.


Cheers,                     www.indiegamedesign.com
Brandon Van Every           Seattle, WA

"The pioneer is the one with the arrows in his back."
                          - anonymous entrepreneur

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Caml-list] the maddening filter
  2004-06-24 17:49 [Caml-list] the maddening filter Brandon J. Van Every
@ 2004-06-24 18:26 ` Xavier Leroy
  2004-06-24 18:38 ` Brandon J. Van Every
  1 sibling, 0 replies; 3+ messages in thread
From: Xavier Leroy @ 2004-06-24 18:26 UTC (permalink / raw)
  To: Brandon J. Van Every; +Cc: caml

> Try as I might, I cannot get a message about the Seattle ML SIG to go
> through.  I am avoiding the use of certain words on the presumption that
> I am filtered because of them.

You piqued my curiosity enough so that I looked at the "spam" box for
caml-list.  Some general facts first:

- caml-list@inria.fr is a heavily spammed address -- 41000 spams since
  Jan 1st, or about 200 spams per day.

- Filtering is entirely automatic, using the "SpamOracle" Bayesian filter.
  No human would have the fortitute to filter 200+ messages per day.

- Yes, one or two spams occasionally get through, but that's still
  99.9% accuracy.

- An alternative would be to restrict posting to list members.
  Unfortunately, many members receive the list with one e-mail address
  and post with another, and Majordomo doesn't handle this.
  We'll reconsider when we switch caml-list to Mailman at some point
  this summer.

Now, for the funny bit, here is why SpamOracle thinks that a number of
your messages are spam.  I'll disguise the tokens just to make sure.

For some reason, "Br*ndon" and "Se*ttle" have high spam probability.
You're just unlucky :-)

Moreover, you signature contains the words "entr*preneur", "certifi*d"
and "anti-vir*s", which are also strongly correlated with spam.  This
you might want to change.

You occasionally talk about "mon*y", "mark*ters", "prod*cts", and use
the adjectives "pa-id", "bri*f" and "spurio*s", all of which occur
disproportionally more in spam than in regular caml-list messages.

On the other hand, you score good "ham" points by using "ocaml" (what
else?), "excessive", "complicated" and "crappy" (score one for strong
language!). "Cheers" is also a good ham indicator.

There is no denying that all this filtering nonsense is getting crazy,
and I'm sorry it just got berserk on your messages.  But still there
is some truth to the fact that your messages are somewhat different in
wording from the training "ham" used; the Bayesian filter just notices
this...

At any rate, here is one of your messages that didn't get through, so
that everyone gets it.  I took the liberty to disguise some words just
in case :-)

- Xavier Leroy

--------------------

On Monday, June 14 we had a meeting of the (first ever?) Se*ttle OCaml
Special Interest Group. It was 3 people, the beer was good, and the
discussion was lively. I now want to broaden the miss*on statement to
include all ML'ers. I think we might get a few more people that way.

We agreed that The Stumbling Monk, a Belgian pub in Capitol Hill, was
a good venue for b*er. We also thought that meetings at roughly 3 week
intervals is the right pace. Yes that's a funny number to remember,
but 2 weeks is too quick and once a month is too slow. So we
think. Anyways, I would like to organize the next meeting for the week
of july 5th, at The Monk again. Day and time to be decided.

Please e-ma*il me if interested.

--------------------

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: [Caml-list] the maddening filter
  2004-06-24 17:49 [Caml-list] the maddening filter Brandon J. Van Every
  2004-06-24 18:26 ` Xavier Leroy
@ 2004-06-24 18:38 ` Brandon J. Van Every
  1 sibling, 0 replies; 3+ messages in thread
From: Brandon J. Van Every @ 2004-06-24 18:38 UTC (permalink / raw)
  To: caml

Being surprised that this made it through, I'll attempt the next quanta:
- Seattle ML SIG meeting during week of July 5th, location and time TBD
- e-mail me if interested

> -----Original Message-----
> From: owner-caml-list@pauillac.inria.fr
> [mailto:owner-caml-list@pauillac.inria.fr]On Behalf Of Brandon J. Van
> Every
> Sent: Thursday, June 24, 2004 10:50 AM
> To: caml
> Subject: [Caml-list] the maddening filter
>
>
> Try as I might, I cannot get a message about the Seattle ML SIG to go
> through.  I am avoiding the use of certain words on the
> presumption that
> I am filtered because of them.  I have two questions:
>
> 1) is there any practical way to test the filter's algorithm?
>
> 2) can't we have something else, like a "known ok posters" list?
>
> I notice that in the time I've been trying to get a legitimate
> announcement through, we've gotten some nudie enhancement product
> thingy.  I do not like the algorithm's priorities.  I feel like I'm
> spelling out letters to avoid a smart animal hearing them.
>
>
> Cheers,                     www.indiegamedesign.com
> Brandon Van Every           Seattle, WA
>
> "The pioneer is the one with the arrows in his back."
>                           - anonymous entrepreneur
>
> -------------------
> To unsubscribe, mail caml-list-request@inria.fr Archives:
http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ:
http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2004-06-24 18:28 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-06-24 17:49 [Caml-list] the maddening filter Brandon J. Van Every
2004-06-24 18:26 ` Xavier Leroy
2004-06-24 18:38 ` Brandon J. Van Every

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).