caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Xavier Leroy <xavier.leroy@inria.fr>
To: "Brandon J. Van Every" <vanevery@indiegamedesign.com>
Cc: caml <caml-list@inria.fr>
Subject: Re: [Caml-list] the maddening filter
Date: Thu, 24 Jun 2004 20:26:58 +0200	[thread overview]
Message-ID: <20040624202658.A27072@pauillac.inria.fr> (raw)
In-Reply-To: <OOEALCJCKEBJBIJHCNJDAEFAHEAB.vanevery@indiegamedesign.com>; from vanevery@indiegamedesign.com on Thu, Jun 24, 2004 at 10:49:39AM -0700

> Try as I might, I cannot get a message about the Seattle ML SIG to go
> through.  I am avoiding the use of certain words on the presumption that
> I am filtered because of them.

You piqued my curiosity enough so that I looked at the "spam" box for
caml-list.  Some general facts first:

- caml-list@inria.fr is a heavily spammed address -- 41000 spams since
  Jan 1st, or about 200 spams per day.

- Filtering is entirely automatic, using the "SpamOracle" Bayesian filter.
  No human would have the fortitute to filter 200+ messages per day.

- Yes, one or two spams occasionally get through, but that's still
  99.9% accuracy.

- An alternative would be to restrict posting to list members.
  Unfortunately, many members receive the list with one e-mail address
  and post with another, and Majordomo doesn't handle this.
  We'll reconsider when we switch caml-list to Mailman at some point
  this summer.

Now, for the funny bit, here is why SpamOracle thinks that a number of
your messages are spam.  I'll disguise the tokens just to make sure.

For some reason, "Br*ndon" and "Se*ttle" have high spam probability.
You're just unlucky :-)

Moreover, you signature contains the words "entr*preneur", "certifi*d"
and "anti-vir*s", which are also strongly correlated with spam.  This
you might want to change.

You occasionally talk about "mon*y", "mark*ters", "prod*cts", and use
the adjectives "pa-id", "bri*f" and "spurio*s", all of which occur
disproportionally more in spam than in regular caml-list messages.

On the other hand, you score good "ham" points by using "ocaml" (what
else?), "excessive", "complicated" and "crappy" (score one for strong
language!). "Cheers" is also a good ham indicator.

There is no denying that all this filtering nonsense is getting crazy,
and I'm sorry it just got berserk on your messages.  But still there
is some truth to the fact that your messages are somewhat different in
wording from the training "ham" used; the Bayesian filter just notices
this...

At any rate, here is one of your messages that didn't get through, so
that everyone gets it.  I took the liberty to disguise some words just
in case :-)

- Xavier Leroy

--------------------

On Monday, June 14 we had a meeting of the (first ever?) Se*ttle OCaml
Special Interest Group. It was 3 people, the beer was good, and the
discussion was lively. I now want to broaden the miss*on statement to
include all ML'ers. I think we might get a few more people that way.

We agreed that The Stumbling Monk, a Belgian pub in Capitol Hill, was
a good venue for b*er. We also thought that meetings at roughly 3 week
intervals is the right pace. Yes that's a funny number to remember,
but 2 weeks is too quick and once a month is too slow. So we
think. Anyways, I would like to organize the next meeting for the week
of july 5th, at The Monk again. Day and time to be decided.

Please e-ma*il me if interested.

--------------------

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


  reply	other threads:[~2004-06-24 18:27 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-06-24 17:49 Brandon J. Van Every
2004-06-24 18:26 ` Xavier Leroy [this message]
2004-06-24 18:38 ` Brandon J. Van Every

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040624202658.A27072@pauillac.inria.fr \
    --to=xavier.leroy@inria.fr \
    --cc=caml-list@inria.fr \
    --cc=vanevery@indiegamedesign.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).