From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id UAA30612; Thu, 24 Jun 2004 20:27:36 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id UAA30850 for ; Thu, 24 Jun 2004 20:27:33 +0200 (MET DST) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by nez-perce.inria.fr (8.12.10/8.12.10) with ESMTP id i5OIR1EV025751; Thu, 24 Jun 2004 20:27:01 +0200 Received: (from xleroy@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id UAA04903; Thu, 24 Jun 2004 20:26:59 +0200 (MET DST) Date: Thu, 24 Jun 2004 20:26:58 +0200 From: Xavier Leroy To: "Brandon J. Van Every" Cc: caml Subject: Re: [Caml-list] the maddening filter Message-ID: <20040624202658.A27072@pauillac.inria.fr> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: ; from vanevery@indiegamedesign.com on Thu, Jun 24, 2004 at 10:49:39AM -0700 X-Miltered: at nez-perce with ID 40DB1CF5.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Loop: caml-list@inria.fr X-Spam: no; 0.00; caml-list:01 seattle:99 avoiding:01 caml-list:01 spamoracle:01 bayesian:01 majordomo:99 spamoracle:01 unlucky:01 bri:99 ham:99 crappy:01 ham:99 bayesian:01 lively:01 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk > Try as I might, I cannot get a message about the Seattle ML SIG to go > through. I am avoiding the use of certain words on the presumption that > I am filtered because of them. You piqued my curiosity enough so that I looked at the "spam" box for caml-list. Some general facts first: - caml-list@inria.fr is a heavily spammed address -- 41000 spams since Jan 1st, or about 200 spams per day. - Filtering is entirely automatic, using the "SpamOracle" Bayesian filter. No human would have the fortitute to filter 200+ messages per day. - Yes, one or two spams occasionally get through, but that's still 99.9% accuracy. - An alternative would be to restrict posting to list members. Unfortunately, many members receive the list with one e-mail address and post with another, and Majordomo doesn't handle this. We'll reconsider when we switch caml-list to Mailman at some point this summer. Now, for the funny bit, here is why SpamOracle thinks that a number of your messages are spam. I'll disguise the tokens just to make sure. For some reason, "Br*ndon" and "Se*ttle" have high spam probability. You're just unlucky :-) Moreover, you signature contains the words "entr*preneur", "certifi*d" and "anti-vir*s", which are also strongly correlated with spam. This you might want to change. You occasionally talk about "mon*y", "mark*ters", "prod*cts", and use the adjectives "pa-id", "bri*f" and "spurio*s", all of which occur disproportionally more in spam than in regular caml-list messages. On the other hand, you score good "ham" points by using "ocaml" (what else?), "excessive", "complicated" and "crappy" (score one for strong language!). "Cheers" is also a good ham indicator. There is no denying that all this filtering nonsense is getting crazy, and I'm sorry it just got berserk on your messages. But still there is some truth to the fact that your messages are somewhat different in wording from the training "ham" used; the Bayesian filter just notices this... At any rate, here is one of your messages that didn't get through, so that everyone gets it. I took the liberty to disguise some words just in case :-) - Xavier Leroy -------------------- On Monday, June 14 we had a meeting of the (first ever?) Se*ttle OCaml Special Interest Group. It was 3 people, the beer was good, and the discussion was lively. I now want to broaden the miss*on statement to include all ML'ers. I think we might get a few more people that way. We agreed that The Stumbling Monk, a Belgian pub in Capitol Hill, was a good venue for b*er. We also thought that meetings at roughly 3 week intervals is the right pace. Yes that's a funny number to remember, but 2 weeks is too quick and once a month is too slow. So we think. Anyways, I would like to organize the next meeting for the week of july 5th, at The Monk again. Day and time to be decided. Please e-ma*il me if interested. -------------------- ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners