From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id NAA15472; Sun, 4 Jan 2004 13:39:57 +0100 (MET) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id NAA15154 for ; Sun, 4 Jan 2004 13:39:56 +0100 (MET) Received: from mg.ihep.su (mg.ihep.su [194.190.161.38]) by concorde.inria.fr (8.11.1/8.11.1) with ESMTP id i04CdsH27447; Sun, 4 Jan 2004 13:39:54 +0100 (MET) Received: by mg.ihep.su (Postfix, from userid 65436) id CF222B5789; Sun, 4 Jan 2004 15:39:28 +0300 (MSK) Received: from mg.ihep.su (localhost [127.0.0.1]) by mg.ihep.su (Postfix) with SMTP id 65451B573B; Sun, 4 Jan 2004 15:39:28 +0300 (MSK) X-Mailbox-Line: From vsl@ontil.ihep.su Sun Jan 4 15:39:28 2004 Received: from ontil.ihep.su (ontil.ihep.su [194.190.161.63]) by mg.ihep.su (Postfix) with ESMTP id B37BFB5778; Sun, 4 Jan 2004 15:38:48 +0300 (MSK) Received: by ontil.ihep.su (Postfix, from userid 1001) id 672829949; Sun, 4 Jan 2004 16:40:11 +0300 (MSK) Received: from localhost (localhost [127.0.0.1]) by ontil.ihep.su (Postfix) with ESMTP id 663A298C1; Sun, 4 Jan 2004 16:40:11 +0300 (MSK) Date: Sun, 4 Jan 2004 16:40:11 +0300 (MSK) From: Vitaly Lugovsky To: Sven Luther Cc: Xavier Leroy , caml-list@inria.fr Subject: Re: [Caml-list] posting policy and spam In-Reply-To: <20040103232837.GA20552@iliana> Message-ID: References: <20040103102449.D31406@pauillac.inria.fr> <20040103232837.GA20552@iliana> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Spam-Status: No, hits=2.8 required=5.0 tests=IN_REP_TO,QL_SENT_FROM_MY_DOMAINS version=2.20 X-Spam-Level: ** X-Loop: caml-list@inria.fr X-Spam: no; 0.00; vitaly:01 lugovsky:01 ontil:01 ihep:01 caml-list:01 sven:01 luther:01 spamoracle:01 bayesian:01 entropy:01 workaround:01 engines:96 wrote:03 implement:05 vsl:06 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk On Sun, 4 Jan 2004, Sven Luther wrote: > Well, on a similar subject, is there any chance of > implementing a > workaround in spamoracle to counter those spams specifically > designed to > fool the bayesian filters ? You know, those who have 4 lines > of random > words in a text attachement, and then some html spam. It's possible to calculate an entropy of a text. If a words aren't correlated, and a correlation weights distribution is plain enough - then it's a random text without any meaning (information content). It's a way how an advanced search engines works. I'd be glad to implement this approach if I'd have some free time. :( ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners