From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/46359 Path: main.gmane.org!not-for-mail From: pinard@iro.umontreal.ca (=?iso-8859-1?q?Fran=E7ois?= Pinard) Newsgroups: gmane.emacs.gnus.general Subject: Re: Using Eric Raymond's bogofilter tool within Gnus Date: Tue, 03 Sep 2002 09:46:00 -0400 Sender: owner-ding@hpc.uh.edu Message-ID: References: NNTP-Posting-Host: localhost.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Transfer-Encoding: quoted-printable X-Trace: main.gmane.org 1031060946 11833 127.0.0.1 (3 Sep 2002 13:49:06 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Tue, 3 Sep 2002 13:49:06 +0000 (UTC) Cc: Forum of ding/Gnus users Return-path: Original-Received: from malifon.math.uh.edu ([129.7.128.13]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 17mE3F-00034k-00 for ; Tue, 03 Sep 2002 15:49:05 +0200 Original-Received: from sina.hpc.uh.edu ([129.7.128.10] ident=lists) by malifon.math.uh.edu with esmtp (Exim 3.20 #1) id 17mE49-0000Ov-00; Tue, 03 Sep 2002 08:50:01 -0500 Original-Received: by sina.hpc.uh.edu (TLB v0.09a (1.20 tibbs 1996/10/09 22:03:07)); Tue, 03 Sep 2002 08:50:36 -0500 (CDT) Original-Received: from sclp3.sclp.com (qmailr@sclp3.sclp.com [209.196.61.66]) by sina.hpc.uh.edu (8.9.3/8.9.3) with SMTP id IAA25435 for ; Tue, 3 Sep 2002 08:50:20 -0500 (CDT) Original-Received: (qmail 28460 invoked by alias); 3 Sep 2002 13:49:40 -0000 Original-Received: (qmail 28455 invoked from network); 3 Sep 2002 13:49:40 -0000 Original-Received: from jaseur.sram.qc.ca (postfix@207.35.30.8) by gnus.org with SMTP; 3 Sep 2002 13:49:40 -0000 Original-Received: from titan.progiciels-bpi.ca (maison.sram.qc.ca [207.35.30.203]) by jaseur.sram.qc.ca (Postfix on SuSE Linux 7.2 (i386)) with ESMTP id ADF3636A29; Tue, 3 Sep 2002 15:48:01 +0200 (CEST) Original-Received: from localhost (localhost [127.0.0.1]) by titan.progiciels-bpi.ca (Postfix on SuSE Linux 8.0 (i386)) with ESMTP id 154973006B; Tue, 3 Sep 2002 09:46:11 -0400 (EDT) Original-Received: by titan.progiciels-bpi.ca (Postfix on SuSE Linux 8.0 (i386), from userid 405) id 695FB2F38D; Tue, 3 Sep 2002 09:46:01 -0400 (EDT) Original-To: Matthias Andree X-Face: "b_m|CE6#'Q8fliQrwHl9K,]PA_o'*S~Dva{~b1n*)K*A(BIwQW.:LY?t4~xhYka_.LV?Qq `}X|71X0ea&H]9Dsk!`kxBXlG;q$mLfv_vtaHK_rHFKu]4'<*LWCyUe@ZcI6"*wB5M@[m (Matthias Andree's message of "Tue, 03 Sep 2002 11:05:54 +0200") Original-Lines: 65 User-Agent: Gnus/5.090008 (Oort Gnus v0.08) Emacs/21.2 (i586-pc-linux-gnu) X-Virus-Scanned: by AMaViS 0.3.12pre5 Precedence: list X-Majordomo: 1.94.jlt7 Xref: main.gmane.org gmane.emacs.gnus.general:46359 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:46359 [Matthias Andree] > pinard@iro.umontreal.ca (Fran=E7ois Pinard) writes: > >> Some of you might be aware of the speedy Graham filter written by Eric >> Raymond last week. [...] > Sorry to be intrusive, but it looks as though "bogofilter" does not > quite work for me, particularly, the -N option does not work (at least > not in 0.6), Give Eric a chance. The whole project started around two weeks ago, and ma= ny editions brought major overhauls within his code. Things will stabilise. For version 0.6, I use `-v', `-n' and `-s' with no serious problems, but always with `-F' to avoid the split between a client and a server. > and I recently got a lot of false positives although I I fed 2,000 non-sp= am > mails to bogofilter -n and only one spam-mail to bogofilter -s. People taking this seriously train Graham filters in batch, with corpora holding thousands of messages, both ham and spam. I'm happy having results with on the fly training within Gnus with only a few hundreds of both ham a= nd spam. I would expect complete non-sense unless you have at the very least a few dozens of messages in each category. > However, there are at least two competing projects that a "Bayesian" sear= ch > on freshmeat dug up, but I have not yet had the time to look at them. If you do, please share your impression with us! :-) > From what it looks, your script could easily also support spamprobe, it's > similar to bogofilter in use, only that it uses cleartext operation mode > specifiers rather than options as -n or -s (as bogofilter does). > 1. spamprobe http://sourceforge.net/projects/spamprobe/ > uses GNU gdbm The maintainer of `spamprobe' wrote (I've been told so, I did not read him directly) that he was not very satisfied with GNU gdbm performance in this context, and thought about abandoning this approach. > 2. bayespam http://www.garyarnold.com/projects.php > [...] but looks targetted at qmail `qmail'? Given the choice, I would stay away from Daniel Bernstein works. = No doubt that he is very competent, the problem is not there. I saw him relate with others, and I think they are surely not free having to suffer such a haughtiness. Yet, for one, I never had the slightest problem with Daniel so far. As my feelings about free software are all mixed and blurred with tho= se of pleasure, collaboration and friendship, `qmail' is not free software. :-) Let me thank you for the two references above. Here are other references I have on Bayes filtering. I did not look at the last three. . @ http://www.paulgraham.com/spam.html . @ http://www.ai.mit.edu/~jrennie/ifile/ . @ http://www.ai.mit.edu/~jhbrown/ifile-gnus.html . @ http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/nondist/sa= ndbox/spambayes/ . @ http://research.microsoft.com/~jplatt/cikm98.pdf . : CRM114 on Sourceforge . @ http://citeseer.nj.nec.com/blum98combining.html --=20 Fran=E7ois Pinard http://www.iro.umontreal.ca/~pinard