Gnus development mailing list
 help / color / mirror / Atom feed
From: pinard@iro.umontreal.ca (François Pinard)
Cc: Forum of ding/Gnus users <ding@gnus.org>
Subject: Re: Using Eric Raymond's bogofilter tool within Gnus
Date: Tue, 03 Sep 2002 09:46:00 -0400	[thread overview]
Message-ID: <oqfzwr6vaf.fsf@titan.progiciels-bpi.ca> (raw)
In-Reply-To: <m3lm6j4f4d.fsf@merlin.emma.line.org> (Matthias Andree's message of "Tue, 03 Sep 2002 11:05:54 +0200")

[Matthias Andree]

> pinard@iro.umontreal.ca (François Pinard) writes:
>
>> Some of you might be aware of the speedy Graham filter written by Eric
>> Raymond last week.  [...]

> Sorry to be intrusive, but it looks as though "bogofilter" does not
> quite work for me, particularly, the -N option does not work (at least
> not in 0.6),

Give Eric a chance.  The whole project started around two weeks ago, and many
editions brought major overhauls within his code.  Things will stabilise.

For version 0.6, I use `-v', `-n' and `-s' with no serious problems, but
always with `-F' to avoid the split between a client and a server.

> and I recently got a lot of false positives although I I fed 2,000 non-spam
> mails to bogofilter -n and only one spam-mail to bogofilter -s.

People taking this seriously train Graham filters in batch, with corpora
holding thousands of messages, both ham and spam.  I'm happy having results
with on the fly training within Gnus with only a few hundreds of both ham and
spam.  I would expect complete non-sense unless you have at the very least a
few dozens of messages in each category.

> However, there are at least two competing projects that a "Bayesian" search
> on freshmeat dug up, but I have not yet had the time to look at them.

If you do, please share your impression with us! :-)

> From what it looks, your script could easily also support spamprobe, it's
> similar to bogofilter in use, only that it uses cleartext operation mode
> specifiers rather than options as -n or -s (as bogofilter does).

> 1. spamprobe  http://sourceforge.net/projects/spamprobe/
>               uses GNU gdbm

The maintainer of `spamprobe' wrote (I've been told so, I did not read him
directly) that he was not very satisfied with GNU gdbm performance in this
context, and thought about abandoning this approach.

> 2. bayespam   http://www.garyarnold.com/projects.php
>               [...] but looks targetted at qmail

`qmail'?  Given the choice, I would stay away from Daniel Bernstein works.  No
doubt that he is very competent, the problem is not there.  I saw him relate
with others, and I think they are surely not free having to suffer such a
haughtiness.  Yet, for one, I never had the slightest problem with Daniel so
far.  As my feelings about free software are all mixed and blurred with those
of pleasure, collaboration and friendship, `qmail' is not free software. :-)

Let me thank you for the two references above.  Here are other references I
have on Bayes filtering.  I did not look at the last three.

. @ http://www.paulgraham.com/spam.html
. @ http://www.ai.mit.edu/~jrennie/ifile/
. @ http://www.ai.mit.edu/~jhbrown/ifile-gnus.html
. @ http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/nondist/sandbox/spambayes/
. @ http://research.microsoft.com/~jplatt/cikm98.pdf
. : CRM114 on Sourceforge
. @ http://citeseer.nj.nec.com/blum98combining.html

-- 
François Pinard   http://www.iro.umontreal.ca/~pinard



  reply	other threads:[~2002-09-03 13:46 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-09-01  3:16 François Pinard
2002-09-03  9:05 ` Matthias Andree
2002-09-03 13:46   ` François Pinard [this message]
2002-09-03 14:13     ` Jeremy H. Brown
2002-09-03 15:01       ` Kai Großjohann
2002-09-10  0:33         ` Jeremy H. Brown
2002-09-11 10:47     ` Matthias Andree

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=oqfzwr6vaf.fsf@titan.progiciels-bpi.ca \
    --to=pinard@iro.umontreal.ca \
    --cc=ding@gnus.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).