Announcements and discussions for Gnus, the GNU Emacs Usenet newsreader
 help / color / mirror / Atom feed
From: Ian Soboroff <org@acm.isoboroff>
Subject: Re: Bogofilter
Date: Wed, 29 Jan 2003 14:59:46 -0500	[thread overview]
Message-ID: <9cf7kcnrakt.fsf@rogue.ncsl.nist.gov> (raw)
In-Reply-To: <848yx4ctyf.fsf@lucy.is.informatik.uni-duisburg.de>

kai.grossjohann@uni-duisburg.de (Kai Großjohann) writes:

> Ian Soboroff <org@acm.isoboroff> writes:
>
>> It's nice to see such a craze over naive-Bayes filtering techniques,
>> but they can get overtrained pretty easily.
>
> Yeah.  I don't know much about automatic classification, but I seem
> to recall that naive-Bayes is not the most effective method.
>
> So are there better algorithms around and is there an implementation
> that can be integrated into Gnus, similar to ifile?

There are boatloads of text classification algorithms.  Naive Bayes is
the canonical second best solution to any problem, and has the
advantage of being fast.  Support Vector Machines are better but NB
can get quite close in some data.  SVMs are hard to update, but to be
honest an email classifier could probably be just fine retraining
overnight.

My favorite classifier tool is Andrew McCallum's BOW toolkit.  It does
NB, SVM, kNN, EM, and probably three other things I forgot about, and
has nice support for doing measurements and experiments.  I was _this_
close to writing a scoring module for Gnus based on it, when I ran
across ifile.

The _right_ thing to do is something like nnir, that is, a classifier
framework that you can plug anything into underneath.  ifile-gnus.el
is probably most of what's needed (plus a couple more functions to
easily move mail without triggering a reclassification).

Ian


  reply	other threads:[~2003-01-29 19:59 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <874r7vclji.fsf@ibook.optushome.com.au>
     [not found] ` <844r7vjc3l.fsf@lucy.is.informatik.uni-duisburg.de>
     [not found]   ` <81k7gq1z6a.fsf@shasta.cs.uiuc.edu>
     [not found]     ` <m2r8ayh7zm.fsf_-_@bluesteel.grierwhite.com>
2003-01-27 19:01       ` Bogofilter David Z Maze
     [not found]         ` <9cfvg09rx37.fsf@rogue.ncsl.nist.gov>
2003-01-29  7:10           ` Bogofilter Kai Großjohann
2003-01-29 19:59             ` Ian Soboroff [this message]
     [not found]               ` <4nlm12ehn5.fsf@lockgroove.bwh.harvard.edu>
2003-01-30 18:09                 ` Bogofilter Kai Großjohann
2003-01-31 16:46                   ` Bogofilter Ted Zlatanov
     [not found] ` <87ptqiodvb.fsf@unix.home>
2003-01-29 11:06   ` spam assassin filtering Alain Picard
2003-01-29 15:35     ` Michael Below
     [not found]       ` <87u1fr7lwn.fsf@jan.korger>
2003-01-29 21:14         ` Vasily Korytov
2003-01-29 21:52           ` Tim Haynes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9cf7kcnrakt.fsf@rogue.ncsl.nist.gov \
    --to=org@acm.isoboroff \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).