From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.user/2005 Path: news.gmane.org!not-for-mail From: Ian Soboroff Newsgroups: gmane.emacs.gnus.user Subject: Re: Bogofilter Date: Wed, 29 Jan 2003 14:59:46 -0500 Organization: University of Maryland, Baltimore County Message-ID: <9cf7kcnrakt.fsf@rogue.ncsl.nist.gov> References: <874r7vclji.fsf@ibook.optushome.com.au> <844r7vjc3l.fsf@lucy.is.informatik.uni-duisburg.de> <81k7gq1z6a.fsf@shasta.cs.uiuc.edu> <9cfvg09rx37.fsf@rogue.ncsl.nist.gov> <848yx4ctyf.fsf@lucy.is.informatik.uni-duisburg.de> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Trace: sea.gmane.org 1138668590 13499 80.91.229.2 (31 Jan 2006 00:49:50 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 31 Jan 2006 00:49:50 +0000 (UTC) Original-X-From: nobody Tue Jan 17 17:30:02 2006 Original-Path: quimby.gnus.org!lackawana.kippona.com!news.infoave.net!newsfeed1.easynews.com!easynews.com!easynews!feed.news.qwest.net!news.ums.edu!news.umbc.edu!not-for-mail Original-Newsgroups: gnu.emacs.gnus Original-NNTP-Posting-Host: rogue.ncsl.nist.gov Original-X-Trace: news.umbc.edu 1043870387 21513 129.6.101.41 (29 Jan 2003 19:59:47 GMT) Original-X-Complaints-To: abuse@umbc.edu Original-NNTP-Posting-Date: Wed, 29 Jan 2003 19:59:47 +0000 (UTC) User-Agent: Gnus/5.090007 (Oort Gnus v0.07) Emacs/21.2 (i686-pc-linux-gnu) Cancel-Lock: sha1:2BMNKvfQzETTeKW9Y+W7STrKDXo= Original-Xref: bridgekeeper.physik.uni-ulm.de gnus-emacs-gnus:2145 Original-Lines: 32 X-Gnus-Article-Number: 2145 Tue Jan 17 17:30:02 2006 Xref: news.gmane.org gmane.emacs.gnus.user:2005 Archived-At: kai.grossjohann@uni-duisburg.de (Kai Großjohann) writes: > Ian Soboroff writes: > >> It's nice to see such a craze over naive-Bayes filtering techniques, >> but they can get overtrained pretty easily. > > Yeah. I don't know much about automatic classification, but I seem > to recall that naive-Bayes is not the most effective method. > > So are there better algorithms around and is there an implementation > that can be integrated into Gnus, similar to ifile? There are boatloads of text classification algorithms. Naive Bayes is the canonical second best solution to any problem, and has the advantage of being fast. Support Vector Machines are better but NB can get quite close in some data. SVMs are hard to update, but to be honest an email classifier could probably be just fine retraining overnight. My favorite classifier tool is Andrew McCallum's BOW toolkit. It does NB, SVM, kNN, EM, and probably three other things I forgot about, and has nice support for doing measurements and experiments. I was _this_ close to writing a scoring module for Gnus based on it, when I ran across ifile. The _right_ thing to do is something like nnir, that is, a classifier framework that you can plug anything into underneath. ifile-gnus.el is probably most of what's needed (plus a couple more functions to easily move mail without triggering a reclassification). Ian