Gnus development mailing list
 help / color / mirror / Atom feed
From: clemens fischer <ino-eb2dced1@spotteswoode.de.eu.org>
Subject: Re: Email filing.
Date: Thu, 05 Sep 2002 18:00:36 +0200	[thread overview]
Message-ID: <8z2g8lzv.fsf@spotteswoode.dnsalias.org> (raw)
In-Reply-To: <oydznvjj5sv.fsf_-_@bert.cs.rice.edu>

Scott A Crosby <scrosby@cs.rice.edu> writes:

> Been done. Look at ifile. I've heard its slow though. 
>
> And that although it works, it doesn't always work as nicely as you'd
> like, in that it classifies right only about 80-90% of the time, but
> that 10% will annoy you. IMHO, its only really useful when you have
> email thats uncatagorizable by any other means.

i have much better numbers with ifile.  where it fails it can be
attributed to not doing MIME, but /that would slow it!/.  mr. browne
has proposed a workaround, though, which just identifies the MIME
parts and/or encodings.  this would make ifile be more accurate in the
typical text-group, where it is intended to block spam.  then there's
always the possibility to mime-decode messages before classifying it.

my experiments show that ifile is good, both regarding accuracy and
speed, but i do use a tuned system with a procmail preprocessor.  the
"recipes" don't do any classification, they throw out chinese and
spamtool generated garbage.  incidentally, procmail uses most of the
time needed to categorize my email.

> It could try to identify mailing lists by noting list-headers but I
> wouldn't want to bet on perfect reliability.

it is easy to support this:  i have procmail tag messages to
mailinglists with a simple "X-Mailinglist: true" header early on, and
ifile adjusts nicely, including it in its statistics.

> For spam-checking, I'm, doing an implementation of something that does
> naive bayesean, but is flexible enough to be used for this. A *very
> fast* implementation.... my benchmark right now for the statistics
> building is 5 seconds on a 35mb, 7500 message corpus. V2 should be 30%
> faster.)

sounds very impressive.  is it for spam/non-spam checking only?

-- 
clemens





  reply	other threads:[~2002-09-05 16:00 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-08-16 17:10 Paul Graham on fighting SPAM Danny Siu
2002-08-17 19:43 ` Kai Großjohann
2002-08-19  5:44   ` Paul Jarc
2002-08-19  8:53     ` Kai Großjohann
2002-08-21  1:14       ` news
2002-08-27 23:03         ` Nathan J. Williams
2002-08-19 10:50     ` Oliver Scholz
2002-08-19 11:06       ` Kai Großjohann
2002-08-19 14:55         ` Alex Schroeder
2002-08-19 17:09           ` Kai Großjohann
2002-08-19 14:12     ` Email filing Scott A Crosby
2002-09-05 16:00       ` clemens fischer [this message]
2002-12-29 22:35         ` Lars Magne Ingebrigtsen
2002-08-19  9:23 ` Paul Graham on fighting SPAM Alex Schroeder
2002-08-19 11:29   ` Ted Zlatanov
2002-08-19 15:09     ` Alex Schroeder
2002-08-19 16:23       ` Ted Zlatanov
2002-08-19 22:22         ` Alex Schroeder
2002-08-20  7:42           ` Alex Schroeder
2002-08-20 12:00             ` Ted Zlatanov
2002-08-22  2:21               ` Alex Schroeder
2002-08-22 16:32                 ` Ted Zlatanov
2002-08-22 16:57                   ` Ted Zlatanov
2002-08-22 17:57                     ` Kai Großjohann
2002-08-22 18:42                       ` Ted Zlatanov
2002-08-22 19:59                       ` Alex Schroeder
2002-08-22 20:07                     ` Alex Schroeder
2002-08-22 20:54                       ` Ted Zlatanov
2002-08-26 21:55               ` Alex Schroeder
2002-08-26 23:19                 ` Alex Schroeder
2002-08-28  6:40                 ` Piers Cawley
2002-08-28 18:44                   ` Alex Schroeder
2002-08-29  2:46                 ` Ted Zlatanov
2002-08-19 17:09       ` Kai Großjohann
2002-08-19 22:19         ` Alex Schroeder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8z2g8lzv.fsf@spotteswoode.dnsalias.org \
    --to=ino-eb2dced1@spotteswoode.de.eu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).