From: clemens fischer <ino-eb2dced1@spotteswoode.de.eu.org>
Subject: Re: Email filing.
Date: Thu, 05 Sep 2002 18:00:36 +0200 [thread overview]
Message-ID: <8z2g8lzv.fsf@spotteswoode.dnsalias.org> (raw)
In-Reply-To: <oydznvjj5sv.fsf_-_@bert.cs.rice.edu>
Scott A Crosby <scrosby@cs.rice.edu> writes:
> Been done. Look at ifile. I've heard its slow though.
>
> And that although it works, it doesn't always work as nicely as you'd
> like, in that it classifies right only about 80-90% of the time, but
> that 10% will annoy you. IMHO, its only really useful when you have
> email thats uncatagorizable by any other means.
i have much better numbers with ifile. where it fails it can be
attributed to not doing MIME, but /that would slow it!/. mr. browne
has proposed a workaround, though, which just identifies the MIME
parts and/or encodings. this would make ifile be more accurate in the
typical text-group, where it is intended to block spam. then there's
always the possibility to mime-decode messages before classifying it.
my experiments show that ifile is good, both regarding accuracy and
speed, but i do use a tuned system with a procmail preprocessor. the
"recipes" don't do any classification, they throw out chinese and
spamtool generated garbage. incidentally, procmail uses most of the
time needed to categorize my email.
> It could try to identify mailing lists by noting list-headers but I
> wouldn't want to bet on perfect reliability.
it is easy to support this: i have procmail tag messages to
mailinglists with a simple "X-Mailinglist: true" header early on, and
ifile adjusts nicely, including it in its statistics.
> For spam-checking, I'm, doing an implementation of something that does
> naive bayesean, but is flexible enough to be used for this. A *very
> fast* implementation.... my benchmark right now for the statistics
> building is 5 seconds on a 35mb, 7500 message corpus. V2 should be 30%
> faster.)
sounds very impressive. is it for spam/non-spam checking only?
--
clemens
next prev parent reply other threads:[~2002-09-05 16:00 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-08-16 17:10 Paul Graham on fighting SPAM Danny Siu
2002-08-17 19:43 ` Kai Großjohann
2002-08-19 5:44 ` Paul Jarc
2002-08-19 8:53 ` Kai Großjohann
2002-08-21 1:14 ` news
2002-08-27 23:03 ` Nathan J. Williams
2002-08-19 10:50 ` Oliver Scholz
2002-08-19 11:06 ` Kai Großjohann
2002-08-19 14:55 ` Alex Schroeder
2002-08-19 17:09 ` Kai Großjohann
2002-08-19 14:12 ` Email filing Scott A Crosby
2002-09-05 16:00 ` clemens fischer [this message]
2002-12-29 22:35 ` Lars Magne Ingebrigtsen
2002-08-19 9:23 ` Paul Graham on fighting SPAM Alex Schroeder
2002-08-19 11:29 ` Ted Zlatanov
2002-08-19 15:09 ` Alex Schroeder
2002-08-19 16:23 ` Ted Zlatanov
2002-08-19 22:22 ` Alex Schroeder
2002-08-20 7:42 ` Alex Schroeder
2002-08-20 12:00 ` Ted Zlatanov
2002-08-22 2:21 ` Alex Schroeder
2002-08-22 16:32 ` Ted Zlatanov
2002-08-22 16:57 ` Ted Zlatanov
2002-08-22 17:57 ` Kai Großjohann
2002-08-22 18:42 ` Ted Zlatanov
2002-08-22 19:59 ` Alex Schroeder
2002-08-22 20:07 ` Alex Schroeder
2002-08-22 20:54 ` Ted Zlatanov
2002-08-26 21:55 ` Alex Schroeder
2002-08-26 23:19 ` Alex Schroeder
2002-08-28 6:40 ` Piers Cawley
2002-08-28 18:44 ` Alex Schroeder
2002-08-29 2:46 ` Ted Zlatanov
2002-08-19 17:09 ` Kai Großjohann
2002-08-19 22:19 ` Alex Schroeder
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8z2g8lzv.fsf@spotteswoode.dnsalias.org \
--to=ino-eb2dced1@spotteswoode.de.eu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).