From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/46185
Path: main.gmane.org!not-for-mail
From: Ted Zlatanov <tzz@lifelogs.com>
Newsgroups: gmane.emacs.gnus.general
Subject: Re: Paul Graham on fighting SPAM
Date: Mon, 19 Aug 2002 12:23:51 -0400
Organization: =?koi8-r?q?=F4=C5=CF=C4=CF=D2=20=FA=CC=C1=D4=C1=CE=CF=D7?= @
 Cienfuegos
Sender: owner-ding@hpc.uh.edu
Message-ID: <m3d6seddgo.fsf@heechee.beld.net>
References: <uy9b6spad.fsf@adobe.com> <87d6sf42ys.fsf@emacswiki.org>
	<m3lm73ccif.fsf@heechee.beld.net> <871y8u7un8.fsf@emacswiki.org>
NNTP-Posting-Host: localhost.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: main.gmane.org 1029774177 20661 127.0.0.1 (19 Aug 2002 16:22:57 GMT)
X-Complaints-To: usenet@main.gmane.org
NNTP-Posting-Date: Mon, 19 Aug 2002 16:22:57 +0000 (UTC)
Cc: ding@gnus.org
Return-path: <owner-ding@hpc.uh.edu>
Original-Received: from malifon.math.uh.edu ([129.7.128.13])
	by main.gmane.org with esmtp (Exim 3.35 #1 (Debian))
	id 17gpIn-0005Mt-00
	for <ding-account@gmane.org>; Mon, 19 Aug 2002 18:22:49 +0200
Original-Received: from sina.hpc.uh.edu ([129.7.128.10] ident=lists)
	by malifon.math.uh.edu with esmtp (Exim 3.20 #1)
	id 17gpIs-0001dL-00; Mon, 19 Aug 2002 11:22:54 -0500
Original-Received: by sina.hpc.uh.edu (TLB v0.09a (1.20 tibbs 1996/10/09 22:03:07)); Mon, 19 Aug 2002 11:23:25 -0500 (CDT)
Original-Received: from sclp3.sclp.com (qmailr@sclp3.sclp.com [209.196.61.66])
	by sina.hpc.uh.edu (8.9.3/8.9.3) with SMTP id LAA23355
	for <ding@hpc.uh.edu>; Mon, 19 Aug 2002 11:23:09 -0500 (CDT)
Original-Received: (qmail 22792 invoked by alias); 19 Aug 2002 16:22:31 -0000
Original-Received: (qmail 22787 invoked from network); 19 Aug 2002 16:22:31 -0000
Original-Received: from ns1.beld.net (208.229.215.81)
  by gnus.org with SMTP; 19 Aug 2002 16:22:31 -0000
Original-Received: from heechee.beld.net (dhcp-0-50-8b-df-51-5e.cpe.beld.net [65.202.179.253])
	by ns1.beld.net (Postfix) with ESMTP
	id 088FB3B9E7; Mon, 19 Aug 2002 12:21:21 -0400 (EDT)
Original-To: Alex Schroeder <alex@emacswiki.org>
X-Face: bd.DQ~'29fIs`T_%O%C\g%6jW)yi[zuz6;d4V0`@y-~$#3P_Ng{@m+e4o<4P'#(_GJQ%TT= D}[Ep*b!\e,fBZ'j_+#"Ps?s2!4H2-Y"sx"
Mail-Followup-To: Alex Schroeder <alex@emacswiki.org>, ding@gnus.org
In-Reply-To: <871y8u7un8.fsf@emacswiki.org> (Alex Schroeder's message of
 "Mon, 19 Aug 2002 17:09:15 +0200")
Original-Lines: 45
User-Agent: Gnus/5.090008 (Oort Gnus v0.08) Emacs/21.2
 (i386-redhat-linux-gnu)
Precedence: list
X-Majordomo: 1.94.jlt7
Xref: main.gmane.org gmane.emacs.gnus.general:46185
X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:46185

On Mon, 19 Aug 2002, alex@emacswiki.org wrote:
> Ted Zlatanov <tzz@lifelogs.com> writes:
> 
>> Do you want to integrate it with the current spam.el contents?  You
>> just need to add a function that uses spam-stats.el, that can be
>> invoked on a message buffer to return t or a number if spam is
>> detected, and nil otherwise.  See the spam-split function, it
>> already invokes the blackholes and whitelist/blacklist checks, and
>> would invoke your function as well.  I have to write the code to
>> make those checks user-selectable via some symbols, but that's a
>> separate thing.
> 
> I would not mind adding the glue -- if people would use it.
> Remember that you have to create that dictionary, first!  So this is
> not something that just works out of the box -- unless we distribute
> such a dictionary from the web.  Maybe not a bad idea:
> 
> ~% cat .spam-stat.el | gzip - | wc --bytes
>  192390
> 
> Did you use spam-stat.el to create a dictionary for yourself to test
> it with?  I'd be interested in hearing about it.

I haven't had the chance to use spam-stat.el myself (I'm job-hunting
at the moment).  The code looks good, though.

I would suggest that users can build a dictionary based on messages
they mark as spam.  Gnus supports a spam mark; another direction I
wanted to take spam.el besides splitting rules is to do summary exit
hooks where spam gets processed somehow (blacklisted, submitted to a
spam detection center, etc.)

It would seem that the ideal place to build the dictionary, therefore,
is on a summary exit hook.  The list of articles marked as spam is
available (gnus-summary-spam-marked) so all you need to give me is a
function that, given a message we know is spam, can increment the user
dictionary.  I'll write the function that applies that function to the
list of spam messages, unless you feel like doing that part too :)

Letting the user determine what's spam is probably the best strategy.
I imagine some corporate memos, for instance, could get classified as
spam with a default dictionary.

Thanks
Ted