From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/46185 Path: main.gmane.org!not-for-mail From: Ted Zlatanov Newsgroups: gmane.emacs.gnus.general Subject: Re: Paul Graham on fighting SPAM Date: Mon, 19 Aug 2002 12:23:51 -0400 Organization: =?koi8-r?q?=F4=C5=CF=C4=CF=D2=20=FA=CC=C1=D4=C1=CE=CF=D7?= @ Cienfuegos Sender: owner-ding@hpc.uh.edu Message-ID: References: <87d6sf42ys.fsf@emacswiki.org> <871y8u7un8.fsf@emacswiki.org> NNTP-Posting-Host: localhost.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1029774177 20661 127.0.0.1 (19 Aug 2002 16:22:57 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 19 Aug 2002 16:22:57 +0000 (UTC) Cc: ding@gnus.org Return-path: Original-Received: from malifon.math.uh.edu ([129.7.128.13]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 17gpIn-0005Mt-00 for ; Mon, 19 Aug 2002 18:22:49 +0200 Original-Received: from sina.hpc.uh.edu ([129.7.128.10] ident=lists) by malifon.math.uh.edu with esmtp (Exim 3.20 #1) id 17gpIs-0001dL-00; Mon, 19 Aug 2002 11:22:54 -0500 Original-Received: by sina.hpc.uh.edu (TLB v0.09a (1.20 tibbs 1996/10/09 22:03:07)); Mon, 19 Aug 2002 11:23:25 -0500 (CDT) Original-Received: from sclp3.sclp.com (qmailr@sclp3.sclp.com [209.196.61.66]) by sina.hpc.uh.edu (8.9.3/8.9.3) with SMTP id LAA23355 for ; Mon, 19 Aug 2002 11:23:09 -0500 (CDT) Original-Received: (qmail 22792 invoked by alias); 19 Aug 2002 16:22:31 -0000 Original-Received: (qmail 22787 invoked from network); 19 Aug 2002 16:22:31 -0000 Original-Received: from ns1.beld.net (208.229.215.81) by gnus.org with SMTP; 19 Aug 2002 16:22:31 -0000 Original-Received: from heechee.beld.net (dhcp-0-50-8b-df-51-5e.cpe.beld.net [65.202.179.253]) by ns1.beld.net (Postfix) with ESMTP id 088FB3B9E7; Mon, 19 Aug 2002 12:21:21 -0400 (EDT) Original-To: Alex Schroeder X-Face: bd.DQ~'29fIs`T_%O%C\g%6jW)yi[zuz6;d4V0`@y-~$#3P_Ng{@m+e4o<4P'#(_GJQ%TT= D}[Ep*b!\e,fBZ'j_+#"Ps?s2!4H2-Y"sx" Mail-Followup-To: Alex Schroeder , ding@gnus.org In-Reply-To: <871y8u7un8.fsf@emacswiki.org> (Alex Schroeder's message of "Mon, 19 Aug 2002 17:09:15 +0200") Original-Lines: 45 User-Agent: Gnus/5.090008 (Oort Gnus v0.08) Emacs/21.2 (i386-redhat-linux-gnu) Precedence: list X-Majordomo: 1.94.jlt7 Xref: main.gmane.org gmane.emacs.gnus.general:46185 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:46185 On Mon, 19 Aug 2002, alex@emacswiki.org wrote: > Ted Zlatanov writes: > >> Do you want to integrate it with the current spam.el contents? You >> just need to add a function that uses spam-stats.el, that can be >> invoked on a message buffer to return t or a number if spam is >> detected, and nil otherwise. See the spam-split function, it >> already invokes the blackholes and whitelist/blacklist checks, and >> would invoke your function as well. I have to write the code to >> make those checks user-selectable via some symbols, but that's a >> separate thing. > > I would not mind adding the glue -- if people would use it. > Remember that you have to create that dictionary, first! So this is > not something that just works out of the box -- unless we distribute > such a dictionary from the web. Maybe not a bad idea: > > ~% cat .spam-stat.el | gzip - | wc --bytes > 192390 > > Did you use spam-stat.el to create a dictionary for yourself to test > it with? I'd be interested in hearing about it. I haven't had the chance to use spam-stat.el myself (I'm job-hunting at the moment). The code looks good, though. I would suggest that users can build a dictionary based on messages they mark as spam. Gnus supports a spam mark; another direction I wanted to take spam.el besides splitting rules is to do summary exit hooks where spam gets processed somehow (blacklisted, submitted to a spam detection center, etc.) It would seem that the ideal place to build the dictionary, therefore, is on a summary exit hook. The list of articles marked as spam is available (gnus-summary-spam-marked) so all you need to give me is a function that, given a message we know is spam, can increment the user dictionary. I'll write the function that applies that function to the list of spam messages, unless you feel like doing that part too :) Letting the user determine what's spam is probably the best strategy. I imagine some corporate memos, for instance, could get classified as spam with a default dictionary. Thanks Ted