From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/46170 Path: main.gmane.org!not-for-mail From: Alex Schroeder Newsgroups: gmane.emacs.gnus.general Subject: Re: Paul Graham on fighting SPAM Date: Mon, 19 Aug 2002 11:23:07 +0200 Sender: owner-ding@hpc.uh.edu Message-ID: <87d6sf42ys.fsf@emacswiki.org> References: NNTP-Posting-Host: localhost.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1029748893 24397 127.0.0.1 (19 Aug 2002 09:21:33 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 19 Aug 2002 09:21:33 +0000 (UTC) Return-path: Original-Received: from malifon.math.uh.edu ([129.7.128.13]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 17gij4-0006L2-00 for ; Mon, 19 Aug 2002 11:21:30 +0200 Original-Received: from sina.hpc.uh.edu ([129.7.128.10] ident=lists) by malifon.math.uh.edu with esmtp (Exim 3.20 #1) id 17gijV-00079W-00; Mon, 19 Aug 2002 04:21:57 -0500 Original-Received: by sina.hpc.uh.edu (TLB v0.09a (1.20 tibbs 1996/10/09 22:03:07)); Mon, 19 Aug 2002 04:22:28 -0500 (CDT) Original-Received: from sclp3.sclp.com (qmailr@sclp3.sclp.com [209.196.61.66]) by sina.hpc.uh.edu (8.9.3/8.9.3) with SMTP id EAA22456 for ; Mon, 19 Aug 2002 04:22:16 -0500 (CDT) Original-Received: (qmail 7048 invoked by alias); 19 Aug 2002 09:21:41 -0000 Original-Received: (qmail 7043 invoked from network); 19 Aug 2002 09:21:40 -0000 Original-Received: from quimby.gnus.org (80.91.224.244) by gnus.org with SMTP; 19 Aug 2002 09:21:40 -0000 Original-Received: from news by quimby.gnus.org with local (Exim 3.12 #1 (Debian)) id 17gj9T-0006o4-00 for ; Mon, 19 Aug 2002 11:48:47 +0200 Original-To: ding@gnus.org Original-Path: not-for-mail Original-Newsgroups: gnus.ding Original-Lines: 23 Original-NNTP-Posting-Host: dclient217-162-239-43.hispeed.ch Original-X-Trace: quimby.gnus.org 1029750527 24025 217.162.239.43 (19 Aug 2002 09:48:47 GMT) Original-X-Complaints-To: usenet@quimby.gnus.org Original-NNTP-Posting-Date: 19 Aug 2002 09:48:47 GMT X-Face: ^BC$`[IcggstLPyen&dqF+b2'zyK#r.mU*'Nms}@&4zw%SJ#5!/7SMVjBS7'lb;QK)|IPU5U'o1'522W4TyzB3Ab*IBo^iw]l4|kUbdZuUDO6=Um-.4IzhNiV'B"@K#jy_(wW|Zbk[34flKY^|PrQ?$u2\fKg^]AY>wOX#H32i User-Agent: Gnus/5.090006 (Oort Gnus v0.06) Emacs/21.2.90 (i686-pc-linux-gnu) Cancel-Lock: sha1:WjB/R8W9YnRFhxIF2LfNtWUhjZ0= Precedence: list X-Majordomo: 1.94.jlt7 Xref: main.gmane.org gmane.emacs.gnus.general:46170 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:46170 Danny Siu writes: > since we had much discussion on spams lately, it is worthwhile to see read > about what lisp guru thinks the content based filters can effectively kill spams. > > I posted some code to g.e.sources to implement the basics. If anybody feels like fooling around with it, I'd be happy to read about it. There's also a comment by Kai on it in g.e.help. * http://www.emacswiki.org/cgi-bin/wiki.pl?SpamStat Things I'd like to see: Efficient storage and retrieval of the data from disk. Based on 3351 mails, 298 of them being spam, I got a dictionary of 650k; preparing it used an intermediary file of 7m. Once saving is fast, I'd like to update the stats as we go along to avoid the long preparation times. Updating the stats requires the original 7m of data, however. So before delving into all of this, I'd prefer to see wether it works, see what other people think, collect some ideas and patches... Alex.