From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/46289 Path: main.gmane.org!not-for-mail From: Piers Cawley Newsgroups: gmane.emacs.gnus.general Subject: Re: Paul Graham on fighting SPAM Date: 28 Aug 2002 07:40:52 +0100 Sender: owner-ding@hpc.uh.edu Message-ID: <84znv7o58r.fsf@despairon.bofh.org.uk> References: <87d6sf42ys.fsf@emacswiki.org> <871y8u7un8.fsf@emacswiki.org> <87fzxa7ala.fsf@emacswiki.org> <87d6se9dsy.fsf@emacswiki.org> <87wuqd5lp9.fsf@emacswiki.org> NNTP-Posting-Host: localhost.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1030521673 25869 127.0.0.1 (28 Aug 2002 08:01:13 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Wed, 28 Aug 2002 08:01:13 +0000 (UTC) Cc: ding@gnus.org Return-path: Original-Received: from malifon.math.uh.edu ([129.7.128.13]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 17jxlH-0006ig-00 for ; Wed, 28 Aug 2002 10:01:11 +0200 Original-Received: from sina.hpc.uh.edu ([129.7.128.10] ident=lists) by malifon.math.uh.edu with esmtp (Exim 3.20 #1) id 17jxkx-0005F0-00; Wed, 28 Aug 2002 03:00:51 -0500 Original-Received: by sina.hpc.uh.edu (TLB v0.09a (1.20 tibbs 1996/10/09 22:03:07)); Wed, 28 Aug 2002 03:01:24 -0500 (CDT) Original-Received: from sclp3.sclp.com (qmailr@sclp3.sclp.com [209.196.61.66]) by sina.hpc.uh.edu (8.9.3/8.9.3) with SMTP id DAA12529 for ; Wed, 28 Aug 2002 03:01:12 -0500 (CDT) Original-Received: (qmail 13989 invoked by alias); 28 Aug 2002 08:00:28 -0000 Original-Received: (qmail 13974 invoked from network); 28 Aug 2002 08:00:27 -0000 Original-Received: from unknown (HELO mail.hybyte.com) (217.89.91.12) by gnus.org with SMTP; 28 Aug 2002 08:00:27 -0000 Original-Received: (qmail 7571 invoked from network); 28 Aug 2002 07:56:16 -0000 Original-Received: from pdcawley@bofh.org.uk by mail.hybyte.com by uid 504 with qmail-scanner-1.13 (uvscan: v4.1.40/v4218. Clear:. Processed in 0.659461 secs); 28 Aug 2002 07:56:16 -0000 Original-Received: from unknown (HELO despairon.bofh.org.uk) (213.86.117.113) by 0 with SMTP; 28 Aug 2002 07:56:15 -0000 Original-Received: from pdcawley by despairon.bofh.org.uk with local (Exim 4.04) id 17jwVY-0002Vs-00; Wed, 28 Aug 2002 07:40:52 +0100 Original-To: Alex Schroeder In-Reply-To: <87wuqd5lp9.fsf@emacswiki.org> Original-Lines: 26 User-Agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.5 (broccoflower) Precedence: list X-Majordomo: 1.94.jlt7 Xref: main.gmane.org gmane.emacs.gnus.general:46289 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:46289 Alex Schroeder writes: > Anyway, what shall we do with spam-stat.el, now? An ifile user > suggested I write code to reduce the dictionary size again -- perhaps > I should remove all the words occuring less than 5 times, Then how do you incrementally change the dictionary? Where do you magically remember that some word has been seen 4 times already and should therefore go in the dictionary this time. > and all words whose spaminess is close to 0.5 (common words occuring > both in spam and non-spam), That sounds like a bad idea, maybe if a word hangs around the .5 mark for a certain number of dictionary builds... > and only the first few kb of all mails should be analyzed. Only if you can tune that. -- Piers "It is a truth universally acknowledged that a language in possession of a rich syntax must be in need of a rewrite." -- Jane Austen?