From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/46153 Path: main.gmane.org!not-for-mail From: Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai =?iso-8859-15?q?Gro=DFjohann?=) Newsgroups: gmane.emacs.gnus.general Subject: Re: Paul Graham on fighting SPAM Date: Sat, 17 Aug 2002 21:43:05 +0200 Sender: owner-ding@hpc.uh.edu Message-ID: References: NNTP-Posting-Host: localhost.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1029613524 20997 127.0.0.1 (17 Aug 2002 19:45:24 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Sat, 17 Aug 2002 19:45:24 +0000 (UTC) Return-path: Original-Received: from malifon.math.uh.edu ([129.7.128.13]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 17g9Vi-0005S4-00 for ; Sat, 17 Aug 2002 21:45:22 +0200 Original-Received: from sina.hpc.uh.edu ([129.7.128.10] ident=lists) by malifon.math.uh.edu with esmtp (Exim 3.20 #1) id 17g9UO-0007de-00; Sat, 17 Aug 2002 14:44:00 -0500 Original-Received: by sina.hpc.uh.edu (TLB v0.09a (1.20 tibbs 1996/10/09 22:03:07)); Sat, 17 Aug 2002 14:44:30 -0500 (CDT) Original-Received: from sclp3.sclp.com (qmailr@sclp3.sclp.com [209.196.61.66]) by sina.hpc.uh.edu (8.9.3/8.9.3) with SMTP id OAA19387 for ; Sat, 17 Aug 2002 14:44:19 -0500 (CDT) Original-Received: (qmail 7478 invoked by alias); 17 Aug 2002 19:43:39 -0000 Original-Received: (qmail 7473 invoked from network); 17 Aug 2002 19:43:39 -0000 Original-Received: from waldorf.cs.uni-dortmund.de (129.217.4.42) by gnus.org with SMTP; 17 Aug 2002 19:43:39 -0000 Original-Received: from lothlorien.cs.uni-dortmund.de (lothlorien [129.217.19.67]) by waldorf.cs.uni-dortmund.de with ESMTP id g7HJhBb13141 for ; Sat, 17 Aug 2002 21:43:11 +0200 (MES) Original-Received: from lucy.cs.uni-dortmund.de (lucy [129.217.19.80]) by lothlorien.cs.uni-dortmund.de id VAA04365; Sat, 17 Aug 2002 21:43:06 +0200 (MET DST) Original-Received: by lucy.cs.uni-dortmund.de (Postfix, from userid 6104) id EB39F3B145; Sat, 17 Aug 2002 21:43:05 +0200 (CEST) Original-To: ding@gnus.org Mail-Followup-To: ding@gnus.org In-Reply-To: (Danny Siu's message of "Fri, 16 Aug 2002 10:10:18 -0700") Original-Lines: 25 User-Agent: Gnus/5.090008 (Oort Gnus v0.08) Emacs/21.3.50 (i686-pc-linux-gnu) Precedence: list X-Majordomo: 1.94.jlt7 Xref: main.gmane.org gmane.emacs.gnus.general:46153 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:46153 Danny Siu writes: > since we had much discussion on spams lately, it is worthwhile to > see read about what lisp guru thinks the content based filters can > effectively kill spams. He has clearly seen the light :-) There is a research field known as "information filtering" or "(automatic) text classification" or "text categorization". I don't know the details of the theory, but folks in that community are speaking of "naive Bayes classifiers" as one of the ways to do it -- maybe that's similar to his approach. Other buzzwords that come to my mind are kNN (k nearest neighbor) and support vector machines. I'm not an expert in that field, but the numbers given by people who talk about the effectiveness (quality) of text classifiers are quite good, they are above 70% usually. On much harder problems, that is -- recognizing spam should be a no-brainer. Maybe ShengHuo knows more and can elaborate. I'm not an expert, just aware that the field exists. kai -- A large number of young women don't trust men with beards. (BFBS Radio)