From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/44106 Path: main.gmane.org!not-for-mail From: Russ Allbery Newsgroups: gmane.emacs.gnus.general Subject: Re: Spam spam spam spam spam Date: Sun, 31 Mar 2002 10:19:44 -0800 Organization: The Eyrie Sender: owner-ding@hpc.uh.edu Message-ID: References: <87it7exbdo.fsf@enberg.org> NNTP-Posting-Host: localhost.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: main.gmane.org 1017598846 29168 127.0.0.1 (31 Mar 2002 18:20:46 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Sun, 31 Mar 2002 18:20:46 +0000 (UTC) Original-Received: from malifon.math.uh.edu ([129.7.128.13]) by main.gmane.org with esmtp (Exim 3.33 #1 (Debian)) id 16rjwb-0007aL-00 for ; Sun, 31 Mar 2002 20:20:46 +0200 Original-Received: from sina.hpc.uh.edu ([129.7.128.10] ident=lists) by malifon.math.uh.edu with esmtp (Exim 3.20 #1) id 16rjvz-0005da-00; Sun, 31 Mar 2002 12:20:07 -0600 Original-Received: by sina.hpc.uh.edu (TLB v0.09a (1.20 tibbs 1996/10/09 22:03:07)); Sun, 31 Mar 2002 12:20:13 -0600 (CST) Original-Received: from sclp3.sclp.com (qmailr@sclp3.sclp.com [209.196.61.66]) by sina.hpc.uh.edu (8.9.3/8.9.3) with SMTP id MAA24333 for ; Sun, 31 Mar 2002 12:20:01 -0600 (CST) Original-Received: (qmail 24055 invoked by alias); 31 Mar 2002 18:19:50 -0000 Original-Received: (qmail 24048 invoked from network); 31 Mar 2002 18:19:50 -0000 Original-Received: from windlord.stanford.edu (171.64.13.23) by gnus.org with SMTP; 31 Mar 2002 18:19:50 -0000 Original-Received: (qmail 18139 invoked by uid 50); 31 Mar 2002 18:19:44 -0000 Original-To: ding@gnus.org In-Reply-To: (Lars Magne Ingebrigtsen's message of "Sun, 31 Mar 2002 17:23:58 +0200") Original-Lines: 60 User-Agent: Gnus/5.090005 (Oort Gnus v0.05) XEmacs/21.4 (Common Lisp, sparc-sun-solaris2.6) Precedence: list X-Majordomo: 1.94.jlt7 Xref: main.gmane.org gmane.emacs.gnus.general:44106 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:44106 Lars Magne Ingebrigtsen writes: > Now, about the white-list thing -- there should also be a black-list > thing as well, I guess. "Classify all mail from this From address as > spam." `C-u M-y', perhaps. > What should the format of the black-and-white-lists be? It could > just be a one-address-per-line thing, but perhaps it would be nice to > allow regexps? Or perhaps a GLOB thing would be better? > larsi@gnus.org > larsi@gnus\.org > .*@gnus\.org > *@gnus.org > I think it's easier for people to edit GLOBs than to edit the > regexps, and I don't think people really need regexps here... It would be great to somehow plug in the same logic that scoring has so that we can use all of the same options that we can with score file entries. If you think glob would be particularly useful, it could be added as a scoring type too. :) Incidentally, while this isn't a good heuristic for everyone, for those of us who don't speak any Asian language and don't live in Asia, the following is remarkably good. All by itself, it catches something like 90% of all of my spam. (Asian language spam has increased drastically in the past year or so.) '(nnmail-split-abbrev-alist [...] (cons 'content-spam (concat "big5\\|gb2312\\|ks_c_.*\\|shift_jis" "\\|default_charset")) (cons 'subject-spam (concat ".*=3D\\?\\(big5\\|gb2312\\|ks_c_\\|shift_jis" "\\|euc[-_]kr\\).*" "\\|.*[=B9=B2=B0=B6=F7=BE].*")) I then check content-spam against the Content-Type header and subject-spam against the Subject header. The last bit of subject-spam catches a lot of spam with unencoded Asian languages in the subject header by looking for characters that are fairly unlikely to be in the Subject for ISO 8859-1 or -15 languages, but again isn't going to be applicable to everyone. Not sure if something like this would be worth offering as an option or example. It's wholly inappropriate for people who correspond in Asian languages, or in any language that uses those code points which may include any UTF-8 encoding, but for those of us who only correspond in English or western European languages it's staggeringly effective. Obviously, this is the sort of check that you want to put after you already split out your mailing list traffic, so as to not get false positives on mail to public mailing lists from people who just want to spell their name correctly. --=20 Russ Allbery (rra@stanford.edu)