From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/45015 Path: main.gmane.org!not-for-mail From: Russ Allbery Newsgroups: gmane.emacs.gnus.general Subject: Re: nnml splitting on encoded headers Date: Tue, 28 May 2002 17:31:36 -0700 Organization: The Eyrie Sender: owner-ding@hpc.uh.edu Message-ID: References: <87off09f2s.fsf@nwalsh.com> NNTP-Posting-Host: localhost.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: main.gmane.org 1022632348 23729 127.0.0.1 (29 May 2002 00:32:28 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Wed, 29 May 2002 00:32:28 +0000 (UTC) Return-path: Original-Received: from malifon.math.uh.edu ([129.7.128.13]) by main.gmane.org with esmtp (Exim 3.33 #1 (Debian)) id 17CrO7-0006Aa-00 for ; Wed, 29 May 2002 02:32:27 +0200 Original-Received: from sina.hpc.uh.edu ([129.7.128.10] ident=lists) by malifon.math.uh.edu with esmtp (Exim 3.20 #1) id 17CrNm-0001hT-00; Tue, 28 May 2002 19:32:06 -0500 Original-Received: by sina.hpc.uh.edu (TLB v0.09a (1.20 tibbs 1996/10/09 22:03:07)); Tue, 28 May 2002 19:32:21 -0500 (CDT) Original-Received: from sclp3.sclp.com (qmailr@sclp3.sclp.com [209.196.61.66]) by sina.hpc.uh.edu (8.9.3/8.9.3) with SMTP id TAA16151 for ; Tue, 28 May 2002 19:32:11 -0500 (CDT) Original-Received: (qmail 5893 invoked by alias); 29 May 2002 00:31:47 -0000 Original-Received: (qmail 5888 invoked from network); 29 May 2002 00:31:46 -0000 Original-Received: from windlord.stanford.edu (171.64.13.23) by gnus.org with SMTP; 29 May 2002 00:31:46 -0000 Original-Received: (qmail 27749 invoked by uid 50); 29 May 2002 00:31:36 -0000 Original-To: ding@gnus.org In-Reply-To: (Mark Thomas's message of "Tue, 28 May 2002 18:17:54 -0400") Original-Lines: 20 User-Agent: Gnus/5.090005 (Oort Gnus v0.05) XEmacs/21.4 (Common Lisp, sparc-sun-solaris2.6) Precedence: list X-Majordomo: 1.94.jlt7 Xref: main.gmane.org gmane.emacs.gnus.general:45015 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:45015 Mark Thomas writes: > Sometimes I get spam where the Content-Type is multipart/alternative > and there is no charset listed in the headers. For these, I use the > following rule to catch un-encoded spam: > ("mail.spam.asian" "^subject:.*[=A1-=FF]\\{4,\\}") > I figure any mail with more than four high-bit characters in a row in > the subject is probably not one I'm going to be able to read. I've had extremely good luck with the following regex: .*[=B9=B2=B3=B0=B6=F7=BE].* It still passes pretty much anything that's ISO 8859-1 or -15, and it catches unencoded Korean and Cyrillic pretty reliably. Adjust to taste if you get unencoded subject headers in character sets other than ISO 8859-1, of course. --=20 Russ Allbery (rra@stanford.edu)