From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/82460 Path: news.gmane.org!not-for-mail From: Eric Abrahamsen Newsgroups: gmane.emacs.gnus.general Subject: Re: sometime splits Date: Fri, 02 Nov 2012 15:50:24 +0800 Message-ID: <87wqy4qw8v.fsf@ericabrahamsen.net> References: <87obrhnayk.fsf@ericabrahamsen.net> <87limlpzp8.fsf@windlord.stanford.edu> <877gy5n356.fsf@ericabrahamsen.net> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1351842523 14676 80.91.229.3 (2 Nov 2012 07:48:43 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 2 Nov 2012 07:48:43 +0000 (UTC) To: ding@gnus.org Original-X-From: ding-owner+M30726@lists.math.uh.edu Fri Nov 02 08:48:46 2012 Return-path: Envelope-to: ding-account@gmane.org Original-Received: from util0.math.uh.edu ([129.7.128.18]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1TUBzh-00056I-BZ for ding-account@gmane.org; Fri, 02 Nov 2012 08:48:45 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.math.uh.edu) by util0.math.uh.edu with smtp (Exim 4.63) (envelope-from ) id 1TUBy6-0001AJ-No; Fri, 02 Nov 2012 02:47:06 -0500 Original-Received: from mx1.math.uh.edu ([129.7.128.32]) by util0.math.uh.edu with esmtps (TLSv1:AES256-SHA:256) (Exim 4.63) (envelope-from ) id 1TUBy4-0001A5-6X for ding@lists.math.uh.edu; Fri, 02 Nov 2012 02:47:04 -0500 Original-Received: from quimby.gnus.org ([80.91.231.51]) by mx1.math.uh.edu with esmtps (TLSv1:AES256-SHA:256) (Exim 4.76) (envelope-from ) id 1TUBy1-0007DN-AC for ding@lists.math.uh.edu; Fri, 02 Nov 2012 02:47:03 -0500 Original-Received: from plane.gmane.org ([80.91.229.3]) by quimby.gnus.org with esmtp (Exim 4.72) (envelope-from ) id 1TUBxz-0001RQ-8U for ding@gnus.org; Fri, 02 Nov 2012 08:46:59 +0100 Original-Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1TUBy6-0003XZ-Rv for ding@gnus.org; Fri, 02 Nov 2012 08:47:06 +0100 Original-Received: from 114.250.110.126 ([114.250.110.126]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 02 Nov 2012 08:47:06 +0100 Original-Received: from eric by 114.250.110.126 with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 02 Nov 2012 08:47:06 +0100 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 71 Original-X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: 114.250.110.126 User-Agent: Gnus/5.130006 (Ma Gnus v0.6) Emacs/24.2 (gnu/linux) Cancel-Lock: sha1:mUv0t7RAb1S7slQRz9RzEXAbZ0s= X-Spam-Score: -1.4 (-) List-ID: Precedence: bulk Xref: news.gmane.org gmane.emacs.gnus.general:82460 Archived-At: So, months after first having this problem, I think I've finally figured out what's going on. To recap, I have this split in `nnmail-split-fancy': ("from" "info@paper-republic.org" (| ("subject" "New Comment" (| ("subject" ,(rx "MARKED SPAM" eol) "mail.PRSpam") "mail.PRham")) When messages come in with "MARKED SPAM" at the end of the subject header, this _sometimes_ matches, and sometimes doesn't. These messages are sent via a Django website, through Google Apps email service. I figured out that if there are non-ASCII characters in the subject header, something (probably Google's mail service) messes with the header. Using "C-u g" in the summary buffer shows that a pure-ASCII subject header looks just like you'd expect it to, while a header containing non-ASCII characters ends up actually looking like this: --8<---------------cut here---------------start------------->8--- Subject: =?utf-8?q?=5BPaper_Republic=5D_New_Comment_on_French_Rendition_of_Fan_Wen?= =?utf-8?b?4oCZcyDigJxIYXJtb25pb3VzIExhbmTigJ0gdG8gTGF1bmNoIGJ5IGVhcmx5?= =?utf-8?q?_2013_MARKED_SPAM?= --8<---------------cut here---------------end--------------->8--- Not surprisingly, the call to (rx "MARKED SPAM" eol) fails on this, because of the extra "?=" at the end of the header, and the underscore between MARKED and SPAM. That underscore means I would need two different rules for the differently-encoded headers. Is there anything built into Gnus that might allow me to somehow translate this header into a "real" UTF-8 string, instead of what Google gives me? Or have the split performed on the decoded string, rather than the literal string? At any rate, I'm pleased to know that I'm not actually crazy. E Eric Abrahamsen writes: > On Tue, Mar 27 2012, Russ Allbery wrote: > >> Eric Abrahamsen writes: >> >>> I'm having an irritating issue where one type of common email message >>> gets split incorrectly. I run a website that emails me automatically >>> with spam notifications, so I can catch false positives before they're >>> automatically deleted. The top of my `nnmail-split-fancy' looks like >>> this: >> >>> '(| >>> ("From" "info@paper-republic.org" >>> (| ("Subject" "\\[Paper Republic\\]" >> >> This kept catching me too. You have to be careful about regexes; Gnus >> adds an implicit word boundary on either end of the regex, but Emacs >> doesn't consider the transition from a non-alphanumeric to another >> non-alphanumeric to be a word boundary. So if your regex begins or ends >> with some non-alphanumeric characters, the regex won't match the way you >> expect. >> >> Short version: change that to ".*\\[Paper Republic\\].*" and I bet it will >> start working. > > Ooh, I'll give that a shot, thank you!