Gnus development mailing list
 help / color / mirror / Atom feed
* sometime splits
@ 2012-03-27 18:41 Eric Abrahamsen
  2012-03-27 20:16 ` Russ Allbery
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Abrahamsen @ 2012-03-27 18:41 UTC (permalink / raw)
  To: ding

Hi,

I'm having an irritating issue where one type of common email message
gets split incorrectly. I run a website that emails me automatically
with spam notifications, so I can catch false positives before they're
automatically deleted. The top of my `nnmail-split-fancy' looks like
this:

'(|
  ("From" "info@paper-republic.org"
    (| ("Subject" "\\[Paper Republic\\]"
        (| ("Subject" "\\(MARKED SPAM\\|INTERNAL\\)" "mail.PRspam") "mail.PRham") t)

Those are the first filters. The problem is, a lot of emails I'm getting
that should go into mail.PRspam instead go into mail.misc. Recently, a
whole run of eight that had these headers:

From: info@paper-republic.org
Subject: [Paper Republic] New Comment on NYT: In China, Objections to Google’s Book Scans MARKED SPAM

That's copy-n-paste. Others that look identical to this, except with
different stuff between the "[Paper Republic]" and the "MARKED SPAM"
strings, go into the correct mail.PRspam.

What could be doing this? It's not a huge problem, but just common
enough and just irritating enough that I wanted to check with the
gnurus.

Thanks,
Eric

-- 
GNU Emacs 24.0.94.1 (i686-pc-linux-gnu, GTK+ Version 2.24.10)
 of 2012-03-06 on pellet
Ma Gnus v0.4




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: sometime splits
  2012-03-27 18:41 sometime splits Eric Abrahamsen
@ 2012-03-27 20:16 ` Russ Allbery
  2012-03-27 21:29   ` Eric Abrahamsen
  0 siblings, 1 reply; 6+ messages in thread
From: Russ Allbery @ 2012-03-27 20:16 UTC (permalink / raw)
  To: ding

Eric Abrahamsen <eric@ericabrahamsen.net> writes:

> I'm having an irritating issue where one type of common email message
> gets split incorrectly. I run a website that emails me automatically
> with spam notifications, so I can catch false positives before they're
> automatically deleted. The top of my `nnmail-split-fancy' looks like
> this:

> '(|
>   ("From" "info@paper-republic.org"
>     (| ("Subject" "\\[Paper Republic\\]"

This kept catching me too.  You have to be careful about regexes; Gnus
adds an implicit word boundary on either end of the regex, but Emacs
doesn't consider the transition from a non-alphanumeric to another
non-alphanumeric to be a word boundary.  So if your regex begins or ends
with some non-alphanumeric characters, the regex won't match the way you
expect.

Short version: change that to ".*\\[Paper Republic\\].*" and I bet it will
start working.

-- 
Russ Allbery (rra@stanford.edu)             <http://www.eyrie.org/~eagle/>



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: sometime splits
  2012-03-27 20:16 ` Russ Allbery
@ 2012-03-27 21:29   ` Eric Abrahamsen
  2012-11-02  7:50     ` Eric Abrahamsen
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Abrahamsen @ 2012-03-27 21:29 UTC (permalink / raw)
  To: ding

On Tue, Mar 27 2012, Russ Allbery wrote:

> Eric Abrahamsen <eric@ericabrahamsen.net> writes:
>
>> I'm having an irritating issue where one type of common email message
>> gets split incorrectly. I run a website that emails me automatically
>> with spam notifications, so I can catch false positives before they're
>> automatically deleted. The top of my `nnmail-split-fancy' looks like
>> this:
>
>> '(|
>>   ("From" "info@paper-republic.org"
>>     (| ("Subject" "\\[Paper Republic\\]"
>
> This kept catching me too.  You have to be careful about regexes; Gnus
> adds an implicit word boundary on either end of the regex, but Emacs
> doesn't consider the transition from a non-alphanumeric to another
> non-alphanumeric to be a word boundary.  So if your regex begins or ends
> with some non-alphanumeric characters, the regex won't match the way you
> expect.
>
> Short version: change that to ".*\\[Paper Republic\\].*" and I bet it will
> start working.

Ooh, I'll give that a shot, thank you!

-- 
GNU Emacs 24.0.94.1 (i686-pc-linux-gnu, GTK+ Version 2.24.10)
 of 2012-03-06 on pellet
Ma Gnus v0.4




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: sometime splits
  2012-03-27 21:29   ` Eric Abrahamsen
@ 2012-11-02  7:50     ` Eric Abrahamsen
  2012-11-02  9:00       ` Katsumi Yamaoka
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Abrahamsen @ 2012-11-02  7:50 UTC (permalink / raw)
  To: ding

So, months after first having this problem, I think I've finally figured
out what's going on. To recap, I have this split in
`nnmail-split-fancy':

("from" "info@paper-republic.org"
        (| ("subject" "New Comment"
           (|
           ("subject" ,(rx "MARKED SPAM" eol) "mail.PRSpam")
           "mail.PRham"))

When messages come in with "MARKED SPAM" at the end of the subject
header, this _sometimes_ matches, and sometimes doesn't.

These messages are sent via a Django website, through Google Apps email
service.

I figured out that if there are non-ASCII characters in the subject
header, something (probably Google's mail service) messes with the
header. Using "C-u g" in the summary buffer shows that a pure-ASCII
subject header looks just like you'd expect it to, while a header
containing non-ASCII characters ends up actually looking like this:

--8<---------------cut here---------------start------------->8---
Subject: =?utf-8?q?=5BPaper_Republic=5D_New_Comment_on_French_Rendition_of_Fan_Wen?=
	=?utf-8?b?4oCZcyDigJxIYXJtb25pb3VzIExhbmTigJ0gdG8gTGF1bmNoIGJ5IGVhcmx5?=
	=?utf-8?q?_2013_MARKED_SPAM?=
--8<---------------cut here---------------end--------------->8---

Not surprisingly, the call to (rx "MARKED SPAM" eol) fails on this,
because of the extra "?=" at the end of the header, and the underscore
between MARKED and SPAM.

That underscore means I would need two different rules for the
differently-encoded headers. Is there anything built into Gnus that
might allow me to somehow translate this header into a "real" UTF-8
string, instead of what Google gives me? Or have the split performed on
the decoded string, rather than the literal string?

At any rate, I'm pleased to know that I'm not actually crazy.

E



Eric Abrahamsen <eric@ericabrahamsen.net> writes:

> On Tue, Mar 27 2012, Russ Allbery wrote:
>
>> Eric Abrahamsen <eric@ericabrahamsen.net> writes:
>>
>>> I'm having an irritating issue where one type of common email message
>>> gets split incorrectly. I run a website that emails me automatically
>>> with spam notifications, so I can catch false positives before they're
>>> automatically deleted. The top of my `nnmail-split-fancy' looks like
>>> this:
>>
>>> '(|
>>>   ("From" "info@paper-republic.org"
>>>     (| ("Subject" "\\[Paper Republic\\]"
>>
>> This kept catching me too.  You have to be careful about regexes; Gnus
>> adds an implicit word boundary on either end of the regex, but Emacs
>> doesn't consider the transition from a non-alphanumeric to another
>> non-alphanumeric to be a word boundary.  So if your regex begins or ends
>> with some non-alphanumeric characters, the regex won't match the way you
>> expect.
>>
>> Short version: change that to ".*\\[Paper Republic\\].*" and I bet it will
>> start working.
>
> Ooh, I'll give that a shot, thank you!




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: sometime splits
  2012-11-02  7:50     ` Eric Abrahamsen
@ 2012-11-02  9:00       ` Katsumi Yamaoka
  2012-11-02  9:39         ` Eric Abrahamsen
  0 siblings, 1 reply; 6+ messages in thread
From: Katsumi Yamaoka @ 2012-11-02  9:00 UTC (permalink / raw)
  To: ding

Eric Abrahamsen wrote:
> Subject: =?utf-8?q?=5BPaper_Republic=5D_New_Comment_on_French_Rendition_of_Fan_Wen?=
> 	=?utf-8?b?4oCZcyDigJxIYXJtb25pb3VzIExhbmTigJ0gdG8gTGF1bmNoIGJ5IGVhcmx5?=
> 	=?utf-8?q?_2013_MARKED_SPAM?=

> Not surprisingly, the call to (rx "MARKED SPAM" eol) fails on this,
> because of the extra "?=" at the end of the header, and the underscore
> between MARKED and SPAM.

How about setting `nnmail-mail-splitting-decodes' to t?
This makes Gnus decode encoded headers before splitting, though
it might make splitting slow a bit.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: sometime splits
  2012-11-02  9:00       ` Katsumi Yamaoka
@ 2012-11-02  9:39         ` Eric Abrahamsen
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Abrahamsen @ 2012-11-02  9:39 UTC (permalink / raw)
  To: ding

Katsumi Yamaoka <yamaoka@jpl.org> writes:

> Eric Abrahamsen wrote:
>> Subject: =?utf-8?q?=5BPaper_Republic=5D_New_Comment_on_French_Rendition_of_Fan_Wen?=
>> 	=?utf-8?b?4oCZcyDigJxIYXJtb25pb3VzIExhbmTigJ0gdG8gTGF1bmNoIGJ5IGVhcmx5?=
>> 	=?utf-8?q?_2013_MARKED_SPAM?=
>
>> Not surprisingly, the call to (rx "MARKED SPAM" eol) fails on this,
>> because of the extra "?=" at the end of the header, and the underscore
>> between MARKED and SPAM.
>
> How about setting `nnmail-mail-splitting-decodes' to t?
> This makes Gnus decode encoded headers before splitting, though
> it might make splitting slow a bit.

That was the exact solution to my problem! I don't know whether to feel
a little embarrassed that I didn't find that variable, or whether to
just throw up my hands at the unbelievable number of variables one would
have to be aware of to make gnus behave the way you wanted it to...

Either way, my thanks!

E




^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-11-02  9:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-27 18:41 sometime splits Eric Abrahamsen
2012-03-27 20:16 ` Russ Allbery
2012-03-27 21:29   ` Eric Abrahamsen
2012-11-02  7:50     ` Eric Abrahamsen
2012-11-02  9:00       ` Katsumi Yamaoka
2012-11-02  9:39         ` Eric Abrahamsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).