From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.user/790 Path: news.gmane.org!not-for-mail From: Michael Slass Newsgroups: gmane.emacs.gnus.user Subject: spam-splitter.el Date: Sat, 20 Jul 2002 00:18:56 GMT Organization: Verio Message-ID: NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Trace: sea.gmane.org 1138667702 8423 80.91.229.2 (31 Jan 2006 00:35:02 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Tue, 31 Jan 2006 00:35:02 +0000 (UTC) Original-X-From: nobody Tue Jan 17 17:28:07 2006 Original-Path: quimby.gnus.org!lackawana.kippona.com!news.stealth.net!news.stealth.net!news.maxwell.syr.edu!news.xnet.com!dfw-peer!news.verio.net!sea-read.news.verio.net.POSTED!not-for-mail Original-Followup-To: gnu.emacs.gnus Original-Sender: mikesl@thneed.na.wrq.com Original-Newsgroups: gnu.emacs.sources,gnu.emacs.gnus User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.1 Original-NNTP-Posting-Host: 150.215.91.153 Original-X-Complaints-To: abuse@verio.net Original-X-Trace: sea-read.news.verio.net 1027124336 150.215.91.153 (Sat, 20 Jul 2002 00:18:56 GMT) Original-NNTP-Posting-Date: Sat, 20 Jul 2002 00:18:56 GMT Original-Xref: bridgekeeper.physik.uni-ulm.de gnus-emacs-gnus:930 Original-Lines: 278 X-Gnus-Article-Number: 930 Tue Jan 17 17:28:07 2006 Xref: news.gmane.org gmane.emacs.gnus.user:790 Archived-At: --=-=-= Here's something I've been using to search the *body* of incoming mails for spam-like phrases. It started with the tip on how to do so from the Gnus manual, and kinda grew from there. Critiques and suggestions for improvement are appreciated. --=-=-= Content-Type: application/emacs-lisp Content-Disposition: attachment; filename=spam-splitter.el Content-Description: spam-splitter.el ;;; spam-splitter.el - search incoming mail bodies for spam-indicating phrases ;; ;; Written by Mike Slass ;; ;; $Id: //users/mikesl/emacs-lisp/spam-splitter.el#1 $ ;; ;;; Commentary ;; spam splitter.el provides functions and variables to let you search ;; the BODY of an incoming or respooled message for phrases that indicate ;; spam-iness. It returns a split of the name of your spam group if the ;; mail matches, or nil if it doesn't ;; You must be using nnmail-split-fancy to use this module ;; Quick start: ;; - define some regexps that will match spams ;; (setq nnmail-split-spam-phrases ;; '("to be removed" ;; "to unsubscribe" ;; "to learn more" ;; "\\(no longer\\|do not\\) +\\(want\\|wish\\) to receive.*" ;; "for \\(further\\|more\\) info\\(rmation\\)?" ;; "you have received this" ;; "financial\\(ly\\)? \\(independen\\(t\\|ce\\)\\|abundance\\|freedom\\)" ;; "sex\\(y\\|iest\\) +\\(girl\\|teen\\)s?" ;; ;; more body matches here ;; )) ;; - add a form to your nnmail-split-fancy list that calls ;; nnmail-split-search-spam-body-trace: ;; (setq nnmail-split-fancy ;; '(| ;; ; all your other forms ;; (: nnmail-split-search-spam-body ) ;; "mail.misc")) ;; likely your last form ;; - With just that, any incoming mail containing phrases matching ;; any of the regular expressions will be spooled to the "spam" group ;; (unless they are split earlier in your nnmail-split-fancy list) ;; Read the function docs for nnmail-split-search-spam-body ;; for additional features ;;; Code (require 'nnmail) (defun nnmail-split-search-spam-body () "Search incoming or respooled mail body and generate spam split. This function searches the body of incoming or respooled mail for any of the regexps in `nnmail-split-spam-phrases'. If any of the regexps match, this function returns the name of the group specified by `nnmail-split-spam-group' This works in conjunction with `nnmail-split-fancy' (which see). You use this by adding this form to your `nnmail-split-fancy': \(: nnmail-split-search-spam-body ) You must also add the regexps which you want to use as indicators of spam email to the `nnmail-split-spam-phrases' list. An example of this might be: \(setq nnmail-split-spam-phrases '(\"to be removed\" \"to learn more\" \"you have received this\" \"financial \\\\(independence\\\\|abundance\\\\)\" \"sex\\\\(y\\\\|iest\\\\) +\\\\(girl\\\\|teen\\\\)s?\" ;; more body matches here )) This list will match many spams which include instructions on how to be removed from the spam list, or directions on how to learn more about the spammer's product, or spams promising \"financial independence\" or \"the sexiest teens on the web\" If you want to see why this function has decided to send a mail to your spam group, you can set the variable `nnmail-split-search-spam-body-trace' to non-nil. If this is set, each time this function matches an incoming or respooled mail to one of your spam-indicating regexps, `nnmail-split-show-spam-split' will be called, and will generate a temporary buffer showing you what matched. Exceptions: You may also have a list of regexps which indicate that a message is NOT spam, no matter what other content is in the message body. If `nnmail-split-spam-exceptions' is non-nil, it should be a list of regexps which indicate that an incoming mail is NOT a spam. The mail will be searched for matches to the nnmail-split-spam-exceptions BEFORE it is searched for nnmail-split-spam-phrases. If any of the exceptions match, the funtion will return a split of nil." (save-excursion (let ((mbuf (or (get-buffer " *nnmail incoming*") (get-buffer " *nnml move*") (get-buffer " *Original Article*")))) (if mbuf (let ((old-cfs case-fold-search)) (setq case-fold-search t) (set-buffer mbuf) (save-restriction (save-excursion ;; if the buffer we're looking at is " *nnmail incoming*" ;; then we don't want to widen, because it contains _all_ ;; the mails retrieved in the last pop, and gnus will take ;; care of narrowing the buffer to the appropriate message (if (not (string-match " \\*nnmail incoming\\*" (buffer-name))) (widen)) (goto-char (point-min)) (let ((spam-strings nnmail-split-spam-phrases) (exception-strings nnmail-split-spam-exceptions)) ;; if there are exception strings, and any of them match, ;; return nil (meaning this split doesn't match) (if (progn (while (and exception-strings (not (re-search-forward (car exception-strings) nil t))) (setq exception-strings (cdr exception-strings))) exception-strings) ;; we got a match on exception-strings, so the split is nil nil (progn ;; we got here because no exception strings matched, ;; so now try to match each of the spam strings until ;; we get a match, or the list is exhausted (while (and spam-strings (not (re-search-forward (car spam-strings) nil t))) (setq spam-strings (cdr spam-strings))) (setq case-fold-search old-cfs) (if spam-strings ;; this form is executed when a spam-string matched (progn (when nnmail-split-search-spam-body-trace (nnmail-split-show-spam-split mbuf (match-string 0) (car spam-strings) (match-beginning 0))) ;; the final value of the progn form is the spam split nnmail-split-spam-group) ;; this nil means the spam string list was exhausted w/o a match nil))))))) ;; this nil means we weren't able to find a buffer to search, ;; so we return nil nil)))) (defcustom nnmail-split-search-spam-body-trace nil "*Non-nil means show why `nnmail-split-search-spam-body' has matched a spam. See also: `nnmail-show-spam-split'" :type '(boolean) :group 'nnmail-split) (defcustom nnmail-split-spam-phrases nil "*List of REGEXPS: any mail containing will be be spooled to `nnmail-split-spam-group' Any mail whose body matches any of the regexps in `nnmail-split-spam-phrases' will be spooled to the `nnmail-split-spam-group' (which see). See also `nnmail-split-search-spam-body'" :type '(repeat regexp) :group 'nnmail-split) (defcustom nnmail-split-spam-exceptions nil "*REGEXPS: mail containing will NEVER be matched by `nnmail-split-search-spam-body' The phrases in this list are searched for before those in `nnmail-split-spam-phrases'. If a mail matches, nnmail-split-search-spam-body will immediately exit and return a split of nil." :type '(repeat regexp) :group 'nnmail-split) (defun nnmail-split-show-spam-split (article-buf offending-phrase matching-regexp match-position) "Display buffer showing why `nnmail-split-search-spam-body' has matched a spam. Creates a new buffer, shows which of the `nnmail-split-spam-phrases' has matched, where it matched, and which group the spam is headed to. If `font-lock-mode' is loaded, highlight the match with `font-lock-warning-face'" (set-buffer (get-buffer-create "*Spam Match Trace*")) (end-of-buffer) (if (not (eq (point-min) (point-max))) (insert "================================================================================")) (insert-buffer article-buf) (if (boundp 'font-lock-warning-face) (highlight-regexp matching-regexp font-lock-warning-face)) (end-of-buffer) (insert (format (concat "\n\n" "================================================================================\n" "This mail will be split to %s\n\n" "article phrase : \"%s\"\n" "at position : %d\n" "matches spam regexp : \"%s\"\n") nnmail-split-spam-group offending-phrase match-position matching-regexp)) (end-of-buffer) (message "Spam email split to \"%s\". See buffer \"*Spam Match Trace*\" for details" nnmail-split-spam-group)) (defcustom nnmail-split-spam-group "spam" "*Group to which mail matching `nnmail-split-spam-phrases' will be spooled. See also `nnmail-split-search-spam-body'" :type '(string) :group 'nnmail-split) ;;; spam-splitter.el ends here (setq nnmail-split-spam-phrases '("to be removed" "to unsubscribe" "to learn more" "\\(no longer\\|do not\\) +\\(want\\|wish\\) to receive.*" "for \\(further\\|more\\) info\\(rmation\\)?" "you have received this" "financial\\(ly\\)? \\(independen\\(t\\|ce\\)\\|abundance\\|freedom\\)" "sex\\(y\\|iest\\) +\\(girl\\|teen\\)s?" ;; more body matches here )) (setq nnmail-split-spam-group "junk-mail") (setq nnmail-split-spam-exceptions '("host-intensive environments?")) (setq nnmail-split-search-spam-body-trace t) --=-=-= -- Mike Slass ,---- WRQ, Inc. | | We specialize in integration software and services that let you | quickly adapt your host-intensive environment to meet new business | needs. `---- --=-=-=--