From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/53014 Path: main.gmane.org!not-for-mail From: Eric Knauel Newsgroups: gmane.emacs.gnus.general Subject: Re: [Eric Knauel] spam.el and spamoracle Date: Thu, 05 Jun 2003 11:32:04 +0200 Sender: ding-owner@lists.math.uh.edu Message-ID: References: <4nel2tecrk.fsf@lockgroove.bwh.harvard.edu> NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Trace: main.gmane.org 1054805535 9834 80.91.224.249 (5 Jun 2003 09:32:15 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Thu, 5 Jun 2003 09:32:15 +0000 (UTC) Original-X-From: ding-owner+M1558@lists.math.uh.edu Thu Jun 05 11:32:12 2003 Return-path: Original-Received: from malifon.math.uh.edu ([129.7.128.13]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 19Nr5g-0002U3-00 for ; Thu, 05 Jun 2003 11:31:24 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.math.uh.edu) by malifon.math.uh.edu with smtp (Exim 3.20 #1) id 19Nr7K-00072p-00; Thu, 05 Jun 2003 04:33:06 -0500 Original-Received: from sclp3.sclp.com ([64.157.176.121]) by malifon.math.uh.edu with smtp (Exim 3.20 #1) id 19Nr75-00072h-00 for ding@lists.math.uh.edu; Thu, 05 Jun 2003 04:32:51 -0500 Original-Received: (qmail 3339 invoked by alias); 5 Jun 2003 09:32:51 -0000 Original-Received: (qmail 3334 invoked from network); 5 Jun 2003 09:32:50 -0000 Original-Received: from mx1.informatik.uni-tuebingen.de (134.2.12.5) by sclp3.sclp.com with SMTP; 5 Jun 2003 09:32:50 -0000 Original-Received: from jimi.informatik.uni-tuebingen.de (jimi [134.2.12.83]) by mx1.informatik.uni-tuebingen.de (Postfix) with ESMTP id 757BA868 for ; Thu, 5 Jun 2003 11:32:19 +0200 (MST) Original-Received: from jimi.informatik.uni-tuebingen.de (localhost [127.0.0.1]) by jimi.informatik.uni-tuebingen.de (8.12.9/8.12.2) with ESMTP id h559WJ1A010518 for ; Thu, 5 Jun 2003 11:32:19 +0200 (CEST) Original-Received: (from knauel@localhost) by jimi.informatik.uni-tuebingen.de (8.12.9/8.12.2/Submit) id h559WFIR010517; Thu, 5 Jun 2003 11:32:15 +0200 (CEST) X-Authentication-Warning: jimi.informatik.uni-tuebingen.de: knauel set sender to knauel@informatik.uni-tuebingen.de using -f Original-To: ding@gnus.org In-Reply-To: <4nel2tecrk.fsf@lockgroove.bwh.harvard.edu> (Ted Zlatanov's message of "Tue, 20 May 2003 13:41:51 -0400") User-Agent: Gnus/5.1003 (Gnus v5.10.3) XEmacs/21.4 (Portable Code, darwin) Precedence: bulk Xref: main.gmane.org gmane.emacs.gnus.general:53014 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:53014 --=-=-= Ted Zlatanov writes: > On Tue, 13 May 2003, knauel@informatik.uni-tuebingen.de wrote: >> Ted Zlatanov writes: >> >>> Does Eric have FSF papers on file? >> >> No. What do I have to do to get one? The FSF website says something >> about logging onto mysterious GNU machines where the forms can be >> found. > > The first rule of FSF club is, you don't talk about FSF club :) Seems so. ;) Finally, I got the FSF papers and sent them back this week. >>> - should there be a spam/ham processor for spamoracle? >> >> I think that would this would be a convenient way for training >> spamoracle. I'll add that feature and send you the patch. > > No problem, I hope it's not too hard. If you could also look at the > documentation - basically, copy the SpamAssassin section and modify > accordingly, I'll update the menus, check the syntax, and add it in if > you are not familiar with Texinfo. Not that I'm an expert either, but > I do find Texinfo very manageable and easy to learn. That would be great! I've never written texinfo before, so I probably made many mistakes. Here is an updated version with documentation and spam/ham-processors, diffed against CVS this time. --=-=-= Content-Type: text/x-patch Content-Disposition: attachment; filename=spamoracle-3.patch Index: lisp/gnus.el =================================================================== RCS file: /usr/local/cvsroot/gnus/lisp/gnus.el,v retrieving revision 6.189 diff -u -r6.189 gnus.el --- lisp/gnus.el 3 Jun 2003 14:11:36 -0000 6.189 +++ lisp/gnus.el 5 Jun 2003 09:34:26 -0000 @@ -1843,6 +1843,9 @@ "The Gmane reporting summary exit spam processor. Only applicable to NNTP groups with articles from Gmane. See spam-report.el") + (defvar gnus-group-spam-exit-processor-spamoracle "spamoracle-spam" + "The spamoracle summary exit spam processor.") + (defvar gnus-group-ham-exit-processor-ifile "ifile-ham" "The ifile summary exit ham processor. Only applicable to non-spam (unclassified and ham) groups.") @@ -1867,6 +1870,10 @@ "The ham copy exit ham processor. Only applicable to non-spam (unclassified and ham) groups.") + (defvar gnus-group-ham-exit-processor-spamoracle "spamoracle-ham" + "The spamoracle summary exit ham processor. +Only applicable to non-spam (unclassified and ham) groups.") + (gnus-define-group-parameter spam-process :type list @@ -1879,12 +1886,14 @@ (variable-item gnus-group-spam-exit-processor-bogofilter) (variable-item gnus-group-spam-exit-processor-blacklist) (variable-item gnus-group-spam-exit-processor-report-gmane) + (variable-item gnus-group-spam-exit-processor-spamoracle) (variable-item gnus-group-ham-exit-processor-bogofilter) (variable-item gnus-group-ham-exit-processor-ifile) (variable-item gnus-group-ham-exit-processor-stat) (variable-item gnus-group-ham-exit-processor-whitelist) (variable-item gnus-group-ham-exit-processor-BBDB) - (variable-item gnus-group-ham-exit-processor-copy)))) + (variable-item gnus-group-ham-exit-processor-copy) + (variable-item gnus-group-ham-exit-processor-spamoracle)))) :function-document "Which spam or ham processors will be applied to the GROUP articles at summary exit." :variable gnus-spam-process-newsgroups Index: lisp/spam.el =================================================================== RCS file: /usr/local/cvsroot/gnus/lisp/spam.el,v retrieving revision 6.97 diff -u -r6.97 spam.el --- lisp/spam.el 30 May 2003 20:05:05 -0000 6.97 +++ lisp/spam.el 5 Jun 2003 09:34:26 -0000 @@ -182,24 +182,10 @@ :type 'boolean :group 'spam) -(defcustom spam-install-hooks (or - spam-use-dig - spam-use-blacklist - spam-use-whitelist - spam-use-whitelist-exclusive - spam-use-blackholes - spam-use-hashcash - spam-use-regex-headers - spam-use-bogofilter-headers - spam-use-bogofilter - spam-use-BBDB - spam-use-BBDB-exclusive - spam-use-ifile - spam-use-stat) - "Whether the spam hooks should be installed, default to t if one of -the spam-use-* variables is set." - :group 'gnus-registry - :type 'boolean) +(defcustom spam-use-spamoracle nil + "Whether spamoracle should be used by spam-split." + :type 'boolean + :group 'spam) (defcustom spam-split-group "spam" "Group name where incoming spam should be put by spam-split." @@ -309,6 +295,19 @@ (const :tag "Use the default")) :group 'spam-ifile) +(defcustom spam-spamoracle-database nil + "Location of spamoracle database file. When nil, use the default +spamoracle database." + :type '(choice (directory :tag "Location of spamoracle database file.") + (cons :tag "Use the default")) + :group 'spam-spamoracle) + +(defcustom spam-spamoracle-binary (executable-find "spamoracle") + "Location of the spamoracle binary." + :type '(choice (directory :tag "Location of the spamoracle binary") + (const :tag "Use the default")) + :group 'spam-spamoracle) + ;;; Key bindings for spam control. (gnus-define-keys gnus-summary-mode-map @@ -380,6 +379,9 @@ (defun spam-group-spam-processor-ifile-p (group) (spam-group-processor-p group 'gnus-group-spam-exit-processor-ifile)) +(defun spam-group-spam-processor-spamoracle-p (group) + (spam-group-processor-p group 'gnus-group-spam-exit-processor-spamoracle)) + (defun spam-group-ham-processor-ifile-p (group) (spam-group-processor-p group 'gnus-group-ham-exit-processor-ifile)) @@ -401,11 +403,16 @@ (defun spam-group-ham-processor-copy-p (group) (spam-group-processor-p group 'gnus-group-ham-exit-processor-copy)) +(defun spam-group-ham-processor-spamoracle-p (group) + (spam-group-processor-p group 'gnus-group-ham-exit-processor-spamoracle)) + ;;; Summary entry and exit processing. (defun spam-summary-prepare () (spam-mark-junk-as-spam-routine)) +(add-hook 'gnus-summary-prepare-hook 'spam-summary-prepare) + ;; The spam processors are invoked for any group, spam or ham or neither (defun spam-summary-prepare-exit () (unless gnus-group-is-exiting-without-update-p @@ -432,6 +439,10 @@ (gnus-message 5 "Registering spam with the Gmane report") (spam-report-gmane-register-routine)) + (when (spam-group-spam-processor-spamoracle-p gnus-newsgroup-name) + (gnus-message 5 "Registering spam with spamoracle") + (spam-spamoracle-learn-spam)) + (if spam-move-spam-nonspam-groups-only (when (not (spam-group-spam-contents-p gnus-newsgroup-name)) (spam-mark-spam-as-expired-and-move-routine @@ -460,7 +471,10 @@ (spam-stat-register-ham-routine)) (when (spam-group-ham-processor-BBDB-p gnus-newsgroup-name) (gnus-message 5 "Registering ham with the BBDB") - (spam-BBDB-register-routine))) + (spam-BBDB-register-routine)) + (when (spam-group-ham-processor-spamoracle-p gnus-newsgroup-name) + (gnus-message 5 "Registering ham with spamoracle") + (spam-spamoracle-learn-ham))) (when (spam-group-ham-processor-copy-p gnus-newsgroup-name) (gnus-message 5 "Copying ham") @@ -608,7 +622,8 @@ (spam-use-blackholes . spam-check-blackholes) (spam-use-hashcash . spam-check-hashcash) (spam-use-bogofilter-headers . spam-check-bogofilter-headers) - (spam-use-bogofilter . spam-check-bogofilter)) + (spam-use-bogofilter . spam-check-bogofilter) + (spam-use-spamoracle . spam-check-spamoracle)) "The spam-list-of-checks list contains pairs associating a parameter variable with a spam checking function. If the parameter variable is true, then the checking function is called, and its value decides what @@ -622,7 +637,7 @@ definitely a spam.") (defvar spam-list-of-statistical-checks - '(spam-use-ifile spam-use-stat spam-use-bogofilter) + '(spam-use-ifile spam-use-stat spam-use-bogofilter spam-use-spamoracle) "The spam-list-of-statistical-checks list contains all the mail splitters that need to have the full message body available.") @@ -1073,33 +1088,62 @@ (spam-bogofilter-register-with-bogofilter (spam-get-article-as-string article) nil)))) - -;;;; Hooks - -(defun spam-install-hooks-function () - "Install the spam.el hooks" - (interactive) - ;; Add hooks for loading and saving the spam stats - (when spam-use-stat - (add-hook 'gnus-save-newsrc-hook 'spam-maybe-spam-stat-save) - (add-hook 'gnus-get-top-new-news-hook 'spam-maybe-spam-stat-load) - (add-hook 'gnus-startup-hook 'spam-maybe-spam-stat-load)) - (add-hook 'gnus-summary-prepare-exit-hook 'spam-summary-prepare-exit) - (add-hook 'gnus-summary-prepare-hook 'spam-summary-prepare) - (add-hook 'gnus-get-new-news-hook 'spam-setup-widening)) - -(defun spam-unload-hook () - "Uninstall the spam.el hooks" - (interactive) - (remove-hook 'gnus-save-newsrc-hook 'spam-maybe-spam-stat-save) - (remove-hook 'gnus-get-top-new-news-hook 'spam-maybe-spam-stat-load) - (remove-hook 'gnus-startup-hook 'spam-maybe-spam-stat-load) - (remove-hook 'gnus-summary-prepare-exit-hook 'spam-summary-prepare-exit) - (remove-hook 'gnus-summary-prepare-hook 'spam-summary-prepare) - (remove-hook 'gnus-get-new-news-hook 'spam-setup-widening)) +^L +;;; spamoracle +(defun spam-check-spamoracle () + "Run spamoracle on an article to determine whether it's spam." + (let ((article-buffer-name (buffer-name))) + (with-temp-buffer + (let ((temp-buffer-name (buffer-name))) + (save-excursion + (set-buffer article-buffer-name) + (let ((status + (apply 'call-process-region + (point-min) (point-max) + spam-spamoracle-binary + nil temp-buffer-name nil + (if spam-spamoracle-database + `("-f" ,spam-spamoracle-database "mark") + '("mark"))))) + (if (zerop status) + (progn + (set-buffer temp-buffer-name) + (goto-char (point-min)) + (when (re-search-forward "^X-Spam: yes;" nil t) + spam-split-group)) + (error "Error running spamoracle" status)))))))) + +(defun spam-spamoracle-learn (article article-is-spam-p) + "Run spamoracle in training mode." + (with-temp-buffer + (let ((temp-buffer-name (buffer-name))) + (save-excursion + (goto-char (point-min)) + (insert-string (spam-get-article-as-string article)) + (let* ((arg (if article-is-spam-p "-spam" "-good")) + (status + (apply 'call-process-region + (point-min) (point-max) + spam-spamoracle-binary + nil temp-buffer-name nil + (if spam-spamoracle-database + `("-f" ,spam-spamoracle-database + "add" ,arg) + `("add" ,arg))))) + (when (not (zerop status)) + (error "Error running spamoracle" status))))))) + +(defun spam-spamoracle-learn-ham () + (spam-generic-register-routine + nil + (lambda (article) + (spam-spamoracle-learn article nil)))) -(when spam-install-hooks - (spam-install-hooks-function)) +(defun spam-spamoracle-learn-spam () + (spam-generic-register-routine + (lambda (article) + (spam-spamoracle-learn article t)) + nil)) (provide 'spam) Index: texi/gnus.texi =================================================================== RCS file: /usr/local/cvsroot/gnus/texi/gnus.texi,v retrieving revision 6.521 diff -u -r6.521 gnus.texi --- texi/gnus.texi 3 Jun 2003 19:23:39 -0000 6.521 +++ texi/gnus.texi 5 Jun 2003 09:34:27 -0000 @@ -22441,6 +22441,7 @@ * Bogofilter:: * ifile spam filtering:: * spam-stat spam filtering:: +* SpamOracle:: * Extending the spam elisp package:: @end menu @@ -22832,6 +22833,109 @@ Bogofilter does not require external programs. A spam and a ham processor, and the @code{spam-use-stat} variable for @code{spam-split} are provided. + +@node SpamOracle +@subsubsection Using SpamOracle with Gnus +@cindex spam filtering +@cindex SpamOracle +@cindex spam + +An easy way to filter out spam is to use SpamOracle. SpamOracle is an +statistical mail filtering tool written by Xavier Leroy and needs to be +installed seperately. The latestet can be found at +@uref{http://pauillac.inria.fr/~xleroy/software.html}. + +There are several ways to use SpamOracle with Gnus. In all cases, your +mail is piped through SpamOracle in its "mark" mode. SpamOracle will +then enter an `X-Spam'-header indicating whether it regards the mail as +a spam mail or not. + +One possibility is to run SpamOracle as a @code{:prescript} from the +@pxref{Mail Source Specifiers}, analoguos to way @pxref{SpamAssassin} is +run on incoming mail. This method has the advantage that the user can +see the `X-Spam'-headers. + +The easiest method is to make @code{spam.el} (see @pxref{Filtering Spam +Using The Spam ELisp Package}) call SpamOracle. +@vindex spam-use-spamoracle +For this to happen set the variable `spam-use-spamoracle' to @code{t} +and configure the `nnmail-split-fancy' or `nnimap-split-fancy' as +described in the section @pxref{Filtering Spam +Using The Spam ELisp Package}. In this example the `INBOX' of an +`nnimap' server is filtered using SpamOracle. Mails recognized as spam +mails will be moved to @code{spam-split-group}, `Junk' in this case. Ham +messages stay in `INBOX': + +@example +(setq spam-use-spamoracle t + spam-split-group "Junk" + nnimap-split-inbox '("INBOX") + nnimap-split-rule 'nnimap-split-fancy + nnimap-split-fancy '(| (: spam-split) "INBOX")) +@end example + +@defvar spam-use-spamoracle +Set to @code{t} if you want Gnus to enable spam filtering using +SpamOracle. +@end defvar + +@defvar spam-spamoracle-binary +Gnus uses the SpamOracle binary found in the PATH. Using the variable +@code{spam-spamoracle-binary}, this can be customized. +@end defvar + +@defvar spam-spamoracle-database +By default, SpamOracle uses the file `.spamoracle.db' as a database to +store it analyses. This is controlled by the variable +@code{spam-spamoracle-database} which defaults to @code{nil}. That means +the default SpamOracle database will be used. In case you want your +database to live somewhere special, set +@code{spam-spamoracle-database} to this path. +@end defvar + +SpamOracle employs an statistical algorithm to determine whether a +message is or not. In order to get good results, meaning few false hits +or misses, SpamOracle needs training. SpamOracle learns the +characteristics of your spam mails. Using the `add' mode (training mode) +one has to feed good (ham) and spam mails to SpamOracle. This can be +done by pressing `|' in the Summary buffer and pipe the mail to +a SpamOracle process or using @code{spam.el}'s spam- and +ham-processors, which is much more convenient. See @pxref{Filtering Spam +Using The Spam ELisp Package} for an detailed description of spam- and +ham-processors. + +@defvar gnus-group-spam-exit-processor-spamoracle +Add this symbol to a group's @code{spam-process} parameter by +customizing the group parameter or the +@code{gnus-spam-process-newsgroups} variable. When this symbol is added +to a group's @code{spam-process} parameter, spam-marked articles will be +send to SpamOracle as samples for spam. +@end defvar + +@defvar gnus-group-ham-exit-processor-spamoracle +Add this symbol to a group's @code{spam-process} parameter by +customizing the group parameter or the +@code{gnus-spam-process-newsgroups} variable. When this symbol is added +to a grup's @code{spam-process} parameter, the ham-marked articles in +@emph{ham} groups will be send to the SpamOracle as samples of ham +messages. Not that this ham processor has no effect in @emph{spam} or +@emph{unclassified} groups. +@end defvar + +@emph{Example:} These are the Group Parameters of an group that has been +classified as a ham group, meaning that it should only contain ham +messages. +@example + ((spam-contents gnus-group-spam-classification-ham) + (spam-process + (gnus-group-spam-exit-processor-spamoracle))) +@end example +For this group the `gnus-group-spam-exit-processor-spamoracle' is +installed. If the group contains spam message (e.g. because SpamOracle +has not had enough sample messages yet) and the user marks some +messages as spam messages, these messages will be processed by +`gnus-group-spam-exit-processor-spamoracle'. This processor sends the +messages to SpamOracle as new samples for spam. @node Extending the spam elisp package @subsubsection Extending the spam elisp package --=-=-= -Eric -- "Excuse me --- Di Du Du Duuuuh Di Dii --- Huh Weeeheeee" (Albert King) --=-=-=--