* Re: [Eric Knauel] spam.el and spamoracle
2003-05-20 17:41 ` Ted Zlatanov
@ 2003-06-05 9:32 ` Eric Knauel
2003-06-09 20:53 ` Ted Zlatanov
0 siblings, 1 reply; 9+ messages in thread
From: Eric Knauel @ 2003-06-05 9:32 UTC (permalink / raw)
[-- Attachment #1: Type: text/plain, Size: 1219 bytes --]
Ted Zlatanov <tzz@lifelogs.com> writes:
> On Tue, 13 May 2003, knauel@informatik.uni-tuebingen.de wrote:
>> Ted Zlatanov <tzz@lifelogs.com> writes:
>>
>>> Does Eric have FSF papers on file?
>>
>> No. What do I have to do to get one? The FSF website says something
>> about logging onto mysterious GNU machines where the forms can be
>> found.
>
> The first rule of FSF club is, you don't talk about FSF club :)
Seems so. ;) Finally, I got the FSF papers and sent them back this
week.
>>> - should there be a spam/ham processor for spamoracle?
>>
>> I think that would this would be a convenient way for training
>> spamoracle. I'll add that feature and send you the patch.
>
> No problem, I hope it's not too hard. If you could also look at the
> documentation - basically, copy the SpamAssassin section and modify
> accordingly, I'll update the menus, check the syntax, and add it in if
> you are not familiar with Texinfo. Not that I'm an expert either, but
> I do find Texinfo very manageable and easy to learn.
That would be great! I've never written texinfo before, so I probably
made many mistakes.
Here is an updated version with documentation and spam/ham-processors,
diffed against CVS this time.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: spamoracle-3.patch --]
[-- Type: text/x-patch, Size: 15087 bytes --]
Index: lisp/gnus.el
===================================================================
RCS file: /usr/local/cvsroot/gnus/lisp/gnus.el,v
retrieving revision 6.189
diff -u -r6.189 gnus.el
--- lisp/gnus.el 3 Jun 2003 14:11:36 -0000 6.189
+++ lisp/gnus.el 5 Jun 2003 09:34:26 -0000
@@ -1843,6 +1843,9 @@
"The Gmane reporting summary exit spam processor.
Only applicable to NNTP groups with articles from Gmane. See spam-report.el")
+ (defvar gnus-group-spam-exit-processor-spamoracle "spamoracle-spam"
+ "The spamoracle summary exit spam processor.")
+
(defvar gnus-group-ham-exit-processor-ifile "ifile-ham"
"The ifile summary exit ham processor.
Only applicable to non-spam (unclassified and ham) groups.")
@@ -1867,6 +1870,10 @@
"The ham copy exit ham processor.
Only applicable to non-spam (unclassified and ham) groups.")
+ (defvar gnus-group-ham-exit-processor-spamoracle "spamoracle-ham"
+ "The spamoracle summary exit ham processor.
+Only applicable to non-spam (unclassified and ham) groups.")
+
(gnus-define-group-parameter
spam-process
:type list
@@ -1879,12 +1886,14 @@
(variable-item gnus-group-spam-exit-processor-bogofilter)
(variable-item gnus-group-spam-exit-processor-blacklist)
(variable-item gnus-group-spam-exit-processor-report-gmane)
+ (variable-item gnus-group-spam-exit-processor-spamoracle)
(variable-item gnus-group-ham-exit-processor-bogofilter)
(variable-item gnus-group-ham-exit-processor-ifile)
(variable-item gnus-group-ham-exit-processor-stat)
(variable-item gnus-group-ham-exit-processor-whitelist)
(variable-item gnus-group-ham-exit-processor-BBDB)
- (variable-item gnus-group-ham-exit-processor-copy))))
+ (variable-item gnus-group-ham-exit-processor-copy)
+ (variable-item gnus-group-ham-exit-processor-spamoracle))))
:function-document
"Which spam or ham processors will be applied to the GROUP articles at summary exit."
:variable gnus-spam-process-newsgroups
Index: lisp/spam.el
===================================================================
RCS file: /usr/local/cvsroot/gnus/lisp/spam.el,v
retrieving revision 6.97
diff -u -r6.97 spam.el
--- lisp/spam.el 30 May 2003 20:05:05 -0000 6.97
+++ lisp/spam.el 5 Jun 2003 09:34:26 -0000
@@ -182,24 +182,10 @@
:type 'boolean
:group 'spam)
-(defcustom spam-install-hooks (or
- spam-use-dig
- spam-use-blacklist
- spam-use-whitelist
- spam-use-whitelist-exclusive
- spam-use-blackholes
- spam-use-hashcash
- spam-use-regex-headers
- spam-use-bogofilter-headers
- spam-use-bogofilter
- spam-use-BBDB
- spam-use-BBDB-exclusive
- spam-use-ifile
- spam-use-stat)
- "Whether the spam hooks should be installed, default to t if one of
-the spam-use-* variables is set."
- :group 'gnus-registry
- :type 'boolean)
+(defcustom spam-use-spamoracle nil
+ "Whether spamoracle should be used by spam-split."
+ :type 'boolean
+ :group 'spam)
(defcustom spam-split-group "spam"
"Group name where incoming spam should be put by spam-split."
@@ -309,6 +295,19 @@
(const :tag "Use the default"))
:group 'spam-ifile)
+(defcustom spam-spamoracle-database nil
+ "Location of spamoracle database file. When nil, use the default
+spamoracle database."
+ :type '(choice (directory :tag "Location of spamoracle database file.")
+ (cons :tag "Use the default"))
+ :group 'spam-spamoracle)
+
+(defcustom spam-spamoracle-binary (executable-find "spamoracle")
+ "Location of the spamoracle binary."
+ :type '(choice (directory :tag "Location of the spamoracle binary")
+ (const :tag "Use the default"))
+ :group 'spam-spamoracle)
+
;;; Key bindings for spam control.
(gnus-define-keys gnus-summary-mode-map
@@ -380,6 +379,9 @@
(defun spam-group-spam-processor-ifile-p (group)
(spam-group-processor-p group 'gnus-group-spam-exit-processor-ifile))
+(defun spam-group-spam-processor-spamoracle-p (group)
+ (spam-group-processor-p group 'gnus-group-spam-exit-processor-spamoracle))
+
(defun spam-group-ham-processor-ifile-p (group)
(spam-group-processor-p group 'gnus-group-ham-exit-processor-ifile))
@@ -401,11 +403,16 @@
(defun spam-group-ham-processor-copy-p (group)
(spam-group-processor-p group 'gnus-group-ham-exit-processor-copy))
+(defun spam-group-ham-processor-spamoracle-p (group)
+ (spam-group-processor-p group 'gnus-group-ham-exit-processor-spamoracle))
+
;;; Summary entry and exit processing.
(defun spam-summary-prepare ()
(spam-mark-junk-as-spam-routine))
+(add-hook 'gnus-summary-prepare-hook 'spam-summary-prepare)
+
;; The spam processors are invoked for any group, spam or ham or neither
(defun spam-summary-prepare-exit ()
(unless gnus-group-is-exiting-without-update-p
@@ -432,6 +439,10 @@
(gnus-message 5 "Registering spam with the Gmane report")
(spam-report-gmane-register-routine))
+ (when (spam-group-spam-processor-spamoracle-p gnus-newsgroup-name)
+ (gnus-message 5 "Registering spam with spamoracle")
+ (spam-spamoracle-learn-spam))
+
(if spam-move-spam-nonspam-groups-only
(when (not (spam-group-spam-contents-p gnus-newsgroup-name))
(spam-mark-spam-as-expired-and-move-routine
@@ -460,7 +471,10 @@
(spam-stat-register-ham-routine))
(when (spam-group-ham-processor-BBDB-p gnus-newsgroup-name)
(gnus-message 5 "Registering ham with the BBDB")
- (spam-BBDB-register-routine)))
+ (spam-BBDB-register-routine))
+ (when (spam-group-ham-processor-spamoracle-p gnus-newsgroup-name)
+ (gnus-message 5 "Registering ham with spamoracle")
+ (spam-spamoracle-learn-ham)))
(when (spam-group-ham-processor-copy-p gnus-newsgroup-name)
(gnus-message 5 "Copying ham")
@@ -608,7 +622,8 @@
(spam-use-blackholes . spam-check-blackholes)
(spam-use-hashcash . spam-check-hashcash)
(spam-use-bogofilter-headers . spam-check-bogofilter-headers)
- (spam-use-bogofilter . spam-check-bogofilter))
+ (spam-use-bogofilter . spam-check-bogofilter)
+ (spam-use-spamoracle . spam-check-spamoracle))
"The spam-list-of-checks list contains pairs associating a parameter
variable with a spam checking function. If the parameter variable is
true, then the checking function is called, and its value decides what
@@ -622,7 +637,7 @@
definitely a spam.")
(defvar spam-list-of-statistical-checks
- '(spam-use-ifile spam-use-stat spam-use-bogofilter)
+ '(spam-use-ifile spam-use-stat spam-use-bogofilter spam-use-spamoracle)
"The spam-list-of-statistical-checks list contains all the mail
splitters that need to have the full message body available.")
@@ -1073,33 +1088,62 @@
(spam-bogofilter-register-with-bogofilter
(spam-get-article-as-string article) nil))))
-\f
-;;;; Hooks
-
-(defun spam-install-hooks-function ()
- "Install the spam.el hooks"
- (interactive)
- ;; Add hooks for loading and saving the spam stats
- (when spam-use-stat
- (add-hook 'gnus-save-newsrc-hook 'spam-maybe-spam-stat-save)
- (add-hook 'gnus-get-top-new-news-hook 'spam-maybe-spam-stat-load)
- (add-hook 'gnus-startup-hook 'spam-maybe-spam-stat-load))
- (add-hook 'gnus-summary-prepare-exit-hook 'spam-summary-prepare-exit)
- (add-hook 'gnus-summary-prepare-hook 'spam-summary-prepare)
- (add-hook 'gnus-get-new-news-hook 'spam-setup-widening))
-
-(defun spam-unload-hook ()
- "Uninstall the spam.el hooks"
- (interactive)
- (remove-hook 'gnus-save-newsrc-hook 'spam-maybe-spam-stat-save)
- (remove-hook 'gnus-get-top-new-news-hook 'spam-maybe-spam-stat-load)
- (remove-hook 'gnus-startup-hook 'spam-maybe-spam-stat-load)
- (remove-hook 'gnus-summary-prepare-exit-hook 'spam-summary-prepare-exit)
- (remove-hook 'gnus-summary-prepare-hook 'spam-summary-prepare)
- (remove-hook 'gnus-get-new-news-hook 'spam-setup-widening))
+^L
+;;; spamoracle
+(defun spam-check-spamoracle ()
+ "Run spamoracle on an article to determine whether it's spam."
+ (let ((article-buffer-name (buffer-name)))
+ (with-temp-buffer
+ (let ((temp-buffer-name (buffer-name)))
+ (save-excursion
+ (set-buffer article-buffer-name)
+ (let ((status
+ (apply 'call-process-region
+ (point-min) (point-max)
+ spam-spamoracle-binary
+ nil temp-buffer-name nil
+ (if spam-spamoracle-database
+ `("-f" ,spam-spamoracle-database "mark")
+ '("mark")))))
+ (if (zerop status)
+ (progn
+ (set-buffer temp-buffer-name)
+ (goto-char (point-min))
+ (when (re-search-forward "^X-Spam: yes;" nil t)
+ spam-split-group))
+ (error "Error running spamoracle" status))))))))
+
+(defun spam-spamoracle-learn (article article-is-spam-p)
+ "Run spamoracle in training mode."
+ (with-temp-buffer
+ (let ((temp-buffer-name (buffer-name)))
+ (save-excursion
+ (goto-char (point-min))
+ (insert-string (spam-get-article-as-string article))
+ (let* ((arg (if article-is-spam-p "-spam" "-good"))
+ (status
+ (apply 'call-process-region
+ (point-min) (point-max)
+ spam-spamoracle-binary
+ nil temp-buffer-name nil
+ (if spam-spamoracle-database
+ `("-f" ,spam-spamoracle-database
+ "add" ,arg)
+ `("add" ,arg)))))
+ (when (not (zerop status))
+ (error "Error running spamoracle" status)))))))
+
+(defun spam-spamoracle-learn-ham ()
+ (spam-generic-register-routine
+ nil
+ (lambda (article)
+ (spam-spamoracle-learn article nil))))
-(when spam-install-hooks
- (spam-install-hooks-function))
+(defun spam-spamoracle-learn-spam ()
+ (spam-generic-register-routine
+ (lambda (article)
+ (spam-spamoracle-learn article t))
+ nil))
(provide 'spam)
Index: texi/gnus.texi
===================================================================
RCS file: /usr/local/cvsroot/gnus/texi/gnus.texi,v
retrieving revision 6.521
diff -u -r6.521 gnus.texi
--- texi/gnus.texi 3 Jun 2003 19:23:39 -0000 6.521
+++ texi/gnus.texi 5 Jun 2003 09:34:27 -0000
@@ -22441,6 +22441,7 @@
* Bogofilter::
* ifile spam filtering::
* spam-stat spam filtering::
+* SpamOracle::
* Extending the spam elisp package::
@end menu
@@ -22832,6 +22833,109 @@
Bogofilter does not require external programs. A spam and a ham
processor, and the @code{spam-use-stat} variable for @code{spam-split}
are provided.
+
+@node SpamOracle
+@subsubsection Using SpamOracle with Gnus
+@cindex spam filtering
+@cindex SpamOracle
+@cindex spam
+
+An easy way to filter out spam is to use SpamOracle. SpamOracle is an
+statistical mail filtering tool written by Xavier Leroy and needs to be
+installed seperately. The latestet can be found at
+@uref{http://pauillac.inria.fr/~xleroy/software.html}.
+
+There are several ways to use SpamOracle with Gnus. In all cases, your
+mail is piped through SpamOracle in its "mark" mode. SpamOracle will
+then enter an `X-Spam'-header indicating whether it regards the mail as
+a spam mail or not.
+
+One possibility is to run SpamOracle as a @code{:prescript} from the
+@pxref{Mail Source Specifiers}, analoguos to way @pxref{SpamAssassin} is
+run on incoming mail. This method has the advantage that the user can
+see the `X-Spam'-headers.
+
+The easiest method is to make @code{spam.el} (see @pxref{Filtering Spam
+Using The Spam ELisp Package}) call SpamOracle.
+@vindex spam-use-spamoracle
+For this to happen set the variable `spam-use-spamoracle' to @code{t}
+and configure the `nnmail-split-fancy' or `nnimap-split-fancy' as
+described in the section @pxref{Filtering Spam
+Using The Spam ELisp Package}. In this example the `INBOX' of an
+`nnimap' server is filtered using SpamOracle. Mails recognized as spam
+mails will be moved to @code{spam-split-group}, `Junk' in this case. Ham
+messages stay in `INBOX':
+
+@example
+(setq spam-use-spamoracle t
+ spam-split-group "Junk"
+ nnimap-split-inbox '("INBOX")
+ nnimap-split-rule 'nnimap-split-fancy
+ nnimap-split-fancy '(| (: spam-split) "INBOX"))
+@end example
+
+@defvar spam-use-spamoracle
+Set to @code{t} if you want Gnus to enable spam filtering using
+SpamOracle.
+@end defvar
+
+@defvar spam-spamoracle-binary
+Gnus uses the SpamOracle binary found in the PATH. Using the variable
+@code{spam-spamoracle-binary}, this can be customized.
+@end defvar
+
+@defvar spam-spamoracle-database
+By default, SpamOracle uses the file `.spamoracle.db' as a database to
+store it analyses. This is controlled by the variable
+@code{spam-spamoracle-database} which defaults to @code{nil}. That means
+the default SpamOracle database will be used. In case you want your
+database to live somewhere special, set
+@code{spam-spamoracle-database} to this path.
+@end defvar
+
+SpamOracle employs an statistical algorithm to determine whether a
+message is or not. In order to get good results, meaning few false hits
+or misses, SpamOracle needs training. SpamOracle learns the
+characteristics of your spam mails. Using the `add' mode (training mode)
+one has to feed good (ham) and spam mails to SpamOracle. This can be
+done by pressing `|' in the Summary buffer and pipe the mail to
+a SpamOracle process or using @code{spam.el}'s spam- and
+ham-processors, which is much more convenient. See @pxref{Filtering Spam
+Using The Spam ELisp Package} for an detailed description of spam- and
+ham-processors.
+
+@defvar gnus-group-spam-exit-processor-spamoracle
+Add this symbol to a group's @code{spam-process} parameter by
+customizing the group parameter or the
+@code{gnus-spam-process-newsgroups} variable. When this symbol is added
+to a group's @code{spam-process} parameter, spam-marked articles will be
+send to SpamOracle as samples for spam.
+@end defvar
+
+@defvar gnus-group-ham-exit-processor-spamoracle
+Add this symbol to a group's @code{spam-process} parameter by
+customizing the group parameter or the
+@code{gnus-spam-process-newsgroups} variable. When this symbol is added
+to a grup's @code{spam-process} parameter, the ham-marked articles in
+@emph{ham} groups will be send to the SpamOracle as samples of ham
+messages. Not that this ham processor has no effect in @emph{spam} or
+@emph{unclassified} groups.
+@end defvar
+
+@emph{Example:} These are the Group Parameters of an group that has been
+classified as a ham group, meaning that it should only contain ham
+messages.
+@example
+ ((spam-contents gnus-group-spam-classification-ham)
+ (spam-process
+ (gnus-group-spam-exit-processor-spamoracle)))
+@end example
+For this group the `gnus-group-spam-exit-processor-spamoracle' is
+installed. If the group contains spam message (e.g. because SpamOracle
+has not had enough sample messages yet) and the user marks some
+messages as spam messages, these messages will be processed by
+`gnus-group-spam-exit-processor-spamoracle'. This processor sends the
+messages to SpamOracle as new samples for spam.
@node Extending the spam elisp package
@subsubsection Extending the spam elisp package
[-- Attachment #3: Type: text/plain, Size: 82 bytes --]
-Eric
--
"Excuse me --- Di Du Du Duuuuh Di Dii --- Huh Weeeheeee" (Albert King)
^ permalink raw reply [flat|nested] 9+ messages in thread