From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/56135
Path: main.gmane.org!not-for-mail
From: Ted Zlatanov <tzz@lifelogs.com>
Newsgroups: gmane.emacs.gnus.general
Subject: Re: spam.el: generic bayes interface?
Date: Tue, 20 Jan 2004 19:08:14 -0500
Organization: =?koi8-r?q?=F4=C5=CF=C4=CF=D2=20=FA=CC=C1=D4=C1=CE=CF=D7?= @
 Cienfuegos
Sender: ding-owner@lists.math.uh.edu
Message-ID: <4nptdei2oh.fsf@collins.bwh.harvard.edu>
References: <v9broy1fsd.fsf@marauder.physik.uni-ulm.de>
NNTP-Posting-Host: deer.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: sea.gmane.org 1074643803 6581 80.91.224.253 (21 Jan 2004 00:10:03 GMT)
X-Complaints-To: usenet@sea.gmane.org
NNTP-Posting-Date: Wed, 21 Jan 2004 00:10:03 +0000 (UTC)
Cc: Hubert Chan <hubert@uhoreg.ca>
Original-X-From: ding-owner+M4675@lists.math.uh.edu Wed Jan 21 01:09:50 2004
Return-path: <ding-owner+M4675@lists.math.uh.edu>
Original-Received: from malifon.math.uh.edu ([129.7.128.13])
	by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian))
	id 1Aj5wM-0000NL-00
	for <ding-account@gmane.org>; Wed, 21 Jan 2004 01:09:50 +0100
Original-Received: from localhost ([127.0.0.1] helo=lists.math.uh.edu)
	by malifon.math.uh.edu with smtp (Exim 3.20 #1)
	id 1Aj5wA-0005WX-00; Tue, 20 Jan 2004 18:09:38 -0600
Original-Received: from justine.libertine.org ([66.139.78.221] ident=postfix)
	by malifon.math.uh.edu with esmtp (Exim 3.20 #1)
	id 1Aj5w5-0005WS-00
	for ding@lists.math.uh.edu; Tue, 20 Jan 2004 18:09:33 -0600
Original-Received: from clifford.bwh.harvard.edu (clifford.bwh.harvard.edu [134.174.9.41])
	by justine.libertine.org (Postfix) with ESMTP id 3CDD13A0083
	for <ding@gnus.org>; Tue, 20 Jan 2004 18:09:33 -0600 (CST)
Original-Received: from collins.bwh.harvard.edu (collins [134.174.9.80])
	by clifford.bwh.harvard.edu (8.10.2+Sun/8.11.0) with ESMTP id i0L08LU17655;
	Tue, 20 Jan 2004 19:08:21 -0500 (EST)
Original-Received: from collins.bwh.harvard.edu (localhost [127.0.0.1])
	by collins.bwh.harvard.edu (8.12.9+Sun/8.11.0) with ESMTP id i0L08EuB012299;
	Tue, 20 Jan 2004 19:08:14 -0500 (EST)
Original-Received: (from tzz@localhost)
	by collins.bwh.harvard.edu (8.12.9+Sun/8.12.9/Submit) id i0L08E5d012296;
	Tue, 20 Jan 2004 19:08:14 -0500 (EST)
Original-To: ding@gnus.org
X-Face: bd.DQ~'29fIs`T_%O%C\g%6jW)yi[zuz6;d4V0`@y-~$#3P_Ng{@m+e4o<4P'#(_GJQ%TT= D}[Ep*b!\e,fBZ'j_+#"Ps?s2!4H2-Y"sx"
Mail-Followup-To: ding@gnus.org, Hubert Chan <hubert@uhoreg.ca>
In-Reply-To: <v9broy1fsd.fsf@marauder.physik.uni-ulm.de> (Reiner Steib's
 message of "Tue, 20 Jan 2004 22:17:06 +0100")
User-Agent: Gnus/5.110002 (No Gnus v0.2) Emacs/21.3.50 (usg-unix-v)
Precedence: bulk
Xref: main.gmane.org gmane.emacs.gnus.general:56135
X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:56135

On Tue, 20 Jan 2004, 4.uce.03.r.s@nurfuerspam.de wrote:

> in the German Gnus group someone asked how to use the
> SpamAssassin/Bayes (see sa-learn(1)) thingie with Gnus.  I happily
> pointed him to `spam.el' and the fine manual.  But it turned out
> that there is no interface for SpamAssassin/Bayes in `spam.el' (or
> at least I couldn't locate it).

Yes, spam-use-regex-headers will do the right thing for splitting
incoming mail, but there's no SA specific backend.  Hubert Chan wrote
a SA backend, and I have been late in replying to his questions.
It's coming, though.

> I assume that SpamAssassin/Bayes works very similar to bogofilter
> [1], so it probably works by abusing the `spam-bogofilter-*' [2]
> variables.  But this is a quite dubious approach, IMHO.  Wouldn't it
> make sense to add a generic bayes interface with say
> `spam-bayes-...' variables (similar to the `browse-url-generic*'
> variables) instead of adding a set of variables for each (new)
> Bayesian filter?

The problem is that then you force people into just one Bayesian
approach (how would SA and bogofilter work together?), and I'm not
sure it's a good idea.  Granted, most people use just one Bayesian
filter, so it's probably nice to switch filters with just one thing.

But consider that the registry must track which Bayesian backend has
registered which message.  Let's say the registry knows that
spam-use-bayesian has registered message A, and that was Bogofilter at
the time, but the user switches to SA later.  Now the registry doesn't
know that SA has not registered message A, and spam.el will
not re-register message A.  It's just an example, but things will be
slightly harder to track in general.

Also, I can't drop the current Bayesian spam-use-* backends that users
are using.  So now we will have the general case of spam-use-bayesian
plus the specific backends.  Seems pretty confusing.

I would prefer to make adding new Bayesian backends easy, but give
them separate spam-use-BACKEND symbols.  Hubert's work will be
helpful here, because I've been too lazy/busy to write a good
example :)

Ted