From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/56145 Path: main.gmane.org!not-for-mail From: Hubert Chan Newsgroups: gmane.emacs.gnus.general Subject: Re: spam.el: generic bayes interface? Date: Tue, 20 Jan 2004 23:02:15 -0500 Sender: ding-owner@lists.math.uh.edu Message-ID: <87d69e54qg.fsf@uhoreg.ca> References: <4nptdei2oh.fsf@collins.bwh.harvard.edu> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1074707749 21954 80.91.224.253 (21 Jan 2004 17:55:49 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 21 Jan 2004 17:55:49 +0000 (UTC) Original-X-From: ding-owner+M4685@lists.math.uh.edu Wed Jan 21 18:55:36 2004 Return-path: Original-Received: from malifon.math.uh.edu ([129.7.128.13]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1AjMZk-0001zc-00 for ; Wed, 21 Jan 2004 18:55:36 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.math.uh.edu) by malifon.math.uh.edu with smtp (Exim 3.20 #1) id 1AjMYV-0007jH-00; Wed, 21 Jan 2004 11:54:19 -0600 Original-Received: from justine.libertine.org ([66.139.78.221] ident=postfix) by malifon.math.uh.edu with esmtp (Exim 3.20 #1) id 1Aj9Zk-00065Y-00 for ding@lists.math.uh.edu; Tue, 20 Jan 2004 22:02:44 -0600 Original-Received: from hopper.math.uwaterloo.ca (hopper.math.uwaterloo.ca [129.97.78.132]) by justine.libertine.org (Postfix) with ESMTP id 2436E3A003B for ; Tue, 20 Jan 2004 22:02:43 -0600 (CST) Original-Received: (from hy3chan@localhost) by hopper.math.uwaterloo.ca (8.11.7/8.11.7) id i0L42fx23465 for ding@gnus.org; Tue, 20 Jan 2004 23:02:41 -0500 (EST) X-Mailer: emacs 21.3.50.2 (via feedmail 8 I) Original-To: ding@gnus.org In-Reply-To: <4nptdei2oh.fsf@collins.bwh.harvard.edu> (Ted Zlatanov's message of "Tue, 20 Jan 2004 19:08:14 -0500") Mail-Followup-To: ding@gnus.org, Hubert Chan X-Hashcash: 0:040121:ding@gnus.org:3793d7317f09d249 User-Agent: Gnus/5.1006 (Gnus v5.10.6) Emacs/21.3.50 (gnu/linux) Precedence: bulk Xref: main.gmane.org gmane.emacs.gnus.general:56145 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:56145 >>>>> "Ted" == Ted Zlatanov writes: [...] Ted> Yes, spam-use-regex-headers will do the right thing for splitting Ted> incoming mail, but there's no SA specific backend. Hubert Chan Ted> wrote a SA backend, and I have been late in replying to his Ted> questions. It's coming, though. I see it in CVS now. ;-) I promised to write documentation too, but that won't happen until at least next week some time. In the mean time, though, the variable documentation should probably suffice for most people. [...] Ted> The problem is that then you force people into just one Bayesian Ted> approach (how would SA and bogofilter work together?), and I'm not Ted> sure it's a good idea. Granted, most people use just one Bayesian Ted> filter, so it's probably nice to switch filters with just one Ted> thing. Well, there are at least some good reasons that someone might want to use multiple Bayesian filters. For example, one might want to just try out the effectiveness of one filter, while retaining their original filter as a backup. Also, if one wishes to switch Bayesian filters, and does not have a corpus of spam/ham to train the filter, there would have to be a transition time during which the new filter is trained, while the old filter is still being used for splitting. And, of course, during this time, one would still want to keep training the old filter at the same time. This got me thinking, though, Ted, that the registration code for the spam/ham processors is pretty similar. They seem to mostly work in one of two ways -- either register one at a time, or register multiple articles at a time in a mbox-style format. I think they all feed the articles via standard input. I would imagine that we would be able to share a lot of common code. Maybe write a function that feeds the article(s) to the registration program, and pass the name of the program and its arguments as arguments to that function. Then the registration functions just have to call that function with the appropriate arguments. Hmm. I'll have to look at the code to see if that would actually work... -- Hubert Chan - http://www.uhoreg.ca/ PGP/GnuPG key: 1024D/124B61FA Fingerprint: 96C5 012F 5F74 A5F7 1FF7 5291 AF29 C719 124B 61FA Key available at wwwkeys.pgp.net. Encrypted e-mail preferred.