Gnus development mailing list
 help / color / mirror / Atom feed
* Re: Training for ham and training for spam
  2003-10-29 19:54 ` Michael Shields
@ 2003-10-29 14:26   ` Ian Dobbie
  2003-10-30 14:39     ` Michael Shields
  0 siblings, 1 reply; 5+ messages in thread
From: Ian Dobbie @ 2003-10-29 14:26 UTC (permalink / raw)
  Cc: Jake Colman, ding

Michael Shields <shields@msrl.com> writes:

> In message <76u15ru93j.fsf@newjersey.ppllc.com>,
> Jake Colman <colman@ppllc.com> wrote:
>> Is it truly necessary to train for ham or can I just train for spam.
>
> You need to train for both; Bayesian filters work not only by
> recognizing what features are correlated with spam, but also what
> features indicate that mail is likely to be ham.

...And if you dont all your mail will end up in the spam group. I just
setup the spam and didnt quite get the ham config right so it wasnt
working. All my mail went into the spam folder, until I worked out
what I had done wrong.

I would recommend against having large volume mailing lists in the ham
filter all the time. Maybe train on a few hundred messages and then
dont bother. However I am using an old, slow machine so CPU is a major
factor.

Ian




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Training for ham and training for spam
@ 2003-10-29 19:39 Jake Colman
  2003-10-29 19:50 ` Ted Zlatanov
  2003-10-29 19:54 ` Michael Shields
  0 siblings, 2 replies; 5+ messages in thread
From: Jake Colman @ 2003-10-29 19:39 UTC (permalink / raw)



Is it truly necessary to train for ham or can I just train for spam.  I have
a number of mailing lists that filter into their own folders and are mostly
spam-free.  Is there any benefit to classifying them as ham folders and
specifying an exit processor?  Or will it just add processing time?

On the other hand, I have a group that mostly gets spam so I have classified
it as a spam group and I do pass it through the exit processor.

Does this make sense?

-- 
Jake Colman                     

Principia Partners LLC                    Phone: (201) 209-2467
Harborside Financial Center                 Fax: (201) 946-0320
902 Plaza Two                          E-mail: colman@ppllc.com
Jersey City, NJ 07311                 www.principiapartners.com



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Training for ham and training for spam
  2003-10-29 19:39 Training for ham and training for spam Jake Colman
@ 2003-10-29 19:50 ` Ted Zlatanov
  2003-10-29 19:54 ` Michael Shields
  1 sibling, 0 replies; 5+ messages in thread
From: Ted Zlatanov @ 2003-10-29 19:50 UTC (permalink / raw)
  Cc: ding

On Wed, 29 Oct 2003, colman@ppllc.com wrote:

> Is it truly necessary to train for ham or can I just train for spam.
> I have a number of mailing lists that filter into their own folders
> and are mostly spam-free.  Is there any benefit to classifying them
> as ham folders and specifying an exit processor?  Or will it just
> add processing time?

Training for ham is useful, and pretty much every statistical filter
recommends it.  The decision of whether it's worth the CPU time and
memory is up to you and depends on the particular filter.

> On the other hand, I have a group that mostly gets spam so I have
> classified it as a spam group and I do pass it through the exit
> processor.
> 
> Does this make sense?

Sure.

Ted



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Training for ham and training for spam
  2003-10-29 19:39 Training for ham and training for spam Jake Colman
  2003-10-29 19:50 ` Ted Zlatanov
@ 2003-10-29 19:54 ` Michael Shields
  2003-10-29 14:26   ` Ian Dobbie
  1 sibling, 1 reply; 5+ messages in thread
From: Michael Shields @ 2003-10-29 19:54 UTC (permalink / raw)
  Cc: ding

In message <76u15ru93j.fsf@newjersey.ppllc.com>,
Jake Colman <colman@ppllc.com> wrote:
> Is it truly necessary to train for ham or can I just train for spam.

You need to train for both; Bayesian filters work not only by
recognizing what features are correlated with spam, but also what
features indicate that mail is likely to be ham.
-- 
Shields.




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Training for ham and training for spam
  2003-10-29 14:26   ` Ian Dobbie
@ 2003-10-30 14:39     ` Michael Shields
  0 siblings, 0 replies; 5+ messages in thread
From: Michael Shields @ 2003-10-30 14:39 UTC (permalink / raw)
  Cc: Jake Colman, ding

In message <m38yn4t90x.fsf@biostaff03.nuigalway.ie>,
Ian Dobbie <ian.dobbie@nuigalway.ie> wrote:
> I would recommend against having large volume mailing lists in the ham
> filter all the time. Maybe train on a few hundred messages and then
> dont bother. However I am using an old, slow machine so CPU is a major
> factor.

If learning is slow, then maybe you should do it asynchronously by
using the settings that copy ham and spam into folders for training,
and then use a cronjob to learn from those in the background.  I use:

    (setq gnus-spam-process-newsgroups
          '(("^INBOX" (gnus-group-ham-exit-processor-copy))))
    (setq gnus-spam-process-destinations
          '(("^INBOX" "INBOX.SA-spam")))
    (setq gnus-ham-process-destinations
          '(("^INBOX" "INBOX.SA-ham")))
    (setq spam-move-spam-nonspam-groups-only nil)

The major motivation for that feature was to have filtering done on a
different machine, the IMAP server.  But it also means that I don't
have to wait for either learning or filtering, since they happen in
the background from cron and procmail respectively.

Another idea would be to have a knob that allowed you to train on only
every n-th message; it would make processing n times faster, and since
the Bayesian filters are statistical they would still work ok.  You
would only set this knob after building up a database of a few hundred
messages.
-- 
Shields.




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2003-10-30 14:39 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-10-29 19:39 Training for ham and training for spam Jake Colman
2003-10-29 19:50 ` Ted Zlatanov
2003-10-29 19:54 ` Michael Shields
2003-10-29 14:26   ` Ian Dobbie
2003-10-30 14:39     ` Michael Shields

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).