spam/ham exit processors

Gnus development mailing list
 help / color / mirror / Atom feed

* spam/ham exit processors
@ 2003-11-03 13:47 Jake Colman
  2003-11-03 18:41 ` Ted Zlatanov
  0 siblings, 1 reply; 16+ messages in thread
From: Jake Colman @ 2003-11-03 13:47 UTC (permalink / raw)



From my reading of the manual, it seems that the suggested or anticipated
modus operandi is to move spam in a ham-classified group to a spam-classified
group for furthur processing and to move ham from a spam-classified group
over to a ham-classified group for furthur processing.  In such aa situation,
do you need both types of exit processors in both types of groups?  Or do you
just need a ham exit processors for the ham-classified group and a spam exit
processor for the spam-classified group?  I'm assuming that having an exit
processor when you don't need one costs you a bit in performance.

-- 
Jake Colman                     

Principia Partners LLC                    Phone: (201) 209-2467
Harborside Financial Center                 Fax: (201) 946-0320
902 Plaza Two                          E-mail: colman@ppllc.com
Jersey City, NJ 07311                 www.principiapartners.com



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: spam/ham exit processors
  2003-11-03 13:47 spam/ham exit processors Jake Colman
@ 2003-11-03 18:41 ` Ted Zlatanov
  2003-11-03 20:29   ` Kai Grossjohann
  0 siblings, 1 reply; 16+ messages in thread
From: Ted Zlatanov @ 2003-11-03 18:41 UTC (permalink / raw)
  Cc: ding

On Mon, 03 Nov 2003, colman@ppllc.com wrote:

>>From my reading of the manual, it seems that the suggested or
>>anticipated
> modus operandi is to move spam in a ham-classified group to a
> spam-classified group for furthur processing and to move ham from a
> spam-classified group over to a ham-classified group for furthur
> processing.  

Nope, you can process the spam at the point of origin or in a central
group.  Most people like a central "spam" group better.

> In such aa situation, do you need both types of exit processors in
> both types of groups?  Or do you just need a ham exit processors for
> the ham-classified group and a spam exit processor for the
> spam-classified group?  

You can do it either way, and it is just a different approach to the
same problem.

> I'm assuming that having an exit processor when you don't need one
> costs you a bit in performance.

The only performance cost is for the analysis of spam or ham, if
there's nothing to be done you won't pay any penalties in performance
at all.  So you can choose to batch up the processing in one group or
break it up over several groups.  It depends on your system, the
particular spam/ham processor's speed, the size of the messages...

Ted

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: spam/ham exit processors
  2003-11-03 18:41 ` Ted Zlatanov
@ 2003-11-03 20:29   ` Kai Grossjohann
  2003-11-03 20:31     ` Russ Allbery
  2003-11-03 21:25     ` Ted Zlatanov
  0 siblings, 2 replies; 16+ messages in thread
From: Kai Grossjohann @ 2003-11-03 20:29 UTC (permalink / raw)


Ted Zlatanov <tzz@lifelogs.com> writes:

> On Mon, 03 Nov 2003, colman@ppllc.com wrote:
>
>>>From my reading of the manual, it seems that the suggested or
>>>anticipated
>> modus operandi is to move spam in a ham-classified group to a
>> spam-classified group for furthur processing and to move ham from a
>> spam-classified group over to a ham-classified group for furthur
>> processing.  
>
> Nope, you can process the spam at the point of origin or in a central
> group.  Most people like a central "spam" group better.

Oh!  Does this mean that people could set
gnus-spam-process-destinations to nnml:spam, say, then set the
spam-process parameter just on nnml:spam, for it to be added to the
blacklist or whatever?  (Whereas spam-process remains unset/nil in all
other groups except nnml:spam?)

Fascinating.






^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: spam/ham exit processors
  2003-11-03 20:29   ` Kai Grossjohann
@ 2003-11-03 20:31     ` Russ Allbery
  2003-11-03 21:26       ` Ted Zlatanov
  2003-11-03 21:25     ` Ted Zlatanov
  1 sibling, 1 reply; 16+ messages in thread
From: Russ Allbery @ 2003-11-03 20:31 UTC (permalink / raw)


Kai Grossjohann <kai@emptydomain.de> writes:

> Oh!  Does this mean that people could set gnus-spam-process-destinations
> to nnml:spam, say, then set the spam-process parameter just on
> nnml:spam, for it to be added to the blacklist or whatever?  (Whereas
> spam-process remains unset/nil in all other groups except nnml:spam?)

When I tried to do that, I discovered that I had to look at all of the
spam twice in order to get it registered.

-- 
Russ Allbery (rra@stanford.edu)             <http://www.eyrie.org/~eagle/>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: spam/ham exit processors
  2003-11-03 20:29   ` Kai Grossjohann
  2003-11-03 20:31     ` Russ Allbery
@ 2003-11-03 21:25     ` Ted Zlatanov
  2003-11-03 22:10       ` Kai Grossjohann
  2003-11-04 15:08       ` Jake Colman
  1 sibling, 2 replies; 16+ messages in thread
From: Ted Zlatanov @ 2003-11-03 21:25 UTC (permalink / raw)
  Cc: ding

On Mon, 03 Nov 2003, kai@emptydomain.de wrote:

> Ted Zlatanov <tzz@lifelogs.com> writes:
> 
>> On Mon, 03 Nov 2003, colman@ppllc.com wrote:
>>
>>>>From my reading of the manual, it seems that the suggested or
>>>>anticipated
>>> modus operandi is to move spam in a ham-classified group to a
>>> spam-classified group for furthur processing and to move ham from
>>> a spam-classified group over to a ham-classified group for furthur
>>> processing.
>>
>> Nope, you can process the spam at the point of origin or in a
>> central group.  Most people like a central "spam" group better.
> 
> Oh!  Does this mean that people could set
> gnus-spam-process-destinations to nnml:spam, say, then set the
> spam-process parameter just on nnml:spam, for it to be added to the
> blacklist or whatever?  (Whereas spam-process remains unset/nil in
> all other groups except nnml:spam?)
> 
> Fascinating.

Sure, you could have spam moved from all groups to "nnml:spam" [1] and
then process spam only in "nnml:spam".  I do that, and furthermore I
have the spam-process-destination parameter of "nnml:spam" set to
"nnml:train" so I can run SpamAssassin directly on the "nnml:spam"
group's file contents.  In "nnml:spam" I have the
ham-process-destination set to "nnml:mail" and thus when I tick an
article in the spam group it gets popped and ham-processed back into
"nnml:mail".  It's a pretty tidy system.

Ted

[1] "nnimap+mail.lifelogs.com:spam" actually, but the prefix is not
important



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: spam/ham exit processors
  2003-11-03 20:31     ` Russ Allbery
@ 2003-11-03 21:26       ` Ted Zlatanov
  2003-11-03 21:43         ` Russ Allbery
  0 siblings, 1 reply; 16+ messages in thread
From: Ted Zlatanov @ 2003-11-03 21:26 UTC (permalink / raw)
  Cc: ding

On Mon, 03 Nov 2003, rra@stanford.edu wrote:

> Kai Grossjohann <kai@emptydomain.de> writes:
> 
>> Oh!  Does this mean that people could set
>> gnus-spam-process-destinations to nnml:spam, say, then set the
>> spam-process parameter just on nnml:spam, for it to be added to the
>> blacklist or whatever?  (Whereas spam-process remains unset/nil in
>> all other groups except nnml:spam?)
> 
> When I tried to do that, I discovered that I had to look at all of
> the spam twice in order to get it registered.

I'm not sure what you mean by "look at" and "registered" - can you
clarify the process and where the bug/problem was?

Thanks
Ted



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: spam/ham exit processors
  2003-11-03 21:26       ` Ted Zlatanov
@ 2003-11-03 21:43         ` Russ Allbery
  2003-11-04  2:28           ` Ted Zlatanov
  0 siblings, 1 reply; 16+ messages in thread
From: Russ Allbery @ 2003-11-03 21:43 UTC (permalink / raw)

Ted Zlatanov <tzz@lifelogs.com> writes:
> On Mon, 03 Nov 2003, rra@stanford.edu wrote:
>> Kai Grossjohann <kai@emptydomain.de> writes:

>>> Oh!  Does this mean that people could set
>>> gnus-spam-process-destinations to nnml:spam, say, then set the
>>> spam-process parameter just on nnml:spam, for it to be added to the
>>> blacklist or whatever?  (Whereas spam-process remains unset/nil in all
>>> other groups except nnml:spam?)

>> When I tried to do that, I discovered that I had to look at all of the
>> spam twice in order to get it registered.

> I'm not sure what you mean by "look at" and "registered" - can you
> clarify the process and where the bug/problem was?

Oh, that's right, it wasn't to get it registered.  I'm remembering more of
this now.

What happened when I set a process destination is that all the spam would
get moved into that group and show up as new unread messages there.  So
when I went to do my false positive scan, I would end up scanning through
all those messages again, even though I'd already manually confirmed that
they were spam.

And depending on how I set things up, I'd either end up registering that
spam twice (once when I moved it into that group and again when I scanned
the group for false positives), or I'd end up not registering any spam
that bogofilter stuck directly into the spam group.

-- 
Russ Allbery (rra@stanford.edu)             <http://www.eyrie.org/~eagle/>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: spam/ham exit processors
  2003-11-03 21:25     ` Ted Zlatanov
@ 2003-11-03 22:10       ` Kai Grossjohann
  2003-11-04  2:20         ` Ted Zlatanov
  2003-11-04 15:08       ` Jake Colman
  1 sibling, 1 reply; 16+ messages in thread
From: Kai Grossjohann @ 2003-11-03 22:10 UTC (permalink / raw)

Ted Zlatanov <tzz@lifelogs.com> writes:

> Sure, you could have spam moved from all groups to "nnml:spam" [1] and
> then process spam only in "nnml:spam".  I do that, and furthermore I
> have the spam-process-destination parameter of "nnml:spam" set to
> "nnml:train" so I can run SpamAssassin directly on the "nnml:spam"
> group's file contents.  In "nnml:spam" I have the
> ham-process-destination set to "nnml:mail" and thus when I tick an
> article in the spam group it gets popped and ham-processed back into
> "nnml:mail".  It's a pretty tidy system.

Why do you have two groups, nnml:spam and nnml:train?  Hm.  Ah, maybe
it is in order to avoid training SA on all spam.  You look at
nnml:spam first, then you catch all ham from there.  Only *real* spam
is moved to nnml:train.  So you're not training SA with "fake spam".

So it's a system similar to mine; I have nnimap:INBOX.spam instead of
nnml:spam and nnimap:INBOX.makespam instead of nnml:train.  And I only
train Bogofilter on the borderline cases, so I don't move all spam
from nnimap:INBOX.spam to nnimap:INBOX.makespam.

But maybe it would be a good idea to do that.

Kai

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: spam/ham exit processors
  2003-11-03 22:10       ` Kai Grossjohann
@ 2003-11-04  2:20         ` Ted Zlatanov
  0 siblings, 0 replies; 16+ messages in thread
From: Ted Zlatanov @ 2003-11-04  2:20 UTC (permalink / raw)
  Cc: ding

On Mon, 03 Nov 2003, kai@emptydomain.de wrote:

> Ted Zlatanov <tzz@lifelogs.com> writes:
> 
>> Sure, you could have spam moved from all groups to "nnml:spam" [1]
>> and then process spam only in "nnml:spam".  I do that, and
>> furthermore I have the spam-process-destination parameter of
>> "nnml:spam" set to "nnml:train" so I can run SpamAssassin directly
>> on the "nnml:spam" group's file contents.  In "nnml:spam" I have
>> the ham-process-destination set to "nnml:mail" and thus when I tick
>> an article in the spam group it gets popped and ham-processed back
>> into "nnml:mail".  It's a pretty tidy system.
> 
> Why do you have two groups, nnml:spam and nnml:train?  Hm.  Ah,
> maybe it is in order to avoid training SA on all spam.  You look at
> nnml:spam first, then you catch all ham from there.  Only *real*
> spam is moved to nnml:train.  So you're not training SA with "fake
> spam".

Right.  Where I said "I can run SpamAssassin directly on the
'nnml:spam' group" I actually meant "nnml:train" and you knew what I
meant :)

> So it's a system similar to mine; I have nnimap:INBOX.spam instead
> of nnml:spam and nnimap:INBOX.makespam instead of nnml:train.  And I
> only train Bogofilter on the borderline cases, so I don't move all
> spam from nnimap:INBOX.spam to nnimap:INBOX.makespam.

Well, it takes time to move the articles, but then again I just have
to hit `q' and it's automatic so I don't mind doing it once a day.  I
also empty out my training group after I'm done training.

Ted



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: spam/ham exit processors
  2003-11-03 21:43         ` Russ Allbery
@ 2003-11-04  2:28           ` Ted Zlatanov
  2003-11-04  3:03             ` Russ Allbery
  2003-11-04 15:11             ` Jake Colman
  0 siblings, 2 replies; 16+ messages in thread
From: Ted Zlatanov @ 2003-11-04  2:28 UTC (permalink / raw)
  Cc: ding

On Mon, 03 Nov 2003, rra@stanford.edu wrote:

> What happened when I set a process destination is that all the spam
> would get moved into that group and show up as new unread messages
> there.  So when I went to do my false positive scan, I would end up
> scanning through all those messages again, even though I'd already
> manually confirmed that they were spam.
> 
> And depending on how I set things up, I'd either end up registering
> that spam twice (once when I moved it into that group and again when
> I scanned the group for false positives), or I'd end up not
> registering any spam that bogofilter stuck directly into the spam
> group.

I think you want an intermediate "spam" group with its
process-destination set to "train", and then run spam-processors on
"train" only.  So all spam will flow to "spam" and then you can pop
ham back out of "spam" before it all gets moved into "train."

I like the "train" approach also because Bogofilter can be easily run
on all those messages from the command line.

Ted



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: spam/ham exit processors
  2003-11-04  2:28           ` Ted Zlatanov
@ 2003-11-04  3:03             ` Russ Allbery
  2003-11-04 15:11             ` Jake Colman
  1 sibling, 0 replies; 16+ messages in thread
From: Russ Allbery @ 2003-11-04  3:03 UTC (permalink / raw)


Ted Zlatanov <tzz@lifelogs.com> writes:

> I think you want an intermediate "spam" group with its
> process-destination set to "train", and then run spam-processors on
> "train" only.  So all spam will flow to "spam" and then you can pop ham
> back out of "spam" before it all gets moved into "train."

> I like the "train" approach also because Bogofilter can be easily run
> on all those messages from the command line.

Oh, that's a really good idea.  I hadn't thought of that at all.  Thank
you!

-- 
Russ Allbery (rra@stanford.edu)             <http://www.eyrie.org/~eagle/>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: spam/ham exit processors
  2003-11-03 21:25     ` Ted Zlatanov
  2003-11-03 22:10       ` Kai Grossjohann
@ 2003-11-04 15:08       ` Jake Colman
  1 sibling, 0 replies; 16+ messages in thread
From: Jake Colman @ 2003-11-04 15:08 UTC (permalink / raw)
  Cc: ding

>>>>> "TZ" == Ted Zlatanov <tzz@lifelogs.com> writes:

   TZ> Sure, you could have spam moved from all groups to "nnml:spam" [1] and
   TZ> then process spam only in "nnml:spam".  I do that, and furthermore I
   TZ> have the spam-process-destination parameter of "nnml:spam" set to
   TZ> "nnml:train" so I can run SpamAssassin directly on the "nnml:spam"
   TZ> group's file contents.  In "nnml:spam" I have the
   TZ> ham-process-destination set to "nnml:mail" and thus when I tick an
   TZ> article in the spam group it gets popped and ham-processed back into
   TZ> "nnml:mail".  It's a pretty tidy system.

But if spam training is done on all marked spam regardless of the group's
classificiation, assuming I read the docs correctly, then why bother moving
the spam anywhere else for processing?  Why not just specify a spam exit
processor on your ham-classified or unclassified group and let the spam get
trained that way?  Am I missing some benefit by not moving my spam into a
spam-classified group?

-- 
Jake Colman                     

Principia Partners LLC                    Phone: (201) 209-2467
Harborside Financial Center                 Fax: (201) 946-0320
902 Plaza Two                          E-mail: colman@ppllc.com
Jersey City, NJ 07311                 www.principiapartners.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: spam/ham exit processors
  2003-11-04  2:28           ` Ted Zlatanov
  2003-11-04  3:03             ` Russ Allbery
@ 2003-11-04 15:11             ` Jake Colman
  2003-11-04 16:17               ` Ted Zlatanov
  1 sibling, 1 reply; 16+ messages in thread
From: Jake Colman @ 2003-11-04 15:11 UTC (permalink / raw)
  Cc: ding

>>>>> "TZ" == Ted Zlatanov <tzz@lifelogs.com> writes:

   TZ> I think you want an intermediate "spam" group with its
   TZ> process-destination set to "train", and then run spam-processors on
   TZ> "train" only.  So all spam will flow to "spam" and then you can pop
   TZ> ham back out of "spam" before it all gets moved into "train."

   TZ> I like the "train" approach also because Bogofilter can be easily run
   TZ> on all those messages from the command line.

But if the messages were already filtered into the "intermediate" spam group
it would seem to indicate that spam.el (through whatever statistical tool is
being used) already determine it to be spam.  So why do you need to ever
move messages from one spam group into another?

-- 
Jake Colman                     

Principia Partners LLC                    Phone: (201) 209-2467
Harborside Financial Center                 Fax: (201) 946-0320
902 Plaza Two                          E-mail: colman@ppllc.com
Jersey City, NJ 07311                 www.principiapartners.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: spam/ham exit processors
  2003-11-04 15:11             ` Jake Colman
@ 2003-11-04 16:17               ` Ted Zlatanov
  2003-11-04 17:37                 ` Jake Colman
  0 siblings, 1 reply; 16+ messages in thread
From: Ted Zlatanov @ 2003-11-04 16:17 UTC (permalink / raw)
  Cc: Russ Allbery, ding

On Tue, 04 Nov 2003, colman@ppllc.com wrote:

>>>>>> "TZ" == Ted Zlatanov <tzz@lifelogs.com> writes:
> 
>    TZ> I think you want an intermediate "spam" group with its
>    TZ> process-destination set to "train", and then run
>    TZ> spam-processors on "train" only.  So all spam will flow to
>    TZ> "spam" and then you can pop ham back out of "spam" before it
>    TZ> all gets moved into "train."
> 
>    TZ> I like the "train" approach also because Bogofilter can be
>    TZ> easily run on all those messages from the command line.
> 
> But if the messages were already filtered into the "intermediate"
> spam group it would seem to indicate that spam.el (through whatever
> statistical tool is being used) already determine it to be spam.  So
> why do you need to ever move messages from one spam group into
> another?

Because I like to verify (visually) that messages are spam before
training my filters on them.  This has given me the result of 2 spam
messages in the last week that got through, out of a few thousand
spams.  Some people don't want the inconvenience of spending time
looking at spams, so they train on whatever is in the spam folder.
That's fine too.

Ted



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: spam/ham exit processors
  2003-11-04 16:17               ` Ted Zlatanov
@ 2003-11-04 17:37                 ` Jake Colman
  2003-11-04 22:31                   ` Kai Grossjohann
  0 siblings, 1 reply; 16+ messages in thread
From: Jake Colman @ 2003-11-04 17:37 UTC (permalink / raw)
  Cc: ding

>>>>> "TZ" == Ted Zlatanov <tzz@lifelogs.com> writes:

   TZ> On Tue, 04 Nov 2003, colman@ppllc.com wrote:
   >>>>>>> "TZ" == Ted Zlatanov <tzz@lifelogs.com> writes:
   >> 
   TZ> I think you want an intermediate "spam" group with its
   TZ> process-destination set to "train", and then run
   TZ> spam-processors on "train" only.  So all spam will flow to
   TZ> "spam" and then you can pop ham back out of "spam" before it
   TZ> all gets moved into "train."
   >> 
   TZ> I like the "train" approach also because Bogofilter can be
   TZ> easily run on all those messages from the command line.
   >> 
   >> But if the messages were already filtered into the "intermediate"
   >> spam group it would seem to indicate that spam.el (through whatever
   >> statistical tool is being used) already determine it to be spam.  So
   >> why do you need to ever move messages from one spam group into
   >> another?

   TZ> Because I like to verify (visually) that messages are spam before
   TZ> training my filters on them.  This has given me the result of 2 spam
   TZ> messages in the last week that got through, out of a few thousand
   TZ> spams.  Some people don't want the inconvenience of spending time
   TZ> looking at spams, so they train on whatever is in the spam folder.
   TZ> That's fine too.

I MUST be missing something here.   Don't the filters _already know_ that
the messages were spam?  How else did they end up in the intermediate folder?
 Why do you need to train if it's been trained?

-- 
Jake Colman                     

Principia Partners LLC                    Phone: (201) 209-2467
Harborside Financial Center                 Fax: (201) 946-0320
902 Plaza Two                          E-mail: colman@ppllc.com
Jersey City, NJ 07311                 www.principiapartners.com



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: spam/ham exit processors
  2003-11-04 17:37                 ` Jake Colman
@ 2003-11-04 22:31                   ` Kai Grossjohann
  0 siblings, 0 replies; 16+ messages in thread
From: Kai Grossjohann @ 2003-11-04 22:31 UTC (permalink / raw)

Jake Colman <colman@ppllc.com> writes:

> I MUST be missing something here.   Don't the filters _already know_ that
> the messages were spam?  How else did they end up in the intermediate folder?
>  Why do you need to train if it's been trained?

No, the filters don't _know_.  They just _predict_ that with a
sufficiently high probability, the message is spam.

Sometimes, they predict wrong so it's good to double-check.

Double-checking is not a big problem because finding the one ham in
100 messages is quite easy, whereas finding 50 hams in 100 messages is
difficult and tedious.

Kai

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2003-11-04 22:31 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-11-03 13:47 spam/ham exit processors Jake Colman
2003-11-03 18:41 ` Ted Zlatanov
2003-11-03 20:29   ` Kai Grossjohann
2003-11-03 20:31     ` Russ Allbery
2003-11-03 21:26       ` Ted Zlatanov
2003-11-03 21:43         ` Russ Allbery
2003-11-04  2:28           ` Ted Zlatanov
2003-11-04  3:03             ` Russ Allbery
2003-11-04 15:11             ` Jake Colman
2003-11-04 16:17               ` Ted Zlatanov
2003-11-04 17:37                 ` Jake Colman
2003-11-04 22:31                   ` Kai Grossjohann
2003-11-03 21:25     ` Ted Zlatanov
2003-11-03 22:10       ` Kai Grossjohann
2003-11-04  2:20         ` Ted Zlatanov
2003-11-04 15:08       ` Jake Colman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).