spam filtering using IMAP ?

Gnus development mailing list
 help / color / mirror / Atom feed

* spam filtering using IMAP ?
@ 2003-01-09 19:20 Arnd Kohrs
  2003-01-09 19:38 ` Ted Zlatanov
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Arnd Kohrs @ 2003-01-09 19:20 UTC (permalink / raw)


Hi,

enticed by the vivid discussion about spam.el filtering I feel the urge
to ask this silly question again:

Can one spam filter when using nnimap for all email?

I guess the answer is still "NO!".

IMHO the reason for this is that when nnimap splits, it only loads
the headers and not the bodies of the articles, so that the spam
recognition which is based on the bodies may not be used.

However, wouldn't it be possible to tell nnimap.el to download articles
completly (with bodies) when doing the splitting.  AFAIK, then the spam
stuff should be feasable.  (when articles are downloaded they may as
well be copied to cache or agent as well).

On a related note, if articles where downloaded completely they may even
be split into non-imap backends.

Can there be spam filtering for nnimap?

Cheers,
Arnd.





^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: spam filtering using IMAP ?
  2003-01-09 19:20 spam filtering using IMAP ? Arnd Kohrs
@ 2003-01-09 19:38 ` Ted Zlatanov
  2003-01-10  2:02   ` Simon Josefsson
  2003-01-16  7:20 ` Mats Lidell
  2003-01-16 15:32 ` Kai Großjohann
  2 siblings, 1 reply; 11+ messages in thread
From: Ted Zlatanov @ 2003-01-09 19:38 UTC (permalink / raw)
  Cc: ding

On Thu, 09 Jan 2003, kohrs@castify.net wrote:
> Can one spam filter when using nnimap for all email?
> 
> I guess the answer is still "NO!".
> 
> IMHO the reason for this is that when nnimap splits, it only loads
> the headers and not the bodies of the articles, so that the spam
> recognition which is based on the bodies may not be used.

As far as spam.el is concerned, it only looks at whatever is in the
current buffer when spam-split is invoked.  Spam and ham processors
use the full body of the message at summary exit, so in spam-split we
have the only difference of opinion between the nnimap and other
backends.

It's worth noting that 

1) many people don't use spam.el (the majority)

2) we don't want nnimap to download full articles when splitting, even
   with spam.el, because whitelist/blacklist/BBDB/blackhole splitters
   don't need the full article

So I would suggest that this could be an option that spam-split could
set when dealing with a nnimap backend and a splitter that needs the
full message.  For instance, spam-use-ifile would require full nnimap
downloads, but spam-use-blackholes wouldn't.

nnimap gurus, is that possible?

Ted

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: spam filtering using IMAP ?
  2003-01-09 19:38 ` Ted Zlatanov
@ 2003-01-10  2:02   ` Simon Josefsson
  0 siblings, 0 replies; 11+ messages in thread
From: Simon Josefsson @ 2003-01-10  2:02 UTC (permalink / raw)
  Cc: ding

Ted Zlatanov <tzz@lifelogs.com> writes:

> On Thu, 09 Jan 2003, kohrs@castify.net wrote:
>> Can one spam filter when using nnimap for all email?
>> 
>> I guess the answer is still "NO!".
>> 
>> IMHO the reason for this is that when nnimap splits, it only loads
>> the headers and not the bodies of the articles, so that the spam
>> recognition which is based on the bodies may not be used.
>
> As far as spam.el is concerned, it only looks at whatever is in the
> current buffer when spam-split is invoked.  Spam and ham processors
> use the full body of the message at summary exit, so in spam-split we
> have the only difference of opinion between the nnimap and other
> backends.
>
> It's worth noting that 
>
> 1) many people don't use spam.el (the majority)
>
> 2) we don't want nnimap to download full articles when splitting, even
>    with spam.el, because whitelist/blacklist/BBDB/blackhole splitters
>    don't need the full article
>
> So I would suggest that this could be an option that spam-split could
> set when dealing with a nnimap backend and a splitter that needs the
> full message.  For instance, spam-use-ifile would require full nnimap
> downloads, but spam-use-blackholes wouldn't.
>
> nnimap gurus, is that possible?

Fancy splitting works for nnimap, so maybe what you propose simply
just works?  (Assuming spam.el uses fancy splitting, I'm afraid I
haven't had time to look at it.)  Adding a variable that will make
nnimap download entire articles (and then limiting to the headers,
like nnmail do) should be doable if it is needed.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: spam filtering using IMAP ?
  2003-01-09 19:20 spam filtering using IMAP ? Arnd Kohrs
  2003-01-09 19:38 ` Ted Zlatanov
@ 2003-01-16  7:20 ` Mats Lidell
  2003-01-16  9:40   ` Ted Zlatanov
  2003-01-16 15:32 ` Kai Großjohann
  2 siblings, 1 reply; 11+ messages in thread
From: Mats Lidell @ 2003-01-16  7:20 UTC (permalink / raw)


>>>>> Arnd wrote:

Arnd> Can there be spam filtering for nnimap?

Isn't it natural when using imap to do the spam filtering on the
server? This way all your MUAs will benefit from the filtering.

The problem from the MUAs perspective is how to feedback
reclassifications to the server. The best I have come up with so far,
not tested, is to move the articles to special folders for
reclassification and then use a cronjob for feeding the articles to
the statistic engine and eventually the correct folder.

This scheme would suggest that spam.el should support a simple move to
a special mark-as-ham or mark-as-spam folder. A quick look at spam.el
doesn't reveal such simple functionality. It seems targeted to feeding
the articles to the statistics engine directly. (Or am I missing
something?)

Related: Anybody who have tried the scheme outlined above with imap
and spam filtering with statistics?

Yours
-- 
%% Mats




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: spam filtering using IMAP ?
  2003-01-16  7:20 ` Mats Lidell
@ 2003-01-16  9:40   ` Ted Zlatanov
  2003-01-16 10:43     ` Mats Lidell
  0 siblings, 1 reply; 11+ messages in thread
From: Ted Zlatanov @ 2003-01-16  9:40 UTC (permalink / raw)
  Cc: ding

On Thu, 16 Jan 2003, matsl@contactor.se wrote:
> Isn't it natural when using imap to do the spam filtering on the
> server? This way all your MUAs will benefit from the filtering.

Some people (quite a few, in fact) don't have login access to the IMAP
server.  Most people can't install software on the IMAP server.  So
it's not always that easy to do classification on the server.

Some people may want to classify IMAP mail into other (nnml, for
instance) folders).  That's tough if the IMAP server is not where you
normally run Gnus.

> The problem from the MUAs perspective is how to feedback
> reclassifications to the server. The best I have come up with so
> far, not tested, is to move the articles to special folders for
> reclassification and then use a cronjob for feeding the articles to
> the statistic engine and eventually the correct folder.
> 
> This scheme would suggest that spam.el should support a simple move
> to a special mark-as-ham or mark-as-spam folder. A quick look at
> spam.el doesn't reveal such simple functionality. It seems targeted
> to feeding the articles to the statistics engine directly. (Or am I
> missing something?)

What exactly would you like moved to that special folder?  I'm not
sure how spam.el could help you for server-side splitting.

Thanks
Ted




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: spam filtering using IMAP ?
  2003-01-16  9:40   ` Ted Zlatanov
@ 2003-01-16 10:43     ` Mats Lidell
  2003-01-16 11:22       ` Ted Zlatanov
  0 siblings, 1 reply; 11+ messages in thread
From: Mats Lidell @ 2003-01-16 10:43 UTC (permalink / raw)


>>>>> Ted wrote:

Ted> Some people (quite a few, in fact) don't have login access to the
Ted> IMAP server.  Most people can't install software on the IMAP
Ted> server.  So it's not always that easy to do classification on the
Ted> server.

Agreed. Even if it is natural it might be impossible for practical
reasons.

Ted> What exactly would you like moved to that special folder?  I'm
Ted> not sure how spam.el could help you for server-side splitting.

I haven't designed the details (much less tested it) but what I
envision is to move an article that is spam and not classified as such
to one folder, lets say mark-as-spam, and move good articles that are
classified as spam to another folder, lets say mark-as-ham. Then I
would have some server code that feeds these articles to the spam
statistics engine and at the same time move the mark-as-spam articles
to the spam folder and the mark-as-ham articles to the inbox(!?).

What this all is about is to feed the article back to the server so
that the spam processing can be adjusted. One alternative is to mail
the article back to the server and there could be other ways
too. Using imap directly seems natural though.

It is here that spam.el comes in for supporting this scheme in
gnus. If I get how this works it uses different marks to intelligently
find out whether articles need reclassification based also on normal
operations on the article. (Philosophy: If I do this with the article
then it must be spam or it must be ham.)

On the other hand a very natural thing to do with ham found in the
spam folder is to move them directly to the folder where they should
be. This deletes the article in the spam folder and I don't know if
spam.el on exit from the summary buffer will be able to access the
article so that it could be copied to the mark-as-ham folder for later
statistics processing.

There might also be other problems with this approach that I haven't
realized yet that might make spam.el not the right vehicle for this.

Yours
-- 
%% Mats




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: spam filtering using IMAP ?
  2003-01-16 10:43     ` Mats Lidell
@ 2003-01-16 11:22       ` Ted Zlatanov
  2003-01-16 12:41         ` Mats Lidell
  0 siblings, 1 reply; 11+ messages in thread
From: Ted Zlatanov @ 2003-01-16 11:22 UTC (permalink / raw)
  Cc: ding

On Thu, 16 Jan 2003, matsl@contactor.se wrote:
> I haven't designed the details (much less tested it) but what I
> envision is to move an article that is spam and not classified as
> such to one folder, lets say mark-as-spam, and move good articles
> that are classified as spam to another folder, lets say
> mark-as-ham. 

How do you know if an article is spam or ham?  The user has to
determine that, right?  Or are you talking about automated server-side
classification?  I'm a little confused, maybe it would help if you did
a scenario of what happens to a message on the mail server and on the
client machine - what programs get invoked, what the user has to do...

> Then I would have some server code that feeds these articles to the
> spam statistics engine and at the same time move the mark-as-spam
> articles to the spam folder and the mark-as-ham articles to the
> inbox(!?).

I'm not sure how doing this on the server makes a big difference.
spam.el supports all that on the client side, and the only penalty you
pay is retrieving the article body.  Maybe you want to have a single
place to store message statistics?

You could run Gnus on the server, I guess...

> It is here that spam.el comes in for supporting this scheme in
> gnus. If I get how this works it uses different marks to
> intelligently find out whether articles need reclassification based
> also on normal operations on the article. (Philosophy: If I do this
> with the article then it must be spam or it must be ham.)

> On the other hand a very natural thing to do with ham found in the
> spam folder is to move them directly to the folder where they should
> be. This deletes the article in the spam folder and I don't know if
> spam.el on exit from the summary buffer will be able to access the
> article so that it could be copied to the mark-as-ham folder for
> later statistics processing.

There's the spam-process-destination and ham-process-destination group
parameters, which let you move spam or ham articles at summary exit.
You can set them for a group, a topic, or a regex matching the group
name.

They are set to nil by default, which means "expire spam, leave ham
alone."  When you set those parameters, spam articles get moved from
any group into a spam-process-destination group.  Ham articles must be
in a spam group to be moved to a ham-process-destination.

Is that helpful?  Or do you have something else in mind?

You should probably look at the CVS Gnus manual if you haven't
already.  It explains all the spam.el behavior and parameters.

Thanks
Ted

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: spam filtering using IMAP ?
  2003-01-16 11:22       ` Ted Zlatanov
@ 2003-01-16 12:41         ` Mats Lidell
  2003-01-16 14:11           ` Ted Zlatanov
  0 siblings, 1 reply; 11+ messages in thread
From: Mats Lidell @ 2003-01-16 12:41 UTC (permalink / raw)


>>>>> Ted wrote:

Ted> How do you know if an article is spam or ham?  The user has to
Ted> determine that, right?  Or are you talking about automated server-side
Ted> classification?  I'm a little confused, maybe it would help if you did
Ted> a scenario of what happens to a message on the mail server and on the
Ted> client machine - what programs get invoked, what the user has to do...

Ok. I'll describe the setup.

On the server the messages are classified on arrival. This is done by
using procmail and some spam filtering software. I currently use my
own bayesian filter program but it could as well be bogofilter, ifile
or what ever. (In fact I plan to change to bogofilter soon.)

This gets spam to be filed to the spam folder. Ham is filed to other
folders based on standard procmail rules.

So when the user starts his MUA he will see a bunch of imap folders.

Now some articles might need to be reclassified because they went into
the wrong folders. This is the users responsibility but it should be
an easy and fast procedure. spam.el seems to be a good candidate for
this when using gnus.

Now since the spam filtering is done on the server the articles must
be reclassified on the server. So it is a matter of both finding a
good scheme for the server side reclassification and if spam.el can
support that.

Ted> I'm not sure how doing this on the server makes a big difference.

By doing this on the server makes all my MUAs see the same thing. Both
normal procmail filing into folders and spam handling is done on the
server. This makes it behave the same way for different MUAs and also
from different locations.

Ted> spam.el supports all that on the client side, and the only
Ted> penalty you pay is retrieving the article body.  Maybe you want
Ted> to have a single place to store message statistics?

Yes. Since the mails are classified and filed to folders on the server
the database needs to be updated on the server. (So the problem I want
to solve is how to feed back the reclassifications to the server.)

As you might read between the lines I would like to come up with a
scheme that also works for other MUAs than gnus. My personally
interest is then how to support that scheme in gnus.

Ted> There's the spam-process-destination and ham-process-destination
Ted> group parameters, which let you move spam or ham articles at
Ted> summary exit.  You can set them for a group, a topic, or a regex
Ted> matching the group name.

Aha. Sorry. I have missed those.

Ted> Is that helpful?  Or do you have something else in mind?

They sound good. I'll have a deeper look at it.

Ted> You should probably look at the CVS Gnus manual if you haven't
Ted> already.  It explains all the spam.el behavior and parameters.

RTFM. I know. ;-)

Yours
-- 
%% Mats




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: spam filtering using IMAP ?
  2003-01-16 12:41         ` Mats Lidell
@ 2003-01-16 14:11           ` Ted Zlatanov
  0 siblings, 0 replies; 11+ messages in thread
From: Ted Zlatanov @ 2003-01-16 14:11 UTC (permalink / raw)
  Cc: ding

You could actually write a wrapper (let's say for Bogofilter) which
invokes Bogofilter on the server.  Modify spam-bogofilter-path, and
have the wrapper invoke bogofilter on the server over SSH or however
you want.  It will be slow, but it can be done.  Then you can just use
bogofilter as a spam/ham processor in spam.el, and it will update the
database on the server.

You could also, on the server side, periodically connect to IMAP and
process any new messages in the spam/ham folders you define.  That's
completely outside of the Gnus/spam.el domain though - you would only
be using Gnus/spam.el to move messages around.

For incoming mail, use spam-check-bogofilter-headers instead of
spam-check-bogofilter, and set up bogofilter to process your mail on
the IMAP server.  All spam mail (with the "X-Bogosity: Yes" header)
will go to the spam-split-group.

Ted

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: spam filtering using IMAP ?
  2003-01-09 19:20 spam filtering using IMAP ? Arnd Kohrs
  2003-01-09 19:38 ` Ted Zlatanov
  2003-01-16  7:20 ` Mats Lidell
@ 2003-01-16 15:32 ` Kai Großjohann
  2003-01-16 20:02   ` Mats Lidell
  2 siblings, 1 reply; 11+ messages in thread
From: Kai Großjohann @ 2003-01-16 15:32 UTC (permalink / raw)


Arnd Kohrs <kohrs@castify.net> writes:

> Can one spam filter when using nnimap for all email?

What's going through my mind is a generalization of this: is it
possible to set up server-side splitting with an automatic classifier,
such as ifile?  I've got a Cyrus server and it would be really cool
if it was possible to invoke it.

My special situation is that people have an account on the server, so
for the splitting part, procmail could be used to invoke ifile.  But
whenever I move a message to another group, Cyrus also needs to
invoke ifile.  That's the part that I'm unsure about.
-- 
Ambibibentists unite!



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: spam filtering using IMAP ?
  2003-01-16 15:32 ` Kai Großjohann
@ 2003-01-16 20:02   ` Mats Lidell
  0 siblings, 0 replies; 11+ messages in thread
From: Mats Lidell @ 2003-01-16 20:02 UTC (permalink / raw)
  Cc: ding

>>>>> Kai wrote:

Kai> What's going through my mind is a generalization of this: is it
Kai> possible to set up server-side splitting with an automatic
Kai> classifier, such as ifile?  I've got a Cyrus server and it would
Kai> be really cool if it was possible to invoke it.

It is very possible to do that. I use procmail with a statistical spam
filter (my own but it could very well be ifile or bogofilter or what
ever.) The spam splitter stores all spam in a spam folder. The rest is
filtered by normal procmail rules to other folders. It is very
convenient if you use different MUAs or access you mail from different
locations.

Kai> My special situation is that people have an account on the
Kai> server, so for the splitting part, procmail could be used to
Kai> invoke ifile.  But whenever I move a message to another group,
Kai> Cyrus also needs to invoke ifile.  That's the part that I'm
Kai> unsure about.

So am I. In my current setup there is no reclassification
dynamically. I just use imap to store in spam and ham folders. Then
once in a while I generate new statistics based on the spam and ham
folders.

I have been thinking of going dynamic with reclassification by using
temporary folders together with a cronjob. I have just now tried it
yet and thus haven't found out what problems there might be with such
an approach.

Yours
-- 
%% Mats

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2003-01-16 20:02 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-01-09 19:20 spam filtering using IMAP ? Arnd Kohrs
2003-01-09 19:38 ` Ted Zlatanov
2003-01-10  2:02   ` Simon Josefsson
2003-01-16  7:20 ` Mats Lidell
2003-01-16  9:40   ` Ted Zlatanov
2003-01-16 10:43     ` Mats Lidell
2003-01-16 11:22       ` Ted Zlatanov
2003-01-16 12:41         ` Mats Lidell
2003-01-16 14:11           ` Ted Zlatanov
2003-01-16 15:32 ` Kai Großjohann
2003-01-16 20:02   ` Mats Lidell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).