wallowing out of the spam quagmire

Gnus development mailing list
 help / color / mirror / Atom feed

* wallowing out of the spam quagmire
@ 2004-06-19 18:27 Harry Putnam
  2004-06-20  6:58 ` Jonas Steverud
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Harry Putnam @ 2004-06-19 18:27 UTC (permalink / raw)


We all notice how bad it is getting spamwise...  

I'm about to give up with my old approach of running spamassassin and
helping it out by assembling an ever more complex .procmailrc to take
the heavy lifting.  I've been doing it that way mainly because SA is
quite expensive to run against LOTS of spam.  Even without involving
online data base checks.

So ... I've been culling out the easy stuff with procmail.  But its
getting too hard to ID this stuff.

I'd like to try the bogofilter approach in addition to my inplace
spamassassin/procmail (doesn't involve gnus other than as mail reader)
setup.  (That setup does not involve contacting online data bases either)

That is, I'd like to try the method where you have spam/ham groups
inside gnus and invoke bogofilter on them, building up a bogofilter
database.  

Listening to various posts on that it seems have all the earmarks of
being a pain in the butt.

I wondered if anyone can direct me to some examples of a setup like
below or maybe even better, post their setup.

Summary of possible setup:

   1) procmail/SpamAssassin based pre filtering (before gnus)

   Then for whatever gets thru that barrier:
    (And here is where I'd like some help)
   2) Inside gnus: Bogofilter based spam/ham database buiding tools.

The old 1 2 punch...

=====

I'd like to see some examples of point 2 that describe how this can
be done.




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: wallowing out of the spam quagmire
  2004-06-19 18:27 wallowing out of the spam quagmire Harry Putnam
@ 2004-06-20  6:58 ` Jonas Steverud
  2004-06-22  1:21   ` Harry Putnam
  2004-06-20 23:44 ` Kevin Ryde
  2004-06-21 14:35 ` Ted Zlatanov
  2 siblings, 1 reply; 22+ messages in thread
From: Jonas Steverud @ 2004-06-20  6:58 UTC (permalink / raw)

Harry Putnam <reader@newsguy.com> writes:

> Listening to various posts on that it seems have all the earmarks of
> being a pain in the butt.

Both yes and no. The problem is to understand how spam.el works. It is
not complex, the documentation is simply not yet complete. Read it
before you continue with this email.

>    1) procmail/SpamAssassin based pre filtering (before gnus)

I assume it places all spam in a specific group, lets for the
discussion call it nnfolder:Spam.

>    2) Inside gnus: Bogofilter based spam/ham database buiding tools.

I have a specific group to where all my spam is splitted during
fetching and all other groups are unclassified - it might be spam as
well as ham in those groups, spam.el don't make an assumption. Any
ham in the spam group (nnfolder:Spam) is marked by ! and upon exit it
is respooled.

I use bogofilter and I leave all encountered spam in the group it ends
up in and let the expire process delete it. I would like to make Gnus
delete it at once since I in some groups have a long expire-wait, 3-4
weeks. If this is possible or not, I don't know.

I use Group Topics in the *Group* buffer.

My set up:

I have added (: spam-split) to the beginning of the nnmail-split-fancy
variable and before it I have  (: gnus-registry-split-fancy-with-parent).

The group parameters I've added:

On the top email topic (all email groups are below this one):
((comment
  (spam-contents gnus-group-spam-classification-ham))
 (spam-process
  ((spam spam-use-bogofilter)
   (ham spam-use-bogofilter)))
 (spam-process-destination) ;; No process destination!
 (comment
  (ham-marks
   (gnus-del-mark gnus-read-mark gnus-killed-mark gnus-kill-file-mark gnus-low-score-mark gnus-expirable-mark gnus-ancient-mark))))

The two commented lines are nice when you train bogofilter the first
time. Just enter each non-spam group and then exit and bogofilter will
train on all as ham. Delete or comment away afterwards.

On my spam group, i.e. nnfolder:Spam, I have added:
((expiry-wait . immediate)
 (ham-process-destination respool)
 (spam-contents gnus-group-spam-classification-spam)
 (ham-marks
  (gnus-ticked-mark)))

Expire at once, respool all found ham, all ticked articles are ham and
everything else is spam.

And finally, my .emacs:

(setq spam-split-group "Spam" ; Important that  "nnfolder:" is *NOT* included!
      spam-use-bogofilter t
      spam-use-BBDB t ;; Use BBDB as a whitelist
      spam-log-to-registry t
      spam-mark-ham-unread-before-move-from-spam-group t
      spam-move-spam-nonspam-groups-only nil ; No moving at all.
      spam-disable-spam-split-during-ham-respool t
      )

(spam-initialize) ;; Loads the spam.el package etc.
(gnus-registry-initialize)

I think that's it. You might be interested in the autodetection
mechanism, that way it tries to detect spam in groups you enter, but
this is not necessary since you use spamassassin and probably will use
spam-split. It might be a nice idea if you read USENET.

HTH!

-- 
(        http://hem.bredband.net/steverud/        !     Wei Wu Wei     )
(        Meaning of U2 Lyrics, Roleplaying        !  To Do Without Do  )

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: wallowing out of the spam quagmire
  2004-06-19 18:27 wallowing out of the spam quagmire Harry Putnam
  2004-06-20  6:58 ` Jonas Steverud
@ 2004-06-20 23:44 ` Kevin Ryde
  2004-06-21  4:28   ` Daniel Pittman
  2004-06-21 14:35 ` Ted Zlatanov
  2 siblings, 1 reply; 22+ messages in thread
From: Kevin Ryde @ 2004-06-20 23:44 UTC (permalink / raw)


Harry Putnam <reader@newsguy.com> writes:
>
> I've been doing it that way mainly because SA is
> quite expensive to run against LOTS of spam.

I'd found it helped to keep one running copy.  That's what spamc/spamd
is meant to do, except it seems to fork for each message, which was a
killer on my poor pc.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: wallowing out of the spam quagmire
  2004-06-20 23:44 ` Kevin Ryde
@ 2004-06-21  4:28   ` Daniel Pittman
  0 siblings, 0 replies; 22+ messages in thread
From: Daniel Pittman @ 2004-06-21  4:28 UTC (permalink / raw)


On 21 Jun 2004, Kevin Ryde wrote:
> Harry Putnam <reader@newsguy.com> writes:
>>
>> I've been doing it that way mainly because SA is
>> quite expensive to run against LOTS of spam.
>
> I'd found it helped to keep one running copy.  That's what spamc/spamd
> is meant to do, except it seems to fork for each message, which was a
> killer on my poor pc.

You may find that the 'amavisd-new' package, which uses the SpamAssassin
core to do spam identification, better suits your needs. The resource
use seems to be more managed than the spam[cd] combination, in my
experience.

        Daniel
-- 
No, no, you're not thinking, you're just being logical.
        -- Niels Bohr




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: wallowing out of the spam quagmire
  2004-06-19 18:27 wallowing out of the spam quagmire Harry Putnam
  2004-06-20  6:58 ` Jonas Steverud
  2004-06-20 23:44 ` Kevin Ryde
@ 2004-06-21 14:35 ` Ted Zlatanov
  2004-06-22  1:40   ` Harry Putnam
  2 siblings, 1 reply; 22+ messages in thread
From: Ted Zlatanov @ 2004-06-21 14:35 UTC (permalink / raw)
  Cc: ding

On Sat, 19 Jun 2004, reader@newsguy.com wrote:

> Listening to various posts on that [spam.el] seems have all the
> earmarks of being a pain in the butt.

The CVS manual has working configurations (mine and one other).

The latest spam.el which I posted here last week for testing
simplifies things further, eliminating some of the more confusing
options.  I'll do a writeup on the changes when I commit, but it's
definitely an improvement.

Also there was a patch for the spam.el manual section posted on this
newsgroup, which needs a second pair of eyes.  If you could take a
look at that and see how well it works for you (being a potential new
spam.el user) that would be very helpful.

Generally, any feedback on the setup difficulty of spam.el (not the
manual, but the complexity of the data structures and procedures) is
very welcome.

Thanks
Ted

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: wallowing out of the spam quagmire
  2004-06-20  6:58 ` Jonas Steverud
@ 2004-06-22  1:21   ` Harry Putnam
  2004-06-22  1:53     ` Jody Klymak
                       ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Harry Putnam @ 2004-06-22  1:21 UTC (permalink / raw)

Jonas Steverud <tvrud@bredband.net> writes:

> Harry Putnam <reader@newsguy.com> writes:
>
>> Listening to various posts on that it seems have all the earmarks of
>> being a pain in the butt.
>
> Both yes and no. The problem is to understand how spam.el works. It is
> not complex, the documentation is simply not yet complete. Read it
> before you continue with this email.

I'm not sure we're from the same planetary system... or as bare
minimum you must have a rather bizarre notion of what `not complex'
means.  I went glassy eyed after the first couple hundred lines.

I'm introduced to black lists, black holes, hash-cash payments,
bogofilters, on line data bases, bbdb as white list, some absolutely
convoluted processing that seem to require `split fancy' which I've
never used.  Some use of gnus registry, which I also have never messed
with.  Many lines of variable discussion which apparently is supposed
to spell out what 2780 lines of elisp in spame.el do.

In my world, this is quite `complex'.
>
>>    1) procmail/SpamAssassin based pre filtering (before gnus)
>
> I assume it places all spam in a specific group, lets for the
> discussion call it nnfolder:Spam.

No, but sort of similar.  I used plain splitting inside gnus for a few
years but gave it up a couple years ago in favor of procmail.  For
some time now I've left all splitting to procmail/SpamAssassin.  What
gets past procmail/sa ends up in a single inbox where I deal with all
of it by hand.  That inbox is getting an increasing amount of spam.
Stuff that is hard to indentify etc.

So to summarize.  I let procmail/sa do most splitting and culling out
of spam.  When that is done, the rest comes to my inbox and I deal
with it by hand.  I hoped to introduce bogofilter at that stage.

Many thanks for posting your setup... However it seems fantastically
complicated to me.

I had visions of leaving all spam that spamassasin and procmail find
out of the equation.  Then whatever gets to my single inbox, I had
visions of marking any spam as such and moving it to a spam group.
Maybe copy ham to a ham group.  Then let these messages be the
training tools for bogofilter.  After showing bogofilter enough
examples isn't is supposed to take it from there?

As training begins I'd introduce splitting into my single inbox as
the tools learn what is what.

I'm not sure what this training actually does in practice, but it
sounds like bogofilter begins to know what is spam.  If so, then I'd
tell bogofilter to remove what it thinks is spam. No other splitting
would be needed.

Not at all clear why a fancy-spit is required to do that.  In fact
its kind of hard to imagine a spit rule at all.

Seems like one would just invoke bogofilter on each message and send
each one to spam or ham.  Technically a split, I guess but not very
complicated. The complicated part seems to be what goes on inside
bogofilter.  The messages it will be seeing have already skirted SA's
complex set of interrelated rules, plus my own homeboy procmail rules
and tweaks to SA.  So this mail will be hard to find a pattern or some
other thing to help indentify it.

The above semi-diagram seems fairly simple to me.  But I don't see
how it can be done with the current documentation. I have no idea how
to implement this.

I've probably talked myself right into a hole but how can I set up
the simple system described above?

Have I over looked step by step instructions?  I'm assuming by
documentation people mean the stuff at:
  Filtering Spam Using The Spam ELisp Package
I haven't found a step by step there.

I guess ...Spam ELisp Package Sequence of Events
is as close as it gets..  Sounds like I need the auto-detect method
and would set G p on my single inbox group to something that tells
spam.el to `auto-detect' in it.

My case should be the simplest possible example of using spam.el and
bogofilter, but I'm not sure about involving gnus registry etc.
Or what `exactly' needs doing.

I'm going to look for Teds patch to docs right now.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: wallowing out of the spam quagmire
  2004-06-21 14:35 ` Ted Zlatanov
@ 2004-06-22  1:40   ` Harry Putnam
  2004-06-22 16:45     ` Ted Zlatanov
  0 siblings, 1 reply; 22+ messages in thread
From: Harry Putnam @ 2004-06-22  1:40 UTC (permalink / raw)

"Ted Zlatanov" <tzz@lifelogs.com> writes:

[...]

> Also there was a patch for the spam.el manual section posted on this
> newsgroup, which needs a second pair of eyes.  If you could take a
> look at that and see how well it works for you (being a potential new
> spam.el user) that would be very helpful.

About how long ago... looking at the last 2000 messages I haven't
been able to ID it.  Is it already in cvs manual?  I just
updated.. and read some of the docu on spam.el.  It seems horribly
complicated. 

> Generally, any feedback on the setup difficulty of spam.el (not the
> manual, but the complexity of the data structures and procedures) is
> very welcome.

My usage will probably be the simplest one would find.  I want to run
bogofilter against only one group.  (My setup described in previous
post) .. that group gets whatever has made it by SA and dozens of my
own procmail rules including splitting out many list-server messages
from dozens of subscribed lists.  So this group doesn't get lots of
mail.  However, the spam that shows up there though is fairly
sophisticated and will be hard to indentify as spam. (for computer
tools).

It is readily indentifieable by humans though, so I thought if I showed
enough of it to bogofilter, that filter would eventually become able to
indentify most of it.  Maybe that isn't how Bogofilter works..?

At this stage I'd like to see a few steps that would get this process
started.  Something like (examples are made up):

1) put (some elisp) in G p of this group (auto-detect I guess)
2) install bogofilter and set (`bogo-on' to t)
3) set variables (spam-group "groupname") (ham-group "groupname")
4) enter group and mark messages spam or ham
5) let bogofiter wallow around in spam-group ham-group several times
   a day.
Do steps four/five 7,535 times and then bogofilter will know what is
what....

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: wallowing out of the spam quagmire
  2004-06-22  1:21   ` Harry Putnam
@ 2004-06-22  1:53     ` Jody Klymak
  2004-06-22 10:56       ` Harry Putnam
  2004-06-22  7:52     ` Jonas Steverud
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 22+ messages in thread
From: Jody Klymak @ 2004-06-22  1:53 UTC (permalink / raw)

Hi Harry,

Harry Putnam <reader@newsguy.com> writes:

> In my world, this is quite `complex'.

heh heh.  I think it was simpler when I started and has kind of
balooned.  However, my setup still works....

> I've probably talked myself right into a hole but how can I set up
> the simple system described above?

Step 1:  this goes in your .gnus.el:
(setq
  nnimap-split-rule 'nnimap-split-fancy
  nnimap-split-inbox "INBOX"
  nnimap-split-fancy '(|
		       (: spam-split)
               ;; For example (any ".*bbdb.*" "mail/ZIn.bbdb")
		      ;; default mailbox
		      "mail/Inbox.good")
)

Step 2: - I do this with customize, but I suspect you can do it with
setq-s.  

 '(spam-bogofilter-path "/usr/local/bin/bogofilter")
 '(spam-junk-mailgroups (quote ("mail/junk")))
 '(spam-process-ham-in-spam-groups t)
 '(spam-split-group "mail/junk")
 '(spam-use-bogofilter t)
 '(spam-use-bogofilter-headers nil)

Step 3: Set parameters in mail/junk 
I think you need to go to group-customize to do this (G c on the
group).  I find this personally annoying and adds a layer of mystery
to the whole thing.  I think there is a way around it, but I'm not
sure what it is.

a) Check Group contents spam/ham classification to make it "spam"
b) Set the Spam summary exit processor to "Ham: Bogofilter" - this
will allow you to mark Ham in the spam group for training when you
get a false positive.
c) Choose a "Destination for ham articles"  I just put them in my
mail/Inbox.good.
d) Set Ham mark choices - I use gnus-read-mark and gnus-ticked-mark.
e) I set Expire wait to 2 days.  

You could also set this group to train on the spam, but bogofilter
already knew it was spam, so that may be overtraining.  

This comes out in my groups parameters looking like:

((uidvalidity . "1074123432")
 (spam-contents gnus-group-spam-classification-spam)
 (spam-process
  ((ham spam-use-bogofilter)))
 (ham-process-destination "nnimap+opg1.ucsd.edu:mail/Inbox.good")
 (ham-marks
  (gnus-read-mark gnus-ticked-mark))
 (expiry-wait . 2.0))

Let me know if this doesn't make sense.

Note that I do this over IMAP, but bogofilter runs locally.  It
should, however, work with any backend.

Cheers,  Jody.

-- 
Jody Klymak      http://opg1.ucsd.edu/~jklymak/
mailto:jklymak@ucsd.edu   

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: wallowing out of the spam quagmire
  2004-06-22  1:21   ` Harry Putnam
  2004-06-22  1:53     ` Jody Klymak
@ 2004-06-22  7:52     ` Jonas Steverud
  2004-06-22 15:18       ` Jody Klymak
  2004-06-22 16:34       ` Ted Zlatanov
  2004-06-22 16:32     ` Ted Zlatanov
  2004-06-25 13:37     ` Kai Grossjohann
  3 siblings, 2 replies; 22+ messages in thread
From: Jonas Steverud @ 2004-06-22  7:52 UTC (permalink / raw)

Harry Putnam <reader@newsguy.com> writes:

Note: I use No Gnus v0.2.

> Jonas Steverud <tvrud@bredband.net> writes:
>
[...]
>> Both yes and no. The problem is to understand how spam.el works. It is
>> not complex, the documentation is simply not yet complete. Read it
>> before you continue with this email.
>
> I'm not sure we're from the same planetary system... or as bare
> minimum you must have a rather bizarre notion of what `not complex'
> means.  I went glassy eyed after the first couple hundred lines.

As I said, the documentation is not yet finsihed. ;-) You only
confirmed what I said: "The problem is to understand how spam.el
works. [...] the documentation is simply not yet complete."

>>>    1) procmail/SpamAssassin based pre filtering (before gnus)
>>
>> I assume it places all spam in a specific group, lets for the
>> discussion call it nnfolder:Spam.
[...]
> So to summarize.  I let procmail/sa do most splitting and culling out
> of spam.  When that is done, the rest comes to my inbox and I deal
> with it by hand.  I hoped to introduce bogofilter at that stage.

OK.

First: The fancy splitting is the same as splitting (which you already
had used) but gives the possibility for more complex rules. If you
don't want Gnus to filter/split yor mail, leave it out.

The way bogofilter works is to eat the email it is given and either

a. If it is told to train, bogofilter updates its databases of words
that exist in spams and in hams (if the email is considered spam or
ham is set by command line parameters).

b. If it is told to classify it checks its databases and from that
calculates the probability that the email is spam or ham. It reports
YES or NO.

Bogofilter don't give a d*mn about what spam.el does. It eats emails
and either train on them or classifies it. Period.

Spam.el does not care about which program you use for training and
classifying. It has an interface to different backends and lets them
handle that - the same approach Gnus has toward messages, nntp, pop,
imap, all messages in one file or one file for each message and so on.

What you need to do is to tell spam.el what it shall do and with which
backends.

First, some terminology:

I will call the main mailbox you described as Inbox, this is where all
mails procmail and sa haven't done anything with. Some will be spam
and the rest will be ham. There are also two other groups: Spam and
Ham.

So, now we are set.

You can do two different things. You can move any found spam in Inbox
to Spam and train bogofilter in Spam or you can train bogofilter in
Inbox and leave the spam there. If you use expire in Inbox the latter
is IMHO preferred, but it is all about taste. There is no correct
answer in this case. I will assume you want it moved to Spam and
bogofilter to train in Spam.

First, tell spam.el to use bogofilter. Which backend you use doesn't
matter so if you want to use another backend later, just search and
replace with your new backend.

Add (setq spam-use-bogofilter t) to your .gnus.el.
Also add (spam-initialize)  and make sure it is the last line of all
spam related code in .gnus.el. I.e. add any further spam.el related
stuff *before* this line.

I think you need (setq spam-move-spam-nonspam-groups-only t) as well.

You need to tell spam.el that any spam found in Inbox is to be moved
to Spam. Edit the group parameters of Inbox (I assume you know how to
do that, ask otherwise) to contain the following lines:

 (spam-process-destination "Spam") ;; You might need to add
 "nnfolder:" or whatever you use as mail backend.

In case you want all ham (everything else) to be moved to Ham, add
these lines:
 (ham-process-destination "Ham") ;; Read comment above.
 (ham-marks
   (gnus-read-mark gnus-killed-mark)) ;; All according to taste.

Now, all spam will be moved to Spam when you exit Inbox. All mails you
consider to be spam you mark with M-d or S x (same function). If you
want spam.el to go through your Inbox folder and mark all spam as such
for you (i.e. all emails bogofilter consider is spam), add the
following line to the Inbox group parameters:

 (spam-autodetect-methods spam-use-bogofilter)
 (spam-autodetect . t)

The group parameters shall contain:
 Spam: (spam-contents gnus-group-spam-classification-spam)
 Ham:  (spam-contents gnus-group-spam-classification-ham)
So spam.el knows what to expect in the groups.

If I got everything right, all mails in Inbox will be checked upon
entry for spam. Any spam will be marked with $. Upon exit, all spam
(autodetected and marked by you) will be moved to Spam and all ham
(what is considered ham is decided by the ham-mark above) will be
moved to Ham. Everything that is not marked as spam neither ham will
stay in place.

When you exit Ham and Spam, bogofilter will train on them as ham and
spam respectively.

It is as important to train on ham as on spam since bogofilter will
not otherwise know how to detect ham and will consider everything as
spam (your email will be present in the spam and bogofilter will
consider the presence of this as a sure sign of spam - if you train on
ham as well it will see that your email is also a sure sign of ham,
i.e. not a word to go by).

(load-library "std-disclaimer") ;-) This is all from the top of my
head and I might have missed something.

> Seems like one would just invoke bogofilter on each message and send
> each one to spam or ham.  Technically a split, I guess but not very
> complicated. The complicated part seems to be what goes on inside
> bogofilter.  The messages it will be seeing have already skirted SA's
> complex set of interrelated rules, plus my own homeboy procmail rules
> and tweaks to SA.  So this mail will be hard to find a pattern or some
> other thing to help indentify it.

Bogofilter keeps a statistical database of all words that exist in the
email and know if the email was considered (by you) as a ham or
spam. When detecting spam, it checks the database for each word and
applies a mathematical formula.

The database can look like this (my spam database):
FDA-Approved 1 20040610
FDA-approved 2 20040613
FDZTb0mAPVS 1 20040607
FEEL 8 20040417
FFFF00 1 20040416

Google for a description of Bayesian filters (sp?), it is quite simple
actually. Bogofilter will detect spams that the statical rules in sa
has missed. I.e. all different spelling of Viagra: V1agra, V1ag.ra etc.

The idea is "keep a database of all good words and all bad words and
check the email and whichever has the highest ranking classifies the
message".

> My case should be the simplest possible example of using spam.el and
> bogofilter, but I'm not sure about involving gnus registry etc.
> Or what `exactly' needs doing.

The registry is a database (a lisp list actually) of all message ids
and which group they exit in. Some lines in the documentation suggests
that you need to use it for autodetection (can someone else
confirm?). In that case, add
(setq spam-log-to-registry t)
(gnus-registry-initialize)

HTH.

-- 
(        http://hem.bredband.net/steverud/        !     Wei Wu Wei     )
(        Meaning of U2 Lyrics, Roleplaying        !  To Do Without Do  )

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: wallowing out of the spam quagmire
  2004-06-22  1:53     ` Jody Klymak
@ 2004-06-22 10:56       ` Harry Putnam
  2004-06-22 15:03         ` Jody Klymak
  2004-06-22 15:20         ` Jody Klymak
  0 siblings, 2 replies; 22+ messages in thread
From: Harry Putnam @ 2004-06-22 10:56 UTC (permalink / raw)

Jody Klymak <jklymak@ucsd.edu> writes:

[...] Snipped good instructions ... thanks

> Step 3: Set parameters in mail/junk 
> I think you need to go to group-customize to do this (G c on the
> group).  I find this personally annoying and adds a layer of mystery
> to the whole thing.  I think there is a way around it, but I'm not
> sure what it is.

Just an aside about your annoyance:
Maybe I'm missing what you mean above... but you could probably do it
by regex using `gnus-parameters'... open gnus manual and press:
 i gnus-parameters <RET> for a nifty example.  But maybe the needed 
lisp won't work from there?

> Let me know if this doesn't make sense.

Working with your instructions and the further info posted by Jonas. 
I'll report back how it goes.  Should have it done sometime this
morning.

Many thanks for the input...

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: wallowing out of the spam quagmire
  2004-06-22 10:56       ` Harry Putnam
@ 2004-06-22 15:03         ` Jody Klymak
  2004-06-22 15:20         ` Jody Klymak
  1 sibling, 0 replies; 22+ messages in thread
From: Jody Klymak @ 2004-06-22 15:03 UTC (permalink / raw)


Harry Putnam <reader@newsguy.com> writes:

> Maybe I'm missing what you mean above... but you could probably do it
> by regex using `gnus-parameters'... open gnus manual and press:
>  i gnus-parameters <RET> for a nifty example.  But maybe the needed 
> lisp won't work from there?

I *knew* there was a way.  There always is!  

Thanks,  Jody

-- 
Jody Klymak      http://opg1.ucsd.edu/~jklymak/
mailto:jklymak@ucsd.edu   




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: wallowing out of the spam quagmire
  2004-06-22  7:52     ` Jonas Steverud
@ 2004-06-22 15:18       ` Jody Klymak
  2004-06-22 16:34       ` Ted Zlatanov
  1 sibling, 0 replies; 22+ messages in thread
From: Jody Klymak @ 2004-06-22 15:18 UTC (permalink / raw)

Hi all,

Jonas Steverud <tvrud@bredband.net> writes:

> If I got everything right, all mails in Inbox will be checked upon
> entry for spam. Any spam will be marked with $. Upon exit, all spam
> (autodetected and marked by you) will be moved to Spam and all ham
> (what is considered ham is decided by the ham-mark above) will be
> moved to Ham. Everything that is not marked as spam neither ham will
> stay in place.
>
> When you exit Ham and Spam, bogofilter will train on them as ham and
> spam respectively.

I never quite understood this "trinary" approach to spam.  The setup
I gave assumes a binary approach - either its spam or not.  On
splitting my Inbox
 ham -> mail/Inbox.good
 spam -> mail/junk

I read mail in mail/Inbox.good.  If somehow a spam got in there I
"S-x" on the mail.  My spam-exit-processor in mail/Inbox.good is set
to Spam: bogofitler, so bogofilter is trained on the bad mail and then
it is moved to mail/junk.

I scan over mail/junk.  Everything is marked as spam ("$").  If I see
something that is not spam I tick it ("!") and quit the group.
Bogofilter is trained on the ham and then it is moved to
mail/Inbox.good.

I think this is the simplest set up possible.  Its how Netscape and
Apple's mail works.  

More complicated setups occur if you want to train bogofilter (or
other software) offline.  i.e. you run a cron job that runs
bogofilter on the contents of your "Ham" folder.  i.e. you have an
IMAP server where you can run bogofilter, and you run procmail over
there, but you need a mechanism to communicate with it from the local
machine where you run gnus.  

Cheers,  Jody 

-- 
Jody Klymak      http://opg1.ucsd.edu/~jklymak/
mailto:jklymak@ucsd.edu   

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: wallowing out of the spam quagmire
  2004-06-22 10:56       ` Harry Putnam
  2004-06-22 15:03         ` Jody Klymak
@ 2004-06-22 15:20         ` Jody Klymak
  1 sibling, 0 replies; 22+ messages in thread
From: Jody Klymak @ 2004-06-22 15:20 UTC (permalink / raw)



Hello Harry,

I realized that I have a couple of settings in my Inbox.good as
well.  Sigh.  I'll try to put these all into .gnus.el.  But for now -
if you want spam processed on exit, I believe you need to set spam-process...

;;; Editing the group parameters for `nnimap+opg1.ucsd.edu:mail/Inbox.good'.
;; Type `C-c C-c' after you've finished editing.

((uidvalidity . "1078971070")
 (auto-expire . t)
 (spam-process
  ((spam spam-use-bogofilter)))
 (spam-process-destination)
 (expiry-wait . 10.0))

Sorry for leaving that out....

Cheers,  Jody
-- 
Jody Klymak      http://opg1.ucsd.edu/~jklymak/
mailto:jklymak@ucsd.edu   




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: wallowing out of the spam quagmire
  2004-06-22  1:21   ` Harry Putnam
  2004-06-22  1:53     ` Jody Klymak
  2004-06-22  7:52     ` Jonas Steverud
@ 2004-06-22 16:32     ` Ted Zlatanov
  2004-06-25 13:37     ` Kai Grossjohann
  3 siblings, 0 replies; 22+ messages in thread
From: Ted Zlatanov @ 2004-06-22 16:32 UTC (permalink / raw)
  Cc: ding

On Mon, 21 Jun 2004, reader@newsguy.com wrote:

> I'm not sure we're from the same planetary system... or as bare
> minimum you must have a rather bizarre notion of what `not complex'
> means.  I went glassy eyed after the first couple hundred lines.
> 
> I'm introduced to black lists, black holes, hash-cash payments,
> bogofilters, on line data bases, bbdb as white list, some absolutely
> convoluted processing that seem to require `split fancy' which I've
> never used.  Some use of gnus registry, which I also have never messed
> with.  Many lines of variable discussion which apparently is supposed
> to spell out what 2780 lines of elisp in spame.el do.

Heh.  The *new* spam.el is better organized and should be easier to
understand.  Specifically, the backend-specific code that confused
you is separated from the program logic.

The gnus registry integration with spam.el is optional and off by default.

I agree though, the code is complex.  It has to be in order to do all
it does.

> I guess ...Spam ELisp Package Sequence of Events
> is as close as it gets..  Sounds like I need the auto-detect method
> and would set G p on my single inbox group to something that tells
> spam.el to `auto-detect' in it.

That would be OK.  The auto-detect stuff basically runs spam-split on
a temporary buffer with the message.

I'll follow up on your other messages too...

> My case should be the simplest possible example of using spam.el and
> bogofilter, but I'm not sure about involving gnus registry etc.
> Or what `exactly' needs doing.
> 
> I'm going to look for Teds patch to docs right now.

The doc patch is not by me, sorry for confusing you.

Ted



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: wallowing out of the spam quagmire
  2004-06-22  7:52     ` Jonas Steverud
  2004-06-22 15:18       ` Jody Klymak
@ 2004-06-22 16:34       ` Ted Zlatanov
  1 sibling, 0 replies; 22+ messages in thread
From: Ted Zlatanov @ 2004-06-22 16:34 UTC (permalink / raw)

On Tue, 22 Jun 2004, tvrud@bredband.net wrote:

> The registry is a database (a lisp list actually) of all message ids
> and which group they exit in. 

Yes.

> Some lines in the documentation suggests that you need to use it for
> autodetection (can someone else confirm?).

The registry is used so messages don't get registered as spam more
than once, based on their message ID.  I don't think it's used by
autodetection now, but it may be in the future (so you remember the
classification of a message ID, and don't have to re-run the
spam-split checks on it).

Ted

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: wallowing out of the spam quagmire
  2004-06-22  1:40   ` Harry Putnam
@ 2004-06-22 16:45     ` Ted Zlatanov
  0 siblings, 0 replies; 22+ messages in thread
From: Ted Zlatanov @ 2004-06-22 16:45 UTC (permalink / raw)
  Cc: ding

On Mon, 21 Jun 2004, reader@newsguy.com wrote:

> My usage will probably be the simplest one would find.  I want to run
> bogofilter against only one group.  (My setup described in previous
> post) .. that group gets whatever has made it by SA and dozens of my
> own procmail rules including splitting out many list-server messages
> from dozens of subscribed lists.  So this group doesn't get lots of
> mail.  However, the spam that shows up there though is fairly
> sophisticated and will be hard to indentify as spam. (for computer
> tools).
> 
> It is readily indentifieable by humans though, so I thought if I showed
> enough of it to bogofilter, that filter would eventually become able to
> indentify most of it.  Maybe that isn't how Bogofilter works..?
> 
> At this stage I'd like to see a few steps that would get this process
> started.  Something like (examples are made up):
> 
> 1) put (some elisp) in G p of this group (auto-detect I guess)

Yes, auto-detect with spam-use-bogofilter.

> 2) install bogofilter and set (`bogo-on' to t)

You should (setq spam-use-bogofilter t), that's the most important
thing BEFORE you call (spam-initialize).

> 3) set variables (spam-group "groupname") (ham-group "groupname")

Set the group spam/ham classification you mean.  You can do that on a
topic too, it's easier for many groups.

> 4) enter group and mark messages spam or ham
> 5) let bogofiter wallow around in spam-group ham-group several times
>    a day.
> Do steps four/five 7,535 times and then bogofilter will know what is
> what....

Yes, this is very sensible.

Your workflow then is:

1. Enter ham group A with parameters:
 auto-detect with spam-use-bogofilter
 spam-process-destination "S"
 spam-process (spam spam-use-bogofilter)
1.1. mark spam in ham group A (if any was missed by auto-detect)
1.2. quit ham group A
1.3. the marked spam goes to the spam-process-destination, after
     being processed with the bogofilter processor

2. Enter spam group S with parameters:
 spam-process-destination nil (just delete spam)
 ham-process-destination "A"
 spam-process (ham spam-use-bogofilter)
2.1. mark ham (with a ham-mark, e.g. !)
2.2. quit spam group S
2.3. the marked ham goes to the ham-process-destination, after being
 processed with bogofilter

Specifically regarding the treatment and motion of the spam/ham
articles when exiting, you may want to look at the new spam.el and
specifically the spam-summary-exit-behavior variable.

I also like spam-mark-ham-unread-before-move-from-spam-group set to t,
so instead of ! ham is marked unread when it goes to ham group A.

I hope that explains things...

Ted

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: wallowing out of the spam quagmire
  2004-06-22  1:21   ` Harry Putnam
                       ` (2 preceding siblings ...)
  2004-06-22 16:32     ` Ted Zlatanov
@ 2004-06-25 13:37     ` Kai Grossjohann
  2004-06-25 14:26       ` Daniel Pittman
  2004-06-26 10:18       ` wallowing out of the spam quagmire Harry Putnam
  3 siblings, 2 replies; 22+ messages in thread
From: Kai Grossjohann @ 2004-06-25 13:37 UTC (permalink / raw)

Harry Putnam <reader@newsguy.com> writes:

> So to summarize.  I let procmail/sa do most splitting and culling out
> of spam.  When that is done, the rest comes to my inbox and I deal
> with it by hand.  I hoped to introduce bogofilter at that stage.

My suggestion for you is to tell procmail to run all mail through
bogofilter.  You can tell bogofilter to add a header saying whether
the mail is spam.

Then you can just do header-based splitting, either from procmail or
from nnmail-split-methods.  If you decide to use the fancy spam.el
thing, then you'll need to migrate to nnmail-split-fancy, and to tell
spam.el to split according to bogofilter headers.  You can do that by
setting spam-use-bogofilter-headers to t.

Now the remaining problem is to train bogofilter.  To do this, invoke
G c on the groups/topics involved and set the "spam exit processor"
to "bogofilter".  Also, invoke G c on the spam group and tell it that
the "ham exit processor" is also bogofilter.

Kai

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: wallowing out of the spam quagmire
  2004-06-25 13:37     ` Kai Grossjohann
@ 2004-06-25 14:26       ` Daniel Pittman
  2004-06-25 18:46         ` Chris Green
  2004-06-26 10:34         ` Harry Putnam
  2004-06-26 10:18       ` wallowing out of the spam quagmire Harry Putnam
  1 sibling, 2 replies; 22+ messages in thread
From: Daniel Pittman @ 2004-06-25 14:26 UTC (permalink / raw)

On 25 Jun 2004, Kai Grossjohann wrote:
> Harry Putnam <reader@newsguy.com> writes:
>
>> So to summarize.  I let procmail/sa do most splitting and culling out
>> of spam.  When that is done, the rest comes to my inbox and I deal
>> with it by hand.  I hoped to introduce bogofilter at that stage.
>
> My suggestion for you is to tell procmail to run all mail through
> bogofilter.  You can tell bogofilter to add a header saying whether
> the mail is spam.

This is a good suggestion, and the way I generally prefer to deal with
spam detection -- doing it before I even see the message in my "new
mail" indicator.

OTOH, I have a preference for SpamAssassin, which does the same
statistical stuff as Bogofilter, and a bunch of other useful testing as
well.

If you have control over your SMTP server, take a look at the
'amavisd-new' package, which hooks into the loop and deals with SPAM
tagging with SpamAssassin as well as virus detection, and works very
nicely.

        Daniel
-- 
Should I abide by the rules until they're changed,
or help speed the change by breaking them?
        -- Ashleigh Brilliant

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: wallowing out of the spam quagmire
  2004-06-25 14:26       ` Daniel Pittman
@ 2004-06-25 18:46         ` Chris Green
  2004-06-26 10:34         ` Harry Putnam
  1 sibling, 0 replies; 22+ messages in thread
From: Chris Green @ 2004-06-25 18:46 UTC (permalink / raw)


Daniel Pittman <daniel@rimspace.net> writes:

> OTOH, I have a preference for SpamAssassin, which does the same
> statistical stuff as Bogofilter, and a bunch of other useful testing as
> well.

FYI, what I used to do for spam was pipe through SA then use
bogofilter to weed out the false positives from what SA marked as
spam.  Then what SA declared as ham, I would run through bogofilter
to weed out false negatives.

My ISP complained about the amount of CPU time my spamc instance was
using so I switched to pure bogofilter.  With version 0.17 and a 8meg
database w/ the 1 word counts weeded out, I'm a pretty happy camper
with the exception of web sites HTML password reset requests.
-- 
Chris Green <cmg@dok.org>
Chicken's thinkin'




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: wallowing out of the spam quagmire
  2004-06-25 13:37     ` Kai Grossjohann
  2004-06-25 14:26       ` Daniel Pittman
@ 2004-06-26 10:18       ` Harry Putnam
  1 sibling, 0 replies; 22+ messages in thread
From: Harry Putnam @ 2004-06-26 10:18 UTC (permalink / raw)

Kai Grossjohann <kai@emptydomain.de> writes:

> My suggestion for you is to tell procmail to run all mail through
> bogofilter.  You can tell bogofilter to add a header saying whether
> the mail is spam.

As usual, Kai, you've supplied a good scheme.  I'm itching to try
this stuff but got hung up rebuilding my garage for another few days.

Thanks to all for the good input... Once I've tried some of
it... I'll report back.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: wallowing out of the spam quagmire
  2004-06-25 14:26       ` Daniel Pittman
  2004-06-25 18:46         ` Chris Green
@ 2004-06-26 10:34         ` Harry Putnam
  2004-06-26 14:55           ` [OT] Dual-MTA setup and spam filtering (was Re: wallowing out of the spam quagmire) Daniel Pittman
  1 sibling, 1 reply; 22+ messages in thread
From: Harry Putnam @ 2004-06-26 10:34 UTC (permalink / raw)

Daniel Pittman <daniel@rimspace.net> writes:

> If you have control over your SMTP server, take a look at the
> 'amavisd-new' package, which hooks into the loop and deals with SPAM
> tagging with SpamAssassin as well as virus detection, and works very
> nicely.

My MTA is sendmail:
checking this out at: http://www.ijs.si/software/amavisd/

It mentions a `dual sendmail set-up' being required to use it with
sendmail and says it works best with Postfix.

I've seen that term ( `Dual Sendmail setup' ) a few times lately but
haven't really seen anything telling what it means.

Do you know what that reference is about?

==
(at http://www.ijs.si/software/amavisd/)
[...]  

  It is written in Perl for maintainability, without paying a
  significant price for speed. It talks to MTA via (E)SMTP or LMTP, or
  by using helper programs. Best with Postfix, fine with dual-sendmail
  setup and Exim v4, works with sendmail/milter, or with any MTA as a
  SMTP relay. 'Howto' for qmail available as well.

[...]
==

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [OT] Dual-MTA setup and spam filtering (was Re: wallowing out of the spam quagmire)
  2004-06-26 10:34         ` Harry Putnam
@ 2004-06-26 14:55           ` Daniel Pittman
  0 siblings, 0 replies; 22+ messages in thread
From: Daniel Pittman @ 2004-06-26 14:55 UTC (permalink / raw)

On 26 Jun 2004, Harry Putnam wrote:
> Daniel Pittman <daniel@rimspace.net> writes:
>
>> If you have control over your SMTP server, take a look at the
>> 'amavisd-new' package, which hooks into the loop and deals with SPAM
>> tagging with SpamAssassin as well as virus detection, and works very
>> nicely.
>
> My MTA is sendmail:
> checking this out at: http://www.ijs.si/software/amavisd/
>
> It mentions a `dual sendmail set-up' being required to use it with
> sendmail and says it works best with Postfix.

Ah. What that means is that sendmail lacks some of the routing
flexibility of newer MTA systems, requiring you to run two complete and
distinct instances of it to get inline SMTP filtering...

> I've seen that term ( `Dual Sendmail setup' ) a few times lately but
> haven't really seen anything telling what it means.

Basically, this is the diagram for mail delivery:

  +----------+     +---------+     +-------------+     +---------+
  | internet +-----+ MTA ext +-----+ amavisd-new +-----+ MTA int |
  +----------+     +---------+     +-------------+     +---------+

Mail comes in to the first sendmail on port 25, and is queued to disk. 
It is then sent through amavisd-new via SMTP, which sends it to a second
sendmail. The second sendmail then delivers the mail to the end user.

You can, but don't have to, run both of the sendmail instances on the
same system.

One reason for doing this is because there is a firewall between the two
systems, and you don't want your internal mail server exposed directly,
and vice-versa.

Another is to get something like amavisd-new inline to the SMTP delivery
cycle.

Postfix, for reference, can have a single instance, with only the one
configuration file, etc, and achieve the same results -- effectively,
have one SMTP port that passes stuff to a content filter, and another
which doesn't.[1]

> Do you know what that reference is about?
>
> ==
> (at http://www.ijs.si/software/amavisd/)
> [...]  
>
> It is written in Perl for maintainability, without paying a
> significant price for speed. It talks to MTA via (E)SMTP or LMTP, or
> by using helper programs. Best with Postfix, fine with dual-sendmail
> setup and Exim v4, works with sendmail/milter, or with any MTA as a
> SMTP relay. 'Howto' for qmail available as well.

The 'milter' interface is a way to put something inline to sendmail
during the initial SMTP conversation, rather than after the mail is
queued on disk for the first time.

This works, but I don't personally recommend it as a model. This
document covers why in some detail:

<http://www.postfix.org/SMTPD_PROXY_README.html>

It deals with Postfix, but the 'milter' is connected within sendmail in
the same spot as the 'before queue filter' in Postfix.

    Daniel

Footnotes: 
[1]  Recent versions can even have a single port which does this; I have
     that configuration, and it is quite nice.

-- 
CAUTION: This product exerts a force on every other object in the
Universe, proportional to the product of their masses divided by the
square of the distance between them, center to center.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2004-06-26 14:55 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-06-19 18:27 wallowing out of the spam quagmire Harry Putnam
2004-06-20  6:58 ` Jonas Steverud
2004-06-22  1:21   ` Harry Putnam
2004-06-22  1:53     ` Jody Klymak
2004-06-22 10:56       ` Harry Putnam
2004-06-22 15:03         ` Jody Klymak
2004-06-22 15:20         ` Jody Klymak
2004-06-22  7:52     ` Jonas Steverud
2004-06-22 15:18       ` Jody Klymak
2004-06-22 16:34       ` Ted Zlatanov
2004-06-22 16:32     ` Ted Zlatanov
2004-06-25 13:37     ` Kai Grossjohann
2004-06-25 14:26       ` Daniel Pittman
2004-06-25 18:46         ` Chris Green
2004-06-26 10:34         ` Harry Putnam
2004-06-26 14:55           ` [OT] Dual-MTA setup and spam filtering (was Re: wallowing out of the spam quagmire) Daniel Pittman
2004-06-26 10:18       ` wallowing out of the spam quagmire Harry Putnam
2004-06-20 23:44 ` Kevin Ryde
2004-06-21  4:28   ` Daniel Pittman
2004-06-21 14:35 ` Ted Zlatanov
2004-06-22  1:40   ` Harry Putnam
2004-06-22 16:45     ` Ted Zlatanov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).