Announcements and discussions for Gnus, the GNU Emacs Usenet newsreader
 help / color / mirror / Atom feed
* Spam.el, ham groups and INBOX
@ 2003-07-24 11:31 Jesse F. Hughes
  2003-07-24 14:20 ` David Z Maze
       [not found] ` <4nznj3uccd.fsf@chubby.bwh.harvard.edu>
  0 siblings, 2 replies; 5+ messages in thread
From: Jesse F. Hughes @ 2003-07-24 11:31 UTC (permalink / raw)


Hey ho.

I'm just getting started with spam.el.  I think my set-up looks okay,
but I am not sure that I understand how ham is handled.

What is the intended meaning of "ham groups"?  I thought that,
whenever I have a piece of mail in my INBOX that isn't spam, then I
should consider it ham.  Therefore, it makes sense to mark the INBOX
as a ham group, as long as I am careful to mark each false negative as
spam.

However, I saw a recent question in the gnus.ding list which worried
about whether messages in the INBOX will be processed as ham every
time one leaves the group.  Here is the excerpt from Ted's response.

     On Mon, 21 Jul 2003, jklymak@coas.oregonstate.edu wrote:
     
     > I didn't want to set my Inbox as a ham group because I didn't
     > want all the mails in the group processed *every* time I exit
     > the group.

     Ah yes, I've been meaning to use the gnus-registry for that, so
     messages are only processed once.  It's not hard, I'm just unable
     to do it right now.  The tracking data needs to be stored in the
     "extra" field of the registry entry, the rest is already handled
     by the registry.

This suggests that I'm wrong to set my INBOX as a ham group -- or does
it?

As a rule, I don't fetch previously read messages, so I guess those
won't be processed.  Of course, once in a while, I enter the group
with C-u so that I can search for an old email.  I reckon that when I
do that, every email fetched is re-processed as ham.

But, all those old articles are marked ancient (or expired), not read,
deleted, killed, etc.  So, they can't be re-processed, can they?

Is it or is it not a good idea to have INBOX as a ham group?

Thanks for any help.
-- 
"Come on people!!!  The US just blew up a lot of people in Iraq, don't
you realize that a person with my exposure might just end up dead, by
mysterious circumstances?" 
  --James Harris, on the dangers of "proving" Fermat's last theorem


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Spam.el, ham groups and INBOX
  2003-07-24 11:31 Spam.el, ham groups and INBOX Jesse F. Hughes
@ 2003-07-24 14:20 ` David Z Maze
       [not found]   ` <878yqo0z9c.fsf@phiwumbda.localnet>
       [not found] ` <4nznj3uccd.fsf@chubby.bwh.harvard.edu>
  1 sibling, 1 reply; 5+ messages in thread
From: David Z Maze @ 2003-07-24 14:20 UTC (permalink / raw)
  Cc: Jesse F. Hughes

jesseh@cs.kun.nl (Jesse F. Hughes) writes:

> What is the intended meaning of "ham groups"?  I thought that,
> whenever I have a piece of mail in my INBOX that isn't spam, then I
> should consider it ham.  Therefore, it makes sense to mark the INBOX
> as a ham group, as long as I am careful to mark each false negative as
> spam.

My understanding of spam/ham/other groups is that, given a perfect
spam filter, a spam group would contain only spam, a ham group would
contain no spam, and a group that's neither isn't handled by the
filter (e.g., news groups).

> However, I saw a recent question in the gnus.ding list which worried
> about whether messages in the INBOX will be processed as ham every
> time one leaves the group.

While Ted wrote the code and almost certainly understands it better
than I do, my understanding is that only previously unseen messages
("." mark) are processed.  Certainly it doesn't process messages that
aren't listed in the summary buffer when you exit the group.

> Is it or is it not a good idea to have INBOX as a ham group?

I think it is; otherwise, you'll never have a source for ham, which is
poor if you're using a Bayesian classifier.

-- 
David Maze             dmaze@mit.edu          http://www.mit.edu/~dmaze/
"Theoretical politics is interesting.  Politicking should be illegal."
	-- Abra Mitchell


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Spam.el, ham groups and INBOX
       [not found]     ` <y683cgvq0ju.fsf@no-knife.mit.edu>
@ 2003-07-24 18:09       ` Jesse F. Hughes
       [not found]         ` <y68n0f3oe6x.fsf@no-knife.mit.edu>
  0 siblings, 1 reply; 5+ messages in thread
From: Jesse F. Hughes @ 2003-07-24 18:09 UTC (permalink / raw)


David Z Maze <dmaze@mit.edu> writes:

> jesseh@cs.kun.nl (Jesse F. Hughes) writes:
>
>> Your emailed copy, by the way, was marked "spam" by spam-stat.
>
> Exciting.  :-)

I see your email was *again* marked spam, but not by the Bayesian
filters this time.  From my message log:

(DIG): positive blackhole check '86.21.7.18.relays.ordb.org. 300 IN A
127.0.0.2'

What's that mean, exactly?  Does it mean that
melbourne-city-street.mit.edu has bee blacklisted?  And so, since it
was in your list of relays, your message is presumed spam?  

That's how I understand that the blackhole servers work, but I'm
surprised to see an mit.edu machine on it.

>> I'm not too surprised by false positives.  Ifile has had a couple
>> already, due to some HTML ham (ptui!) I received.  No idea why yours
>> was marked, but I haven't "trained" the filters properly yet.
>
> That's very important for the Bayesian classifiers.  I know the
> ifile-gnus package has a shell script that you can use to populate
> ifile's database, and the spam-stat manual tells you how to set that
> up (though I might have written Lisp to make it do more than one group
> at a time, or I might have just let it come up with an empty initial
> database, I don't remember).
>
> ...which reminds me, the one big problem I had with ifile was that it
> had this tendency to mark all HTML mail as spam, regardless of other
> factors.  That, and it was slow (big files and AFS will do that,
> though).

Is speed a big concern for IMAP?  I've been trying to decide whether I
prefer POP3 anyway, and maybe I'll change due to speed issues with
spam.el.  I only care about IMAP when I'm on vacation, after all.

>
>> I find myself impatiently waiting on real spam to arrive today, so
>> that I can see if the filters can catch it.  It's a bit perverse.  I
>> coulda sworn I got more spam than I'm receiving today.
>
> nnml:mail.misc.spam seems to have 399 messages at the moment.  Want
> some?  :-)

Oh, I found some spam on the spam archive site to play with -- at
least with bogofilter, which is easy to use on the command line.  This
allows me to train the filter on spam, but I don't have so much ham
lying around for training.  I've managed to prune my email groups too
recently.

-- 
"No feeling sympathy for mathematicians who start marching with signs
like 'Will work for food' in the future...  I will not show mercy
going forward.  I was trained as a soldier in the United States Army
after all... We play to win." --James Harris, feel his wrath!


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Spam.el, ham groups and INBOX
       [not found]   ` <87vftrg7dh.fsf@phiwumbda.localnet>
@ 2003-07-24 18:25     ` Ted Zlatanov
  0 siblings, 0 replies; 5+ messages in thread
From: Ted Zlatanov @ 2003-07-24 18:25 UTC (permalink / raw)
  Cc: Jesse F. Hughes

On Thu, 24 Jul 2003, jesseh@cs.kun.nl wrote:
> As an aside, can you tell me about the registry?  I wrote some
> hateful-bastard functions some time ago, but I'd like to alter them
> so that they are applied only once to each message.  Right now, they
> are applied each time an article is prepared for the article buffer.
> Is the registry helpful for fixing this?
> 
> Where can I learn about it?

You can see gnus-registry.el in the lisp subdirectory of the Gnus CVS.
It's undocumented externally in the Gnus manual for these reasons: 
1) it's far from ready, and may change radically as I use it to help
spam.el; 2) Gnus is in a feature freeze and I'm trying to avoid major
work in the manual or in the code.

There is an "extra data" field for each article (as identified by the
message ID) that currently has a 'mtime key for storing the
modification time.  You can use any key in the extra data, similarly
to this:

write: (gnus-registry-store-extra-entry id 'mtime (current-time))
read: (gnus-registry-fetch-extra key 'mtime)

Let me know if you do this, because it has direct application to the
spam/ham registration (essentially I want to make sure an article is
processed by a particular spam/ham processor just once).

Ted


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Spam.el, ham groups and INBOX
       [not found]         ` <y68n0f3oe6x.fsf@no-knife.mit.edu>
@ 2003-07-24 21:21           ` Jesse F. Hughes
  0 siblings, 0 replies; 5+ messages in thread
From: Jesse F. Hughes @ 2003-07-24 21:21 UTC (permalink / raw)


David Z Maze <dmaze@mit.edu> writes:

> Fundamentally, the black hole lists contain lists of servers that have
> a mail policy the list operator disagrees with.  This is not
> necessarily a list of machines being actively used to forward spam;
> use at your own risk.

I've turned off black holes for now.  

How do black holes, white lists and filters work together?  I'd
perhaps want something like: 

  It's spam if (it's in the black hole list, but not my white list) or
  the filter says so.

Anyway, for present, I don't reckon I need black holes.  We'll see as
time passes.

-- 
"If you see math knowledge as a tool--as a hammer--with which
you can attack other people then ... you defeat rational discourse."
"I get to call my proof the Hammer.  It's more powerful than *any*
physical object.  It is overwhelming force."  -- Two JSH quotes


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2003-07-24 21:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-07-24 11:31 Spam.el, ham groups and INBOX Jesse F. Hughes
2003-07-24 14:20 ` David Z Maze
     [not found]   ` <878yqo0z9c.fsf@phiwumbda.localnet>
     [not found]     ` <y683cgvq0ju.fsf@no-knife.mit.edu>
2003-07-24 18:09       ` Jesse F. Hughes
     [not found]         ` <y68n0f3oe6x.fsf@no-knife.mit.edu>
2003-07-24 21:21           ` Jesse F. Hughes
     [not found] ` <4nznj3uccd.fsf@chubby.bwh.harvard.edu>
     [not found]   ` <87vftrg7dh.fsf@phiwumbda.localnet>
2003-07-24 18:25     ` Ted Zlatanov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).