[Q]: Gmane groups and spam filtering

Gnus development mailing list
 help / color / mirror / Atom feed

* [Q]: Gmane groups and spam filtering
@ 2004-03-11 21:22 Xavier Maillard
  2004-03-11 23:38 ` Kevin Greiner
  0 siblings, 1 reply; 4+ messages in thread
From: Xavier Maillard @ 2004-03-11 21:22 UTC (permalink / raw)


Hi,

I am trying (hard) to use exclusively Gmane for _all_ my old mailing
lists.

As usual, some groups are quite big and contain many spams.

I am using the agent here and I am used to fetch *all* articles for
groups I read. Currently this implies I also fetch spam which is not
very efficient and lot of bandwith waste. I want to avoid this at some
level.

I know Gmane is tagging all spam messages with an 'xref' and that
spam.el provides some way to detect spam.

I tried to use this but it doesn't seem to work for me :(

Here are my settings for gmane groups (note I make an heavy use of 
gnus-parameters and nothing has been defined in any other part):


             ("^nntp\\+.*:gmane\\..*"
              (auto-detect . t)
              (spam-use-gmane-xref t)
              (gnus-agent-consider-all-articles . t)
              (gnus-agent-enable-expiration . (quote (DISABLE)))
              (spam-contents gnus-group-spam-classification-ham)
              (spam-process
               (gnus-group-spam-exit-processor-report-gmane))
               (agent-predicate (not spam))
              )


This doesn't work ! I just don't know if 'spam' as an agent-predicate
does something. Currently I set the agent-predicate on a group basis
and it is generally set to 'true' so I download *all* articles
(including spams).

Now I just want to know if my spam-use-gmane-xref is correct since I
don't see any difference when set and when not.

Thank you,

zeDek

P.S: how can I know if the parameter is in a dotted-pair format or not ?
-- 
Xavier Maillard
http://www.gnu-rox.org/~zedek/cgi-bin/wiki.pl





^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Q]: Gmane groups and spam filtering
  2004-03-11 21:22 [Q]: Gmane groups and spam filtering Xavier Maillard
@ 2004-03-11 23:38 ` Kevin Greiner
  2004-03-12 15:18   ` Xavier Maillard
  0 siblings, 1 reply; 4+ messages in thread
From: Kevin Greiner @ 2004-03-11 23:38 UTC (permalink / raw)


Xavier Maillard <zedek@gnu-rox.org> writes:

> Hi,
>
> I am trying (hard) to use exclusively Gmane for _all_ my old mailing
> lists.
>
> As usual, some groups are quite big and contain many spams.
>
> I am using the agent here and I am used to fetch *all* articles for
> groups I read. Currently this implies I also fetch spam which is not
> very efficient and lot of bandwith waste. I want to avoid this at some
> level.
>
> I know Gmane is tagging all spam messages with an 'xref' and that
> spam.el provides some way to detect spam.
>
> I tried to use this but it doesn't seem to work for me :(
>
> Here are my settings for gmane groups (note I make an heavy use of 
> gnus-parameters and nothing has been defined in any other part):
>
>
>              ("^nntp\\+.*:gmane\\..*"
>               (auto-detect . t)
>               (spam-use-gmane-xref t)
>               (gnus-agent-consider-all-articles . t)
>               (gnus-agent-enable-expiration . (quote (DISABLE)))
>               (spam-contents gnus-group-spam-classification-ham)
>               (spam-process
>                (gnus-group-spam-exit-processor-report-gmane))
>                (agent-predicate (not spam))
>               )
>
>
> This doesn't work ! I just don't know if 'spam' as an agent-predicate
> does something. Currently I set the agent-predicate on a group basis
> and it is generally set to 'true' so I download *all* articles
> (including spams).
>
> Now I just want to know if my spam-use-gmane-xref is correct since I
> don't see any difference when set and when not.

You might find it more informative, and faster, to simply look at the code.

However, since you asked, it appears that gnus-agent-spam-p (the
function that implements the span predicate) is a rather misleading
placeholder.  It does a gnus-gethash on gnus-agent-spam-hashtb so it
seems to do something yet I can't find any code, anywhere, that
populates gnus-agent-spam-hashtb.

Instead of using (not spam) you could use (not x-spam) where x-spam is
the name of a spam detection function of your own design.

On the other hand, if your spam processing results in an article
score, you could set the agent predicate to select articles higher
than that score.

That is, (agent-predicate (not low)).

Kevin



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Q]: Gmane groups and spam filtering
  2004-03-11 23:38 ` Kevin Greiner
@ 2004-03-12 15:18   ` Xavier Maillard
  2004-03-29 21:21     ` Ted Zlatanov
  0 siblings, 1 reply; 4+ messages in thread
From: Xavier Maillard @ 2004-03-12 15:18 UTC (permalink / raw)

Kevin Greiner <kgreiner <at> xpediantsolutions.com> writes:

> > Now I just want to know if my spam-use-gmane-xref is correct since I
> > don't see any difference when set and when not.
> 
> You might find it more informative, and faster, to simply look at the code.

Ok I finally found why it didn't work. Setting it this way: 

              (spam-autodetect-methods spam-use-gmane-xref spam-use-BBDB)
              (spam-autodetect . t)
              (spam-process '(spam spame-use-gmane))
;;               (gnus-group-spam-exit-processor-report-gmane)) ;; obsoleted
              (gnus-agent-consider-all-articles . t)
              (gnus-agent-enable-expiration . (quote (DISABLE)))
              (spam-contents gnus-group-spam-classification-ham)
              (agent-predicate . true) ;; and (not spam) (not old))

But now I see many messages like this one:

/------------------------------------------------------------- 
|Fetching headers for nntp+news.gmane.org:gmane.linux.debian.user...done
|Dictionary used: american in group nntp+news.gmane.org:gmane.linux.debian.user
|[2 times]
|Scoring...done
|Making sparse threads...done
|Sorting threads...done
|Generating summary...done
|spam-split: widening the buffer (spam-use-bogofilter requires it)
|spam-split: calling the spam-check-gmane-xref function
|spam-split: calling the spam-check-BBDB function
|Article 138023 has a nil data header [3 times]
|Article 138023 has no message ID!
|spam-generate-fake-headers: article 138023 didn't have a valid header
|spam-split: widening the buffer (spam-use-bogofilter requires it)
|spam-split: calling the spam-check-gmane-xref function
|spam-split: calling the spam-check-BBDB function
|Article 138024 has a nil data header [3 times]
|Article 138024 has no message ID!
|spam-generate-fake-headers: article 138024 didn't have a valid header
|spam-split: widening the buffer (spam-use-bogofilter requires it)
|spam-split: calling the spam-check-gmane-xref function
|spam-split: calling the spam-check-BBDB function
|spam-split: widening the buffer (spam-use-bogofilter requires it)
|spam-split: calling the spam-check-gmane-xref function
|spam-split: calling the spam-check-BBDB function
|Article 138026 has a nil data header [3 times]
|Article 138026 has no message ID!
\-------------------------------------------------------------

First I would like to know wether I can deactivate spam-use-bogofilter method
temporary. I thought (wrongly) that setting `spam-autodetect-methods` in a group
parameters would overload it but it seems it is not.

Secondly I don't know why gnus (or spam.el) is complaining about headers. I
didn't try to see a message header but I surely will since it increase spam.el
load and processing time.

> However, since you asked, it appears that gnus-agent-spam-p (the
> function that implements the span predicate) is a rather misleading
> placeholder.  It does a gnus-gethash on gnus-agent-spam-hashtb so it
> seems to do something yet I can't find any code, anywhere, that
> populates gnus-agent-spam-hashtb.

So did I.

> On the other hand, if your spam processing results in an article
> score, you could set the agent predicate to select articles higher
> than that score.

Hmm, problem is spam scores are ones that come from bogofilter check and I can
predicate which message is a good (i.e. interesting or not spam) based only on
its score ;)

Regards,

zeDek

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Q]: Gmane groups and spam filtering
  2004-03-12 15:18   ` Xavier Maillard
@ 2004-03-29 21:21     ` Ted Zlatanov
  0 siblings, 0 replies; 4+ messages in thread
From: Ted Zlatanov @ 2004-03-29 21:21 UTC (permalink / raw)
  Cc: ding

On Fri, 12 Mar 2004, zedek@gnu-rox.org wrote:

> But now I see many messages like this one:
> 
> /------------------------------------------------------------- 
> |Article 138023 has a nil data header [3 times] 
> |Article 138023 has no message ID!  
> \-------------------------------------------------------------

This is caused by the spam-generate-fake-headers function, which will
use the overview data to build fake headers when possible.

See if you can find what's causing the nil data header by tracing
through the spam-find-spam function.  It calls
(spam-generate-fake-headers article) when possible.

> First I would like to know wether I can deactivate
> spam-use-bogofilter method temporary. I thought (wrongly) that
> setting `spam-autodetect-methods` in a group parameters would
> overload it but it seems it is not.

It should.  What does `G p' show for the group in question?

> Secondly I don't know why gnus (or spam.el) is complaining about
> headers. I didn't try to see a message header but I surely will
> since it increase spam.el load and processing time.

With spam-autodetect, all headers of hitherto unseen messages are
checked when you enter a group, but spam.el tries to do it without
retrieving the whole article.  That's the reason for the whole
spam-generate-fake-headers rigamarole.

Ted

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2004-03-29 21:21 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-03-11 21:22 [Q]: Gmane groups and spam filtering Xavier Maillard
2004-03-11 23:38 ` Kevin Greiner
2004-03-12 15:18   ` Xavier Maillard
2004-03-29 21:21     ` Ted Zlatanov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).