filtering nntp messages

Gnus development mailing list
 help / color / mirror / Atom feed

* filtering nntp messages
@ 2009-10-23 17:06 Harry Putnam
  2009-10-23 22:33 ` Adam Sjøgren
  0 siblings, 1 reply; 19+ messages in thread
From: Harry Putnam @ 2009-10-23 17:06 UTC (permalink / raw)
  To: ding

Can anyone show some examples of how to filter nntp groups using gnus.
I've filtered my mail with procmail for a very long time... but never
really tried to filter nntp groups.

Some nntp groups I like to frequent like comp.unix.shell have been
nearly destroyed by spam.  I'm guessing gnus is capable of filtering
all that mess... but a few cursory tries at the manual have come up
dry.

Maybe some use of gnus-parameters can get it done?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: filtering nntp messages
  2009-10-23 17:06 filtering nntp messages Harry Putnam
@ 2009-10-23 22:33 ` Adam Sjøgren
  2009-10-23 22:55   ` Ted Zlatanov
  0 siblings, 1 reply; 19+ messages in thread
From: Adam Sjøgren @ 2009-10-23 22:33 UTC (permalink / raw)
  To: ding

On Fri, 23 Oct 2009 12:06:56 -0500, Harry wrote:

> Some nntp groups I like to frequent like comp.unix.shell have been
> nearly destroyed by spam.  I'm guessing gnus is capable of filtering
> all that mess... but a few cursory tries at the manual have come up
> dry.

Uhm, look a scoring?


  Best regards,

-- 
 "It's a way to escalate the annoyance for added              Adam Sjøgren
  amusement."                                            asjo@koldfront.dk




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: filtering nntp messages
  2009-10-23 22:33 ` Adam Sjøgren
@ 2009-10-23 22:55   ` Ted Zlatanov
  2009-10-24  1:51     ` Harry Putnam
  0 siblings, 1 reply; 19+ messages in thread
From: Ted Zlatanov @ 2009-10-23 22:55 UTC (permalink / raw)
  To: ding

On Sat, 24 Oct 2009 00:33:11 +0200 asjo@koldfront.dk (Adam Sjøgren) wrote: 

AS> On Fri, 23 Oct 2009 12:06:56 -0500, Harry wrote:
>> Some nntp groups I like to frequent like comp.unix.shell have been
>> nearly destroyed by spam.  I'm guessing gnus is capable of filtering
>> all that mess... but a few cursory tries at the manual have come up
>> dry.

AS> Uhm, look a scoring?

You can filter nntp with spam.el, same as any other message source.  You
just can't move spam articles out of a group, but you can copy them to
another backend or feed directly into the spam.el backends for spam
training.

Statistical spam backends will require fetching every message body,
though, which could be painful.  Unfortunately that's the best solution
nowadays.  You may want to look into integrating some anti-spam solution
with leafnode on arrival or something like it (I don't know if it's
possible!).  Then you can just score on headers, with spam.el or not.

Ted

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: filtering nntp messages
  2009-10-23 22:55   ` Ted Zlatanov
@ 2009-10-24  1:51     ` Harry Putnam
  2009-10-24  2:12       ` Harry Putnam
                         ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Harry Putnam @ 2009-10-24  1:51 UTC (permalink / raw)
  To: ding

Ted Zlatanov <tzz@lifelogs.com> writes:

> On Sat, 24 Oct 2009 00:33:11 +0200 asjo@koldfront.dk (Adam Sjøgren) wrote: 
>
> AS> On Fri, 23 Oct 2009 12:06:56 -0500, Harry wrote:
>>> Some nntp groups I like to frequent like comp.unix.shell have been
>>> nearly destroyed by spam.  I'm guessing gnus is capable of filtering
>>> all that mess... but a few cursory tries at the manual have come up
>>> dry.
>
> AS> Uhm, look a scoring?
>
> You can filter nntp with spam.el, same as any other message source.  You
> just can't move spam articles out of a group, but you can copy them to
> another backend or feed directly into the spam.el backends for spam
> training.
>
> Statistical spam backends will require fetching every message body,
> though, which could be painful.  Unfortunately that's the best solution
> nowadays.  You may want to look into integrating some anti-spam solution
> with leafnode on arrival or something like it (I don't know if it's
> possible!).  Then you can just score on headers, with spam.el or not.

Sounds like scoring would be a better and easier solution eh Ted?.  I mean
you can score on just the headers right... and mark things read by
scoring or the like, not having to download all bodies.  Or am I
missing your point?

Actually I could use a nice example of scoring to mark read or similar.

I've never really learned spam.el or even splitting mail much.  I've
relied on procmail for that chore for yrs, and finally have a
semi-decent understanding of using procmail.  Its such a durable and
versatile little tool... its hard not to use...

Back in quassia-gnus days, when I first started using gnus, I had
already come to rely on procmail for sorting and despamming mail, so
never got too heavily involved in that side of gnus.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: filtering nntp messages
  2009-10-24  1:51     ` Harry Putnam
@ 2009-10-24  2:12       ` Harry Putnam
  2009-10-24 11:20         ` Adam Sjøgren
  2009-10-24  5:04       ` Ted Zlatanov
  2009-10-24 10:13       ` Steinar Bang
  2 siblings, 1 reply; 19+ messages in thread
From: Harry Putnam @ 2009-10-24  2:12 UTC (permalink / raw)
  To: ding

Harry Putnam <reader@newsguy.com> writes:

>> Statistical spam backends will require fetching every message body,
>> though, which could be painful.  Unfortunately that's the best solution
>> nowadays.  You may want to look into integrating some anti-spam solution
>> with leafnode on arrival or something like it (I don't know if it's
>> possible!).  Then you can just score on headers, with spam.el or not.
>
> Sounds like scoring would be a better and easier solution eh Ted?.  I mean
> you can score on just the headers right... and mark things read by
> scoring or the like, not having to download all bodies.  Or am I
> missing your point?
>
> Actually I could use a nice example of scoring to mark read or similar.

My god... I've spent the last 20-30 minutes just browsing thru the
stuff on scoring.  It looks like the biggest morass of pita generating
rules and all manner of this ways and that ways, that I've seen in
yrs.

I remember starting to work on this stuff yrs ago... now I see why I
didn't stick with it.  It must be less complex than it looks (I say
hopefully). 

I'd like to try it a bit I guess, but can I tie the scoring into
gnus-parameters so its just select groups?

Anyone have an example of that"

Certain groups have become just really spammed to death... one example
is comp.unix.shell.

But posters there tell me they don't see much spam.. that its my nntp
server at fault.  That may be... are there any other newsguy users
here?

I've been with them since they were still zippo.com (before the
cigarette ligher co sued them) sometime in mid or late 90s.  But
looking around the home site I'm not finding a way to filter nntp,
just pop3.  Anyone know better? 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: filtering nntp messages
  2009-10-24  1:51     ` Harry Putnam
  2009-10-24  2:12       ` Harry Putnam
@ 2009-10-24  5:04       ` Ted Zlatanov
  2009-10-24 10:13       ` Steinar Bang
  2 siblings, 0 replies; 19+ messages in thread
From: Ted Zlatanov @ 2009-10-24  5:04 UTC (permalink / raw)
  To: ding

On Fri, 23 Oct 2009 20:51:45 -0500 Harry Putnam <reader@newsguy.com> wrote: 

HP> Ted Zlatanov <tzz@lifelogs.com> writes:
>> You can filter nntp with spam.el, same as any other message source.  You
>> just can't move spam articles out of a group, but you can copy them to
>> another backend or feed directly into the spam.el backends for spam
>> training.
>> 
>> Statistical spam backends will require fetching every message body,
>> though, which could be painful.  Unfortunately that's the best solution
>> nowadays.  You may want to look into integrating some anti-spam solution
>> with leafnode on arrival or something like it (I don't know if it's
>> possible!).  Then you can just score on headers, with spam.el or not.

HP> Sounds like scoring would be a better and easier solution eh Ted?.  I mean
HP> you can score on just the headers right... and mark things read by
HP> scoring or the like, not having to download all bodies.  Or am I
HP> missing your point?

Spam changes too fast for scoring rules.  See "A Plan for Spam" which
sort of started the statistical analysis of spam a while back.  Nowadays
companies use a mix of statistical, blackhole, and static filters, but
individual users can't keep up with the latter two.

If you find static (scoring) rules sufficient, that's wonderful.  If the
posting path for your NNTP spam always goes through a particular host,
for example, use that in your scoring rules.  Your NNTP host may already
be doing some spam filtering, check the headers.

If static scoring won't work, you can try setting up statistical filters
just on the headers.  It won't be as accurate as filtering on the whole
body but it will be fast.

Ted

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: filtering nntp messages
  2009-10-24  1:51     ` Harry Putnam
  2009-10-24  2:12       ` Harry Putnam
  2009-10-24  5:04       ` Ted Zlatanov
@ 2009-10-24 10:13       ` Steinar Bang
  2009-10-24 15:10         ` Harry Putnam
  2 siblings, 1 reply; 19+ messages in thread
From: Steinar Bang @ 2009-10-24 10:13 UTC (permalink / raw)
  To: ding

>>>>> Harry Putnam <reader@newsguy.com>:

> Actually I could use a nice example of scoring to mark read or similar.

 - Enter comp.unix.shell
 - Select a spam article with an obviously spammy subject
 - Type `L s r p' and then edit the subject to become a regular
   expression matching as much of similar subjects as possible

The rule you've just created will be in the file ~/News/comp.unix.shell.SCORE
and be specific to that group.  `V e' will take you to the score file
and let you adjust the score rules.

This is if comp.unix.shell is a group on your primary server.  If it is
on a secondary server, the file name will be prefixed by the server
name, e.g. ~/News/nntp+news.gmane.org:gmane.discuss.SCORE for this
group.

But as Ted pointed out, static score rules won't help you very far these
days. 

There's also adaptive scoring, where Gnus will add scoring rules
automatically in ADAPT files with the same naming scheme, but with
"ADAPT" in place of "SCORE".  I used that years ago to lose annoying
posters on USENET (if I skipped enough of the posters' articles, they
would eventually go away).

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: filtering nntp messages
  2009-10-24  2:12       ` Harry Putnam
@ 2009-10-24 11:20         ` Adam Sjøgren
  2009-10-24 16:22           ` Harry Putnam
  0 siblings, 1 reply; 19+ messages in thread
From: Adam Sjøgren @ 2009-10-24 11:20 UTC (permalink / raw)
  To: ding

On Fri, 23 Oct 2009 21:12:08 -0500, Harry wrote:

> My god... I've spent the last 20-30 minutes just browsing thru the
> stuff on scoring. It looks like the biggest morass of pita generating
> rules and all manner of this ways and that ways, that I've seen in
> yrs.

I mostly use scoring to hide articles by authors I don't like (for some
reason or other.)

I do that by having something like this in my ~/News/all.SCORE file:

  (
   ("from"
    ("Very Annoying <very@annoying.example.invalid>" -10000 nil e)
    ("Less Annoying <litl@annoying.example.invalid>" -5000 nil e)
   ("references"
    ("<87[0-9a-z]+\\.fsf\\(_-_\\)?@.*koldfront.dk>" nil nil r)
   ("xref"
    ("gmane\\.spam\\.detected" -5000 nil r))
  )

And in my .gnus I have defined:

  ; Scoring, don't show the lowest of the low:
  (setq gnus-summary-expunge-below -9999)

This way I never see articles where the From: line exactly matches Very
Annoying, while articles by Less Annoying are scored down so they are
read by default (and Y-marked, shown in italics.)

The references part matches my own articles, so follow ups (from
non-braindamaged programs that handle the References: header) are
highligted.

The xref part is for gmane, where spam is crossposted to
gmane.spam.detected, and easily marked as read.

I would guess that what you'd want to start out with is a subject part
matching the articles you don't want to see - perhaps starting with just
-5000'ing them.

> I remember starting to work on this stuff yrs ago... now I see why I
> didn't stick with it.  It must be less complex than it looks (I say
> hopefully). 

Well, you're using Gnus - it is supposed to be highly configurable and
slightly confusing at first sight, right?

> I'd like to try it a bit I guess, but can I tie the scoring into
> gnus-parameters so its just select groups?

> Anyone have an example of that"

You can have score-files per group: ~/News/<groupname>.SCORE

I'm not using it a lot, but here is an example:

,----[ nntp+news.gmane.org:gmane.config.SCORE ]
| (("subject"
|   ("Cron <root@.*" -5000 nil r)
|   ("CVS update of .*" -5000 nil r)))
`----


  Best regards,

-- 
 "It's a way to escalate the annoyance for added              Adam Sjøgren
  amusement."                                            asjo@koldfront.dk




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: filtering nntp messages
  2009-10-24 10:13       ` Steinar Bang
@ 2009-10-24 15:10         ` Harry Putnam
  2009-10-24 16:38           ` Adam Sjøgren
  2009-10-24 22:05           ` Steinar Bang
  0 siblings, 2 replies; 19+ messages in thread
From: Harry Putnam @ 2009-10-24 15:10 UTC (permalink / raw)
  To: ding

Steinar Bang <sb@dod.no> writes:

> This is if comp.unix.shell is a group on your primary server.  If it is
> on a secondary server, the file name will be prefixed by the server
> name, e.g. ~/News/nntp+news.gmane.org:gmane.discuss.SCORE for this
> group.

Looking at the rule created and browsing through the emacs manual bit
concerning regular expressions... I'm still not sure if I'm getting it
right.

The rule ends up looking like:

(("from"
  ("@[0-9]+\\.com>" -1 nil s)))

I was shooting for matching any From: header with ATsign followed by 1
or more digits followed by a dot followed by com>

It appears the vast majority of spam in comp.unix.shell has that
regexp in its `From:' line.  But I think it would be quite rare for regular
folks to have that in their `From:' line.

If that regex does what I was shooting for, then from their how do I
get to where those are marked read? What I've done so far appears to
have had no effect on what I see when I open the group.

After creating the rule, I marked several hundred messages as unread,
left the group with ZZ and reopened it... I still see all the spam.
Nearly all those messages now have a Y on the left of summary line.

Should I not see the spam now? Is my regexp messed up?

How would I combine that rule with a more compound rule including some
other header?

If I go back into the group select a spam message and attempt to
create a filter on Subject (L s r p) after adjusting the regex and
press enter... the rule seems to just disappear.  Opening the SCORE
file I see the same `From:' rule and nothing else.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: filtering nntp messages
  2009-10-24 11:20         ` Adam Sjøgren
@ 2009-10-24 16:22           ` Harry Putnam
  2009-10-24 16:48             ` Adam Sjøgren
  0 siblings, 1 reply; 19+ messages in thread
From: Harry Putnam @ 2009-10-24 16:22 UTC (permalink / raw)
  To: ding

asjo@koldfront.dk (Adam Sjøgren) writes:

> You can have score-files per group: ~/News/<groupname>.SCORE
>
> I'm not using it a lot, but here is an example:
>
> ,----[ nntp+news.gmane.org:gmane.config.SCORE ]
> | (("subject"
> |   ("Cron <root@.*" -5000 nil r)
> |   ("CVS update of .*" -5000 nil r)))
> `----
>

Thanks, very helpful... I've been poring over the scoring section of
the manual and still a bit confused about the score file format.

In your example above, It can't mean that both items must appear in the
subject line... so apparently either appearing will do it.

If I wanted to have a rule that looked for `@[0-9]+\.com>' in From:
line and any of sell|free|discount in the subject line:

I guess it would look like:

,----
| (("from"
|   ("@[0-9]+\\.com>" -100 nil r))
|  ("subject"
|   ("sale\\|free\\|discount" -100 nil r)))
`----

But if I wanted each rule to operate on its own it would look like:

,----
| (("from"
|   ("@[0-9]+\\.com>" -100 nil r)))
| (("subject"
|   ("sale\\|free\\|discount" -100 nil r)))
`----

Is that right?.  It really doesn't say far as I see, in the manual.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: filtering nntp messages
  2009-10-24 15:10         ` Harry Putnam
@ 2009-10-24 16:38           ` Adam Sjøgren
  2009-10-24 19:55             ` Harry Putnam
  2009-10-24 22:05           ` Steinar Bang
  1 sibling, 1 reply; 19+ messages in thread
From: Adam Sjøgren @ 2009-10-24 16:38 UTC (permalink / raw)
  To: ding

On Sat, 24 Oct 2009 10:10:02 -0500, Harry wrote:

> (("from"
>   ("@[0-9]+\\.com>" -1 nil s)))

If you say "s" it is a substring match. That is unlikely to work with
your regular expression.

See http://gnus.org/manual/gnus_266.html#SEC266


  Best regards,

-- 
 "It's a way to escalate the annoyance for added              Adam Sjøgren
  amusement."                                            asjo@koldfront.dk




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: filtering nntp messages
  2009-10-24 16:22           ` Harry Putnam
@ 2009-10-24 16:48             ` Adam Sjøgren
  2009-10-24 19:45               ` Harry Putnam
  0 siblings, 1 reply; 19+ messages in thread
From: Adam Sjøgren @ 2009-10-24 16:48 UTC (permalink / raw)
  To: ding

On Sat, 24 Oct 2009 11:22:37 -0500, Harry wrote:

>> ,----[ nntp+news.gmane.org:gmane.config.SCORE ]
>> | (("subject"
>> |   ("Cron <root@.*" -5000 nil r)
>> |   ("CVS update of .*" -5000 nil r)))
>> `----

[...]

> In your example above, It can't mean that both items must appear in the
> subject line... so apparently either appearing will do it.

Yes, each regular expression is matched against the subject and the
corresponding score is applied if it matches.

> If I wanted to have a rule that looked for `@[0-9]+\.com>' in From:
> line and any of sell|free|discount in the subject line:

I don't know that you can do that, unless you use Advanced Scoring (see
http://gnus.org/manual/gnus_277.html#SEC277 )

> I guess it would look like:
> ,----
> | (("from"
> |   ("@[0-9]+\\.com>" -100 nil r))
> |  ("subject"
> |   ("sale\\|free\\|discount" -100 nil r)))
> `----

There is no implicit AND between the different matches, so I would
expect the above to give articles only matching on from -100, articles
only matching subject -100, and articles matching both -200.

> ,----
> | (("from"
> |   ("@[0-9]+\\.com>" -100 nil r)))
> | (("subject"
> |   ("sale\\|free\\|discount" -100 nil r)))
> `----

That doesn't look valid to me - you seem to be making up your own
syntax. The manual clearly states:

  "A score file is an emacs-lisp file that normally contains just a single form."
   - http://gnus.org/manual/gnus_266.html#SEC266


  Best regards,

-- 
 "Gravity is arbitrary!"                                      Adam Sjøgren
                                                         asjo@koldfront.dk




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: filtering nntp messages
  2009-10-24 16:48             ` Adam Sjøgren
@ 2009-10-24 19:45               ` Harry Putnam
  0 siblings, 0 replies; 19+ messages in thread
From: Harry Putnam @ 2009-10-24 19:45 UTC (permalink / raw)
  To: ding

asjo@koldfront.dk (Adam Sjøgren) writes:

[...]

>> ,----
>> | (("from"
>> |   ("@[0-9]+\\.com>" -100 nil r)))
>> | (("subject"
>> |   ("sale\\|free\\|discount" -100 nil r)))
>> `----
>
> That doesn't look valid to me - you seem to be making up your own
> syntax. The manual clearly states:

He he.. it appears I have yes

>   "A score file is an emacs-lisp file that normally contains just a single form."
>    - http://gnus.org/manual/gnus_266.html#SEC266

OK, thanks.  This and you other answers have gone miles in getting me
to understand how it works.




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: filtering nntp messages
  2009-10-24 16:38           ` Adam Sjøgren
@ 2009-10-24 19:55             ` Harry Putnam
  2009-10-24 20:12               ` Adam Sjøgren
  0 siblings, 1 reply; 19+ messages in thread
From: Harry Putnam @ 2009-10-24 19:55 UTC (permalink / raw)
  To: ding

asjo@koldfront.dk (Adam Sjøgren) writes:

> On Sat, 24 Oct 2009 10:10:02 -0500, Harry wrote:
>
>> (("from"
>>   ("@[0-9]+\\.com>" -1 nil s)))
>
> If you say "s" it is a substring match. That is unlikely to work with
> your regular expression.
>
> See http://gnus.org/manual/gnus_266.html#SEC266
>

I'm starting to get the hang of it, I guess, but something puzzles me.
I've managed to work out some scoring that marks much of the spam read
in comp.unix.shell but still If I mark a bunch of stuff unread to
simulate opening the group after lots of new messages have arrived, I
leave the group. On  re-enter I see my scoring has marked most of it
read... but its still there.  I still see it when opening the group.
I have to do M-g (`gnus-summary-rescan-group') before they slither
away in hiding.

Is that how its supposed to work?  Somehow I'd gotten the idea being
marked read would hide it on open.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: filtering nntp messages
  2009-10-24 19:55             ` Harry Putnam
@ 2009-10-24 20:12               ` Adam Sjøgren
  2009-10-24 22:07                 ` Harry Putnam
  0 siblings, 1 reply; 19+ messages in thread
From: Adam Sjøgren @ 2009-10-24 20:12 UTC (permalink / raw)
  To: ding

On Sat, 24 Oct 2009 14:55:52 -0500, Harry wrote:

> On re-enter I see my scoring has marked most of it read... but its
> still there. I still see it when opening the group. I have to do M-g
> (`gnus-summary-rescan-group') before they slither away in hiding.

As far as I understand it "read" (in this context) means you see them in
the list, but they are marked read (and skipped if you use e.g. 'n' to
read the next article). If you don't want to see them in the summary at
all, you need "expunge", i.e. something like setting:

,----[ C-h v gnus-summary-expunge-below RET ]
| `gnus-summary-expunge-below' is a variable declared in Lisp.
|   -- loaded from "gnus-sum"
| 
| Value: -9999
| 
| Documentation:
| All articles that have a score less than this variable will be expunged.
| This variable is local to the summary buffers.
`----

and scoring the spam down below that number.

  Best regards,

-- 
 "My brain always rejects attitude transplants."              Adam Sjøgren
                                                         asjo@koldfront.dk

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: filtering nntp messages
  2009-10-24 15:10         ` Harry Putnam
  2009-10-24 16:38           ` Adam Sjøgren
@ 2009-10-24 22:05           ` Steinar Bang
  2009-10-24 22:09             ` Steinar Bang
  1 sibling, 1 reply; 19+ messages in thread
From: Steinar Bang @ 2009-10-24 22:05 UTC (permalink / raw)
  To: ding

>>>>> Harry Putnam <reader@newsguy.com>:

> The rule ends up looking like:

> (("from"
>   ("@[0-9]+\\.com>" -1 nil s)))

Two things to mention:
 1. s (as Adam mentioned), means it is a substring match, rather than a
    regexp match.  An r at the end means it should be interpreted as a
    regexp 
 2. -1 is very little if you want to really zap it.  nil here would mean
    the default negative score, which (I think) would be -1000

> I was shooting for matching any From: header with ATsign followed by 1
> or more digits followed by a dot followed by com>

Should match reasonably well, if interpreted as a regexp...

> It appears the vast majority of spam in comp.unix.shell has that
> regexp in its `From:' line.  But I think it would be quite rare for regular
> folks to have that in their `From:' line.

> If that regex does what I was shooting for, then from their how do I
> get to where those are marked read? What I've done so far appears to
> have had no effect on what I see when I open the group.

> After creating the rule, I marked several hundred messages as unread,
> left the group with ZZ and reopened it... I still see all the spam.
> Nearly all those messages now have a Y on the left of summary line.

It means that they have gotten a negative score, but not enough to zap
them.  -1000 or more would help.

Here are the score settings from my ~/.gnus.el file.  They are too old
for me to remember what they actually mean...

(setq gnus-thread-sort-functions
      '(gnus-thread-sort-by-number gnus-thread-sort-by-total-score))
(setq gnus-score-find-score-files-function
      'gnus-score-find-hierarchical)
(setq gnus-thread-score-function 'max)
(setq gnus-use-scoring t)
(setq gnus-use-adaptive-scoring nil)
(setq gnus-decay-scores t)
(defun gnus-set-score-limit ()
  "Set the limit of when scored down articles are no longer displayed"
  (setq gnus-summary-expunge-below -1500))
(add-hook 'gnus-summary-mode-hook 'gnus-set-score-limit)

I think what the expunge-below means is that I zap everything with a
score lower than -1500.

Here's myntp+news.gmane.org:SCORE file:

(("from"
  ("Steinar Bang" nil nil s))
 ("subject"
  ("^ANNOUNCE: " nil nil r)
  ("^ANN: " nil nil r)
  ("^\\[ANN] " nil nil r)
  ("^\\[SPAM] " -2000 nil r))
 ("xref"
  ("gmane.spam.detected" -2000 nil s))
 (decay 733571))

As you can see I zap two kinds of messages:
 - those that are spam-tagged by a mailing list's SpamAssassin
 - those that are detected by gmane's spamassassin, and crossposted to
   gmane.spam.detected 

Those messages that matche either, I never see.

> Should I not see the spam now?

No.  You're not lowering the score enough, I think.  And you don't
expunge low scored messages.

> Is my regexp messed up?

If the above example is copy/pasted, your rexep is only interpreted as a
substring match (strange that it should match so many articles then...?
Not copy/pasted correctly...?)

> How would I combine that rule with a more compound rule including some
> other header?

> If I go back into the group select a spam message and attempt to
> create a filter on Subject (L s r p) after adjusting the regex and
> press enter... the rule seems to just disappear.  Opening the SCORE
> file I see the same `From:' rule and nothing else.

Hm... you should see the new rule added.  That's what happens for me.





^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: filtering nntp messages
  2009-10-24 20:12               ` Adam Sjøgren
@ 2009-10-24 22:07                 ` Harry Putnam
  2009-10-24 22:59                   ` Harry Putnam
  0 siblings, 1 reply; 19+ messages in thread
From: Harry Putnam @ 2009-10-24 22:07 UTC (permalink / raw)
  To: ding

asjo@koldfront.dk (Adam Sjøgren) writes:

> On Sat, 24 Oct 2009 14:55:52 -0500, Harry wrote:
>
>> On re-enter I see my scoring has marked most of it read... but its
>> still there. I still see it when opening the group. I have to do M-g
>> (`gnus-summary-rescan-group') before they slither away in hiding.
>
> As far as I understand it "read" (in this context) means you see them in
> the list, but they are marked read (and skipped if you use e.g. 'n' to
> read the next article). If you don't want to see them in the summary at
> all, you need "expunge", i.e. something like setting:
>
> ,----[ C-h v gnus-summary-expunge-below RET ]
> | `gnus-summary-expunge-below' is a variable declared in Lisp.
> |   -- loaded from "gnus-sum"
> | 
> | Value: -9999
> | 
> | Documentation:
> | All articles that have a score less than this variable will be expunged.
> | This variable is local to the summary buffers.
> `----

Nice... I guess I'm happy enough just marking them read, since they
disappear with rescan... and for now I may expunge some stuff I want
so I think I'll wait till I'm a little more confident.

Here is a good example of why I'm not so confident just yet.

I've set a regex to match subjects that I thought would match:
  Any two or more tildes in a row OR any 4 Uppercase letters in a row.

(("subject"
  ("~~\\|[A-Z]\\{4\\}" -101 nil r)))

But apparently I'd miss-reading the regular expressions section in
emacs manual or making some other blunder because the regex above
matches every last message in comp.programming

Then I tried `~~\\|\\[A-Z\\]\\{4\\}'
And it worked as planned




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: filtering nntp messages
  2009-10-24 22:05           ` Steinar Bang
@ 2009-10-24 22:09             ` Steinar Bang
  0 siblings, 0 replies; 19+ messages in thread
From: Steinar Bang @ 2009-10-24 22:09 UTC (permalink / raw)
  To: ding

>>>>> Steinar Bang <sb@dod.no>:

>  2. -1 is very little if you want to really zap it.  nil here would mean
>     the default negative score, which (I think) would be -1000

Correcting myself: nil is the default positive score.  If you want a
negative score, use a negative number.

And noting from the other articles in the thread: if you want to zap the
articles, don't go for score values of -1 or even -100, go for <-1000 at
least. 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: filtering nntp messages
  2009-10-24 22:07                 ` Harry Putnam
@ 2009-10-24 22:59                   ` Harry Putnam
  0 siblings, 0 replies; 19+ messages in thread
From: Harry Putnam @ 2009-10-24 22:59 UTC (permalink / raw)
  To: ding

Harry Putnam <reader@newsguy.com> writes:

> I've set a regex to match subjects that I thought would match:
>   Any two or more tildes in a row OR any 4 Uppercase letters in a row.
>
> (("subject"
>   ("~~\\|[A-Z]\\{4\\}" -101 nil r)))
>
> But apparently I'd miss-reading the regular expressions section in
> emacs manual or making some other blunder because the regex above
> matches every last message in comp.programming
>
> Then I tried `~~\\|\\[A-Z\\]\\{4\\}'
> And it worked as planned

Ahh scratch that... it doesn't match 4 uppercase letters either.

This one below seems to match most closely what the regexps syntax
part of emacs manual actually says.

   `[A-Z][A-Z]' (BUT:It matches everything)

Is it a case problem... is emacs using the regex case insensitively? 

This does not match either.
   \\[A-Z\\]\\[A-Z\\] 

I do have this in .emacs
(setq case-fold-search nil)

But still in gnus-summary-mode C-h v case-fold-search
shows it with the value t

So setting it manually with M-x set-variable.. I then see the value is:
,----
| case-fold-search is a variable defined in `C source code'.
| Its value is nil
| Local in buffer 
|   *Summary nntp+enews.newsguy.com:comp.infosystems*; global value is t
`----
but still the regex [A-Z][A-Z] affects ALL subject lines.

A little more testing shows that emacs insists on using the regex case
insensitively.

I even tried:
(defun case-fold-nil ()
  (setq case-fold-search nil))
(add-hook 'gnus-summary-mode-hook 'case-fold-nil)

But when I evaluate that it returns:
(case-fold-nil gnus-agent-mode)

And in summary-buffer C-h v case-fold-search returns:
,----
| case-fold-search is a variable defined in `C source code'.
| Its value is nil
| Local in buffer *Summary nntp+enews.newsguy.com:comp.infosystems*; global value is t
`----

But still when I run V R with the score file:

(("subject"
  ("winning" -101 nil r)))

It still marks a subject with WINNING in it.

What am I doing wrong here...?  case-fold-search does need to be nil
right? 

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2009-10-24 22:59 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-10-23 17:06 filtering nntp messages Harry Putnam
2009-10-23 22:33 ` Adam Sjøgren
2009-10-23 22:55   ` Ted Zlatanov
2009-10-24  1:51     ` Harry Putnam
2009-10-24  2:12       ` Harry Putnam
2009-10-24 11:20         ` Adam Sjøgren
2009-10-24 16:22           ` Harry Putnam
2009-10-24 16:48             ` Adam Sjøgren
2009-10-24 19:45               ` Harry Putnam
2009-10-24  5:04       ` Ted Zlatanov
2009-10-24 10:13       ` Steinar Bang
2009-10-24 15:10         ` Harry Putnam
2009-10-24 16:38           ` Adam Sjøgren
2009-10-24 19:55             ` Harry Putnam
2009-10-24 20:12               ` Adam Sjøgren
2009-10-24 22:07                 ` Harry Putnam
2009-10-24 22:59                   ` Harry Putnam
2009-10-24 22:05           ` Steinar Bang
2009-10-24 22:09             ` Steinar Bang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).