Announcements and discussions for Gnus, the GNU Emacs Usenet newsreader
 help / color / mirror / Atom feed
* Using `all.SCORE' @ ~/News/all.SCORE [regex syntax]
@ 2017-05-27 14:58 Harry Putnam
  2017-05-29 10:34 ` Ben Bacarisse
  0 siblings, 1 reply; 3+ messages in thread
From: Harry Putnam @ 2017-05-27 14:58 UTC (permalink / raw)
  To: info-gnus-english


all.SCORE:

((mark -100)
 ("from"
  ("nikolys@gmail" -101 nil r)
  ("sina\.com" -101 nil r)
  ("@aol\\.com" -101 nil r)
  ("@[0-9]+\\.com>" -101 nil r)
  ("harry504@gmail" -101 nil r)
  ("s[ea]l[el]\\|discount\\|free\\|wholesale\\|paypal" -101 nil r))
 ("subject"
  ("~~" -101 nil r)
  ("~~\\|>>>\\|\\[A-Z\\]\\{4\\}" -101 nil r)
  ("!!\\|free\\|discount\\|wholesale" -101 nil r)))

I've forgotten how that was generated but would like to hand edit it.

You can see the term `free' in two places... in the last `from' element
and the last `subject' element.

I want the `free' at the last `from' element to be more restrictive as
it is hitting quite a few false positives due to network name with
various combinations of free with a dot like: `free.', `.free' and
`.free.'

This is happening in groups with thousands and thousands of messages
so I don't want to get it wrong... not sure how to re-run it.

So something like (please ignore the elisions (`[...]')):

        [...] |[^\.]free[^\.]\\|[...]

But does it need the double slashes like:

        [...] |\\[^\.\\]free\\[^\.\\] [...]
                      ^^    ^^     ^^      
Will that even accomplish what I am after; to allow `free' in any
combination of: `.free', `free.' or `.free.' to not be down scored?

Is there a handy way to test the regex?

Is there a handy way to rerun all those messages thru `all.SCORE'?



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Using `all.SCORE' @ ~/News/all.SCORE [regex syntax]
  2017-05-27 14:58 Using `all.SCORE' @ ~/News/all.SCORE [regex syntax] Harry Putnam
@ 2017-05-29 10:34 ` Ben Bacarisse
  2017-05-30  0:51   ` Harry Putnam
  0 siblings, 1 reply; 3+ messages in thread
From: Ben Bacarisse @ 2017-05-29 10:34 UTC (permalink / raw)
  To: Harry Putnam; +Cc: info-gnus-english

Harry Putnam <reader@newsguy.com> writes:

> all.SCORE:
>
> ((mark -100)
>  ("from"
>   ("nikolys@gmail" -101 nil r)
>   ("sina\.com" -101 nil r)
>   ("@aol\\.com" -101 nil r)
>   ("@[0-9]+\\.com>" -101 nil r)
>   ("harry504@gmail" -101 nil r)
>   ("s[ea]l[el]\\|discount\\|free\\|wholesale\\|paypal" -101 nil r))
>  ("subject"
>   ("~~" -101 nil r)
>   ("~~\\|>>>\\|\\[A-Z\\]\\{4\\}" -101 nil r)
>   ("!!\\|free\\|discount\\|wholesale" -101 nil r)))
<snip>
> I want the `free' at the last `from' element to be more restrictive as
> it is hitting quite a few false positives due to network name with
> various combinations of free with a dot like: `free.', `.free' and
> `.free.'
>
> This is happening in groups with thousands and thousands of messages
> so I don't want to get it wrong... not sure how to re-run it.
>
> So something like (please ignore the elisions (`[...]')):
>
>         [...] |[^\.]free[^\.]\\|[...]

It's simpler than you think because . does not need \ inside []s.  All
you need to add is [^.] on either side.

> But does it need the double slashes like:
>
>         [...] |\\[^\.\\]free\\[^\.\\] [...]
>                       ^^    ^^     ^^      
> Will that even accomplish what I am after; to allow `free' in any
> combination of: `.free', `free.' or `.free.' to not be down scored?

You've added \s only where not needed!  There are two things going on
that require \s.  First, some elements i a regexp only mean what you
want when preceded by \.  So | is just | unless you write \| to mean an
alternative.  But then the regexp is being put into a string, and \s
need to be doubled inside a string so that they remain \s.  So, if you
did need [^\.] (you don't) you'd have to write "... [^\\.] ..." in a
string.

> Is there a handy way to test the regex?

I use highlight mode.  Text matching your regexp gets highlighted in
real time.  Remember, get the regexp working, then double every \ to put
it into a string.

> Is there a handy way to rerun all those messages thru `all.SCORE'?

I think just exiting and re-entering the group does that, though I'm
sure there will be some more direct way.

-- 
Ben.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Using `all.SCORE' @ ~/News/all.SCORE [regex syntax]
  2017-05-29 10:34 ` Ben Bacarisse
@ 2017-05-30  0:51   ` Harry Putnam
  0 siblings, 0 replies; 3+ messages in thread
From: Harry Putnam @ 2017-05-30  0:51 UTC (permalink / raw)
  To: info-gnus-english

Ben Bacarisse <ben.lists@bsb.me.uk> writes:

> It's simpler than you think because . does not need \ inside []s.  All
> you need to add is [^.] on either side.

Thanks for the well aimed tutorial... a great help



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-05-30  0:51 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-27 14:58 Using `all.SCORE' @ ~/News/all.SCORE [regex syntax] Harry Putnam
2017-05-29 10:34 ` Ben Bacarisse
2017-05-30  0:51   ` Harry Putnam

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).