* Using `all.SCORE' @ ~/News/all.SCORE [regex syntax]
@ 2017-05-27 14:58 Harry Putnam
2017-05-29 10:34 ` Ben Bacarisse
0 siblings, 1 reply; 3+ messages in thread
From: Harry Putnam @ 2017-05-27 14:58 UTC (permalink / raw)
To: info-gnus-english
all.SCORE:
((mark -100)
("from"
("nikolys@gmail" -101 nil r)
("sina\.com" -101 nil r)
("@aol\\.com" -101 nil r)
("@[0-9]+\\.com>" -101 nil r)
("harry504@gmail" -101 nil r)
("s[ea]l[el]\\|discount\\|free\\|wholesale\\|paypal" -101 nil r))
("subject"
("~~" -101 nil r)
("~~\\|>>>\\|\\[A-Z\\]\\{4\\}" -101 nil r)
("!!\\|free\\|discount\\|wholesale" -101 nil r)))
I've forgotten how that was generated but would like to hand edit it.
You can see the term `free' in two places... in the last `from' element
and the last `subject' element.
I want the `free' at the last `from' element to be more restrictive as
it is hitting quite a few false positives due to network name with
various combinations of free with a dot like: `free.', `.free' and
`.free.'
This is happening in groups with thousands and thousands of messages
so I don't want to get it wrong... not sure how to re-run it.
So something like (please ignore the elisions (`[...]')):
[...] |[^\.]free[^\.]\\|[...]
But does it need the double slashes like:
[...] |\\[^\.\\]free\\[^\.\\] [...]
^^ ^^ ^^
Will that even accomplish what I am after; to allow `free' in any
combination of: `.free', `free.' or `.free.' to not be down scored?
Is there a handy way to test the regex?
Is there a handy way to rerun all those messages thru `all.SCORE'?
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Using `all.SCORE' @ ~/News/all.SCORE [regex syntax]
2017-05-27 14:58 Using `all.SCORE' @ ~/News/all.SCORE [regex syntax] Harry Putnam
@ 2017-05-29 10:34 ` Ben Bacarisse
2017-05-30 0:51 ` Harry Putnam
0 siblings, 1 reply; 3+ messages in thread
From: Ben Bacarisse @ 2017-05-29 10:34 UTC (permalink / raw)
To: Harry Putnam; +Cc: info-gnus-english
Harry Putnam <reader@newsguy.com> writes:
> all.SCORE:
>
> ((mark -100)
> ("from"
> ("nikolys@gmail" -101 nil r)
> ("sina\.com" -101 nil r)
> ("@aol\\.com" -101 nil r)
> ("@[0-9]+\\.com>" -101 nil r)
> ("harry504@gmail" -101 nil r)
> ("s[ea]l[el]\\|discount\\|free\\|wholesale\\|paypal" -101 nil r))
> ("subject"
> ("~~" -101 nil r)
> ("~~\\|>>>\\|\\[A-Z\\]\\{4\\}" -101 nil r)
> ("!!\\|free\\|discount\\|wholesale" -101 nil r)))
<snip>
> I want the `free' at the last `from' element to be more restrictive as
> it is hitting quite a few false positives due to network name with
> various combinations of free with a dot like: `free.', `.free' and
> `.free.'
>
> This is happening in groups with thousands and thousands of messages
> so I don't want to get it wrong... not sure how to re-run it.
>
> So something like (please ignore the elisions (`[...]')):
>
> [...] |[^\.]free[^\.]\\|[...]
It's simpler than you think because . does not need \ inside []s. All
you need to add is [^.] on either side.
> But does it need the double slashes like:
>
> [...] |\\[^\.\\]free\\[^\.\\] [...]
> ^^ ^^ ^^
> Will that even accomplish what I am after; to allow `free' in any
> combination of: `.free', `free.' or `.free.' to not be down scored?
You've added \s only where not needed! There are two things going on
that require \s. First, some elements i a regexp only mean what you
want when preceded by \. So | is just | unless you write \| to mean an
alternative. But then the regexp is being put into a string, and \s
need to be doubled inside a string so that they remain \s. So, if you
did need [^\.] (you don't) you'd have to write "... [^\\.] ..." in a
string.
> Is there a handy way to test the regex?
I use highlight mode. Text matching your regexp gets highlighted in
real time. Remember, get the regexp working, then double every \ to put
it into a string.
> Is there a handy way to rerun all those messages thru `all.SCORE'?
I think just exiting and re-entering the group does that, though I'm
sure there will be some more direct way.
--
Ben.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Using `all.SCORE' @ ~/News/all.SCORE [regex syntax]
2017-05-29 10:34 ` Ben Bacarisse
@ 2017-05-30 0:51 ` Harry Putnam
0 siblings, 0 replies; 3+ messages in thread
From: Harry Putnam @ 2017-05-30 0:51 UTC (permalink / raw)
To: info-gnus-english
Ben Bacarisse <ben.lists@bsb.me.uk> writes:
> It's simpler than you think because . does not need \ inside []s. All
> you need to add is [^.] on either side.
Thanks for the well aimed tutorial... a great help
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2017-05-30 0:51 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-27 14:58 Using `all.SCORE' @ ~/News/all.SCORE [regex syntax] Harry Putnam
2017-05-29 10:34 ` Ben Bacarisse
2017-05-30 0:51 ` Harry Putnam
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).