Announcements and discussions for Gnus, the GNU Emacs Usenet newsreader
 help / color / mirror / Atom feed
From: Ben Bacarisse <ben.lists@bsb.me.uk>
To: Harry Putnam <reader@newsguy.com>
Cc: info-gnus-english@gnu.org
Subject: Re: Using `all.SCORE' @ ~/News/all.SCORE [regex syntax]
Date: Mon, 29 May 2017 11:34:07 +0100	[thread overview]
Message-ID: <8737bo3qao.fsf@bsb.me.uk> (raw)
In-Reply-To: <8660gmgxdx.fsf@local.lan> (Harry Putnam's message of "Sat, 27 May 2017 10:58:02 -0400")

Harry Putnam <reader@newsguy.com> writes:

> all.SCORE:
>
> ((mark -100)
>  ("from"
>   ("nikolys@gmail" -101 nil r)
>   ("sina\.com" -101 nil r)
>   ("@aol\\.com" -101 nil r)
>   ("@[0-9]+\\.com>" -101 nil r)
>   ("harry504@gmail" -101 nil r)
>   ("s[ea]l[el]\\|discount\\|free\\|wholesale\\|paypal" -101 nil r))
>  ("subject"
>   ("~~" -101 nil r)
>   ("~~\\|>>>\\|\\[A-Z\\]\\{4\\}" -101 nil r)
>   ("!!\\|free\\|discount\\|wholesale" -101 nil r)))
<snip>
> I want the `free' at the last `from' element to be more restrictive as
> it is hitting quite a few false positives due to network name with
> various combinations of free with a dot like: `free.', `.free' and
> `.free.'
>
> This is happening in groups with thousands and thousands of messages
> so I don't want to get it wrong... not sure how to re-run it.
>
> So something like (please ignore the elisions (`[...]')):
>
>         [...] |[^\.]free[^\.]\\|[...]

It's simpler than you think because . does not need \ inside []s.  All
you need to add is [^.] on either side.

> But does it need the double slashes like:
>
>         [...] |\\[^\.\\]free\\[^\.\\] [...]
>                       ^^    ^^     ^^      
> Will that even accomplish what I am after; to allow `free' in any
> combination of: `.free', `free.' or `.free.' to not be down scored?

You've added \s only where not needed!  There are two things going on
that require \s.  First, some elements i a regexp only mean what you
want when preceded by \.  So | is just | unless you write \| to mean an
alternative.  But then the regexp is being put into a string, and \s
need to be doubled inside a string so that they remain \s.  So, if you
did need [^\.] (you don't) you'd have to write "... [^\\.] ..." in a
string.

> Is there a handy way to test the regex?

I use highlight mode.  Text matching your regexp gets highlighted in
real time.  Remember, get the regexp working, then double every \ to put
it into a string.

> Is there a handy way to rerun all those messages thru `all.SCORE'?

I think just exiting and re-entering the group does that, though I'm
sure there will be some more direct way.

-- 
Ben.


  reply	other threads:[~2017-05-29 10:34 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-27 14:58 Harry Putnam
2017-05-29 10:34 ` Ben Bacarisse [this message]
2017-05-30  0:51   ` Harry Putnam

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8737bo3qao.fsf@bsb.me.uk \
    --to=ben.lists@bsb.me.uk \
    --cc=info-gnus-english@gnu.org \
    --cc=reader@newsguy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).