Gnus development mailing list
 help / color / mirror / Atom feed
* scoring based on a number of matches
@ 2019-07-09  1:49 Sam Steingold
  2019-07-09 15:39 ` Dave Marquardt
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Sam Steingold @ 2019-07-09  1:49 UTC (permalink / raw)
  To: ding

Hi,
I want to down-score articles with many all-upper case words in the
subject.
Ideally, I would like a "multiplier", e.g.:

--8<---------------cut here---------------start------------->8---
(("subject"
 ("!" (0 -10 -3) nil s)))
--8<---------------cut here---------------end--------------->8---

would ignore the 1st "!", reduce the score by 10 for the second "!",
and by 3 for each after that.

In the meantime, I would settle for a regexp that would match long
subjects lines without any lower case characters (see comp.lang.lisp):

--8<---------------cut here---------------start------------->8---
Subject: E' PEDOFILO ED ASSASSINO: PAOLO CARDENÀ (FACEBOOK & TWITTER)! DI CRIMINALISSIMO BLOG VINCITORI E VINTI ( VEDRA' COME LO FAREMO DIVENIRE PARTE DELLA SECONDA CATEGORIA E NON PRIMA, CHE RICICLI DA SEMPRE SOLDI DI MAFIA, CAMORRA E NDRANGHETA O MENO..."O
--8<---------------cut here---------------end--------------->8---

I tried

--8<---------------cut here---------------start------------->8---
(("subject"
  ("[^a-z]\\{100\\}" -100 nil r)))
--8<---------------cut here---------------end--------------->8---

to no  avail.

Thanks.

-- 
Sam Steingold (http://sds.podval.org/) on darwin Ns 10.3.1671
http://childpsy.net http://calmchildstories.com http://steingoldpsychology.com
http://thereligionofpeace.com http://think-israel.org
Democracy is like a car: you can ride it or you can run people over with it.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: scoring based on a number of matches
  2019-07-09  1:49 scoring based on a number of matches Sam Steingold
@ 2019-07-09 15:39 ` Dave Marquardt
  2019-07-09 15:47   ` Sam Steingold
  2019-07-09 15:52 ` Lars Ingebrigtsen
  2019-07-09 17:45 ` Eric Abrahamsen
  2 siblings, 1 reply; 9+ messages in thread
From: Dave Marquardt @ 2019-07-09 15:39 UTC (permalink / raw)
  To: Sam Steingold; +Cc: ding

Did you look at the Advanced Scoring section of the Gnus manual?  Try
https://www.gnus.org/manual/gnus_105.html#Advanced-Scoring


-----------------------
On Mon, Jul 08 2019, Sam Steingold wrote:

Hi,
I want to down-score articles with many all-upper case words in the
subject.
Ideally, I would like a "multiplier", e.g.:

(("subject"
 ("!" (0 -10 -3) nil s)))


would ignore the 1st "!", reduce the score by 10 for the second "!",
and by 3 for each after that.

In the meantime, I would settle for a regexp that would match long
subjects lines without any lower case characters (see comp.lang.lisp):

Subject: E' PEDOFILO ED ASSASSINO: PAOLO CARDENÀ (FACEBOOK & TWITTER)! DI CRIMINALISSIMO BLOG VINCITORI E VINTI ( VEDRA' COME LO FAREMO DIVENIRE PARTE DELLA SECONDA CATEGORIA E NON PRIMA, CHE RICICLI DA SEMPRE SOLDI DI MAFIA, CAMORRA E NDRANGHETA O MENO..."O


I tried

(("subject"
  ("[^a-z]\\{100\\}" -100 nil r)))

to no  avail.

Thanks.

-- 
Sam Steingold (http://sds.podval.org/) on darwin Ns 10.3.1671
http://childpsy.net http://calmchildstories.com http://steingoldpsychology.com
http://thereligionofpeace.com http://think-israel.org
Democracy is like a car: you can ride it or you can run people over with it.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: scoring based on a number of matches
  2019-07-09 15:39 ` Dave Marquardt
@ 2019-07-09 15:47   ` Sam Steingold
  0 siblings, 0 replies; 9+ messages in thread
From: Sam Steingold @ 2019-07-09 15:47 UTC (permalink / raw)
  To: ding

> * Dave Marquardt <qnirzned@yvahk.iarg.voz.pbz> [2019-07-09 10:39:31 -0500]:
>
> Did you look at the Advanced Scoring section of the Gnus manual?  Try
> https://www.gnus.org/manual/gnus_105.html#Advanced-Scoring

I did read it.
How it that supposed to help, specifically?
Thanks.

-- 
Sam Steingold (http://sds.podval.org/) on darwin Ns 10.3.1671
http://childpsy.net http://calmchildstories.com http://steingoldpsychology.com
http://memri.org http://iris.org.il http://www.memritv.org http://camera.org
I may be getting older, but I refuse to grow up!




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: scoring based on a number of matches
  2019-07-09  1:49 scoring based on a number of matches Sam Steingold
  2019-07-09 15:39 ` Dave Marquardt
@ 2019-07-09 15:52 ` Lars Ingebrigtsen
  2019-07-09 17:24   ` Sam Steingold
  2019-07-09 20:10   ` Andreas Schwab
  2019-07-09 17:45 ` Eric Abrahamsen
  2 siblings, 2 replies; 9+ messages in thread
From: Lars Ingebrigtsen @ 2019-07-09 15:52 UTC (permalink / raw)
  To: Sam Steingold; +Cc: ding

Sam Steingold <sds@gnu.org> writes:

> In the meantime, I would settle for a regexp that would match long
> subjects lines without any lower case characters (see comp.lang.lisp):
>
> Subject: E' PEDOFILO ED ASSASSINO: PAOLO CARDENÀ (FACEBOOK & TWITTER)!
> DI CRIMINALISSIMO BLOG VINCITORI E VINTI ( VEDRA' COME LO FAREMO
> DIVENIRE PARTE DELLA SECONDA CATEGORIA E NON PRIMA, CHE RICICLI DA
> SEMPRE SOLDI DI MAFIA, CAMORRA E NDRANGHETA O MENO..."O
>
> I tried
>
> (("subject"
>   ("[^a-z]\\{100\\}" -100 nil r)))

{100} isn't valid in Emacs regexps, I think?  (Unless that's something
that's happened while I wasn't looking; which is always possible.)

So you have to [^a-z][^a-z][^a-z][^a-z][^a-z]... it a lot.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: scoring based on a number of matches
  2019-07-09 15:52 ` Lars Ingebrigtsen
@ 2019-07-09 17:24   ` Sam Steingold
  2019-07-09 20:10   ` Andreas Schwab
  1 sibling, 0 replies; 9+ messages in thread
From: Sam Steingold @ 2019-07-09 17:24 UTC (permalink / raw)
  To: ding

> * Lars Ingebrigtsen <ynefv@tahf.bet> [2019-07-09 17:52:20 +0200]:
>
> Sam Steingold <sds@gnu.org> writes:
>
>> In the meantime, I would settle for a regexp that would match long
>> subjects lines without any lower case characters (see comp.lang.lisp):
>>
>> Subject: E' PEDOFILO ED ASSASSINO: PAOLO CARDENÀ (FACEBOOK & TWITTER)!
>> DI CRIMINALISSIMO BLOG VINCITORI E VINTI ( VEDRA' COME LO FAREMO
>> DIVENIRE PARTE DELLA SECONDA CATEGORIA E NON PRIMA, CHE RICICLI DA
>> SEMPRE SOLDI DI MAFIA, CAMORRA E NDRANGHETA O MENO..."O
>>
>> I tried
>>
>> (("subject"
>>   ("[^a-z]\\{100\\}" -100 nil r)))
>
> {100} isn't valid in Emacs regexps, I think?  (Unless that's something
> that's happened while I wasn't looking; which is always possible.)

--8<---------------cut here---------------start------------->8---
(string-match "b\\{1,\\}" "abcbbd")
==> 1
(string-match "b\\{2\\}" "abcbbd")
==> 3
(string-match "b\\{3\\}" "abcbbd")
==> nil
--8<---------------cut here---------------end--------------->8---


-- 
Sam Steingold (http://sds.podval.org/) on darwin Ns 10.3.1671
http://childpsy.net http://calmchildstories.com http://steingoldpsychology.com
http://thereligionofpeace.com http://islamexposedonline.com
If you're being passed on the right, you're in the wrong lane.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: scoring based on a number of matches
  2019-07-09  1:49 scoring based on a number of matches Sam Steingold
  2019-07-09 15:39 ` Dave Marquardt
  2019-07-09 15:52 ` Lars Ingebrigtsen
@ 2019-07-09 17:45 ` Eric Abrahamsen
  2019-07-11 19:46   ` Sam Steingold
  2 siblings, 1 reply; 9+ messages in thread
From: Eric Abrahamsen @ 2019-07-09 17:45 UTC (permalink / raw)
  To: ding

Sam Steingold <sds@gnu.org> writes:

> Hi,
> I want to down-score articles with many all-upper case words in the
> subject.
> Ideally, I would like a "multiplier", e.g.:

Not the answer you wanted, but it sounds like you're asking Gnus to play
the role of a Bayesian spam filter a la spamassassin or rspamd. If you
have access to your mail server, I'd prefer using those kinds of tools
for this job, adding a spam-score header to messages that Gnus can then
score on. Perhaps that's infeasible/undesirable, but it seems to me like
you're going to end up kind of fighting with Gnus on this one.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: scoring based on a number of matches
  2019-07-09 15:52 ` Lars Ingebrigtsen
  2019-07-09 17:24   ` Sam Steingold
@ 2019-07-09 20:10   ` Andreas Schwab
  1 sibling, 0 replies; 9+ messages in thread
From: Andreas Schwab @ 2019-07-09 20:10 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: Sam Steingold, ding

On Jul 09 2019, Lars Ingebrigtsen <larsi@gnus.org> wrote:

> {100} isn't valid in Emacs regexps, I think?

But \{100\} is.

> (Unless that's something that's happened while I wasn't looking; which
> is always possible.)

About 19 years ago.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: scoring based on a number of matches
  2019-07-09 17:45 ` Eric Abrahamsen
@ 2019-07-11 19:46   ` Sam Steingold
  2019-07-11 20:10     ` Eric Abrahamsen
  0 siblings, 1 reply; 9+ messages in thread
From: Sam Steingold @ 2019-07-11 19:46 UTC (permalink / raw)
  To: ding

> * Eric Abrahamsen <revp@revpnoenunzfra.arg> [2019-07-09 10:45:39 -0700]:
>
> Sam Steingold <sds@gnu.org> writes:
>
>> I want to down-score articles with many all-upper case words in the
>> subject.
>> Ideally, I would like a "multiplier", e.g.:
>
> Not the answer you wanted, but it sounds like you're asking Gnus to play
> the role of a Bayesian spam filter a la spamassassin or rspamd.

This is for a bona fide comp.lang.lisp newsgoups. Cannot do that.

However,  the following works for me now:

--8<---------------cut here---------------start------------->8---
(("subject"
  ("[^a-z]\\{40\\}" -100 nil R)
  ("[^a-z]\\{100\\}" -300 nil R)
  ("[^a-z]\\{150\\}" -1000 nil R)
  ))
--8<---------------cut here---------------end--------------->8---

(note `R` instead of `r`!)

-- 
Sam Steingold (http://sds.podval.org/) on darwin Ns 10.3.1671
http://childpsy.net http://calmchildstories.com http://steingoldpsychology.com
http://islamexposedonline.com https://ffii.org http://iris.org.il
Things that cannot be programmed in assembler have to be soldered.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: scoring based on a number of matches
  2019-07-11 19:46   ` Sam Steingold
@ 2019-07-11 20:10     ` Eric Abrahamsen
  0 siblings, 0 replies; 9+ messages in thread
From: Eric Abrahamsen @ 2019-07-11 20:10 UTC (permalink / raw)
  To: ding

Sam Steingold <sds@gnu.org> writes:

>> * Eric Abrahamsen <revp@revpnoenunzfra.arg> [2019-07-09 10:45:39 -0700]:
>>
>> Sam Steingold <sds@gnu.org> writes:
>>
>>> I want to down-score articles with many all-upper case words in the
>>> subject.
>>> Ideally, I would like a "multiplier", e.g.:
>>
>> Not the answer you wanted, but it sounds like you're asking Gnus to play
>> the role of a Bayesian spam filter a la spamassassin or rspamd.
>
> This is for a bona fide comp.lang.lisp newsgoups. Cannot do that.

Yeah, that wasn't terribly helpful advice.

> However, the following works for me now:
>
> (("subject"
>   ("[^a-z]\\{40\\}" -100 nil R)
>   ("[^a-z]\\{100\\}" -300 nil R)
>   ("[^a-z]\\{150\\}" -1000 nil R)
>   ))
>
> (note `R` instead of `r`!)

Glad you sorted it out! I'm still impressed Gnus can be a Bayesian
filter if you try.




^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-07-11 20:10 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-09  1:49 scoring based on a number of matches Sam Steingold
2019-07-09 15:39 ` Dave Marquardt
2019-07-09 15:47   ` Sam Steingold
2019-07-09 15:52 ` Lars Ingebrigtsen
2019-07-09 17:24   ` Sam Steingold
2019-07-09 20:10   ` Andreas Schwab
2019-07-09 17:45 ` Eric Abrahamsen
2019-07-11 19:46   ` Sam Steingold
2019-07-11 20:10     ` Eric Abrahamsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).