* scoring based on a number of matches
@ 2019-07-09 1:49 Sam Steingold
2019-07-09 15:39 ` Dave Marquardt
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Sam Steingold @ 2019-07-09 1:49 UTC (permalink / raw)
To: ding
Hi,
I want to down-score articles with many all-upper case words in the
subject.
Ideally, I would like a "multiplier", e.g.:
--8<---------------cut here---------------start------------->8---
(("subject"
("!" (0 -10 -3) nil s)))
--8<---------------cut here---------------end--------------->8---
would ignore the 1st "!", reduce the score by 10 for the second "!",
and by 3 for each after that.
In the meantime, I would settle for a regexp that would match long
subjects lines without any lower case characters (see comp.lang.lisp):
--8<---------------cut here---------------start------------->8---
Subject: E' PEDOFILO ED ASSASSINO: PAOLO CARDENÀ (FACEBOOK & TWITTER)! DI CRIMINALISSIMO BLOG VINCITORI E VINTI ( VEDRA' COME LO FAREMO DIVENIRE PARTE DELLA SECONDA CATEGORIA E NON PRIMA, CHE RICICLI DA SEMPRE SOLDI DI MAFIA, CAMORRA E NDRANGHETA O MENO..."O
--8<---------------cut here---------------end--------------->8---
I tried
--8<---------------cut here---------------start------------->8---
(("subject"
("[^a-z]\\{100\\}" -100 nil r)))
--8<---------------cut here---------------end--------------->8---
to no avail.
Thanks.
--
Sam Steingold (http://sds.podval.org/) on darwin Ns 10.3.1671
http://childpsy.net http://calmchildstories.com http://steingoldpsychology.com
http://thereligionofpeace.com http://think-israel.org
Democracy is like a car: you can ride it or you can run people over with it.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: scoring based on a number of matches
2019-07-09 1:49 scoring based on a number of matches Sam Steingold
@ 2019-07-09 15:39 ` Dave Marquardt
2019-07-09 15:47 ` Sam Steingold
2019-07-09 15:52 ` Lars Ingebrigtsen
2019-07-09 17:45 ` Eric Abrahamsen
2 siblings, 1 reply; 9+ messages in thread
From: Dave Marquardt @ 2019-07-09 15:39 UTC (permalink / raw)
To: Sam Steingold; +Cc: ding
Did you look at the Advanced Scoring section of the Gnus manual? Try
https://www.gnus.org/manual/gnus_105.html#Advanced-Scoring
-----------------------
On Mon, Jul 08 2019, Sam Steingold wrote:
Hi,
I want to down-score articles with many all-upper case words in the
subject.
Ideally, I would like a "multiplier", e.g.:
(("subject"
("!" (0 -10 -3) nil s)))
would ignore the 1st "!", reduce the score by 10 for the second "!",
and by 3 for each after that.
In the meantime, I would settle for a regexp that would match long
subjects lines without any lower case characters (see comp.lang.lisp):
Subject: E' PEDOFILO ED ASSASSINO: PAOLO CARDENÀ (FACEBOOK & TWITTER)! DI CRIMINALISSIMO BLOG VINCITORI E VINTI ( VEDRA' COME LO FAREMO DIVENIRE PARTE DELLA SECONDA CATEGORIA E NON PRIMA, CHE RICICLI DA SEMPRE SOLDI DI MAFIA, CAMORRA E NDRANGHETA O MENO..."O
I tried
(("subject"
("[^a-z]\\{100\\}" -100 nil r)))
to no avail.
Thanks.
--
Sam Steingold (http://sds.podval.org/) on darwin Ns 10.3.1671
http://childpsy.net http://calmchildstories.com http://steingoldpsychology.com
http://thereligionofpeace.com http://think-israel.org
Democracy is like a car: you can ride it or you can run people over with it.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: scoring based on a number of matches
2019-07-09 15:39 ` Dave Marquardt
@ 2019-07-09 15:47 ` Sam Steingold
0 siblings, 0 replies; 9+ messages in thread
From: Sam Steingold @ 2019-07-09 15:47 UTC (permalink / raw)
To: ding
> * Dave Marquardt <qnirzned@yvahk.iarg.voz.pbz> [2019-07-09 10:39:31 -0500]:
>
> Did you look at the Advanced Scoring section of the Gnus manual? Try
> https://www.gnus.org/manual/gnus_105.html#Advanced-Scoring
I did read it.
How it that supposed to help, specifically?
Thanks.
--
Sam Steingold (http://sds.podval.org/) on darwin Ns 10.3.1671
http://childpsy.net http://calmchildstories.com http://steingoldpsychology.com
http://memri.org http://iris.org.il http://www.memritv.org http://camera.org
I may be getting older, but I refuse to grow up!
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: scoring based on a number of matches
2019-07-09 1:49 scoring based on a number of matches Sam Steingold
2019-07-09 15:39 ` Dave Marquardt
@ 2019-07-09 15:52 ` Lars Ingebrigtsen
2019-07-09 17:24 ` Sam Steingold
2019-07-09 20:10 ` Andreas Schwab
2019-07-09 17:45 ` Eric Abrahamsen
2 siblings, 2 replies; 9+ messages in thread
From: Lars Ingebrigtsen @ 2019-07-09 15:52 UTC (permalink / raw)
To: Sam Steingold; +Cc: ding
Sam Steingold <sds@gnu.org> writes:
> In the meantime, I would settle for a regexp that would match long
> subjects lines without any lower case characters (see comp.lang.lisp):
>
> Subject: E' PEDOFILO ED ASSASSINO: PAOLO CARDENÀ (FACEBOOK & TWITTER)!
> DI CRIMINALISSIMO BLOG VINCITORI E VINTI ( VEDRA' COME LO FAREMO
> DIVENIRE PARTE DELLA SECONDA CATEGORIA E NON PRIMA, CHE RICICLI DA
> SEMPRE SOLDI DI MAFIA, CAMORRA E NDRANGHETA O MENO..."O
>
> I tried
>
> (("subject"
> ("[^a-z]\\{100\\}" -100 nil r)))
{100} isn't valid in Emacs regexps, I think? (Unless that's something
that's happened while I wasn't looking; which is always possible.)
So you have to [^a-z][^a-z][^a-z][^a-z][^a-z]... it a lot.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: scoring based on a number of matches
2019-07-09 15:52 ` Lars Ingebrigtsen
@ 2019-07-09 17:24 ` Sam Steingold
2019-07-09 20:10 ` Andreas Schwab
1 sibling, 0 replies; 9+ messages in thread
From: Sam Steingold @ 2019-07-09 17:24 UTC (permalink / raw)
To: ding
> * Lars Ingebrigtsen <ynefv@tahf.bet> [2019-07-09 17:52:20 +0200]:
>
> Sam Steingold <sds@gnu.org> writes:
>
>> In the meantime, I would settle for a regexp that would match long
>> subjects lines without any lower case characters (see comp.lang.lisp):
>>
>> Subject: E' PEDOFILO ED ASSASSINO: PAOLO CARDENÀ (FACEBOOK & TWITTER)!
>> DI CRIMINALISSIMO BLOG VINCITORI E VINTI ( VEDRA' COME LO FAREMO
>> DIVENIRE PARTE DELLA SECONDA CATEGORIA E NON PRIMA, CHE RICICLI DA
>> SEMPRE SOLDI DI MAFIA, CAMORRA E NDRANGHETA O MENO..."O
>>
>> I tried
>>
>> (("subject"
>> ("[^a-z]\\{100\\}" -100 nil r)))
>
> {100} isn't valid in Emacs regexps, I think? (Unless that's something
> that's happened while I wasn't looking; which is always possible.)
--8<---------------cut here---------------start------------->8---
(string-match "b\\{1,\\}" "abcbbd")
==> 1
(string-match "b\\{2\\}" "abcbbd")
==> 3
(string-match "b\\{3\\}" "abcbbd")
==> nil
--8<---------------cut here---------------end--------------->8---
--
Sam Steingold (http://sds.podval.org/) on darwin Ns 10.3.1671
http://childpsy.net http://calmchildstories.com http://steingoldpsychology.com
http://thereligionofpeace.com http://islamexposedonline.com
If you're being passed on the right, you're in the wrong lane.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: scoring based on a number of matches
2019-07-09 1:49 scoring based on a number of matches Sam Steingold
2019-07-09 15:39 ` Dave Marquardt
2019-07-09 15:52 ` Lars Ingebrigtsen
@ 2019-07-09 17:45 ` Eric Abrahamsen
2019-07-11 19:46 ` Sam Steingold
2 siblings, 1 reply; 9+ messages in thread
From: Eric Abrahamsen @ 2019-07-09 17:45 UTC (permalink / raw)
To: ding
Sam Steingold <sds@gnu.org> writes:
> Hi,
> I want to down-score articles with many all-upper case words in the
> subject.
> Ideally, I would like a "multiplier", e.g.:
Not the answer you wanted, but it sounds like you're asking Gnus to play
the role of a Bayesian spam filter a la spamassassin or rspamd. If you
have access to your mail server, I'd prefer using those kinds of tools
for this job, adding a spam-score header to messages that Gnus can then
score on. Perhaps that's infeasible/undesirable, but it seems to me like
you're going to end up kind of fighting with Gnus on this one.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: scoring based on a number of matches
2019-07-09 15:52 ` Lars Ingebrigtsen
2019-07-09 17:24 ` Sam Steingold
@ 2019-07-09 20:10 ` Andreas Schwab
1 sibling, 0 replies; 9+ messages in thread
From: Andreas Schwab @ 2019-07-09 20:10 UTC (permalink / raw)
To: Lars Ingebrigtsen; +Cc: Sam Steingold, ding
On Jul 09 2019, Lars Ingebrigtsen <larsi@gnus.org> wrote:
> {100} isn't valid in Emacs regexps, I think?
But \{100\} is.
> (Unless that's something that's happened while I wasn't looking; which
> is always possible.)
About 19 years ago.
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
"And now for something completely different."
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: scoring based on a number of matches
2019-07-09 17:45 ` Eric Abrahamsen
@ 2019-07-11 19:46 ` Sam Steingold
2019-07-11 20:10 ` Eric Abrahamsen
0 siblings, 1 reply; 9+ messages in thread
From: Sam Steingold @ 2019-07-11 19:46 UTC (permalink / raw)
To: ding
> * Eric Abrahamsen <revp@revpnoenunzfra.arg> [2019-07-09 10:45:39 -0700]:
>
> Sam Steingold <sds@gnu.org> writes:
>
>> I want to down-score articles with many all-upper case words in the
>> subject.
>> Ideally, I would like a "multiplier", e.g.:
>
> Not the answer you wanted, but it sounds like you're asking Gnus to play
> the role of a Bayesian spam filter a la spamassassin or rspamd.
This is for a bona fide comp.lang.lisp newsgoups. Cannot do that.
However, the following works for me now:
--8<---------------cut here---------------start------------->8---
(("subject"
("[^a-z]\\{40\\}" -100 nil R)
("[^a-z]\\{100\\}" -300 nil R)
("[^a-z]\\{150\\}" -1000 nil R)
))
--8<---------------cut here---------------end--------------->8---
(note `R` instead of `r`!)
--
Sam Steingold (http://sds.podval.org/) on darwin Ns 10.3.1671
http://childpsy.net http://calmchildstories.com http://steingoldpsychology.com
http://islamexposedonline.com https://ffii.org http://iris.org.il
Things that cannot be programmed in assembler have to be soldered.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: scoring based on a number of matches
2019-07-11 19:46 ` Sam Steingold
@ 2019-07-11 20:10 ` Eric Abrahamsen
0 siblings, 0 replies; 9+ messages in thread
From: Eric Abrahamsen @ 2019-07-11 20:10 UTC (permalink / raw)
To: ding
Sam Steingold <sds@gnu.org> writes:
>> * Eric Abrahamsen <revp@revpnoenunzfra.arg> [2019-07-09 10:45:39 -0700]:
>>
>> Sam Steingold <sds@gnu.org> writes:
>>
>>> I want to down-score articles with many all-upper case words in the
>>> subject.
>>> Ideally, I would like a "multiplier", e.g.:
>>
>> Not the answer you wanted, but it sounds like you're asking Gnus to play
>> the role of a Bayesian spam filter a la spamassassin or rspamd.
>
> This is for a bona fide comp.lang.lisp newsgoups. Cannot do that.
Yeah, that wasn't terribly helpful advice.
> However, the following works for me now:
>
> (("subject"
> ("[^a-z]\\{40\\}" -100 nil R)
> ("[^a-z]\\{100\\}" -300 nil R)
> ("[^a-z]\\{150\\}" -1000 nil R)
> ))
>
> (note `R` instead of `r`!)
Glad you sorted it out! I'm still impressed Gnus can be a Bayesian
filter if you try.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2019-07-11 20:10 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-09 1:49 scoring based on a number of matches Sam Steingold
2019-07-09 15:39 ` Dave Marquardt
2019-07-09 15:47 ` Sam Steingold
2019-07-09 15:52 ` Lars Ingebrigtsen
2019-07-09 17:24 ` Sam Steingold
2019-07-09 20:10 ` Andreas Schwab
2019-07-09 17:45 ` Eric Abrahamsen
2019-07-11 19:46 ` Sam Steingold
2019-07-11 20:10 ` Eric Abrahamsen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).