Re: adaptive word scoring

Gnus development mailing list
 help / color / mirror / Atom feed

From: Holger Franz <hfranz@physik.rwth-aachen.de>
Subject: Re: adaptive word scoring
Date: 03 Dec 1996 14:51:58 +0100	[thread overview]
Message-ID: <vcg21npwg1.fsf@ac3a50.physik.rwth-aachen.de> (raw)
In-Reply-To: Robert Bihlmeyer's message of Mon, 2 Dec 1996 16:08:05 +0100

Just to toss in my two cents: How about adding the exponent feature
from the procmail scoring mechanism? It provides a flexible method to
keep scores from frequent matches low.

>From `man procmailsc`:
---8<---
Weighted regular expression conditions
       The  first  time  the regular expression is found, it will
       add w to the score.  The second time it is found, w*x will
       be  added.   The  third  time  it  is found, w*x*x will be
       added.  The fourth time w*x*x*x will  be  added.   And  so
       forth.

       This can be described by the following concise formula:

                                   n
                   n   k-1        x - 1
              w * Sum x    = w * -------
                  k=1             x - 1

       It  represents the total added score for this condition if
       n matches are found.

       Note that the following case distinctions can be made:

       x=0     Only the first match  will  contribute  w  to  the
               score.  Any subsequent matches are ignored.

       x=1     Every  match  will  contribute  the  same w to the
               score.  The score grows linearly with  the  number
               of matches found.

       0<x<1   Every match will contribute less to the score than
               the previous one.  The score  will  asymptotically
               approach  a  certain  value (see the NOTES section
               below).

       1<x     Every match will contribute more to the score than
               the  previous  one.   The score will grow exponen-
               tionally.

       x<0     Can be utilised to favour odd or  even  number  of
               matches.
---8<---

I think that could be implemented easily if one added to each rule a
counter that reflects how often the rule was already 'adapted'. There
is one major problem though: with an exponent 0<x<1 the first adaption
is likely to dominate over later adaptions. So maybe there will have
to be separate counters for positive and negative adaptions. If
positive and negative adaptions were tuned in such a way that they had
roughly the same asymptotic value, 'meaningless' words that are raised
and lowered randomly would gain a comparatively low net score.

All this sounds very alpha, but may be worth some more thought.

Holger 

-- 
Holger Franz <hfranz@physik.rwth-aachen.de>                             

Caution: feeding Gnus to your XEmacs will make it fat.

next prev parent reply	other threads:[~1996-12-03 13:51 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
1996-11-29  5:25 Felix Lee
1996-11-29  8:09 ` Kai Grossjohann
1996-11-29 22:48   ` Felix Lee
1996-11-30 13:18     ` Lars Magne Ingebrigtsen
1996-12-01  8:39       ` Felix Lee
1996-11-29 15:45 ` Jan Vroonhof
1996-11-30  2:28   ` Felix Lee
1996-12-02  9:37   ` Steinar Bang
1996-12-02  9:40 ` Wesley.Hardaker
1996-12-05 18:49   ` Lars Magne Ingebrigtsen
1996-12-06  8:18     ` Wesley.Hardaker
1996-12-02 11:46 ` Hans de Graaff
1996-12-02 15:08   ` Robert Bihlmeyer
1996-12-05 18:50     ` Lars Magne Ingebrigtsen
1996-12-05 21:21       ` Sean Lynch
1996-12-06 10:39         ` Lars Magne Ingebrigtsen
1996-12-08 22:19           ` Sean Lynch
1996-12-11  0:44             ` Lars Magne Ingebrigtsen
1996-12-06 21:02         ` Janne Sinkkonen
1996-12-08 22:48           ` Sean Lynch
1996-12-10 22:25             ` nnspool virtual server shows funny numbers of articles C. R. Oldham
1996-12-11  0:42               ` Lars Magne Ingebrigtsen
     [not found]   ` <vcn2vvixpz.fsf@totally-fudged-out-message-id>
1996-12-03 13:51     ` Holger Franz [this message]
  -- strict thread matches above, loose matches on Subject: below --
1996-10-31  1:34 Adaptive word scoring Sten Drescher
1996-11-05 15:51 ` Robert Bihlmeyer
1996-11-05 17:16   ` Per Abrahamsen
1996-11-05 21:24   ` Lars Magne Ingebrigtsen
1996-11-05 21:25 ` Lars Magne Ingebrigtsen
1996-08-04  2:57 Lars Magne Ingebrigtsen
1996-08-04 17:19 ` François Pinard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=vcg21npwg1.fsf@ac3a50.physik.rwth-aachen.de \
    --to=hfranz@physik.rwth-aachen.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).