From: Holger Franz <hfranz@physik.rwth-aachen.de>
Subject: Re: adaptive word scoring
Date: 03 Dec 1996 14:51:58 +0100 [thread overview]
Message-ID: <vcg21npwg1.fsf@ac3a50.physik.rwth-aachen.de> (raw)
In-Reply-To: Robert Bihlmeyer's message of Mon, 2 Dec 1996 16:08:05 +0100
Just to toss in my two cents: How about adding the exponent feature
from the procmail scoring mechanism? It provides a flexible method to
keep scores from frequent matches low.
>From `man procmailsc`:
---8<---
Weighted regular expression conditions
The first time the regular expression is found, it will
add w to the score. The second time it is found, w*x will
be added. The third time it is found, w*x*x will be
added. The fourth time w*x*x*x will be added. And so
forth.
This can be described by the following concise formula:
n
n k-1 x - 1
w * Sum x = w * -------
k=1 x - 1
It represents the total added score for this condition if
n matches are found.
Note that the following case distinctions can be made:
x=0 Only the first match will contribute w to the
score. Any subsequent matches are ignored.
x=1 Every match will contribute the same w to the
score. The score grows linearly with the number
of matches found.
0<x<1 Every match will contribute less to the score than
the previous one. The score will asymptotically
approach a certain value (see the NOTES section
below).
1<x Every match will contribute more to the score than
the previous one. The score will grow exponen-
tionally.
x<0 Can be utilised to favour odd or even number of
matches.
---8<---
I think that could be implemented easily if one added to each rule a
counter that reflects how often the rule was already 'adapted'. There
is one major problem though: with an exponent 0<x<1 the first adaption
is likely to dominate over later adaptions. So maybe there will have
to be separate counters for positive and negative adaptions. If
positive and negative adaptions were tuned in such a way that they had
roughly the same asymptotic value, 'meaningless' words that are raised
and lowered randomly would gain a comparatively low net score.
All this sounds very alpha, but may be worth some more thought.
Holger
--
Holger Franz <hfranz@physik.rwth-aachen.de>
Caution: feeding Gnus to your XEmacs will make it fat.
next prev parent reply other threads:[~1996-12-03 13:51 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
1996-11-29 5:25 Felix Lee
1996-11-29 8:09 ` Kai Grossjohann
1996-11-29 22:48 ` Felix Lee
1996-11-30 13:18 ` Lars Magne Ingebrigtsen
1996-12-01 8:39 ` Felix Lee
1996-11-29 15:45 ` Jan Vroonhof
1996-11-30 2:28 ` Felix Lee
1996-12-02 9:37 ` Steinar Bang
1996-12-02 9:40 ` Wesley.Hardaker
1996-12-05 18:49 ` Lars Magne Ingebrigtsen
1996-12-06 8:18 ` Wesley.Hardaker
1996-12-02 11:46 ` Hans de Graaff
1996-12-02 15:08 ` Robert Bihlmeyer
1996-12-05 18:50 ` Lars Magne Ingebrigtsen
1996-12-05 21:21 ` Sean Lynch
1996-12-06 10:39 ` Lars Magne Ingebrigtsen
1996-12-08 22:19 ` Sean Lynch
1996-12-11 0:44 ` Lars Magne Ingebrigtsen
1996-12-06 21:02 ` Janne Sinkkonen
1996-12-08 22:48 ` Sean Lynch
1996-12-10 22:25 ` nnspool virtual server shows funny numbers of articles C. R. Oldham
1996-12-11 0:42 ` Lars Magne Ingebrigtsen
[not found] ` <vcn2vvixpz.fsf@totally-fudged-out-message-id>
1996-12-03 13:51 ` Holger Franz [this message]
-- strict thread matches above, loose matches on Subject: below --
1996-10-31 1:34 Adaptive word scoring Sten Drescher
1996-11-05 15:51 ` Robert Bihlmeyer
1996-11-05 17:16 ` Per Abrahamsen
1996-11-05 21:24 ` Lars Magne Ingebrigtsen
1996-11-05 21:25 ` Lars Magne Ingebrigtsen
1996-08-04 2:57 Lars Magne Ingebrigtsen
1996-08-04 17:19 ` François Pinard
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=vcg21npwg1.fsf@ac3a50.physik.rwth-aachen.de \
--to=hfranz@physik.rwth-aachen.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).