Gnus development mailing list
 help / color / mirror / Atom feed
From: Sean Lynch <seanl@Internex.NET>
Cc: ding@ifi.uio.no
Subject: Re: adaptive word scoring
Date: 05 Dec 1996 13:21:45 -0800	[thread overview]
Message-ID: <rhsafrsn0uu.fsf@internex.net> (raw)
In-Reply-To: Lars Magne Ingebrigtsen's message of 05 Dec 1996 19:50:56 +0100

[-- Attachment #1: Type: text/plain, Size: 2484 bytes --]

I remember reading earlier in this thread about the possibility of
rating words based on interestingness, and I think this is probably
the way to go.  The fundamental theorem of information theory tells us
that the value of any piece of information is inversely proportional
to its probability of occurrence.  Therefore, we should keep some sort
of history of the number of occurrences of each word in the adaptive
scoring criteria (i.e. the subject lines) and estimate the probability
of each word's occurrence, weighting the affect of each word on the
final score by the inverse of the probability.

Multiple occurrences of the same word in a given line probably should
not be counted, because I can't think of any situation where we'd want
to score an article differently just because a word occurred more than
once in the subject line.  However, I guess maybe people would want to
score an article lower if it had the word "free" more than once in the
subject line.

The final score given to a word would be, using the example Robert
gave, -50*(probability of the occurrence of this word)/(sum of
probabilities of all distinct words in this line)

The score of the word in the database would be adjusted by adding
(old score - new score)/c to it, where c is the speed of light.  C
could decrease over time so that scores would stabilize, though this
would cause scores to stop adapting eventually.  

Obviously, there would be some sort of thresholding function to drop
words with a large probability of occurrence.

>>>>> Sometime around 05 Dec 1996 19:50:56 +0100,
>>>>> in article <m2ral4rfjj.fsf@proletcult.slip.ifi.uio.no>,
>>>>> someone posing as Lars Magne Ingebrigtsen <larsi@ifi.uio.no> wrote:

Lars> Robert Bihlmeyer <e9426626@student.tuwien.ac.at> writes:
>> (I think I've already suggested this once:) IMHO word-scoring
>> should derive it's scores from the normal adaptive (line-)scores
>> (by deviding the line-score by the number of words in the
>> subject). So if killed articles would get -50 for their subject,
>> words from the subject should be scored with -50/(number of words)
>> each (ignored words are not include in this count). In this way,
>> word-scoring mimics the effects for line-scoring for single
>> subjects, but has the intended side-effects on others. Furthermore,
>> the scores would by a magnitude smaller than the line-scores,
>> giving the latter priority.

Lars> I think that sounds reasonable...  Anybody have any thoughts on
Lars> this?

[-- Attachment #2: Type: text/plain, Size: 228 bytes --]

Sean Lynch,  Internex Network Operations                 <noc@internex.net>
Voice: +1 408 327 2200  Fax: +1 408 496 5484  <URL:http://www.internex.net>
Technical support: <support@internex.net> <URL:http://support.internex.net>

  reply	other threads:[~1996-12-05 21:21 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
1996-11-29  5:25 Felix Lee
1996-11-29  8:09 ` Kai Grossjohann
1996-11-29 22:48   ` Felix Lee
1996-11-30 13:18     ` Lars Magne Ingebrigtsen
1996-12-01  8:39       ` Felix Lee
1996-11-29 15:45 ` Jan Vroonhof
1996-11-30  2:28   ` Felix Lee
1996-12-02  9:37   ` Steinar Bang
1996-12-02  9:40 ` Wesley.Hardaker
1996-12-05 18:49   ` Lars Magne Ingebrigtsen
1996-12-06  8:18     ` Wesley.Hardaker
1996-12-02 11:46 ` Hans de Graaff
1996-12-02 15:08   ` Robert Bihlmeyer
1996-12-05 18:50     ` Lars Magne Ingebrigtsen
1996-12-05 21:21       ` Sean Lynch [this message]
1996-12-06 10:39         ` Lars Magne Ingebrigtsen
1996-12-08 22:19           ` Sean Lynch
1996-12-11  0:44             ` Lars Magne Ingebrigtsen
1996-12-06 21:02         ` Janne Sinkkonen
1996-12-08 22:48           ` Sean Lynch
1996-12-10 22:25             ` nnspool virtual server shows funny numbers of articles C. R. Oldham
1996-12-11  0:42               ` Lars Magne Ingebrigtsen
     [not found]   ` <vcn2vvixpz.fsf@totally-fudged-out-message-id>
1996-12-03 13:51     ` adaptive word scoring Holger Franz
  -- strict thread matches above, loose matches on Subject: below --
1996-10-31  1:34 Adaptive " Sten Drescher
1996-11-05 15:51 ` Robert Bihlmeyer
1996-11-05 17:16   ` Per Abrahamsen
1996-11-05 21:24   ` Lars Magne Ingebrigtsen
1996-11-05 21:25 ` Lars Magne Ingebrigtsen
1996-08-04  2:57 Lars Magne Ingebrigtsen
1996-08-04 17:19 ` François Pinard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=rhsafrsn0uu.fsf@internex.net \
    --to=seanl@internex.net \
    --cc=ding@ifi.uio.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).