>>>>> Sometime around 06 Dec 1996 11:39:51 +0100,
>>>>> in article <m2ral40xe0.fsf@proletcult.slip.ifi.uio.no>,
>>>>> someone posing as Lars Magne Ingebrigtsen <larsi@ifi.uio.no> wrote:

Lars> Sean Lynch <seanl@Internex.NET> writes:
>> I remember reading earlier in this thread about the possibility of
>> rating words based on interestingness, and I think this is probably
>> the way to go.  The fundamental theorem of information theory tells
>> us that the value of any piece of information is inversely
>> proportional to its probability of occurrence.  Therefore, we
>> should keep some sort of history of the number of occurrences of
>> each word in the adaptive scoring criteria (i.e. the subject lines)
>> and estimate the probability of each word's occurrence, weighting
>> the affect of each word on the final score by the inverse of the
>> probability.

Lars> Would it suffice to calculate this on the fly (from the articles
Lars> currently in the summary buffer), or does this have to be stored
Lars> in a database?

If we only use the articles in the current summary buffer, our
estimates of word probabilities would vary wildly depending on the
number of articles, day of the week, direction of the wind, etc.  So I
think that we should probably keep the counts in some sort of
database, unless someone has a better idea.