From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/9037 Path: main.gmane.org!not-for-mail From: Felix Lee Newsgroups: gmane.emacs.gnus.general Subject: Re: adaptive word scoring Date: Fri, 29 Nov 1996 14:48:47 -0800 Sender: flee@teleport.com Message-ID: <199611292248.OAA17024@kim.teleport.com> References: NNTP-Posting-Host: coloc-standby.netfonds.no X-Trace: main.gmane.org 1035149123 16087 80.91.224.250 (20 Oct 2002 21:25:23 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Sun, 20 Oct 2002 21:25:23 +0000 (UTC) Return-Path: Original-Received: (qmail 19294 invoked from smtpd); 29 Nov 1996 23:06:05 -0000 Original-Received: from ifi.uio.no (0@129.240.64.2) by deanna.miranova.com with SMTP; 29 Nov 1996 23:06:04 -0000 Original-Received: from kim.teleport.com (kim.teleport.com [192.108.254.26]) by ifi.uio.no with ESMTP (8.6.11/ifi2.4) id for ; Fri, 29 Nov 1996 23:48:53 +0100 Original-Received: from teleport.com (pdx2-19.transport.com [206.251.84.28]) by kim.teleport.com (8.8.3/8.7.3) with ESMTP id OAA17024 for ; Fri, 29 Nov 1996 14:48:43 -0800 (PST) Original-To: ding@ifi.uio.no In-reply-to: Your message of 29 Nov 1996 09:09:22 +0100. Xref: main.gmane.org gmane.emacs.gnus.general:9037 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:9037 Kai Grossjohann: > IR people deal with this problem by using tf*idf (i == inverse); > actually it's log(n/N) where n is the term frequency and N the > document frequency. hmm. something like that might help. lemme think about it. but offhand, it's still going to overemphasize null words (like "version") that are medium-low frequency. and there's still the problem that adaptive scoring in general tends to let scores grow without bound in a meaningless way. > I have no idea, though, how to estimate the idf for a newsgroup. Any just count frequency over subjects being scored. it's accurate enough for this purpose. --