From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/9030 Path: main.gmane.org!not-for-mail From: Felix Lee Newsgroups: gmane.emacs.gnus.general Subject: adaptive word scoring Date: Thu, 28 Nov 1996 21:25:08 -0800 Sender: flee@teleport.com Message-ID: <199611290525.VAA00464@kim.teleport.com> NNTP-Posting-Host: coloc-standby.netfonds.no X-Trace: main.gmane.org 1035149117 16053 80.91.224.250 (20 Oct 2002 21:25:17 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Sun, 20 Oct 2002 21:25:17 +0000 (UTC) Return-Path: Original-Received: (qmail 17849 invoked from smtpd); 29 Nov 1996 08:15:36 -0000 Original-Received: from ifi.uio.no (0@129.240.64.2) by deanna.miranova.com with SMTP; 29 Nov 1996 08:15:34 -0000 Original-Received: from kim.teleport.com (kim.teleport.com [192.108.254.26]) by ifi.uio.no with ESMTP (8.6.11/ifi2.4) id for ; Fri, 29 Nov 1996 08:57:33 +0100 Original-Received: from teleport.com (ip-pdx18-12.teleport.com [206.163.125.119]) by kim.teleport.com (8.8.3/8.7.3) with ESMTP id VAA00464 for ; Thu, 28 Nov 1996 21:25:45 -0800 (PST) Original-To: ding@ifi.uio.no Xref: main.gmane.org gmane.emacs.gnus.general:9030 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:9030 so after using adaptive word scoring for a while, I've decided that it's mostly useless. say you're an avid fan of alt.sex.pictures.emacs. the word "gif" is fairly common and mostly neutral: you can't tell if an article is interesting based on the word "gif". however, adaptive scoring treats "gif" as significant in an odd way. if you kill a massive series of "vi pinup gif"s, then adaptive scoring is going to reduce the score of "gif" by an amount proportional to the number of articles you've killed. this significantly affects the score of those really sexy emacs gifs. ok, you could add "gif" to the ignored-word list, but this is just one instance of a more general problem. my current thoughts are: - adaptive scoring should try to discover _useful_ discriminants by comparing interesting v. uninteresting articles. the ignored-word list should be unnecessary. - rather than adjusting score by N for every article marked, marked articles should be assigned a score target, and adaptive-scoring elements should be adjusted to try to hit the target. comments? I'm not sure how to implement this, yet. --