* Recognizing repeats in RSS feeds @ 2009-01-16 18:12 Desmond Rivet 2009-01-16 21:08 ` Robert D. Crawford ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Desmond Rivet @ 2009-01-16 18:12 UTC (permalink / raw) To: info-gnus-english Hi all, In addition to reading news and email, I use Gnus to keep track of various RSS feeds. For some of these feeds, certain articles will, over time, show up repeatedly in my summary list. I'm not sure why, but I assume it has something to do with updates to the article itself. Or maybe it happens when someone posts a new comment on the article. I don't know. I have threading enabled on these RSS groups, so the repeated articles at least get put under one thread, which is good. However, they still show up as "new" articles in my group. I have to go in the group and verify that the article is a repeat. It's a bit of a pain. Is there any way to score a repeated (updated) article down, so that they wouldn't show up in my group unless I asked? I have no idea where to even start with this; a simple push in the right direction would be appreciated. Thanks in advance. -- Desmond Rivet Pain is weakness leaving the body. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Recognizing repeats in RSS feeds 2009-01-16 18:12 Recognizing repeats in RSS feeds Desmond Rivet @ 2009-01-16 21:08 ` Robert D. Crawford 2009-01-16 22:05 ` Ted Zlatanov 2009-01-22 3:15 ` Mark Plaksin 2 siblings, 0 replies; 8+ messages in thread From: Robert D. Crawford @ 2009-01-16 21:08 UTC (permalink / raw) To: info-gnus-english Desmond Rivet <desmond_news@videotron.ca> writes: > For some of these feeds, certain articles will, over time, show up > repeatedly in my summary list. I'm not sure why, but I assume it has > something to do with updates to the article itself. Or maybe it happens > when someone posts a new comment on the article. I don't know. I'm not sure it is any of these, but it could be. It happens to me on _many_ feeds. > Is there any way to score a repeated (updated) article down, so that > they wouldn't show up in my group unless I asked? I have no idea where > to even start with this; a simple push in the right direction would be > appreciated. What I did was to change my reading habit in these groups a bit. Instead of just letting the article be marked as read I kill the article. In my SCORE file I have it set to drop the score whenever I kill the article and have mark-and-expunge set to -1. There might be a better or cleaner way to do this but it works. rdc -- Robert D. Crawford rdc1x@comcast.net ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Recognizing repeats in RSS feeds 2009-01-16 18:12 Recognizing repeats in RSS feeds Desmond Rivet 2009-01-16 21:08 ` Robert D. Crawford @ 2009-01-16 22:05 ` Ted Zlatanov 2009-01-21 1:22 ` Desmond Rivet 2009-01-22 3:15 ` Mark Plaksin 2 siblings, 1 reply; 8+ messages in thread From: Ted Zlatanov @ 2009-01-16 22:05 UTC (permalink / raw) To: info-gnus-english On Fri, 16 Jan 2009 13:12:37 -0500 Desmond Rivet <desmond_news@videotron.ca> wrote: DR> In addition to reading news and email, I use Gnus to keep track of DR> various RSS feeds. DR> For some of these feeds, certain articles will, over time, show up DR> repeatedly in my summary list. I'm not sure why, but I assume it has DR> something to do with updates to the article itself. Or maybe it happens DR> when someone posts a new comment on the article. I don't know. ... DR> Is there any way to score a repeated (updated) article down, so that DR> they wouldn't show up in my group unless I asked? I have no idea where DR> to even start with this; a simple push in the right direction would be DR> appreciated. You want to ignore updates which only affect irrelevant fields. Here's how I do it: (setq nnrss-ignore-article-fields '(description slash:comments slash:hit_parade)) This works for me to eliminate duplicates completely; "description" changes very frequently on some sites for instance. nnrss finds unique articles by taking all their fields that are not ignored and hashing the content. To find out exactly what's happening, set gnus-verbose to 10 and refresh a nnrss group. You have to have a recent CVS Gnus to use this. I added it fairly recently. In *Messages* you'll see a full dump of the RSS segment that describes each article, and from that you can easily figure out what's causing duplicates. For example, here's one entry from the Dilbert Blog: nnrss: Making hash index of (item nil " " (title nil "From Blog to Reality: Three Interesting Things") " " (link nil "http://dilbert.com/blog/entry/from_blog_to_reality_three_things/") " " (description nil "...cut because it's too much text...") " " (pubDate nil "Fri, 16 Jan 2009 01:00:01 PST") " " (guid ((isPermaLink . "false")) "http://dilbert.com/blog/entry/203/") " ") So the fields here are guid, pubDate, title, link, and description. If you need more help, tell us what feeds specifically are causing the problem and I can take a look. Ted ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Recognizing repeats in RSS feeds 2009-01-16 22:05 ` Ted Zlatanov @ 2009-01-21 1:22 ` Desmond Rivet 2009-01-21 7:21 ` Adam Sjøgren ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Desmond Rivet @ 2009-01-21 1:22 UTC (permalink / raw) To: info-gnus-english Ted Zlatanov <tzz@lifelogs.com> writes: > On Fri, 16 Jan 2009 13:12:37 -0500 Desmond Rivet > <desmond_news@videotron.ca> wrote: > > DR> In addition to reading news and email, I use Gnus to keep track of > DR> various RSS feeds. > > DR> For some of these feeds, certain articles will, over time, show up > DR> repeatedly in my summary list. I'm not sure why, but I assume it has > DR> something to do with updates to the article itself. Or maybe it happens > DR> when someone posts a new comment on the article. I don't know. > ... > DR> Is there any way to score a repeated (updated) article down, so that > DR> they wouldn't show up in my group unless I asked? I have no idea where > DR> to even start with this; a simple push in the right direction would be > DR> appreciated. > > You want to ignore updates which only affect irrelevant fields. Here's > how I do it: > > (setq nnrss-ignore-article-fields '(description slash:comments slash:hit_parade)) > > This works for me to eliminate duplicates completely; "description" > changes very frequently on some sites for instance. nnrss finds unique > articles by taking all their fields that are not ignored and hashing the > content. > > To find out exactly what's happening, set gnus-verbose to 10 and refresh > a nnrss group. You have to have a recent CVS Gnus to use this. I added > it fairly recently. In *Messages* you'll see a full dump of the RSS > segment that describes each article, and from that you can easily figure > out what's causing duplicates. > > For example, here's one entry from the Dilbert Blog: > > nnrss: Making hash index of (item nil " > " (title nil "From Blog to Reality: Three Interesting Things") " > " (link nil "http://dilbert.com/blog/entry/from_blog_to_reality_three_things/") " > " (description nil "...cut because it's too much text...") " > " (pubDate nil "Fri, 16 Jan 2009 01:00:01 PST") " > " (guid ((isPermaLink . "false")) "http://dilbert.com/blog/entry/203/") " > ") > > So the fields here are guid, pubDate, title, link, and description. > > If you need more help, tell us what feeds specifically are causing the > problem and I can take a look. Thanks for the reply. However, I'm somewhat confused (not by your directions, but rather by what I'm seeing) So, I've started examining my RSS feeds. I'll use Slashdot as an example since alot of people read it. What I did was the following : 1. made a backup of the directory that stores my downloaded rss feeds. 2. waited until my Slashdot group was updated and I got a repeated item. 3. compared a selected item from the saved backup Slashdot rss file to a selected item from the current Slashdot rss file. If I understand how this works, there should be some sort of textual difference between the old item and the new, yes? (this is all very low tech, bear with me) So, I picked a item at random from the current rss file, pasted the xml fragment into a buffer, did the same with the saved rss file, and did a diff. I get the following: 11,12c11,12 < <slash:comments>770</slash:comments> < <slash:hit_parade>770,762,595,490,138,86,71</slash:hit_parade> --- > <slash:comments>757</slash:comments> > <slash:hit_parade>757,749,587,482,133,83,69</slash:hit_parade> So far, so good. This tells me that the slash:comments and slash:hit_parade fields are the culprits, right? So I do this in my .gnus.el: (setq nnrss-ignore-article-fields '(slash:comments slash:hit_parade)) And restart emacs. However, I *still* get spurious updates of the same article in Slashdot. So I take your advice and do this: (setq gnus-verbose 10) And hit M-g in Slashdot. Picking another article at random, I see this: nnrss: Making hash index of (item ((rdf:about . "http://it.slashdot.org/article.pl?sid=09/01/20/1930252&from=rss")) " " (title nil "Largest Data Breach Disclosed During Inauguration") " " (link nil "http://rss.slashdot.org/~r/Slashdot/slashdot/~3/iHBmFGKE504/article.pl") " " (description nil "rmogull writes \"Brian Krebs over at <snip>") " " (dc:creator nil "kdawson") " " (dc:date nil "2009-01-20T19:44:00+00:00") " " (dc:subject nil "security") " " (slash:department nil "debit-cards-at-risk") " " (slash:section nil "it") " " (slash:comments nil "121") " " (slash:hit_parade nil "121,117,99,80,24,16,13") " " (feedburner:origLink nil "http://it.slashdot.org/article.pl?sid=09%2F01%2F20%2F1930252&from=rss")) Note the presence of slash:comments and slash:hit_parade. Am I to understand that the slash:comments and slash:hit_parade fields are still contributing to the hash? I should mention I'm using GNU Emacs 23.0.60.1. Thanks in advance for any insight! -- Desmond Rivet Pain is weakness leaving the body. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Recognizing repeats in RSS feeds 2009-01-21 1:22 ` Desmond Rivet @ 2009-01-21 7:21 ` Adam Sjøgren 2009-01-21 19:38 ` Desmond Rivet 2009-01-21 21:46 ` Ted Zlatanov 2 siblings, 0 replies; 8+ messages in thread From: Adam Sjøgren @ 2009-01-21 7:21 UTC (permalink / raw) To: info-gnus-english On Tue, 20 Jan 2009 20:22:16 -0500, Desmond wrote: > Note the presence of slash:comments and slash:hit_parade. Am I to > understand that the slash:comments and slash:hit_parade fields are still > contributing to the hash? What is shown is all fields, before the ignored fields are removed; see: http://article.gmane.org/gmane.emacs.gnus.general/67806/ Best regards, -- "Remember, Robert, in life anything can happen." Adam Sjøgren asjo@koldfront.dk ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Recognizing repeats in RSS feeds 2009-01-21 1:22 ` Desmond Rivet 2009-01-21 7:21 ` Adam Sjøgren @ 2009-01-21 19:38 ` Desmond Rivet 2009-01-21 21:46 ` Ted Zlatanov 2 siblings, 0 replies; 8+ messages in thread From: Desmond Rivet @ 2009-01-21 19:38 UTC (permalink / raw) To: info-gnus-english Desmond Rivet <desmond_news@videotron.ca> writes: > > Thanks for the reply. However, I'm somewhat confused (not by your > directions, but rather by what I'm seeing) Errr...I think I found the problem. I was doing this: (setq nnrss-ignore-article-field '(slash:comments slash:hit_parade)) Note the missing 's'. It should be this: (setq nnrss-ignore-article-fields '(slash:comments slash:hit_parade)) I am very embarassed. Sorry for wasting everyone's time. Things appear to be working better now :) -- Desmond Rivet Pain is weakness leaving the body. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Recognizing repeats in RSS feeds 2009-01-21 1:22 ` Desmond Rivet 2009-01-21 7:21 ` Adam Sjøgren 2009-01-21 19:38 ` Desmond Rivet @ 2009-01-21 21:46 ` Ted Zlatanov 2 siblings, 0 replies; 8+ messages in thread From: Ted Zlatanov @ 2009-01-21 21:46 UTC (permalink / raw) To: info-gnus-english On Tue, 20 Jan 2009 20:22:16 -0500 Desmond Rivet <desmond_news@videotron.ca> wrote: DR> Note the presence of slash:comments and slash:hit_parade. Am I to DR> understand that the slash:comments and slash:hit_parade fields are still DR> contributing to the hash? Adam answered, but I just wanted to explain the reasoning. I debated this, but decided to show the article before removing those fields. They don't contribute to the hash, but could be important to the user, especially if the user wants to know if they can be re-enabled. On Wed, 21 Jan 2009 14:38:29 -0500 Desmond Rivet <desmond_news@videotron.ca> wrote: DR> Errr...I think I found the problem. I was doing this: DR> (setq nnrss-ignore-article-field '(slash:comments slash:hit_parade)) DR> Note the missing 's'. It should be this: DR> (setq nnrss-ignore-article-fields '(slash:comments slash:hit_parade)) DR> I am very embarassed. Sorry for wasting everyone's time. Things appear DR> to be working better now :) I'm glad you found the issue. This is why I always recommend using Customize at first--you would have noticed there was no such variable. More importantly, things work now :) I know it's frustrating to have duplicate RSS entries. Ted ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Recognizing repeats in RSS feeds 2009-01-16 18:12 Recognizing repeats in RSS feeds Desmond Rivet 2009-01-16 21:08 ` Robert D. Crawford 2009-01-16 22:05 ` Ted Zlatanov @ 2009-01-22 3:15 ` Mark Plaksin 2 siblings, 0 replies; 8+ messages in thread From: Mark Plaksin @ 2009-01-22 3:15 UTC (permalink / raw) To: info-gnus-english Desmond Rivet <desmond_news@videotron.ca> writes: > Hi all, > > In addition to reading news and email, I use Gnus to keep track of > various RSS feeds. > > For some of these feeds, certain articles will, over time, show up > repeatedly in my summary list. I'm not sure why, but I assume it has > something to do with updates to the article itself. Or maybe it happens > when someone posts a new comment on the article. I don't know. FWIW, I recently switched from nnrss to nnshimbun (part of emacs-w3m) and this problem has essentially disappeared. I'm very happy with the switch. Here's a blog entry by the guy who recently added shimbun-use-local which allows you to fetch feeds (and other shimbuns) via an external script: http://www.randomsample.de/dru5/node/45 ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2009-01-22 3:15 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-01-16 18:12 Recognizing repeats in RSS feeds Desmond Rivet 2009-01-16 21:08 ` Robert D. Crawford 2009-01-16 22:05 ` Ted Zlatanov 2009-01-21 1:22 ` Desmond Rivet 2009-01-21 7:21 ` Adam Sjøgren 2009-01-21 19:38 ` Desmond Rivet 2009-01-21 21:46 ` Ted Zlatanov 2009-01-22 3:15 ` Mark Plaksin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).