From: Tim Landscheidt <tim@tim-landscheidt.de>
To: Nasser Alkmim <nasser.alkmim@gmail.com>
Cc: ding@gnus.org
Subject: Re: Why am I getting duplicate messages on RSS groups?
Date: Wed, 17 Apr 2024 15:08:42 +0000 [thread overview]
Message-ID: <87ttk0do1x.fsf@vagabond.tim-landscheidt.de> (raw)
In-Reply-To: <86le5cv0b0.fsf@gmail.com> (Nasser Alkmim's message of "Wed, 17 Apr 2024 10:52:03 +0200")
Nasser Alkmim <nasser.alkmim@gmail.com> wrote:
> Not sure how to debug this situation, but some RSS feeds that I have in groups end up with duplicate messages.
> I use this "five filters full-text RSS" to extract the full text from some RSS feeds, and it has a limit of 3 items per feed and 12-hours refresh rate.
> Maybe after this 12-hours, the messages are obtained again.
> The duplicate messages have different "Message-ID", but same subject/date and everything else.
> Any ideas?
I'm not sure the /internal/ dates are actually the same: If
I write the data for such duplicate entries to disk (*1):
| (dolist (i '(58302 58461 58609 58757 58905 59053))
| (with-temp-file (format "/tmp/%d.el" i)
| (pp (cddr (assoc i nnrss-group-data)) (current-buffer))))
and diff them, some entries change from file to file
(pubDate, author, URL, etc.). For example, pubDate is:
| $ grep -i date /tmp/*.el
| /tmp/58302.el: (pubDate nil "Thu, 10 Dec 2020 22:01:00 GMT")
| /tmp/58461.el: (pubDate nil "Thu, 10 Dec 2020 22:01:00 -0400")
| /tmp/58609.el: (pubDate nil "Thu, 10 Dec 2020 22:01:00 -0400")
| /tmp/58757.el: (pubDate nil "Thu, 10 Dec 2020 22:01:00 -0400")
| /tmp/58905.el: (pubDate nil "Thu, 10 Dec 2020 22:01:00 +0000")
| /tmp/59053.el: (pubDate nil "Thu, 10 Dec 2020 22:01:00 +0100")
| $
But for all six messages, Gnus says:
| Date: Thu, 10 Dec 2020 22:01:00 +0000 (3 years, 18 weeks ago)
Now if I understand nnrss.el correctly, it considers two en-
tries the same if they only differ in fields listed in
nnrss-ignore-article-fields (which is 'slash:comments by de-
fault), so any changes to an RSS feed entry will create a
new Gnus nnrss message. What appears to be missing is
treating guid as an indicator that an entry has not changed.
Tim
(*1) There is probably also a way to do this in Emacs
(Lisp) itself.
next prev parent reply other threads:[~2024-04-17 15:09 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-17 8:52 Nasser Alkmim
2024-04-17 9:28 ` Emanuel Berg
2024-04-17 10:47 ` Nasser Alkmim
2024-04-26 5:42 ` Nasser Alkmim
2024-04-17 15:08 ` Tim Landscheidt [this message]
2024-04-26 15:16 ` James Thomas
2024-04-26 15:32 ` Eric S Fraga
2024-04-26 21:36 ` James Thomas
2024-04-28 10:02 ` Eric S Fraga
2024-04-27 6:54 ` Nasser Alkmim
2024-04-27 9:33 ` James Thomas
2024-04-27 13:07 ` Nasser Alkmim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ttk0do1x.fsf@vagabond.tim-landscheidt.de \
--to=tim@tim-landscheidt.de \
--cc=ding@gnus.org \
--cc=nasser.alkmim@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).