Gnus development mailing list
 help / color / mirror / Atom feed
From: Tim Landscheidt <tim@tim-landscheidt.de>
To: Nasser Alkmim <nasser.alkmim@gmail.com>
Cc: ding@gnus.org
Subject: Re: Why am I getting duplicate messages on RSS groups?
Date: Wed, 17 Apr 2024 15:08:42 +0000	[thread overview]
Message-ID: <87ttk0do1x.fsf@vagabond.tim-landscheidt.de> (raw)
In-Reply-To: <86le5cv0b0.fsf@gmail.com> (Nasser Alkmim's message of "Wed, 17 Apr 2024 10:52:03 +0200")

Nasser Alkmim <nasser.alkmim@gmail.com> wrote:

> Not sure how to debug this situation, but some RSS feeds that I have in groups end up with duplicate messages.

> I use this "five filters full-text RSS" to extract the full text from some RSS feeds, and it has a limit of 3 items per feed and 12-hours refresh rate.
> Maybe after this 12-hours, the messages are obtained again.

> The duplicate messages have different "Message-ID", but same subject/date and everything else.

> Any ideas?

I'm not sure the /internal/ dates are actually the same: If
I write the data for such duplicate entries to disk (*1):

| (dolist (i '(58302 58461 58609 58757 58905 59053))
|   (with-temp-file (format "/tmp/%d.el" i)
|     (pp (cddr (assoc i nnrss-group-data)) (current-buffer))))

and diff them, some entries change from file to file
(pubDate, author, URL, etc.).  For example, pubDate is:

| $ grep -i date /tmp/*.el
| /tmp/58302.el:       (pubDate nil "Thu, 10 Dec 2020 22:01:00 GMT")
| /tmp/58461.el:       (pubDate nil "Thu, 10 Dec 2020 22:01:00 -0400")
| /tmp/58609.el:       (pubDate nil "Thu, 10 Dec 2020 22:01:00 -0400")
| /tmp/58757.el:       (pubDate nil "Thu, 10 Dec 2020 22:01:00 -0400")
| /tmp/58905.el:       (pubDate nil "Thu, 10 Dec 2020 22:01:00 +0000")
| /tmp/59053.el:       (pubDate nil "Thu, 10 Dec 2020 22:01:00 +0100")
| $

But for all six messages, Gnus says:

| Date: Thu, 10 Dec 2020 22:01:00 +0000 (3 years, 18 weeks ago)

Now if I understand nnrss.el correctly, it considers two en-
tries the same if they only differ in fields listed in
nnrss-ignore-article-fields (which is 'slash:comments by de-
fault), so any changes to an RSS feed entry will create a
new Gnus nnrss message.  What appears to be missing is
treating guid as an indicator that an entry has not changed.

Tim

(*1)   There is probably also a way to do this in Emacs
       (Lisp) itself.


  parent reply	other threads:[~2024-04-17 15:09 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-17  8:52 Nasser Alkmim
2024-04-17  9:28 ` Emanuel Berg
2024-04-17 10:47   ` Nasser Alkmim
2024-04-26  5:42     ` Nasser Alkmim
2024-04-17 15:08 ` Tim Landscheidt [this message]
2024-04-26 15:16   ` James Thomas
2024-04-26 15:32     ` Eric S Fraga
2024-04-26 21:36       ` James Thomas
2024-04-28 10:02         ` Eric S Fraga
2024-04-27  6:54     ` Nasser Alkmim
2024-04-27  9:33       ` James Thomas
2024-04-27 13:07         ` Nasser Alkmim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ttk0do1x.fsf@vagabond.tim-landscheidt.de \
    --to=tim@tim-landscheidt.de \
    --cc=ding@gnus.org \
    --cc=nasser.alkmim@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).