From: James Thomas <jimjoe@gmx.net>
To: ding@gnus.org
Subject: Re: Why am I getting duplicate messages on RSS groups?
Date: Fri, 26 Apr 2024 20:46:35 +0530 [thread overview]
Message-ID: <87jzkk6tnw.fsf@gmx.net> (raw)
In-Reply-To: <87ttk0do1x.fsf@vagabond.tim-landscheidt.de> (Tim Landscheidt's message of "Wed, 17 Apr 2024 15:08:42 +0000")
Tim Landscheidt wrote:
> Nasser Alkmim <nasser.alkmim@gmail.com> wrote:
>
>> Not sure how to debug this situation, but some RSS feeds that I have
>> in groups end up with duplicate messages.
>
>> I use this "five filters full-text RSS" to extract the full text
>> from some RSS feeds, and it has a limit of 3 items per feed and
>> 12-hours refresh rate.
>> Maybe after this 12-hours, the messages are obtained again.
>
>> The duplicate messages have different "Message-ID", but same subject/date and everything else.
>
>> Any ideas?
>
> I'm not sure the /internal/ dates are actually the same: If
> I write the data for such duplicate entries to disk (*1):
>
> | (dolist (i '(58302 58461 58609 58757 58905 59053))
> | (with-temp-file (format "/tmp/%d.el" i)
> | (pp (cddr (assoc i nnrss-group-data)) (current-buffer))))
>
> and diff them, some entries change from file to file
> (pubDate, author, URL, etc.). For example, pubDate is:
>
> | $ grep -i date /tmp/*.el
> | /tmp/58302.el: (pubDate nil "Thu, 10 Dec 2020 22:01:00 GMT")
> | /tmp/58461.el: (pubDate nil "Thu, 10 Dec 2020 22:01:00 -0400")
> | /tmp/58609.el: (pubDate nil "Thu, 10 Dec 2020 22:01:00 -0400")
> | /tmp/58757.el: (pubDate nil "Thu, 10 Dec 2020 22:01:00 -0400")
> | /tmp/58905.el: (pubDate nil "Thu, 10 Dec 2020 22:01:00 +0000")
> | /tmp/59053.el: (pubDate nil "Thu, 10 Dec 2020 22:01:00 +0100")
> | $
>
> But for all six messages, Gnus says:
>
> | Date: Thu, 10 Dec 2020 22:01:00 +0000 (3 years, 18 weeks ago)
>
> Now if I understand nnrss.el correctly, it considers two en-
> tries the same if they only differ in fields listed in
> nnrss-ignore-article-fields (which is 'slash:comments by de-
> fault), so any changes to an RSS feed entry will create a
> new Gnus nnrss message. What appears to be missing is
> treating guid as an indicator that an entry has not changed.
Nasser,
Maybe you haven't tried adding 'pubDate (or better: everything other
than 'guid) to nnrss-ignore-article-fields.
--
next prev parent reply other threads:[~2024-04-26 15:17 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-17 8:52 Nasser Alkmim
2024-04-17 9:28 ` Emanuel Berg
2024-04-17 10:47 ` Nasser Alkmim
2024-04-26 5:42 ` Nasser Alkmim
2024-04-17 15:08 ` Tim Landscheidt
2024-04-26 15:16 ` James Thomas [this message]
2024-04-26 15:32 ` Eric S Fraga
2024-04-26 21:36 ` James Thomas
2024-04-28 10:02 ` Eric S Fraga
2024-04-27 6:54 ` Nasser Alkmim
2024-04-27 9:33 ` James Thomas
2024-04-27 13:07 ` Nasser Alkmim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87jzkk6tnw.fsf@gmx.net \
--to=jimjoe@gmx.net \
--cc=ding@gnus.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).