Gnus development mailing list
 help / color / mirror / Atom feed
From: James Thomas <jimjoe@gmx.net>
To: ding@gnus.org
Subject: Re: Why am I getting duplicate messages on RSS groups?
Date: Fri, 26 Apr 2024 20:46:35 +0530	[thread overview]
Message-ID: <87jzkk6tnw.fsf@gmx.net> (raw)
In-Reply-To: <87ttk0do1x.fsf@vagabond.tim-landscheidt.de> (Tim Landscheidt's message of "Wed, 17 Apr 2024 15:08:42 +0000")

Tim Landscheidt wrote:

> Nasser Alkmim <nasser.alkmim@gmail.com> wrote:
>
>> Not sure how to debug this situation, but some RSS feeds that I have
>> in groups end up with duplicate messages.
>
>> I use this "five filters full-text RSS" to extract the full text
>> from some RSS feeds, and it has a limit of 3 items per feed and
>> 12-hours refresh rate.
>> Maybe after this 12-hours, the messages are obtained again.
>
>> The duplicate messages have different "Message-ID", but same subject/date and everything else.
>
>> Any ideas?
>
> I'm not sure the /internal/ dates are actually the same: If
> I write the data for such duplicate entries to disk (*1):
>
> | (dolist (i '(58302 58461 58609 58757 58905 59053))
> |   (with-temp-file (format "/tmp/%d.el" i)
> |     (pp (cddr (assoc i nnrss-group-data)) (current-buffer))))
>
> and diff them, some entries change from file to file
> (pubDate, author, URL, etc.).  For example, pubDate is:
>
> | $ grep -i date /tmp/*.el
> | /tmp/58302.el:       (pubDate nil "Thu, 10 Dec 2020 22:01:00 GMT")
> | /tmp/58461.el:       (pubDate nil "Thu, 10 Dec 2020 22:01:00 -0400")
> | /tmp/58609.el:       (pubDate nil "Thu, 10 Dec 2020 22:01:00 -0400")
> | /tmp/58757.el:       (pubDate nil "Thu, 10 Dec 2020 22:01:00 -0400")
> | /tmp/58905.el:       (pubDate nil "Thu, 10 Dec 2020 22:01:00 +0000")
> | /tmp/59053.el:       (pubDate nil "Thu, 10 Dec 2020 22:01:00 +0100")
> | $
>
> But for all six messages, Gnus says:
>
> | Date: Thu, 10 Dec 2020 22:01:00 +0000 (3 years, 18 weeks ago)
>
> Now if I understand nnrss.el correctly, it considers two en-
> tries the same if they only differ in fields listed in
> nnrss-ignore-article-fields (which is 'slash:comments by de-
> fault), so any changes to an RSS feed entry will create a
> new Gnus nnrss message.  What appears to be missing is
> treating guid as an indicator that an entry has not changed.

Nasser,

Maybe you haven't tried adding 'pubDate (or better: everything other
than 'guid) to nnrss-ignore-article-fields.

--


  reply	other threads:[~2024-04-26 15:17 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-17  8:52 Nasser Alkmim
2024-04-17  9:28 ` Emanuel Berg
2024-04-17 10:47   ` Nasser Alkmim
2024-04-26  5:42     ` Nasser Alkmim
2024-04-17 15:08 ` Tim Landscheidt
2024-04-26 15:16   ` James Thomas [this message]
2024-04-26 15:32     ` Eric S Fraga
2024-04-26 21:36       ` James Thomas
2024-04-28 10:02         ` Eric S Fraga
2024-04-27  6:54     ` Nasser Alkmim
2024-04-27  9:33       ` James Thomas
2024-04-27 13:07         ` Nasser Alkmim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87jzkk6tnw.fsf@gmx.net \
    --to=jimjoe@gmx.net \
    --cc=ding@gnus.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).