Gnus development mailing list
 help / color / mirror / Atom feed
* Why am I getting duplicate messages on RSS groups?
@ 2024-04-17  8:52 Nasser Alkmim
  2024-04-17  9:28 ` Emanuel Berg
  2024-04-17 15:08 ` Tim Landscheidt
  0 siblings, 2 replies; 12+ messages in thread
From: Nasser Alkmim @ 2024-04-17  8:52 UTC (permalink / raw)
  To: ding

Hi,

Not sure how to debug this situation, but some RSS feeds that I have in groups end up with duplicate messages.

I use this "five filters full-text RSS" to extract the full text from some RSS feeds, and it has a limit of 3 items per feed and 12-hours refresh rate.
Maybe after this 12-hours, the messages are obtained again.

The duplicate messages have different "Message-ID", but same subject/date and everything else.

Any ideas?

-- 
Nasser Alkmim 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Why am I getting duplicate messages on RSS groups?
  2024-04-17  8:52 Why am I getting duplicate messages on RSS groups? Nasser Alkmim
@ 2024-04-17  9:28 ` Emanuel Berg
  2024-04-17 10:47   ` Nasser Alkmim
  2024-04-17 15:08 ` Tim Landscheidt
  1 sibling, 1 reply; 12+ messages in thread
From: Emanuel Berg @ 2024-04-17  9:28 UTC (permalink / raw)
  To: ding

Nasser Alkmim wrote:

> The duplicate messages have different "Message-ID", but same
> subject/date and everything else.
>
> Any ideas?

See if this works - 

(setq gnus-suppress-duplicates t)

-- 
underground experts united
https://dataswamp.org/~incal



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Why am I getting duplicate messages on RSS groups?
  2024-04-17  9:28 ` Emanuel Berg
@ 2024-04-17 10:47   ` Nasser Alkmim
  2024-04-26  5:42     ` Nasser Alkmim
  0 siblings, 1 reply; 12+ messages in thread
From: Nasser Alkmim @ 2024-04-17 10:47 UTC (permalink / raw)
  To: ding


Emanuel Berg <incal@dataswamp.org> writes:

> See if this works - 
>
> (setq gnus-suppress-duplicates t)

I will give it try. Thanks!

-- 
Nasser Alkmim 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Why am I getting duplicate messages on RSS groups?
  2024-04-17  8:52 Why am I getting duplicate messages on RSS groups? Nasser Alkmim
  2024-04-17  9:28 ` Emanuel Berg
@ 2024-04-17 15:08 ` Tim Landscheidt
  2024-04-26 15:16   ` James Thomas
  1 sibling, 1 reply; 12+ messages in thread
From: Tim Landscheidt @ 2024-04-17 15:08 UTC (permalink / raw)
  To: Nasser Alkmim; +Cc: ding

Nasser Alkmim <nasser.alkmim@gmail.com> wrote:

> Not sure how to debug this situation, but some RSS feeds that I have in groups end up with duplicate messages.

> I use this "five filters full-text RSS" to extract the full text from some RSS feeds, and it has a limit of 3 items per feed and 12-hours refresh rate.
> Maybe after this 12-hours, the messages are obtained again.

> The duplicate messages have different "Message-ID", but same subject/date and everything else.

> Any ideas?

I'm not sure the /internal/ dates are actually the same: If
I write the data for such duplicate entries to disk (*1):

| (dolist (i '(58302 58461 58609 58757 58905 59053))
|   (with-temp-file (format "/tmp/%d.el" i)
|     (pp (cddr (assoc i nnrss-group-data)) (current-buffer))))

and diff them, some entries change from file to file
(pubDate, author, URL, etc.).  For example, pubDate is:

| $ grep -i date /tmp/*.el
| /tmp/58302.el:       (pubDate nil "Thu, 10 Dec 2020 22:01:00 GMT")
| /tmp/58461.el:       (pubDate nil "Thu, 10 Dec 2020 22:01:00 -0400")
| /tmp/58609.el:       (pubDate nil "Thu, 10 Dec 2020 22:01:00 -0400")
| /tmp/58757.el:       (pubDate nil "Thu, 10 Dec 2020 22:01:00 -0400")
| /tmp/58905.el:       (pubDate nil "Thu, 10 Dec 2020 22:01:00 +0000")
| /tmp/59053.el:       (pubDate nil "Thu, 10 Dec 2020 22:01:00 +0100")
| $

But for all six messages, Gnus says:

| Date: Thu, 10 Dec 2020 22:01:00 +0000 (3 years, 18 weeks ago)

Now if I understand nnrss.el correctly, it considers two en-
tries the same if they only differ in fields listed in
nnrss-ignore-article-fields (which is 'slash:comments by de-
fault), so any changes to an RSS feed entry will create a
new Gnus nnrss message.  What appears to be missing is
treating guid as an indicator that an entry has not changed.

Tim

(*1)   There is probably also a way to do this in Emacs
       (Lisp) itself.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Why am I getting duplicate messages on RSS groups?
  2024-04-17 10:47   ` Nasser Alkmim
@ 2024-04-26  5:42     ` Nasser Alkmim
  0 siblings, 0 replies; 12+ messages in thread
From: Nasser Alkmim @ 2024-04-26  5:42 UTC (permalink / raw)
  To: ding

Nasser Alkmim <nasser.alkmim@gmail.com> writes:

> Emanuel Berg <incal@dataswamp.org> writes:
>
>> See if this works - 
>>
>> (setq gnus-suppress-duplicates t)
>
> I will give it try. Thanks!

Unfortunately, this setting did not solve the problem.
I'm still getting duplicates on the rss groups.

-- 
Nasser Alkmim 
 +43 677 6408 9171


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Why am I getting duplicate messages on RSS groups?
  2024-04-17 15:08 ` Tim Landscheidt
@ 2024-04-26 15:16   ` James Thomas
  2024-04-26 15:32     ` Eric S Fraga
  2024-04-27  6:54     ` Nasser Alkmim
  0 siblings, 2 replies; 12+ messages in thread
From: James Thomas @ 2024-04-26 15:16 UTC (permalink / raw)
  To: ding

Tim Landscheidt wrote:

> Nasser Alkmim <nasser.alkmim@gmail.com> wrote:
>
>> Not sure how to debug this situation, but some RSS feeds that I have
>> in groups end up with duplicate messages.
>
>> I use this "five filters full-text RSS" to extract the full text
>> from some RSS feeds, and it has a limit of 3 items per feed and
>> 12-hours refresh rate.
>> Maybe after this 12-hours, the messages are obtained again.
>
>> The duplicate messages have different "Message-ID", but same subject/date and everything else.
>
>> Any ideas?
>
> I'm not sure the /internal/ dates are actually the same: If
> I write the data for such duplicate entries to disk (*1):
>
> | (dolist (i '(58302 58461 58609 58757 58905 59053))
> |   (with-temp-file (format "/tmp/%d.el" i)
> |     (pp (cddr (assoc i nnrss-group-data)) (current-buffer))))
>
> and diff them, some entries change from file to file
> (pubDate, author, URL, etc.).  For example, pubDate is:
>
> | $ grep -i date /tmp/*.el
> | /tmp/58302.el:       (pubDate nil "Thu, 10 Dec 2020 22:01:00 GMT")
> | /tmp/58461.el:       (pubDate nil "Thu, 10 Dec 2020 22:01:00 -0400")
> | /tmp/58609.el:       (pubDate nil "Thu, 10 Dec 2020 22:01:00 -0400")
> | /tmp/58757.el:       (pubDate nil "Thu, 10 Dec 2020 22:01:00 -0400")
> | /tmp/58905.el:       (pubDate nil "Thu, 10 Dec 2020 22:01:00 +0000")
> | /tmp/59053.el:       (pubDate nil "Thu, 10 Dec 2020 22:01:00 +0100")
> | $
>
> But for all six messages, Gnus says:
>
> | Date: Thu, 10 Dec 2020 22:01:00 +0000 (3 years, 18 weeks ago)
>
> Now if I understand nnrss.el correctly, it considers two en-
> tries the same if they only differ in fields listed in
> nnrss-ignore-article-fields (which is 'slash:comments by de-
> fault), so any changes to an RSS feed entry will create a
> new Gnus nnrss message.  What appears to be missing is
> treating guid as an indicator that an entry has not changed.

Nasser,

Maybe you haven't tried adding 'pubDate (or better: everything other
than 'guid) to nnrss-ignore-article-fields.

--


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Why am I getting duplicate messages on RSS groups?
  2024-04-26 15:16   ` James Thomas
@ 2024-04-26 15:32     ` Eric S Fraga
  2024-04-26 21:36       ` James Thomas
  2024-04-27  6:54     ` Nasser Alkmim
  1 sibling, 1 reply; 12+ messages in thread
From: Eric S Fraga @ 2024-04-26 15:32 UTC (permalink / raw)
  To: ding

James,

On Friday, 26 Apr 2024 at 20:46, James Thomas wrote:
> Maybe you haven't tried adding 'pubDate (or better: everything other
> than 'guid) to nnrss-ignore-article-fields.

Is there a list of fields to be added to nnrss-ignore-article-fields
that one can find somewhere or should I guess them from the header of an
rss entry?

Thank you,
eric
-- 
Eric S Fraga via gnus (Emacs 30.0.50 2024-04-17) on Debian bookworm/sid



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Why am I getting duplicate messages on RSS groups?
  2024-04-26 15:32     ` Eric S Fraga
@ 2024-04-26 21:36       ` James Thomas
  2024-04-28 10:02         ` Eric S Fraga
  0 siblings, 1 reply; 12+ messages in thread
From: James Thomas @ 2024-04-26 21:36 UTC (permalink / raw)
  To: ding

Eric S Fraga wrote:

> Is there a list of fields to be added to nnrss-ignore-article-fields
> that one can find somewhere or should I guess them from the header of an
> rss entry?

The latter, I suppose, since it could vary by site. I had the following:

'(slash:comments slash:hit_parade num_comments ups)

...for slashdot.org (before I switched to the new nnatom backend).

--


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Why am I getting duplicate messages on RSS groups?
  2024-04-26 15:16   ` James Thomas
  2024-04-26 15:32     ` Eric S Fraga
@ 2024-04-27  6:54     ` Nasser Alkmim
  2024-04-27  9:33       ` James Thomas
  1 sibling, 1 reply; 12+ messages in thread
From: Nasser Alkmim @ 2024-04-27  6:54 UTC (permalink / raw)
  To: James Thomas; +Cc: ding

James Thomas <jimjoe@gmx.net> writes:

>
> Nasser,
>
> Maybe you haven't tried adding 'pubDate (or better: everything other
> than 'guid) to nnrss-ignore-article-fields.

Hi James,

I tried (add-to-list 'nnrss-ignore-article-fields 'pubDate), and after
scanning an rss group with gnus-group-get-new-news-this-group, it still
fetches a repeated article.

What I don't understand is that, after scanning the rss group, I check
the variable nnrss-ignore-article-fields again and it is reset to its
default value (slash:comments).

I'm able to reproduce the behavior by

1. deleting the duplicated messages (two in this case)
2. closing and reopening gnus
3. scan the group again

Then the duplicated messages reappear.

-- 
Nasser Alkmim 
 +43 677 6408 9171


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Why am I getting duplicate messages on RSS groups?
  2024-04-27  6:54     ` Nasser Alkmim
@ 2024-04-27  9:33       ` James Thomas
  2024-04-27 13:07         ` Nasser Alkmim
  0 siblings, 1 reply; 12+ messages in thread
From: James Thomas @ 2024-04-27  9:33 UTC (permalink / raw)
  To: ding

Nasser Alkmim wrote:

> I tried (add-to-list 'nnrss-ignore-article-fields 'pubDate), and after

Are you sure it was done after Gnus and nnrss was loaded? If not, try
putting this in ~/.gnus.el:

(setq nnrss-ignore-article-fields '(slash:comments pubDate))

--


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Why am I getting duplicate messages on RSS groups?
  2024-04-27  9:33       ` James Thomas
@ 2024-04-27 13:07         ` Nasser Alkmim
  0 siblings, 0 replies; 12+ messages in thread
From: Nasser Alkmim @ 2024-04-27 13:07 UTC (permalink / raw)
  To: James Thomas; +Cc: ding

James Thomas <jimjoe@gmx.net> writes:

> Nasser Alkmim wrote:
>
>> I tried (add-to-list 'nnrss-ignore-article-fields 'pubDate), and after
>
> Are you sure it was done after Gnus and nnrss was loaded? If not, try
> putting this in ~/.gnus.el:
>
> (setq nnrss-ignore-article-fields '(slash:comments pubDate))

I have it in a use-package declaration that expands to:

(eval-after-load 'nrss '(progn (add-to-list 'nnrss-ignore-article-fields 'pubDate) t))

I also tried in a ~/.gnus.el, but the same behavior persists.

-- 
Nasser Alkmim 
 +43 677 6408 9171


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Why am I getting duplicate messages on RSS groups?
  2024-04-26 21:36       ` James Thomas
@ 2024-04-28 10:02         ` Eric S Fraga
  0 siblings, 0 replies; 12+ messages in thread
From: Eric S Fraga @ 2024-04-28 10:02 UTC (permalink / raw)
  To: ding

On Saturday, 27 Apr 2024 at 03:06, James Thomas wrote:
> The latter, I suppose, since it could vary by site. I had the following:
>
> '(slash:comments slash:hit_parade num_comments ups)
>
> ...for slashdot.org (before I switched to the new nnatom backend).

Thank you.  None of these appear in the headers for my rss feeds, most
of which are mastodon tags or individuals.  Mastodon rss feeds are quite
minimal in the information they propagate unfortunately.

-- 
Eric S Fraga via gnus (Emacs 30.0.50 2024-04-18) on Debian 12.5



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-04-28 10:03 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-17  8:52 Why am I getting duplicate messages on RSS groups? Nasser Alkmim
2024-04-17  9:28 ` Emanuel Berg
2024-04-17 10:47   ` Nasser Alkmim
2024-04-26  5:42     ` Nasser Alkmim
2024-04-17 15:08 ` Tim Landscheidt
2024-04-26 15:16   ` James Thomas
2024-04-26 15:32     ` Eric S Fraga
2024-04-26 21:36       ` James Thomas
2024-04-28 10:02         ` Eric S Fraga
2024-04-27  6:54     ` Nasser Alkmim
2024-04-27  9:33       ` James Thomas
2024-04-27 13:07         ` Nasser Alkmim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).