From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.4 Received: from mx1.math.uh.edu (mx1.math.uh.edu [129.7.128.32]) by inbox.vuxu.org (Postfix) with ESMTP id EF31C20CB3 for ; Wed, 17 Apr 2024 17:09:18 +0200 (CEST) Received: from lists1.math.uh.edu ([129.7.128.208]) by mx1.math.uh.edu with esmtps (TLS1.3) tls TLS_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1rx6uN-0000000FrNB-2M0e for ml@inbox.vuxu.org; Wed, 17 Apr 2024 10:09:15 -0500 Received: from lists1.math.uh.edu ([127.0.0.1] helo=lists.math.uh.edu) by lists1.math.uh.edu with smtp (Exim 4.97.1) (envelope-from ) id 1rx6uN-00000003SRr-1aGq for ml@inbox.vuxu.org; Wed, 17 Apr 2024 10:09:11 -0500 Received: from mx2.math.uh.edu ([129.7.128.33]) by lists1.math.uh.edu with esmtp (Exim 4.97.1) (envelope-from ) id 1rx6uK-00000003SRi-368C for ding@lists.math.uh.edu; Wed, 17 Apr 2024 10:09:08 -0500 Received: from quimby.gnus.org ([95.216.78.240]) by mx2.math.uh.edu with esmtps (TLS1.3) tls TLS_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1rx6uF-00000000jcy-2TYv for ding@lists.math.uh.edu; Wed, 17 Apr 2024 10:09:08 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnus.org; s=20200322; h=Content-Type:MIME-Version:Message-ID:Date:References: In-Reply-To:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=WNl+80JCiUuIUM5XKnrnPOtrmu16EHCDRWBtr0WUoQU=; b=XVn9+IFUKKNtwKPBUeA8yKLC5l fKRmciD8/GEnhjJ64tuGfNsfuhjZ2mKja4W5xp6mUJ/qZje18oxhi/ytcwYn9tHxsz5vf+IpuivgO EqhLe1Niu/hzVi4LahZgWHI+4UjQQL2tNmmCrkvvBWKs3q4T9PPhD0FP/GynTrfrA9dE=; Received: from gavdos.tim-landscheidt.de ([2a01:4f8:1c0c:4bd6::1]) by quimby.gnus.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1rx6tz-0002wP-Mr for ding@gnus.org; Wed, 17 Apr 2024 17:08:51 +0200 Received: from [37.61.220.43] (port=34712 helo=vagabond) by gavdos.tim-landscheidt.de with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1rx6tv-000m0z-0t; Wed, 17 Apr 2024 15:08:45 +0000 From: Tim Landscheidt To: Nasser Alkmim Cc: ding@gnus.org Subject: Re: Why am I getting duplicate messages on RSS groups? In-Reply-To: <86le5cv0b0.fsf@gmail.com> (Nasser Alkmim's message of "Wed, 17 Apr 2024 10:52:03 +0200") Organization: https://www.tim-landscheidt.de/ References: <86le5cv0b0.fsf@gmail.com> Date: Wed, 17 Apr 2024 15:08:42 +0000 Message-ID: <87ttk0do1x.fsf@vagabond.tim-landscheidt.de> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain List-ID: Precedence: bulk Nasser Alkmim wrote: > Not sure how to debug this situation, but some RSS feeds that I have in groups end up with duplicate messages. > I use this "five filters full-text RSS" to extract the full text from some RSS feeds, and it has a limit of 3 items per feed and 12-hours refresh rate. > Maybe after this 12-hours, the messages are obtained again. > The duplicate messages have different "Message-ID", but same subject/date and everything else. > Any ideas? I'm not sure the /internal/ dates are actually the same: If I write the data for such duplicate entries to disk (*1): | (dolist (i '(58302 58461 58609 58757 58905 59053)) | (with-temp-file (format "/tmp/%d.el" i) | (pp (cddr (assoc i nnrss-group-data)) (current-buffer)))) and diff them, some entries change from file to file (pubDate, author, URL, etc.). For example, pubDate is: | $ grep -i date /tmp/*.el | /tmp/58302.el: (pubDate nil "Thu, 10 Dec 2020 22:01:00 GMT") | /tmp/58461.el: (pubDate nil "Thu, 10 Dec 2020 22:01:00 -0400") | /tmp/58609.el: (pubDate nil "Thu, 10 Dec 2020 22:01:00 -0400") | /tmp/58757.el: (pubDate nil "Thu, 10 Dec 2020 22:01:00 -0400") | /tmp/58905.el: (pubDate nil "Thu, 10 Dec 2020 22:01:00 +0000") | /tmp/59053.el: (pubDate nil "Thu, 10 Dec 2020 22:01:00 +0100") | $ But for all six messages, Gnus says: | Date: Thu, 10 Dec 2020 22:01:00 +0000 (3 years, 18 weeks ago) Now if I understand nnrss.el correctly, it considers two en- tries the same if they only differ in fields listed in nnrss-ignore-article-fields (which is 'slash:comments by de- fault), so any changes to an RSS feed entry will create a new Gnus nnrss message. What appears to be missing is treating guid as an indicator that an entry has not changed. Tim (*1) There is probably also a way to do this in Emacs (Lisp) itself.