From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/82520 Path: news.gmane.org!not-for-mail From: Eric Abrahamsen Newsgroups: gmane.emacs.gnus.general Subject: function to mark duplicate messages Date: Sat, 17 Nov 2012 11:45:01 +0800 Message-ID: <87mwyg3nb6.fsf@ericabrahamsen.net> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: ger.gmane.org 1353123713 16588 80.91.229.3 (17 Nov 2012 03:41:53 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 17 Nov 2012 03:41:53 +0000 (UTC) To: ding@gnus.org Original-X-From: ding-owner+M30786@lists.math.uh.edu Sat Nov 17 04:42:04 2012 Return-path: Envelope-to: ding-account@gmane.org Original-Received: from util0.math.uh.edu ([129.7.128.18]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1TZZIB-0006Eq-T6 for ding-account@gmane.org; Sat, 17 Nov 2012 04:42:04 +0100 Original-Received: from localhost ([127.0.0.1] helo=lists.math.uh.edu) by util0.math.uh.edu with smtp (Exim 4.63) (envelope-from ) id 1TZZHY-0006tf-0m; Fri, 16 Nov 2012 21:41:24 -0600 Original-Received: from mx1.math.uh.edu ([129.7.128.32]) by util0.math.uh.edu with esmtps (TLSv1:AES256-SHA:256) (Exim 4.63) (envelope-from ) id 1TZZHV-0006tK-1N for ding@lists.math.uh.edu; Fri, 16 Nov 2012 21:41:21 -0600 Original-Received: from quimby.gnus.org ([80.91.231.51]) by mx1.math.uh.edu with esmtps (TLSv1:AES256-SHA:256) (Exim 4.76) (envelope-from ) id 1TZZHT-000307-K7 for ding@lists.math.uh.edu; Fri, 16 Nov 2012 21:41:20 -0600 Original-Received: from plane.gmane.org ([80.91.229.3]) by quimby.gnus.org with esmtp (Exim 4.72) (envelope-from ) id 1TZZHR-00077N-JX for ding@gnus.org; Sat, 17 Nov 2012 04:41:17 +0100 Original-Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1TZZHa-0005fI-Em for ding@gnus.org; Sat, 17 Nov 2012 04:41:26 +0100 Original-Received: from 114.250.131.150 ([114.250.131.150]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 17 Nov 2012 04:41:26 +0100 Original-Received: from eric by 114.250.131.150 with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sat, 17 Nov 2012 04:41:26 +0100 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 23 Original-X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: 114.250.131.150 User-Agent: Gnus/5.130006 (Ma Gnus v0.6) Emacs/24.2 (gnu/linux) Cancel-Lock: sha1:OD14fiyqLNSnGd/54uDl9KHtLCc= X-Spam-Score: -0.8 (/) List-ID: Precedence: bulk Xref: news.gmane.org gmane.emacs.gnus.general:82520 Archived-At: We've already got the `nnmail-treat-duplicates' setting for handling duplicate messages on their way in, but there are times -- for instance when you've, um, destroyed a bunch of nnml groups and, erm, have no backups and are reassembling them from a variety of sources -- that you end up with lots of duplicate messages. I'm writing a function to go through a group and mark duplicates (likely for deletion), and just want to make sure my approach makes sense. Message IDs are the thing to use, right? At the moment the plan is to: 1. Turn off threading temporarily 2. Sort by message ID with a predicate that uses `string<', and furthermore sort messages with Gnus-Warning headers to come after those without. 3. Loop through messages and add a mark to any message whose message ID is string= to the ID before it 4. Leave it at that -- the user can delete or otherwise process. Anyway, if anyone has any general comments on this, I'd love to hear it. I'll post it when I'm done, if anyone thinks it would be useful. Thanks, Eric