From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.user/16547 Path: news.gmane.org!not-for-mail From: Alex Schroeder Newsgroups: gmane.emacs.gnus.user Subject: Re: Deleting duplicates from nnml:mail.misc Date: Sun, 13 Oct 2013 20:44:49 +0200 Organization: Gnus News User Services Message-ID: References: NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1381690202 25614 80.91.229.3 (13 Oct 2013 18:50:02 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 13 Oct 2013 18:50:02 +0000 (UTC) To: info-gnus-english@gnu.org Original-X-From: info-gnus-english-bounces+gegu-info-gnus-english=m.gmane.org@gnu.org Sun Oct 13 20:50:08 2013 Return-path: Envelope-to: gegu-info-gnus-english@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VVQju-0000lB-LB for gegu-info-gnus-english@m.gmane.org; Sun, 13 Oct 2013 20:50:06 +0200 Original-Received: from localhost ([::1]:34245 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VVQju-0006h0-BS for gegu-info-gnus-english@m.gmane.org; Sun, 13 Oct 2013 14:50:06 -0400 Original-Path: usenet.stanford.edu!news.tele.dk!news.tele.dk!small.news.tele.dk!uio.no!quimby.gnus.org!.POSTED!not-for-mail Original-Newsgroups: gnu.emacs.gnus Original-Lines: 68 Original-NNTP-Posting-Host: 178-83-163-103.dynamic.hispeed.ch Original-X-Trace: quimby.gnus.org 1381689889 21996 178.83.163.103 (13 Oct 2013 18:44:49 GMT) Original-X-Complaints-To: usenet@quimby.gnus.org Original-NNTP-Posting-Date: Sun, 13 Oct 2013 18:44:49 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (darwin) Cancel-Lock: sha1:poD00oq6zX+cKQPiFp3Hu0qnWTo= Original-Xref: usenet.stanford.edu gnu.emacs.gnus:87674 X-BeenThere: info-gnus-english@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Announcements and discussions for GNUS, the GNU Emacs Usenet newsreader \(in English\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: info-gnus-english-bounces+gegu-info-gnus-english=m.gmane.org@gnu.org Original-Sender: info-gnus-english-bounces+gegu-info-gnus-english=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.gnus.user:16547 Archived-At: Alex Schroeder writes: > Is my nnml:mail.misc of about 60000 messages full of duplicates? I think I found a way of doing this within Emacs. I'm sure this could be made into a nice package, but hope to use it just once. And it takes long seconds to build up the data structures. Anyway, here's what I do: ;; get all the headers from the overview file (setq asc:headers (nnheader-parse-overview-file "/Volumes/Extern/Archives/Mail/mail/misc/.overview")) ;; how many mails in total? (length asc:headers) ;; create an alist with key Message-ID and value being a list of ;; article numbers sharing this Message-Id (in other words, if there ;; are more than one article number, these are potential duplicates) ;; Example list item: ("" 62335 62329) (setq asc:duplicates nil) (dolist (hdr asc:headers) (let ((cell (assoc (aref hdr 4) asc:duplicates))) (if cell (setcdr cell (cons (aref hdr 0) (cdr cell))) (setq asc:duplicates (cons (list (aref hdr 4) (aref hdr 0)) asc:duplicates))))) ;; how many unique Message-IDs? (length asc:duplicates) ;; look at an example entry (car asc:duplicates) ;; check how many entries refer to more than one article number (let ((count 0)) (dolist (item asc:duplicates) (when (> (length item) 2) (setq count (1+ count)))) count) ;; our todo list are the Message-IDs with more than one article number (setq asc:todo (remove-if (lambda (item) (= (length item) 2)) asc:duplicates)) ;; look at the first todo item (car asc:todo) ;; if you want to check upon those numbers... ;; (gnus-summary-goto-article 62335) ;; (gnus-summary-goto-article 62329) ;; how many todo items? (length asc:todo) ;; let's get a flat list of article numbers -- using nconc means ;; trashing asc:duplicates! (setq asc:articles (apply 'nconc (mapcar 'cdr (remove-if (lambda (item) (= (length item) 2)) asc:duplicates)))) ;; how many articles to look at? (length asc:articles) ;; read them all (gnus-group-read-group t t "nnml+mail:mail.misc" asc:articles)