From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/39233 Path: main.gmane.org!not-for-mail From: Karl Kleinpaste Newsgroups: gmane.emacs.gnus.general Subject: Re: Major splitting problem ... Advice please Date: Thu, 11 Oct 2001 08:02:20 -0400 Sender: owner-ding@hpc.uh.edu Message-ID: References: NNTP-Posting-Host: coloc-standby.netfonds.no Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1035174972 26800 80.91.224.250 (21 Oct 2002 04:36:12 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 21 Oct 2002 04:36:12 +0000 (UTC) Return-Path: Original-Received: (qmail 29793 invoked from network); 11 Oct 2001 12:04:14 -0000 Original-Received: from malifon.math.uh.edu (mail@129.7.128.13) by mastaler.com with SMTP; 11 Oct 2001 12:04:14 -0000 Original-Received: from sina.hpc.uh.edu ([129.7.128.10] ident=lists) by malifon.math.uh.edu with esmtp (Exim 3.20 #1) id 15reXw-0001Xw-00; Thu, 11 Oct 2001 07:02:40 -0500 Original-Received: by sina.hpc.uh.edu (TLB v0.09a (1.20 tibbs 1996/10/09 22:03:07)); Thu, 11 Oct 2001 07:02:18 -0500 (CDT) Original-Received: from sclp3.sclp.com (qmailr@sclp3.sclp.com [209.196.61.66]) by sina.hpc.uh.edu (8.9.3/8.9.3) with SMTP id HAA04605 for ; Thu, 11 Oct 2001 07:02:07 -0500 (CDT) Original-Received: (qmail 29781 invoked by alias); 11 Oct 2001 12:02:24 -0000 Original-Received: (qmail 29774 invoked from network); 11 Oct 2001 12:02:22 -0000 Original-Received: from mesquite.slip.cs.cmu.edu (HELO cinnamon.vanillaknot.com) (128.2.207.11) by gnus.org with SMTP; 11 Oct 2001 12:02:22 -0000 Original-Received: (from karl@localhost) by cinnamon.vanillaknot.com (8.11.2/8.11.2) id f9BC2Kc22836; Thu, 11 Oct 2001 08:02:20 -0400 Original-To: ding@gnus.org X-Face: "5(T0tZd{6}pd~YzBG8O/*EW,.]6]@`m^e;fv65W^Y&=d"M\1H}>T~4_.kcDD.O~y3k)a6 hR;Nmi>9|>Nm${2IpM0^RcUEa\jcq?KOP)C&~x51l~zCHTulL^_T|u0I^kB'z@]{`2YjQu In-Reply-To: (Harry Putnam's message of "Wed, 10 Oct 2001 22:26:28 -0700") Original-Lines: 63 User-Agent: Gnus/5.090004 (Oort Gnus v0.04) XEmacs/21.4 (Artificial Intelligence) Precedence: list X-Majordomo: 1.94.jlt7 Xref: main.gmane.org gmane.emacs.gnus.general:39233 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:39233 Harry Putnam writes: > To cut to the chase here, I'm thinking of splitting this up into > groups that contain one month/yr of a specific group. for year in 1995 1996 1997 1998 1998 2000 2001 do for month in Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec do newdir=NewArchive/$year/$month mkdir -p $newdir grep -isl "^Date:.*$month.*$year" message/* | while read article do mv $article $newdir done done done Embellish to taste, if e.g. the messages do not have unique names across a set of directories. It's too bad that xargs(1) can't be used following the grep; the inner "while" loop could be disposed of entirely if so, but that's not how mv(1) works. Er...well, it's icky, but... function newmv() { destdir=$1 shift mv "$@" $destdir } Then the "while" is replaced by grep -isl ... | xargs newmv $newdir which perhaps isn't all that icky after all. > However, there are enough differnet date styles to make that kind of > split pretty hard to program. If there are enough odd (broken) date formats so as not to be caught by this, then after this is run, go back and work out new variants for the "for" loops. Repeat "for" with ever newer and weirder date discriminants until there's nothing left to move. > Also the problem of some messages that > came late to a thread, landing in a different group arises. Keeping > all thead members in one group may not even be possible, except by > hand. I'm not sure. As soon as you decide to use date-based storage, you break either that storage mechanism or you break border-crossing threads. Pick one or the other. OTOH -- and I know we've been over this ground before -- I've become so attached to nnir & swish++ that I would leave the groups in whatever huge collections you've got and simply never enter them directly, but rather do nnir queries to pick up what I need. swish++ is _fast_. Periodically run nnml-generate-nov-databases to keep the overviews current, if you continue to add messages to these archives. --karl