From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/39264 Path: main.gmane.org!not-for-mail From: Harry Putnam Newsgroups: gmane.emacs.gnus.general Subject: Re: Major splitting problem ... Advice please Date: Fri, 12 Oct 2001 21:28:43 -0700 Sender: owner-ding@hpc.uh.edu Message-ID: References: <878zegyfip.fsf@raven.i.defaultvalue.org> NNTP-Posting-Host: coloc-standby.netfonds.no Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1035174998 26969 80.91.224.250 (21 Oct 2002 04:36:38 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 21 Oct 2002 04:36:38 +0000 (UTC) Return-Path: Original-Received: (qmail 25963 invoked from network); 13 Oct 2001 04:31:52 -0000 Original-Received: from malifon.math.uh.edu (mail@129.7.128.13) by mastaler.com with SMTP; 13 Oct 2001 04:31:52 -0000 Original-Received: from sina.hpc.uh.edu ([129.7.128.10] ident=lists) by malifon.math.uh.edu with esmtp (Exim 3.20 #1) id 15sGRh-00036k-00; Fri, 12 Oct 2001 23:30:45 -0500 Original-Received: by sina.hpc.uh.edu (TLB v0.09a (1.20 tibbs 1996/10/09 22:03:07)); Fri, 12 Oct 2001 23:30:22 -0500 (CDT) Original-Received: from sclp3.sclp.com (qmailr@sclp3.sclp.com [209.196.61.66]) by sina.hpc.uh.edu (8.9.3/8.9.3) with SMTP id XAA12636 for ; Fri, 12 Oct 2001 23:30:11 -0500 (CDT) Original-Received: (qmail 25954 invoked by alias); 13 Oct 2001 04:30:28 -0000 Original-Received: (qmail 25949 invoked from network); 13 Oct 2001 04:30:28 -0000 Original-Received: from mail.networkone.net (209.144.112.246) by gnus.org with SMTP; 13 Oct 2001 04:30:28 -0000 Original-Received: (qmail 10043 invoked from network); 13 Oct 2001 04:30:26 -0000 Original-Received: from unknown (HELO reader.local.lan) (66.51.210.228) by mail.networkone.net with SMTP; 13 Oct 2001 04:30:26 -0000 Original-Received: (from reader@localhost) by reader.local.lan (8.11.2/8.11.0) id f9D4UJ727896; Fri, 12 Oct 2001 21:30:19 -0700 X-Authentication-Warning: reader.local.lan: reader set sender to reader@newsguy.com using -f Original-To: ding@gnus.org In-Reply-To: <878zegyfip.fsf@raven.i.defaultvalue.org> (Rob Browning's message of "Fri, 12 Oct 2001 11:44:30 -0500") User-Agent: Gnus/5.090004 (Oort Gnus v0.04) Emacs/21.0.106 Original-Lines: 41 Precedence: list X-Majordomo: 1.94.jlt7 Xref: main.gmane.org gmane.emacs.gnus.general:39264 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:39264 Rob Browning writes: >> However, there are enough differnet date styles to make that kind of >> split pretty hard to program. Also the problem of some messages >> that came late to a thread, landing in a different group arises. >> Keeping all thead members in one group may not even be possible, >> except by hand. I'm not sure. > > Hmm. I had just been planning to use gnus date functions. I hadn't > considered that those might not be sufficient. My comments may have been a little misleading. They were directed at the idea of splitting messages by date with tools such as awk and perl. What I was getting at was a certain amount of difficulty getting regular expressions that match all possible date formulations like these (Taken from a sample of headers on comp.unix.solaris): Date: 24 Sep 2001 09:07:45 GMT Date: Mon, 8 Oct 2001 15:30:18 +0100 Date: 8 Oct 2001 14:30:26 GMT Date: 08 Oct 2001 16:42:08 +0200 Date: Sun, 7 Oct 2001 20:02:06 +0200 Date: Sun, 07 Oct 2001 17:45:17 GMT There are some even odder formulations to be found. Probably not impossible to set regexp that will work for them all, but just a pita. If you plan to use the date functions that do limiting like these: `/ t' and 'C-u / t' It may not be a problem. I wanted to do the splitting outside gnus because it is such a large archive. (app 250,000 messages, from about a dozen groups) I haven't tried this but I suspect one could do this by first setting up a nnmail split methods that splits by date to mnemonic named groups. Then entering the monster groups and split them with `M P a B r . Adjusting the spit method rules for each group, But here again, I would expect some extensive experimentation getting the date regexp right. And it would be very time intensive to do that inside gnus I think, assuming the groups are above 25,000 or so.