From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/39230 Path: main.gmane.org!not-for-mail From: Harry Putnam Newsgroups: gmane.emacs.gnus.general Subject: Major splitting problem ... Advice please Date: Wed, 10 Oct 2001 22:26:28 -0700 Sender: owner-ding@hpc.uh.edu Message-ID: NNTP-Posting-Host: coloc-standby.netfonds.no Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1035174970 26790 80.91.224.250 (21 Oct 2002 04:36:10 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 21 Oct 2002 04:36:10 +0000 (UTC) Return-Path: Original-Received: (qmail 15476 invoked from network); 11 Oct 2001 05:33:20 -0000 Original-Received: from malifon.math.uh.edu (mail@129.7.128.13) by mastaler.com with SMTP; 11 Oct 2001 05:33:20 -0000 Original-Received: from sina.hpc.uh.edu ([129.7.128.10] ident=lists) by malifon.math.uh.edu with esmtp (Exim 3.20 #1) id 15rYRP-0000gN-00; Thu, 11 Oct 2001 00:31:31 -0500 Original-Received: by sina.hpc.uh.edu (TLB v0.09a (1.20 tibbs 1996/10/09 22:03:07)); Thu, 11 Oct 2001 00:31:07 -0500 (CDT) Original-Received: from sclp3.sclp.com (qmailr@sclp3.sclp.com [209.196.61.66]) by sina.hpc.uh.edu (8.9.3/8.9.3) with SMTP id AAA03419 for ; Thu, 11 Oct 2001 00:30:55 -0500 (CDT) Original-Received: (qmail 15409 invoked by alias); 11 Oct 2001 05:31:06 -0000 Original-Received: (qmail 15404 invoked from network); 11 Oct 2001 05:31:06 -0000 Original-Received: from mail.networkone.net (209.144.112.246) by gnus.org with SMTP; 11 Oct 2001 05:31:06 -0000 Original-Received: (qmail 24872 invoked from network); 11 Oct 2001 05:30:24 -0000 Original-Received: from unknown (HELO reader.local.lan) (66.51.210.228) by mail.networkone.net with SMTP; 11 Oct 2001 05:30:24 -0000 Original-Received: (from reader@localhost) by reader.local.lan (8.11.2/8.11.0) id f9B5UIM31469; Wed, 10 Oct 2001 22:30:18 -0700 X-Authentication-Warning: reader.local.lan: reader set sender to reader@newsguy.com using -f Original-To: ding@gnus.org User-Agent: Gnus/5.090004 (Oort Gnus v0.04) Emacs/21.0.106 Original-Lines: 47 Precedence: list X-Majordomo: 1.94.jlt7 Xref: main.gmane.org gmane.emacs.gnus.general:39230 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:39230 I've become something of an amateur archivist over the past few years mainly due to a desire to have a big pile of usenet available for searching, and for running various programming experiments in such languages as awk and perl. That said, I've decided my 250,000+ plus messages archive of some dozen or so newsgroups is just too unwieldy for easy use, in its present form. (The way it came off the nntp servers) Groups are too vast to be really usable inside of gnus, unless most of the nifty formating, threading etc is foregone, then why bother really. I currently use command line tools or homemade scripting to extract info from this pile, but it would be nice to be easily able to access it with gnus at times too. By `access' here, I don't mean nndir or the like. But handy smallish groups that handle well inside gnus. Where all manner of highlight or other special treatment/sorting wouldn't be a major time drag. Maybe a series of nnml groups for each main newsgroup or something. To cut to the chase here, I'm thinking of splitting this up into groups that contain one month/yr of a specific group. However, there are enough differnet date styles to make that kind of split pretty hard to program. Also the problem of some messages that came late to a thread, landing in a different group arises. Keeping all thead members in one group may not even be possible, except by hand. I'm not sure. Splitting on year would be easy enough but would still result in groups too big for handy use. I'm thinking maybe something based on file names? These messages have there original file names as they came from the server (for the most part). Or maybe just split it up into groups of 2000 or less under each newsgroup, not paying attention to date, or worrying about split threads. (Is `A T' capable of pulling messages from other `nnml' groups?) Wondering if some of the card carrying archivists here, like maybe Karl K. could outline a summary of how they would do something like this. Or any suggestions from anyone that has seen various setups or has experience of some kind with a problem like this. I mean just off the top of respective heads.