From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/46406 Path: main.gmane.org!not-for-mail From: Harry Putnam Newsgroups: gmane.emacs.gnus.general Subject: Diffinitive archiving method sought - Big prize money for best entrant Date: Wed, 04 Sep 2002 19:17:25 -0700 Sender: owner-ding@hpc.uh.edu Message-ID: NNTP-Posting-Host: localhost.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1031192423 18232 127.0.0.1 (5 Sep 2002 02:20:23 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Thu, 5 Sep 2002 02:20:23 +0000 (UTC) Return-path: Original-Received: from malifon.math.uh.edu ([129.7.128.13]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 17mmFn-0004jr-00 for ; Thu, 05 Sep 2002 04:20:19 +0200 Original-Received: from sina.hpc.uh.edu ([129.7.128.10] ident=lists) by malifon.math.uh.edu with esmtp (Exim 3.20 #1) id 17mmG7-00024v-00; Wed, 04 Sep 2002 21:20:39 -0500 Original-Received: by sina.hpc.uh.edu (TLB v0.09a (1.20 tibbs 1996/10/09 22:03:07)); Wed, 04 Sep 2002 21:21:13 -0500 (CDT) Original-Received: from sclp3.sclp.com (qmailr@sclp3.sclp.com [209.196.61.66]) by sina.hpc.uh.edu (8.9.3/8.9.3) with SMTP id VAA29169 for ; Wed, 4 Sep 2002 21:21:01 -0500 (CDT) Original-Received: (qmail 26199 invoked by alias); 5 Sep 2002 02:20:22 -0000 Original-Received: (qmail 26194 invoked from network); 5 Sep 2002 02:20:22 -0000 Original-Received: from mail.dslextreme.com (66.51.205.14) by gnus.org with SMTP; 5 Sep 2002 02:20:22 -0000 Original-Received: (qmail 29941 invoked from network); 5 Sep 2002 02:18:16 -0000 Original-Received: from adsl-66.51.210.228.dslextreme.com (HELO reader.local.lan) (66.51.210.228) by mail.dslextreme.com with SMTP; 5 Sep 2002 02:18:16 -0000 Original-Received: from reader.local.lan (localhost [127.0.0.1]) by reader.local.lan (8.12.3/8.12.3) with ESMTP id g852KJX7003272 for ; Wed, 4 Sep 2002 19:20:19 -0700 Original-Received: (from reader@localhost) by reader.local.lan (8.12.3/8.12.3/Submit) id g852KIkM003269; Wed, 4 Sep 2002 19:20:18 -0700 X-Authentication-Warning: reader.local.lan: reader set sender to reader@newsguy.com using -f Original-To: User-Agent: Gnus/5.090008 (Oort Gnus v0.08) Emacs/21.3.50 (i686-pc-linux-gnu) Original-Lines: 59 Precedence: list X-Majordomo: 1.94.jlt7 Xref: main.gmane.org gmane.emacs.gnus.general:46406 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:46406 Ok, so the big prize money was a lie.... But I know there are some card carrying archivist here: I've long wanted a smooth working archiving technique but can't really come up with somethings that isn't either horribly complex or allows some unwanted overlap. I want to use my current method which is rsync, but have another technique that operates to separate my archives into chronological chunks and allows no overlap. Let me explain: I currently download a number of groups with the agent, and periodically run the agent expiry routine. Over time the selected groups to download may change, and have quite a few times over the years. So I don't really want something like the `expiry to target' stuff available for mail. Too much futzing around as selected groups change. I use rsync to grab new stuff from the agentized messages, adding them to an archive on a daily cron job. Anyone who knows about rsync will know that it will keep adding any new messages to an archive. Ok, fine so far. but given enough time, that archive will grow un-usably large. Or at least large enough to be a pain to search etc. I want to break it up into chunks of some kind. I think calendar quarters would be good. I don't mean here that the messages in the archive have to fall inside a certain quarter, but only that the quarter not hold any dups from the one before or after. So, I'm not concerned about message dates, although they would by and large, fall in place. Just a user imposed quarter of collected messages. I can't think of a way to do this directly with rsync, like renaming an acumulated archive at a certain point and beginning a new one. Rsync lacks anything like the `newer' operator in `find' command, so that kind of approach is out too. Two things that cause overlap come up. The way rsync works, it would start grabbing messages from the ~/News/agent that had already been archived since they wouldn't be in the new archive. Only way I can see to prevent that would be to reduce the agentized stock to zero at the same time. If one did that, then the agent itself would redownload some already archived stuff ( I think, but haven't tested to make sure ). Even if the agent knows not to redownload, it would be inconvenient to reduce agentized messages to zero. Leaving no backlog to work with in gnus. Scripting to separate the messages by date is not too big a deal but I'm talking about 500,000+ messages having to be visited individually and sorted according to date, while generating a directory structure to house it. That may be the only way. I wondered if anyone has a slicker faster smarter way?