From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.emacs.gnus.general/58244 Path: main.gmane.org!not-for-mail From: Harry Putnam Newsgroups: gmane.emacs.gnus.general Subject: Re: OT [Archive techniques] What to do when it gets massive Date: Thu, 12 Aug 2004 20:59:59 -0500 Organization: Still searching... Sender: ding-owner@lists.math.uh.edu Message-ID: References: <4nacx0bi2b.fsf@lifelogs.com> NNTP-Posting-Host: deer.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: sea.gmane.org 1092362712 20216 80.91.224.253 (13 Aug 2004 02:05:12 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Fri, 13 Aug 2004 02:05:12 +0000 (UTC) Original-X-From: ding-owner+M6785@lists.math.uh.edu Fri Aug 13 04:05:05 2004 Return-path: Original-Received: from malifon.math.uh.edu ([129.7.128.13]) by deer.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 1BvRRI-0004aK-00 for ; Fri, 13 Aug 2004 04:05:04 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.math.uh.edu ident=lists) by malifon.math.uh.edu with smtp (Exim 3.20 #1) id 1BvRPq-00006h-00; Thu, 12 Aug 2004 21:03:34 -0500 Original-Received: from util2.math.uh.edu ([129.7.128.23]) by malifon.math.uh.edu with esmtp (Exim 3.20 #1) id 1BvRPi-00006Z-00 for ding@lists.math.uh.edu; Thu, 12 Aug 2004 21:03:26 -0500 Original-Received: from justine.libertine.org ([66.139.78.221] ident=postfix) by util2.math.uh.edu with esmtp (Exim 4.30) id 1BvRPh-0001Jn-OB for ding@lists.math.uh.edu; Thu, 12 Aug 2004 21:03:25 -0500 Original-Received: from main.gmane.org (main.gmane.org [80.91.224.249]) by justine.libertine.org (Postfix) with ESMTP id 7757E3A0059 for ; Thu, 12 Aug 2004 21:03:22 -0500 (CDT) Original-Received: from list by main.gmane.org with local (Exim 3.35 #1 (Debian)) id 1BvRPd-0000oB-00 for ; Fri, 13 Aug 2004 04:03:21 +0200 Original-Received: from adsl-68-74-112-145.dsl.emhril.ameritech.net ([68.74.112.145]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 13 Aug 2004 04:03:21 +0200 Original-Received: from reader by adsl-68-74-112-145.dsl.emhril.ameritech.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 13 Aug 2004 04:03:21 +0200 X-Injected-Via-Gmane: http://gmane.org/ Original-To: ding@gnus.org Original-Lines: 60 Original-X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: adsl-68-74-112-145.dsl.emhril.ameritech.net User-Agent: Gnus/5.110003 (No Gnus v0.3) Emacs/21.3.50 (gnu/linux) Cancel-Lock: sha1:N5BEfuFkLfTJt4eZK7bqrM5m2fA= Precedence: bulk Xref: main.gmane.org gmane.emacs.gnus.general:58244 X-Report-Spam: http://spam.gmane.org/gmane.emacs.gnus.general:58244 "Ted Zlatanov" writes: >> I've wondered if I just removed all the numbered files but left the >> .agentview files in place if the agent will just continue with only >> new messages it hasn't seen. If that is the case, then that would be >> one way to do it. > > Look at rsnapshot, it automates this by doing a hard-link copy and > then rsync over the copy. That way you only replace the files that > have changed but your disk usage is not significantly increased. > > You can do this manually on the command line, but rsnapshot automates > it. I'm probably overlooking something fundamental but I don't see how this is really any different than normal rsync, except for disk space issue due to hardlinks. I didn't see a way to not have major overlap between monthly archives. That is, say on 8/30/04 my rsnpshot setup starts writing to a new archive. Its still based on files under ~/News/agent/nntp right? so whatever is in there will be copied over to the new archive. Including many if not all that were there for the previous month. Even assuming a full gnu-agent-expire-all. Short of actually deleting all messages under ~/News/agent/nntp and preventing the agent from redownloading any, there would be the same overlap problem... it seems. What seems to be missing in both rsync and rsnapshot is a way to compare the files to be updated to a second source collection (in this case the previous mnths archive) so that only files that ARE under ~/News/agent/nntp and are NOT in newsarch_072004 will be copied to newsarch_082004. That would leave one mnth in which the agent could expire its pile of files down to only what came in that mnth. So the only thing that wouldn't be readily scriptable would be the gnus-agent-expire part. Does any of this make sense or am I really missing the boat somewhere? Another way to go at this might be to generate an `exclude-from' list from the files found `uptodate' and those being moved in this run. That compiled list would automatically be added to any static exclude list. (by user scripting) and would become the `exclude-from' list for the next run. Seems like that might work. So that when the destination target is suddenly changed to an empty directory at beginning of a mnth, rsync would have the last list to exclude by and not rely only on what it (does not) finds in the new directory. The more I discuss this here ... I'm beginning to think I may have hit on something.. Having such lenthy exclude lists may really really increase processing time or may even make it unusable.