Gnus development mailing list
 help / color / mirror / Atom feed
From: Harry Putnam <reader@newsguy.com>
Subject: Re: OT [Archive techniques] What to do when it gets massive
Date: Thu, 12 Aug 2004 20:59:59 -0500	[thread overview]
Message-ID: <m33c2rztk0.fsf@newsguy.com> (raw)
In-Reply-To: <4nacx0bi2b.fsf@lifelogs.com>

"Ted Zlatanov" <tzz@lifelogs.com> writes:

>> I've wondered if I just removed all the numbered files but left the
>> .agentview files in place if the agent will just continue with only
>> new messages it hasn't seen.  If that is the case, then that would be
>> one way to do it.
>
> Look at rsnapshot, it automates this by doing a hard-link copy and
> then rsync over the copy.  That way you only replace the files that
> have changed but your disk usage is not significantly increased.
>
> You can do this manually on the command line, but rsnapshot automates
> it.

I'm probably overlooking something fundamental but I don't see how
this is really any different than normal rsync, except for disk space
issue due to hardlinks.

I didn't see a way to not have major overlap between monthly archives.
That is, say on 8/30/04 my rsnpshot setup starts writing to a new
archive.  Its still based on files under ~/News/agent/nntp right? so
whatever is in there will be copied over to the new archive.
Including many if not all that were there for the previous month.

Even assuming a full gnu-agent-expire-all.  Short of actually deleting
all messages under ~/News/agent/nntp and preventing the agent from
redownloading any, there would be the same overlap problem... it
seems.

What seems to be missing in both rsync and rsnapshot is a way to
compare the files to be updated to a second source collection (in this
case the previous mnths archive) so that only files that ARE under
~/News/agent/nntp and are NOT in newsarch_072004 will be copied to
newsarch_082004.

That would leave one mnth in which the agent could expire its pile of
files down to only what came in that mnth.

So the only thing that wouldn't be readily scriptable would be the
gnus-agent-expire part.

Does any of this make sense or am I really missing the boat somewhere?

Another way to go at this might be to generate an `exclude-from' list
from the files found `uptodate' and those being moved in this run.

That compiled list would automatically be added to any static exclude
list.  (by user scripting) and would become the `exclude-from' list
for the next run.  

Seems like that might work.  So that when the destination target is
suddenly changed to an empty directory at beginning of a mnth, rsync
would have the last list to exclude by and not rely only on what it
(does not) finds in the new directory.

The more I discuss this here ... I'm beginning to think I may have hit
on something..

Having such lenthy exclude lists may really really increase processing
time or may even make it unusable.




  reply	other threads:[~2004-08-13  1:59 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-08-12  1:34 Harry Putnam
2004-08-12 13:29 ` Ted Zlatanov
2004-08-13  1:59   ` Harry Putnam [this message]
2004-08-16 17:35     ` Ted Zlatanov
2004-08-16 18:02       ` Harry Putnam
2004-09-02 13:07 ` Kai Grossjohann
2004-09-04 19:37   ` Harry Putnam
2004-09-07 11:12 ` Kai Grossjohann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m33c2rztk0.fsf@newsguy.com \
    --to=reader@newsguy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).