Gnus development mailing list
 help / color / mirror / Atom feed
* OT [Archive techniques] What to do when it gets massive
@ 2004-08-12  1:34 Harry Putnam
  2004-08-12 13:29 ` Ted Zlatanov
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Harry Putnam @ 2004-08-12  1:34 UTC (permalink / raw)


I've been archiving a changing list of nntp and mail messages for a
very long time.  Partially to have something to test various search
techniques against that uses a hefty amount of data to search.

I've never really hit on a good method for doing this.  I started
with rsync and still use it like this:

  Run rsync against ~/News/agent/nntp using an exclude file that keeps
  out anything but the directories and messages, into a mirror of
  those directories.  The result is that as new messages come in and
  old are expired from ~/News they accumulate on /arch/news.

At some point the size is so large as to make any commands run against
the massive heap of data take a long time.  I'd like to break this
pile up somehow, but will work on that later.

Right now I'd like to start rsyncing to dated mirrors one month at a
time.  However I see no way to do this without having major overlap.

Example: Agent downloads for a month and I have a large accumulation
under News/agent/nntp.  These have been getting rsynced to this months
mirror.  

Now when I change over to a new month, and start feeding a new empty
mirror all the messages under News...nntp are copied there unless I
empty out News/agent/nntp, but even then without some hand work of
some kind, the agent will download what ever is still on the server in
the initial run, many of which will be overlaps.  Actually the vast
majority will.

Rsync seems to no kind of `newer' type thing like find has.

I've wondered if I just removed all the numbered files but left the
.agentview files in place if the agent will just continue with only
new messages it hasn't seen.  If that is the case, then that would be
one way to do it.

That way would leave only one major inconvenience.  I'd have no
backlog of messages in any groups for a while in case I wanted to A T
a thread or do a search or something.

I'm betting some of the seasoned troopers here have some much better
ways of doing this.  Answers of `use google instead' or
search.gmane.org instead are not accepted... hehe.   




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2004-09-07 11:12 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-08-12  1:34 OT [Archive techniques] What to do when it gets massive Harry Putnam
2004-08-12 13:29 ` Ted Zlatanov
2004-08-13  1:59   ` Harry Putnam
2004-08-16 17:35     ` Ted Zlatanov
2004-08-16 18:02       ` Harry Putnam
2004-09-02 13:07 ` Kai Grossjohann
2004-09-04 19:37   ` Harry Putnam
2004-09-07 11:12 ` Kai Grossjohann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).