Gnus development mailing list
 help / color / mirror / Atom feed
* Diffinitive archiving method sought - Big prize money for best entrant
@ 2002-09-05  2:17 Harry Putnam
  2002-09-06 15:50 ` Kai Großjohann
  0 siblings, 1 reply; 4+ messages in thread
From: Harry Putnam @ 2002-09-05  2:17 UTC (permalink / raw)


Ok, so the big prize money was a lie....

But I know there are some card carrying archivist here:

I've long wanted a smooth working archiving technique but can't
really come up with somethings that isn't either horribly complex or
allows some unwanted overlap.

I want to use my current method which is rsync, but have another
technique that operates to separate my archives into chronological
chunks and allows no overlap.

Let me explain:
I currently download a number of groups with the agent, and
periodically run the agent expiry routine.  Over time the selected
groups to download may change, and have quite a few times over the
years.  So I don't really want something like the `expiry to target'
stuff available for mail.  Too much futzing around as selected groups
change. 

I use rsync to grab new stuff from the agentized messages, adding
them to an archive on a daily cron job.  Anyone who knows about rsync
will know that it will keep adding any new messages to an archive.
Ok, fine so far.  but given enough time, that archive will grow
un-usably large.  Or at least large enough to be a pain to search etc.

I want to break it up into chunks of some kind.  I think calendar
quarters would be good.  I don't mean here that the messages in the
archive have to fall inside a certain quarter, but only that the
quarter not hold any dups from the one before or after.  So, I'm not
concerned about message dates, although they would by and large, fall
in place.  Just a user imposed quarter of collected messages.

I can't think of a way to do this directly with rsync, like renaming
an acumulated archive at a certain point and beginning a new one.
Rsync lacks anything like the `newer' operator in `find' command, so
that kind of approach is out too.

Two things that cause overlap come up.  The way rsync works, it would
start grabbing messages from the ~/News/agent that had already been
archived since they wouldn't be in the new archive.  

Only way I can see to prevent that would be to reduce the agentized
stock to zero at the same time.

If one did that, then the agent itself would redownload some already
archived stuff ( I think, but haven't tested to make sure ).

Even if the agent knows not to redownload, it would be inconvenient
to reduce agentized messages to zero.  Leaving no backlog to work
with in gnus.

Scripting to separate the messages by date is not too big a deal but
I'm talking about 500,000+ messages having to be visited individually
and sorted according to date, while generating a directory structure
to house it.

That may be the only way.  I wondered if anyone has a slicker faster
smarter way?



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2002-09-07 19:04 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-09-05  2:17 Diffinitive archiving method sought - Big prize money for best entrant Harry Putnam
2002-09-06 15:50 ` Kai Großjohann
2002-09-06 21:42   ` Harry Putnam
2002-09-07 19:04     ` Kai Großjohann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).