Gnus development mailing list
 help / color / mirror / Atom feed
From: Harry Putnam <reader@newsguy.com>
Subject: Diffinitive archiving method sought - Big prize money for best entrant
Date: Wed, 04 Sep 2002 19:17:25 -0700	[thread overview]
Message-ID: <m3heh5kwne.fsf@newsguy.com> (raw)

Ok, so the big prize money was a lie....

But I know there are some card carrying archivist here:

I've long wanted a smooth working archiving technique but can't
really come up with somethings that isn't either horribly complex or
allows some unwanted overlap.

I want to use my current method which is rsync, but have another
technique that operates to separate my archives into chronological
chunks and allows no overlap.

Let me explain:
I currently download a number of groups with the agent, and
periodically run the agent expiry routine.  Over time the selected
groups to download may change, and have quite a few times over the
years.  So I don't really want something like the `expiry to target'
stuff available for mail.  Too much futzing around as selected groups
change. 

I use rsync to grab new stuff from the agentized messages, adding
them to an archive on a daily cron job.  Anyone who knows about rsync
will know that it will keep adding any new messages to an archive.
Ok, fine so far.  but given enough time, that archive will grow
un-usably large.  Or at least large enough to be a pain to search etc.

I want to break it up into chunks of some kind.  I think calendar
quarters would be good.  I don't mean here that the messages in the
archive have to fall inside a certain quarter, but only that the
quarter not hold any dups from the one before or after.  So, I'm not
concerned about message dates, although they would by and large, fall
in place.  Just a user imposed quarter of collected messages.

I can't think of a way to do this directly with rsync, like renaming
an acumulated archive at a certain point and beginning a new one.
Rsync lacks anything like the `newer' operator in `find' command, so
that kind of approach is out too.

Two things that cause overlap come up.  The way rsync works, it would
start grabbing messages from the ~/News/agent that had already been
archived since they wouldn't be in the new archive.  

Only way I can see to prevent that would be to reduce the agentized
stock to zero at the same time.

If one did that, then the agent itself would redownload some already
archived stuff ( I think, but haven't tested to make sure ).

Even if the agent knows not to redownload, it would be inconvenient
to reduce agentized messages to zero.  Leaving no backlog to work
with in gnus.

Scripting to separate the messages by date is not too big a deal but
I'm talking about 500,000+ messages having to be visited individually
and sorted according to date, while generating a directory structure
to house it.

That may be the only way.  I wondered if anyone has a slicker faster
smarter way?



             reply	other threads:[~2002-09-05  2:17 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-09-05  2:17 Harry Putnam [this message]
2002-09-06 15:50 ` Kai Großjohann
2002-09-06 21:42   ` Harry Putnam
2002-09-07 19:04     ` Kai Großjohann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m3heh5kwne.fsf@newsguy.com \
    --to=reader@newsguy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).