From: Karl Kleinpaste <karl@charcoal.com>
Subject: Re: Major splitting problem ... Advice please
Date: Thu, 11 Oct 2001 08:02:20 -0400 [thread overview]
Message-ID: <vxkr8sa8jw3.fsf@cinnamon.vanillaknot.com> (raw)
In-Reply-To: <m1wv22d9x7.fsf@reader.newsguy.com> (Harry Putnam's message of "Wed, 10 Oct 2001 22:26:28 -0700")
Harry Putnam <reader@newsguy.com> writes:
> To cut to the chase here, I'm thinking of splitting this up into
> groups that contain one month/yr of a specific group.
for year in 1995 1996 1997 1998 1998 2000 2001
do
for month in Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
do
newdir=NewArchive/$year/$month
mkdir -p $newdir
grep -isl "^Date:.*$month.*$year" message/* |
while read article
do
mv $article $newdir
done
done
done
Embellish to taste, if e.g. the messages do not have unique names
across a set of directories.
It's too bad that xargs(1) can't be used following the grep; the inner
"while" loop could be disposed of entirely if so, but that's not how
mv(1) works.
Er...well, it's icky, but...
function newmv()
{
destdir=$1
shift
mv "$@" $destdir
}
Then the "while" is replaced by
grep -isl ... | xargs newmv $newdir
which perhaps isn't all that icky after all.
> However, there are enough differnet date styles to make that kind of
> split pretty hard to program.
If there are enough odd (broken) date formats so as not to be caught
by this, then after this is run, go back and work out new variants for
the "for" loops. Repeat "for" with ever newer and weirder date
discriminants until there's nothing left to move.
> Also the problem of some messages that
> came late to a thread, landing in a different group arises. Keeping
> all thead members in one group may not even be possible, except by
> hand. I'm not sure.
As soon as you decide to use date-based storage, you break either that
storage mechanism or you break border-crossing threads. Pick one or
the other.
OTOH -- and I know we've been over this ground before -- I've become
so attached to nnir & swish++ that I would leave the groups in
whatever huge collections you've got and simply never enter them
directly, but rather do nnir queries to pick up what I need. swish++
is _fast_. Periodically run nnml-generate-nov-databases to keep the
overviews current, if you continue to add messages to these archives.
--karl
next prev parent reply other threads:[~2001-10-11 12:02 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2001-10-11 5:26 Harry Putnam
2001-10-11 7:40 ` Kai Großjohann
2001-10-12 4:01 ` Harry Putnam
2001-10-11 12:02 ` Karl Kleinpaste [this message]
2001-10-11 15:54 ` Paul Jarc
2001-10-11 16:25 ` Paul Jarc
2001-10-11 16:37 ` Kai Großjohann
2001-10-12 16:44 ` Rob Browning
2001-10-13 4:28 ` Harry Putnam
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=vxkr8sa8jw3.fsf@cinnamon.vanillaknot.com \
--to=karl@charcoal.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).