Gnus development mailing list
 help / color / mirror / Atom feed
From: Karl Kleinpaste <karl@charcoal.com>
Subject: Re: Major splitting problem ... Advice please
Date: Thu, 11 Oct 2001 08:02:20 -0400	[thread overview]
Message-ID: <vxkr8sa8jw3.fsf@cinnamon.vanillaknot.com> (raw)
In-Reply-To: <m1wv22d9x7.fsf@reader.newsguy.com> (Harry Putnam's message of "Wed, 10 Oct 2001 22:26:28 -0700")

Harry Putnam <reader@newsguy.com> writes:
> To cut to the chase here, I'm thinking of splitting this up into
> groups that contain one month/yr of a specific group.  

for year in 1995 1996 1997 1998 1998 2000 2001
do
  for month in Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
  do
    newdir=NewArchive/$year/$month
    mkdir -p $newdir
    grep -isl "^Date:.*$month.*$year" message/* |
    while read article
    do
      mv $article $newdir
    done
  done
done

Embellish to taste, if e.g. the messages do not have unique names
across a set of directories.

It's too bad that xargs(1) can't be used following the grep; the inner
"while" loop could be disposed of entirely if so, but that's not how
mv(1) works.

Er...well, it's icky, but...

function newmv()
{
  destdir=$1
  shift
  mv "$@" $destdir
}

Then the "while" is replaced by
     grep -isl ... | xargs newmv $newdir
which perhaps isn't all that icky after all.

> However, there are enough differnet date styles to make that kind of
> split pretty hard to program.

If there are enough odd (broken) date formats so as not to be caught
by this, then after this is run, go back and work out new variants for
the "for" loops.  Repeat "for" with ever newer and weirder date
discriminants until there's nothing left to move.

> Also the problem of some messages that
> came late to a thread, landing in a different group arises.  Keeping
> all thead members in one group may not even be possible, except by
> hand.  I'm not sure. 

As soon as you decide to use date-based storage, you break either that
storage mechanism or you break border-crossing threads.  Pick one or
the other.

OTOH -- and I know we've been over this ground before -- I've become
so attached to nnir & swish++ that I would leave the groups in
whatever huge collections you've got and simply never enter them
directly, but rather do nnir queries to pick up what I need.  swish++
is _fast_.  Periodically run nnml-generate-nov-databases to keep the
overviews current, if you continue to add messages to these archives.

--karl



  parent reply	other threads:[~2001-10-11 12:02 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-10-11  5:26 Harry Putnam
2001-10-11  7:40 ` Kai Großjohann
2001-10-12  4:01   ` Harry Putnam
2001-10-11 12:02 ` Karl Kleinpaste [this message]
2001-10-11 15:54   ` Paul Jarc
2001-10-11 16:25     ` Paul Jarc
2001-10-11 16:37       ` Kai Großjohann
2001-10-12 16:44 ` Rob Browning
2001-10-13  4:28   ` Harry Putnam

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=vxkr8sa8jw3.fsf@cinnamon.vanillaknot.com \
    --to=karl@charcoal.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).