Gnus development mailing list
 help / color / mirror / Atom feed
From: Harry Putnam <reader@newsguy.com>
Subject: Major splitting problem ... Advice please
Date: Wed, 10 Oct 2001 22:26:28 -0700	[thread overview]
Message-ID: <m1wv22d9x7.fsf@reader.newsguy.com> (raw)


I've become something of an amateur archivist over the past few years
mainly due to a desire to have a big pile of usenet available for
searching, and for running various programming experiments in such
languages as awk and perl.

That said, I've decided my 250,000+ plus messages archive of some
dozen or so newsgroups is just too unwieldy for easy use, in its
present form.  (The way it came off the nntp servers)

Groups are too vast to be really usable inside of gnus, unless most of
the nifty formating, threading etc is foregone, then why bother
really.

I currently use command line tools or homemade scripting to extract
info from this pile, but it would be nice to be easily able to access
it with gnus at times too.  By `access' here, I don't mean nndir or
the like.  But handy smallish groups that handle well inside
gnus. Where all manner of highlight or other special treatment/sorting
wouldn't be a major time drag. Maybe a series of nnml groups for each
main newsgroup or something.

To cut to the chase here, I'm thinking of splitting this up into
groups that contain one month/yr of a specific group.  

However, there are enough differnet date styles to make that kind of
split pretty hard to program.  Also the problem of some messages that
came late to a thread, landing in a different group arises.  Keeping
all thead members in one group may not even be possible, except by
hand.  I'm not sure. 

Splitting on year would be easy enough but would still result in
groups too big for handy use. I'm thinking maybe something based on
file names?  These messages have there original file names as they
came from the server (for the most part).

Or  maybe just split  it up  into groups  of 2000  or less  under each
newsgroup,  not paying  attention  to date,  or  worrying about  split
threads.   (Is  `A T'  capable  of  pulling messages  from  other  `nnml'
groups?)

Wondering if some of the card carrying archivists here, like maybe
Karl K. could outline a summary of how they would do something like
this.  Or any suggestions from anyone that has seen various setups or
has experience of some kind with a problem like this.

I mean just off the top of respective heads. 



             reply	other threads:[~2001-10-11  5:26 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-10-11  5:26 Harry Putnam [this message]
2001-10-11  7:40 ` Kai Großjohann
2001-10-12  4:01   ` Harry Putnam
2001-10-11 12:02 ` Karl Kleinpaste
2001-10-11 15:54   ` Paul Jarc
2001-10-11 16:25     ` Paul Jarc
2001-10-11 16:37       ` Kai Großjohann
2001-10-12 16:44 ` Rob Browning
2001-10-13  4:28   ` Harry Putnam

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m1wv22d9x7.fsf@reader.newsguy.com \
    --to=reader@newsguy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).