From: Harry Putnam <reader@newsguy.com>
Subject: Major splitting problem ... Advice please
Date: Wed, 10 Oct 2001 22:26:28 -0700 [thread overview]
Message-ID: <m1wv22d9x7.fsf@reader.newsguy.com> (raw)
I've become something of an amateur archivist over the past few years
mainly due to a desire to have a big pile of usenet available for
searching, and for running various programming experiments in such
languages as awk and perl.
That said, I've decided my 250,000+ plus messages archive of some
dozen or so newsgroups is just too unwieldy for easy use, in its
present form. (The way it came off the nntp servers)
Groups are too vast to be really usable inside of gnus, unless most of
the nifty formating, threading etc is foregone, then why bother
really.
I currently use command line tools or homemade scripting to extract
info from this pile, but it would be nice to be easily able to access
it with gnus at times too. By `access' here, I don't mean nndir or
the like. But handy smallish groups that handle well inside
gnus. Where all manner of highlight or other special treatment/sorting
wouldn't be a major time drag. Maybe a series of nnml groups for each
main newsgroup or something.
To cut to the chase here, I'm thinking of splitting this up into
groups that contain one month/yr of a specific group.
However, there are enough differnet date styles to make that kind of
split pretty hard to program. Also the problem of some messages that
came late to a thread, landing in a different group arises. Keeping
all thead members in one group may not even be possible, except by
hand. I'm not sure.
Splitting on year would be easy enough but would still result in
groups too big for handy use. I'm thinking maybe something based on
file names? These messages have there original file names as they
came from the server (for the most part).
Or maybe just split it up into groups of 2000 or less under each
newsgroup, not paying attention to date, or worrying about split
threads. (Is `A T' capable of pulling messages from other `nnml'
groups?)
Wondering if some of the card carrying archivists here, like maybe
Karl K. could outline a summary of how they would do something like
this. Or any suggestions from anyone that has seen various setups or
has experience of some kind with a problem like this.
I mean just off the top of respective heads.
next reply other threads:[~2001-10-11 5:26 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2001-10-11 5:26 Harry Putnam [this message]
2001-10-11 7:40 ` Kai Großjohann
2001-10-12 4:01 ` Harry Putnam
2001-10-11 12:02 ` Karl Kleinpaste
2001-10-11 15:54 ` Paul Jarc
2001-10-11 16:25 ` Paul Jarc
2001-10-11 16:37 ` Kai Großjohann
2001-10-12 16:44 ` Rob Browning
2001-10-13 4:28 ` Harry Putnam
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m1wv22d9x7.fsf@reader.newsguy.com \
--to=reader@newsguy.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).