Gnus development mailing list
 help / color / mirror / Atom feed
* Major splitting problem ... Advice please
@ 2001-10-11  5:26 Harry Putnam
  2001-10-11  7:40 ` Kai Großjohann
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Harry Putnam @ 2001-10-11  5:26 UTC (permalink / raw)



I've become something of an amateur archivist over the past few years
mainly due to a desire to have a big pile of usenet available for
searching, and for running various programming experiments in such
languages as awk and perl.

That said, I've decided my 250,000+ plus messages archive of some
dozen or so newsgroups is just too unwieldy for easy use, in its
present form.  (The way it came off the nntp servers)

Groups are too vast to be really usable inside of gnus, unless most of
the nifty formating, threading etc is foregone, then why bother
really.

I currently use command line tools or homemade scripting to extract
info from this pile, but it would be nice to be easily able to access
it with gnus at times too.  By `access' here, I don't mean nndir or
the like.  But handy smallish groups that handle well inside
gnus. Where all manner of highlight or other special treatment/sorting
wouldn't be a major time drag. Maybe a series of nnml groups for each
main newsgroup or something.

To cut to the chase here, I'm thinking of splitting this up into
groups that contain one month/yr of a specific group.  

However, there are enough differnet date styles to make that kind of
split pretty hard to program.  Also the problem of some messages that
came late to a thread, landing in a different group arises.  Keeping
all thead members in one group may not even be possible, except by
hand.  I'm not sure. 

Splitting on year would be easy enough but would still result in
groups too big for handy use. I'm thinking maybe something based on
file names?  These messages have there original file names as they
came from the server (for the most part).

Or  maybe just split  it up  into groups  of 2000  or less  under each
newsgroup,  not paying  attention  to date,  or  worrying about  split
threads.   (Is  `A T'  capable  of  pulling messages  from  other  `nnml'
groups?)

Wondering if some of the card carrying archivists here, like maybe
Karl K. could outline a summary of how they would do something like
this.  Or any suggestions from anyone that has seen various setups or
has experience of some kind with a problem like this.

I mean just off the top of respective heads. 



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2001-10-13  4:28 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-10-11  5:26 Major splitting problem ... Advice please Harry Putnam
2001-10-11  7:40 ` Kai Großjohann
2001-10-12  4:01   ` Harry Putnam
2001-10-11 12:02 ` Karl Kleinpaste
2001-10-11 15:54   ` Paul Jarc
2001-10-11 16:25     ` Paul Jarc
2001-10-11 16:37       ` Kai Großjohann
2001-10-12 16:44 ` Rob Browning
2001-10-13  4:28   ` Harry Putnam

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).