Gnus development mailing list
 help / color / mirror / Atom feed
From: Harry Putnam <reader@newsguy.com>
Subject: Re: Major splitting problem ... Advice please
Date: Fri, 12 Oct 2001 21:28:43 -0700	[thread overview]
Message-ID: <m14rp488pq.fsf@reader.newsguy.com> (raw)
In-Reply-To: <878zegyfip.fsf@raven.i.defaultvalue.org> (Rob Browning's message of "Fri, 12 Oct 2001 11:44:30 -0500")

Rob Browning <rlb@defaultvalue.org> writes:

>> However, there are enough differnet date styles to make that kind of
>> split pretty hard to program.  Also the problem of some messages
>> that came late to a thread, landing in a different group arises.
>> Keeping all thead members in one group may not even be possible,
>> except by hand.  I'm not sure.
>
> Hmm.  I had just been planning to use gnus date functions.  I hadn't
> considered that those might not be sufficient.

My comments may have been a little misleading.  They were directed at
the idea of splitting messages by date with tools such as awk and
perl.  What I was getting at was a certain amount of difficulty
getting regular expressions that match all possible date formulations
like these (Taken from a sample of headers on comp.unix.solaris):

 Date: 24 Sep 2001 09:07:45 GMT
 Date: Mon, 8 Oct 2001 15:30:18 +0100
 Date: 8 Oct 2001 14:30:26 GMT
 Date: 08 Oct 2001 16:42:08 +0200
 Date: Sun, 7 Oct 2001 20:02:06 +0200
 Date: Sun, 07 Oct 2001 17:45:17 GMT


There are some even odder formulations to be found.  Probably not
impossible to set regexp that will work for them all, but just a pita.

If you plan to use the date functions that do limiting like these:
 `/ t' and 'C-u / t'  It may not be a problem.  I wanted to do the
 splitting outside gnus because it is such a large archive.

(app 250,000  messages, from about a dozen groups)

I haven't tried this but I suspect one could do this by first setting
up a nnmail split methods that splits by date to mnemonic named
groups.  Then entering the monster groups and split them with `M P a
<RET> B r <RET>. Adjusting the spit method rules for each group, But
here again, I would expect some extensive experimentation getting the
date regexp right.  And it would be very time intensive to do that
inside gnus I think, assuming the groups are above 25,000 or so.



      reply	other threads:[~2001-10-13  4:28 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-10-11  5:26 Harry Putnam
2001-10-11  7:40 ` Kai Großjohann
2001-10-12  4:01   ` Harry Putnam
2001-10-11 12:02 ` Karl Kleinpaste
2001-10-11 15:54   ` Paul Jarc
2001-10-11 16:25     ` Paul Jarc
2001-10-11 16:37       ` Kai Großjohann
2001-10-12 16:44 ` Rob Browning
2001-10-13  4:28   ` Harry Putnam [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m14rp488pq.fsf@reader.newsguy.com \
    --to=reader@newsguy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).