Announcements and discussions for Gnus, the GNU Emacs Usenet newsreader
 help / color / mirror / Atom feed
* Problem incorporating old mail file
@ 2006-02-19  1:40 Aaron Hsu
  2006-02-19 12:42 ` Aaron Hsu
  0 siblings, 1 reply; 6+ messages in thread
From: Aaron Hsu @ 2006-02-19  1:40 UTC (permalink / raw)


I have a huge mail file that is rather ugly which I would like to
incorporate into my Gnus system. I have reasonable splitting commands
setup for all my old mail, but what I intend to do is this:

1. Get all the mail into an nnml format for archivial purposes.
2. Eliminate all the newsgroup type entries, which are dead.
3. Sort all the mail based on a few splitting rules (I have this one
covered.) 
4. Take the most important pieces of mail (based on something like
group or some such) and upload them to my new IMAP mail server under
certain groups. (I'm doing this so that I have my important mail with
me whereever.)

What's the best way to do this?

-- 
Aaron Hsu <spam@sacrificumdeo.net> Jabber: arcfide@xmpp.us
<http://www.sacrificumdeo.net> "Extend beyond the Mortal . . . ."
"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety." - Benjamin Franklin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Problem incorporating old mail file
  2006-02-19  1:40 Problem incorporating old mail file Aaron Hsu
@ 2006-02-19 12:42 ` Aaron Hsu
  2006-02-19 14:16   ` reader
  0 siblings, 1 reply; 6+ messages in thread
From: Aaron Hsu @ 2006-02-19 12:42 UTC (permalink / raw)


Aaron Hsu <spam@sacrificumdeo.net> writes:

> I have a huge mail file that is rather ugly which I would like to
> incorporate into my Gnus system. I have reasonable splitting commands
> setup for all my old mail, but what I intend to do is this:
>
> 1. Get all the mail into an nnml format for archivial purposes.
> 2. Eliminate all the newsgroup type entries, which are dead.
> 3. Sort all the mail based on a few splitting rules (I have this one
> covered.) 
> 4. Take the most important pieces of mail (based on something like
> group or some such) and upload them to my new IMAP mail server under
> certain groups. (I'm doing this so that I have my important mail with
> me whereever.)
>
> What's the best way to do this?

I'm sorry, I forgot to mention that one of my main problems is that
the file is so large, it is causing a buffer size exceeded error, and
I am also looking for a way to read in the single file without having
to split it up and manually import tons of files.

-- 
Aaron Hsu <spam@sacrificumdeo.net> Jabber: arcfide@xmpp.us
<http://www.sacrificumdeo.net> "Extend beyond the Mortal . . . ."
"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety." - Benjamin Franklin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Problem incorporating old mail file
  2006-02-19 12:42 ` Aaron Hsu
@ 2006-02-19 14:16   ` reader
  2006-02-20  3:22     ` Aaron Hsu
  0 siblings, 1 reply; 6+ messages in thread
From: reader @ 2006-02-19 14:16 UTC (permalink / raw)


Aaron Hsu <spam@sacrificumdeo.net> writes:

> I'm sorry, I forgot to mention that one of my main problems is that
> the file is so large, it is causing a buffer size exceeded error, and
> I am also looking for a way to read in the single file without having
> to split it up and manually import tons of files.

I don't use Imap so won't comment on your other issues.

I guess the buffer error is coming from emacs?

It may fail at the command shell as well.

How huge is this file?

I would at least try the command line.  I use procmail for lots of
things.  If this huge file is in a format procmail can understand,
like unix message format. (nnfolder in gnus) where messages begin with
the `From some@email.add DATE' (no colon after From) and end with
either a space or a dot on a line by itself then procmail can read it
with no special settings.

If you want to try this and are not familiar with procmail I can show
you how to set up a sandbox work area and spit all that mail out in a
way that gnus can just slurp it right up into nnml groups of your
choice. 

Procmail can split in literally any way you can dream up.  It is very
versatile.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Problem incorporating old mail file
  2006-02-19 14:16   ` reader
@ 2006-02-20  3:22     ` Aaron Hsu
  2006-02-20 13:04       ` reader
  0 siblings, 1 reply; 6+ messages in thread
From: Aaron Hsu @ 2006-02-20  3:22 UTC (permalink / raw)


reader@newsguy.com writes:

> Aaron Hsu <spam@sacrificumdeo.net> writes:
>
>> I'm sorry, I forgot to mention that one of my main problems is that
>> the file is so large, it is causing a buffer size exceeded error, and
>> I am also looking for a way to read in the single file without having
>> to split it up and manually import tons of files.
>
> I don't use Imap so won't comment on your other issues.
>
> I guess the buffer error is coming from emacs?
>
> It may fail at the command shell as well.
>
> How huge is this file?

This file is 500+MB approximately. It is a unix mail format, I
think. Extension mbs, and it's definitely some form of mbox, but
exported and concatenated together from Opera.

> If you want to try this and are not familiar with procmail I can show
> you how to set up a sandbox work area and spit all that mail out in a
> way that gnus can just slurp it right up into nnml groups of your
> choice. 
>
> Procmail can split in literally any way you can dream up.  It is very
> versatile.

I might be interested in taking this route if it is going to be the
fastest and the cleanest. Mainly, here was what I was thinking. Most of
my important mail can be determined by who sent it (some people use
multiple email addresses). With procmail, would it be possible to grab
all the mail, first remove all the empty newsgroup entries, and then put
all the mail into individual mboxes or what have you corresponding to
the email addresses (actually the full contents of the from header would
be best) of the sender?

-- 
Aaron Hsu <spam@sacrificumdeo.net> Jabber: arcfide@xmpp.us
<http://www.sacrificumdeo.net> "Extend beyond the Mortal . . . ."
"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety." - Benjamin Franklin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Problem incorporating old mail file
  2006-02-20  3:22     ` Aaron Hsu
@ 2006-02-20 13:04       ` reader
  2006-02-21  5:56         ` Aaron Hsu
  0 siblings, 1 reply; 6+ messages in thread
From: reader @ 2006-02-20 13:04 UTC (permalink / raw)


Aaron Hsu <spam@sacrificumdeo.net> writes:

>> Procmail can split in literally any way you can dream up.  It is very
>> versatile.
>
> I might be interested in taking this route if it is going to be the
> fastest and the cleanest. Mainly, here was what I was thinking. Most of
> my important mail can be determined by who sent it (some people use
> multiple email addresses). With procmail, would it be possible to grab
> all the mail, first remove all the empty newsgroup entries, and then put
> all the mail into individual mboxes or what have you corresponding to
> the email addresses (actually the full contents of the from header would
> be best) of the sender?

Yes.
I'm not sure it is the quickest or cleanest.  And it will take some
experimentation to get the results you want.  It may take a fair bit
of experimentation so be warned it could take some time and effort to
get the results you want.  

However the big file itself will not be altered in any way so you can
always rerun the process until you get the desired result.

Procmail can rewrite/remote headers in just about any way you
like. (by using other tools like sed or whatever) but that is
transparent to the processing.  

Although I'm not sure it can handle 500mb but I think it probably can
since it is used in some really hefty mail handling situations.

Please post at least 2 full messages so we can see the format.

Try `head -200 largefile >first200' to get the first 200 lines.
That won't alter the big file in any way.

... might be more than 2 messages but just post enough so we can see
the format of at least 2 messages.  How they start and end.  What
separates them.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Problem incorporating old mail file
  2006-02-20 13:04       ` reader
@ 2006-02-21  5:56         ` Aaron Hsu
  0 siblings, 0 replies; 6+ messages in thread
From: Aaron Hsu @ 2006-02-21  5:56 UTC (permalink / raw)


reader@newsguy.com writes:

> Aaron Hsu <spam@sacrificumdeo.net> writes:
>
>>> Procmail can split in literally any way you can dream up.  It is very
>>> versatile.
>>
>> I might be interested in taking this route if it is going to be the
>> fastest and the cleanest. Mainly, here was what I was thinking. Most of
>> my important mail can be determined by who sent it (some people use
>> multiple email addresses). With procmail, would it be possible to grab
>> all the mail, first remove all the empty newsgroup entries, and then put
>> all the mail into individual mboxes or what have you corresponding to
>> the email addresses (actually the full contents of the from header would
>> be best) of the sender?
>
> Yes.
> I'm not sure it is the quickest or cleanest.  And it will take some
> experimentation to get the results you want.  It may take a fair bit
> of experimentation so be warned it could take some time and effort to
> get the results you want.  

It's older mail, so as long as I know I can get there within some amount
of reasonable time, I'm okay. :-)

> However the big file itself will not be altered in any way so you can
> always rerun the process until you get the desired result.

That's always a good thing.

> Procmail can rewrite/remote headers in just about any way you
> like. (by using other tools like sed or whatever) but that is
> transparent to the processing.  

I'm not really sure that I want to rewrite headers or anything. Really,
I just want to be able to do most of my processing by what is in the headers.

> Please post at least 2 full messages so we can see the format.
>
> Try `head -200 largefile >first200' to get the first 200 lines.
> That won't alter the big file in any way.
>
> ... might be more than 2 messages but just post enough so we can see
> the format of at least 2 messages.  How they start and end.  What
> separates them.

What I have added below is a sample of some of the first messages in the
file. You'll notice a DOS format with CRLF line enders. Other than this,
these are all examples of messages I want to get rid of. You'll notice
that they are posts to a newsgroup. I want to get rid of the newsgroup
posts that are not from me, as well as anything sent to a mailing
list. All the other mail I want to sort based on sender.

Essentially, I'm trying to get rid of mailing lists and newsgroups, and
then sort by sender, that's it.

--8<---------------cut here---------------start------------->8---
>From invalid@comcast.net Tue Mar 08 02:19:59 2005
X-Opera-Status: 040000000000002301422d0bcf000001a508400081000001a50000008400000001000001910000000000000172000000000000000000000000
X-Opera-Status: 0400000000000017e8422d0bcf000007e808100000000000c4000000e7000000080000011000000000000000f0000000960000000000000000
X-Opera-Location: <zJydnQxqe9tMlrDfRVn-sQ@comcast.com> news.giganews.com microsoft.public.win32:2
Subject: usb communications to uC
From: Eric <invalid@comcast.net>
Date: Mon, 07 Mar 2005 21:19:59 -0500
Message-ID: <zJydnQxqe9tMlrDfRVn-sQ@comcast.com>
Newsgroups: microsoft.public.win32




>From jayant.m@gmail.com Tue Mar 01 09:20:45 2005
X-Opera-Status: 040000000000002302422433ed000001f208400081000001f20000008400000001000001bf0000000000000193000000000000000000000000
X-Opera-Status: 0400000000000017ed422433ed000007d008100000000000f000000108000000080000013e0000000000000111000000960000000000000000
X-Opera-Location: <1109668845.144187.187280@l41g2000cwc.googlegroups.com> news.giganews.com microsoft.public.win32.programmer:2691
Subject: Paint probelm..backgnd getting erased
From: Jayant <jayant.m@gmail.com>
Date: Tue, 01 Mar 2005 04:20:45 -0500
Message-ID: <1109668845.144187.187280@l41g2000cwc.googlegroups.com>
Newsgroups: microsoft.public.win32.programmer




--8<---------------cut here---------------end--------------->8---


-- 
Aaron Hsu <spam@sacrificumdeo.net> Jabber: arcfide@xmpp.us
<http://www.sacrificumdeo.net> "Extend beyond the Mortal . . . ."
"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety." - Benjamin Franklin

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2006-02-21  5:56 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-02-19  1:40 Problem incorporating old mail file Aaron Hsu
2006-02-19 12:42 ` Aaron Hsu
2006-02-19 14:16   ` reader
2006-02-20  3:22     ` Aaron Hsu
2006-02-20 13:04       ` reader
2006-02-21  5:56         ` Aaron Hsu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).