The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
From: random832@fastmail.com (Random832)
Subject: [TUHS] ML archive
Date: Sat, 02 Jul 2016 17:07:04 -0400	[thread overview]
Message-ID: <1467493624.1873121.655162457.687FFA51@webmail.messagingengine.com> (raw)
In-Reply-To: <87inwo1f01.fsf@luisa.c0t0d0s0.de>

On Sat, Jul 2, 2016, at 07:00, Michael Welle wrote:
> Hello,
> 
> I want to complete my local ML archive (I deleted a few emails and I
> wasn't subscribed before 2001 or so I think). After downloading the
> archives and hitting them a few times to get somewhat importable mboxes,
> I ended with 8699 emails in a maildir (in theory that should be a
> superset of the 5027 emails in my regular TUHS maildir. I will merge
> them next.). Two dozens mails are obviously defective (can be repaired
> manually maybe) and some more might be defective (needs deeper
> checking). So, has anybody more ;)?

Gah, the archive files from before 2002 are a nightmare.

My best guess on the proper message count is 8675. This is the number of
blocks of non-blank lines which either start with "From " (the easy
case) or, for the hard case, meet the following conditions:

In a file dated 2001 or earlier
First line contains a colon
Contains at least one line starting with "Received:"

There might still be a handful of messages this fails to split out. I
would recommend interpreting files dated October 2001 or later as a
strict MBOXO archive, and only doing special processing to files dated
September 2001 or earlier (I haven't factored out my own script to be
able to do anything other than count messages)


  reply	other threads:[~2016-07-02 21:07 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-02 11:00 Michael Welle
2016-07-02 21:07 ` Random832 [this message]
2016-07-03  5:46   ` Michael Welle

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1467493624.1873121.655162457.687FFA51@webmail.messagingengine.com \
    --to=random832@fastmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).