9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: Richard Miller <9fans@hamnavoe.com>
To: 9fans@9fans.net
Subject: Re: [9fans] fossil caching venti errors
Date: Wed,  8 Apr 2009 15:44:00 +0100	[thread overview]
Message-ID: <18a0a7d320f2c6799a99d3d3b64c2f4a@hamnavoe.com> (raw)
In-Reply-To: <9795467bbf361247c67dd50a1d03ac4f@terzarima.net>

I've been seeing corruption of fossil /archive data, at a rate of about
once a week.  The symptom is that venti dir entries for /archive copies
of some frequently referenced directories in my main fossil fs (generally
it's /usr, /usr/miller or /mail/box/miller) contain incorrect venti scores.
Sometimes the score is for a nonexistent block, and sometimes it's a valid
score but for a different block (e.g. a data or dir block when a meta block
is expected).

Almost always it's the dir entry for a metadata block which gets a bad
score, but occasionally it's the dir entry for a sub-dir block.  This
morning both the sub-dir and meta entries for one directory were written
with the scores of the corresponding entries for a different directory:

term% ls -lqd /n/dump/2009/0408/mail /n/dump/2009/0408/usr/miller/disk
(0000000000001646  9 80) d-rwxrwxr-x M 36 upas   upas   0 Jan  1  2007 /n/dump/2009/0408/mail
(000000000000694e 23 80) d-rwxr-xr-x M 36 miller miller 0 Dec 31  2003 /n/dump/2009/0408/usr/miller/disk
term% ls /n/dump/2009/0408/mail /n/dump/2009/0408/usr/miller/disk
/n/dump/2009/0408/mail/fdisk.c
/n/dump/2009/0408/mail/fdisk.c_ok
/n/dump/2009/0408/mail/fdisk.c_try
/n/dump/2009/0408/mail/mkfs.c
/n/dump/2009/0408/usr/miller/disk/fdisk.c
/n/dump/2009/0408/usr/miller/disk/fdisk.c_ok
/n/dump/2009/0408/usr/miller/disk/fdisk.c_try
/n/dump/2009/0408/usr/miller/disk/mkfs.c
term% diff -r /n/dump/2009/0408/mail /n/dump/2009/0408/usr/miller/disk
term%

Here are the pairs of VtEntry structures for the two files as retrieved
from venti:

/n/dump/2009/0408/mail:
0000000  00000000 1ff41fe0 03000000 00000000
0000010  00000280 0830e9ae 3023dce8 3ddf6138
0000020  4fe87d59 7257e3b7 00000000 1ff42000
0000030  01000000 00000000 0000039f 21ee2227
0000040  695a8ea2 fbbd136e dcb435e0 64fbcdfb
0000050
 /n/dump/2009/0408/usr/miller/disk:
0000000  00000000 1ff41fe0 03000000 00000000
0000010  000000a0 0830e9ae 3023dce8 3ddf6138
0000020  4fe87d59 7257e3b7 00000000 1ff42000
0000030  01000000 00000000 00000296 21ee2227
0000040  695a8ea2 fbbd136e dcb435e0 64fbcdfb
0000050

Note that it's just the score fields for /mail which are wrong; the
size fields (280 and 39f) are correct, matching yesterday's dump:

/n/dump/2009/0407/mail:
0000000  00000000 1ff41fe0 03000000 00000000
0000010  00000280 ...

So, what's going on?  My intuition says a fossil bug - I can't think
of any disk hardware error which would lead to this kind of corruption.
I have of course studied /sys/src/cmd/fossil/archive.c looking for
races (my fossil+venti machine has two processors), but everything
appears to be protected by locks.

I would be interested to know if anyone else's archive is being quietly
messed up in this way - you wouldn't necessarily know until you tried
something like a history(1) command and found pieces missing.  This
is a quick test which may show if you have a similar problem:

  cd /n/dump/2009
  for (i in *) { test -d $i$home/tmp || ls -d $i$home/tmp }
  for (i in *) { test -f $i/mail/box/$user/mbox || ls $i/mail/box/$user/mbox }

In a post a while ago, Russ said
> The amazing thing to me about fossil is how indestructable
> it is when used with venti.
> ... Once you see the data in the archive
> tree, you can be very sure it's not going away.

I agree with this, but I'd like a way to be reassured that my daily
data has actually gone into the archive correctly.

-- Richard




  parent reply	other threads:[~2009-04-08 14:44 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-28  3:48 Nathaniel W Filardo
2009-03-28 11:11 ` Charles Forsyth
2009-03-28 14:47   ` david bulkow
2009-03-28 15:13   ` erik quanstrom
2009-03-28 16:27   ` Nathaniel W Filardo
2009-03-28 17:31     ` erik quanstrom
2009-03-28 18:40       ` Nathaniel W Filardo
2009-04-08 14:44   ` Richard Miller [this message]
2009-04-08 14:56     ` ron minnich
2009-04-08 15:36       ` C H Forsyth
2009-04-08 15:55       ` Richard Miller
2009-04-08 16:46     ` cinap_lenrek
2009-04-08 17:01     ` cinap_lenrek
2009-04-08 17:28       ` Richard Miller
2009-04-08 17:41         ` cinap_lenrek
2009-04-08 18:18           ` Richard Miller
2009-04-08 17:36     ` Steve Simon
2009-03-28 20:38 erik quanstrom
2009-03-30 11:45 ` C H Forsyth
2009-03-30 11:55   ` C H Forsyth
2009-03-30 19:06     ` lucio
2009-03-30 19:15       ` erik quanstrom
2009-03-30 16:19 Pavel Klinkovsky
2010-01-08  6:53 Josef Artur

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=18a0a7d320f2c6799a99d3d3b64c2f4a@hamnavoe.com \
    --to=9fans@hamnavoe.com \
    --cc=9fans@9fans.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).