9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: Richard Miller <9fans@hamnavoe.com>
To: 9fans@9fans.net
Subject: [9fans] fossil archive corruption - solved
Date: Thu, 22 Mar 2012 16:30:17 +0000	[thread overview]
Message-ID: <a07e1fae051f4fb4d151126f370545fe@hamnavoe.com> (raw)

I've finally solved the mystery of data corruption in fossil archives.

As usual, the cause is obvious once you've seen it.  The clue is this
line and comment in /sys/src/cmd/fossil/cache.c:/^doRemoveLink:
	/*
	 * We're not really going to overwrite b, but if we're not
	 * going to look at its contents, there is no point in reading
	 * them from the disk.
	 */
	b = cacheLocalData(c, p->addr, p->type, p->tag, recurse ? OReadOnly : OOverWrite, 0);

The context is that b is a copy-on-write block from an earlier epoch
which has just been copied, and doRemoveLink needs only the block's
label (not its data) in order to remove it from the active file system
by setting its state to BsClosed.

The problem is that when cacheLocalData loads a block from disk with
mode OOverWrite, it sets its iostate to BioClean even though only the
block's label has been read.  In all other instances of OOverWrite, this
is benign because the caller immediately overwrites the block's data and
sets its iostate to BioDirty before unlocking it.  Even in the case of
doRemoveLink, usually the block won't have needed to be loaded from disk
because it has recently been copied and therefore is still in the cache
with valid data.  But if the block did need to be loaded again, the
cache entry will will now be marked BioClean but with invalid data (ie
the contents of whatever other block most recently used that buffer).

Now suppose a 'snap -a' is running concurrently, and the copied block in
question is from the just-finished epoch but hasn't been archived yet.
If the timing is right (wrong), the snapshot thread will find the block
in the cache and write its bad data permanently to venti.

The consequence will depend on the type of the invalid block.  If it's a
pointer or dir block, the archive operation may fail with "illegal block
address" errors when trying to walk to its descendents; if it's a meta
data block, some portion of the archived tree may become inaccessible
with "corrupted meta data" messages.  Both of these have been reported
from time to time in 9fans.  If it's a data block, the archive will
simply contain some bad data. I've observed this at least once, and who
knows how many times it has happened unnoticed?

The simplest fix would be to revert to the original fossil code in which
doRemoveLink always loaded the block with OReadOnly.  But in line with
the Principle of Least Astonishment (and to guard against anyone else
making a mistake with a similar "optimisation"), I think the right thing
is for cacheLocalData to set the block's iostate to BioLabel, not
BioClean, when only the label has been read.

Patch is fossil-archive-corruption.




             reply	other threads:[~2012-03-22 16:30 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-22 16:30 Richard Miller [this message]
2012-03-22 16:48 ` cinap_lenrek
2012-03-22 17:57 ` Steve Simon
2012-03-22 20:06   ` steve
2012-03-22 18:32 ` David du Colombier
2012-03-22 19:24   ` Richard Miller
2012-03-22 19:54     ` Nemo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a07e1fae051f4fb4d151126f370545fe@hamnavoe.com \
    --to=9fans@hamnavoe.com \
    --cc=9fans@9fans.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).