9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] fossil archive corruption - solved
@ 2012-03-22 16:30 Richard Miller
  2012-03-22 16:48 ` cinap_lenrek
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Richard Miller @ 2012-03-22 16:30 UTC (permalink / raw)
  To: 9fans

I've finally solved the mystery of data corruption in fossil archives.

As usual, the cause is obvious once you've seen it.  The clue is this
line and comment in /sys/src/cmd/fossil/cache.c:/^doRemoveLink:
	/*
	 * We're not really going to overwrite b, but if we're not
	 * going to look at its contents, there is no point in reading
	 * them from the disk.
	 */
	b = cacheLocalData(c, p->addr, p->type, p->tag, recurse ? OReadOnly : OOverWrite, 0);

The context is that b is a copy-on-write block from an earlier epoch
which has just been copied, and doRemoveLink needs only the block's
label (not its data) in order to remove it from the active file system
by setting its state to BsClosed.

The problem is that when cacheLocalData loads a block from disk with
mode OOverWrite, it sets its iostate to BioClean even though only the
block's label has been read.  In all other instances of OOverWrite, this
is benign because the caller immediately overwrites the block's data and
sets its iostate to BioDirty before unlocking it.  Even in the case of
doRemoveLink, usually the block won't have needed to be loaded from disk
because it has recently been copied and therefore is still in the cache
with valid data.  But if the block did need to be loaded again, the
cache entry will will now be marked BioClean but with invalid data (ie
the contents of whatever other block most recently used that buffer).

Now suppose a 'snap -a' is running concurrently, and the copied block in
question is from the just-finished epoch but hasn't been archived yet.
If the timing is right (wrong), the snapshot thread will find the block
in the cache and write its bad data permanently to venti.

The consequence will depend on the type of the invalid block.  If it's a
pointer or dir block, the archive operation may fail with "illegal block
address" errors when trying to walk to its descendents; if it's a meta
data block, some portion of the archived tree may become inaccessible
with "corrupted meta data" messages.  Both of these have been reported
from time to time in 9fans.  If it's a data block, the archive will
simply contain some bad data. I've observed this at least once, and who
knows how many times it has happened unnoticed?

The simplest fix would be to revert to the original fossil code in which
doRemoveLink always loaded the block with OReadOnly.  But in line with
the Principle of Least Astonishment (and to guard against anyone else
making a mistake with a similar "optimisation"), I think the right thing
is for cacheLocalData to set the block's iostate to BioLabel, not
BioClean, when only the label has been read.

Patch is fossil-archive-corruption.




^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-03-22 20:06 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-22 16:30 [9fans] fossil archive corruption - solved Richard Miller
2012-03-22 16:48 ` cinap_lenrek
2012-03-22 17:57 ` Steve Simon
2012-03-22 20:06   ` steve
2012-03-22 18:32 ` David du Colombier
2012-03-22 19:24   ` Richard Miller
2012-03-22 19:54     ` Nemo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).