9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] fossil archive corruption - solved
@ 2012-03-22 16:30 Richard Miller
  2012-03-22 16:48 ` cinap_lenrek
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Richard Miller @ 2012-03-22 16:30 UTC (permalink / raw)
  To: 9fans

I've finally solved the mystery of data corruption in fossil archives.

As usual, the cause is obvious once you've seen it.  The clue is this
line and comment in /sys/src/cmd/fossil/cache.c:/^doRemoveLink:
	/*
	 * We're not really going to overwrite b, but if we're not
	 * going to look at its contents, there is no point in reading
	 * them from the disk.
	 */
	b = cacheLocalData(c, p->addr, p->type, p->tag, recurse ? OReadOnly : OOverWrite, 0);

The context is that b is a copy-on-write block from an earlier epoch
which has just been copied, and doRemoveLink needs only the block's
label (not its data) in order to remove it from the active file system
by setting its state to BsClosed.

The problem is that when cacheLocalData loads a block from disk with
mode OOverWrite, it sets its iostate to BioClean even though only the
block's label has been read.  In all other instances of OOverWrite, this
is benign because the caller immediately overwrites the block's data and
sets its iostate to BioDirty before unlocking it.  Even in the case of
doRemoveLink, usually the block won't have needed to be loaded from disk
because it has recently been copied and therefore is still in the cache
with valid data.  But if the block did need to be loaded again, the
cache entry will will now be marked BioClean but with invalid data (ie
the contents of whatever other block most recently used that buffer).

Now suppose a 'snap -a' is running concurrently, and the copied block in
question is from the just-finished epoch but hasn't been archived yet.
If the timing is right (wrong), the snapshot thread will find the block
in the cache and write its bad data permanently to venti.

The consequence will depend on the type of the invalid block.  If it's a
pointer or dir block, the archive operation may fail with "illegal block
address" errors when trying to walk to its descendents; if it's a meta
data block, some portion of the archived tree may become inaccessible
with "corrupted meta data" messages.  Both of these have been reported
from time to time in 9fans.  If it's a data block, the archive will
simply contain some bad data. I've observed this at least once, and who
knows how many times it has happened unnoticed?

The simplest fix would be to revert to the original fossil code in which
doRemoveLink always loaded the block with OReadOnly.  But in line with
the Principle of Least Astonishment (and to guard against anyone else
making a mistake with a similar "optimisation"), I think the right thing
is for cacheLocalData to set the block's iostate to BioLabel, not
BioClean, when only the label has been read.

Patch is fossil-archive-corruption.




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [9fans] fossil archive corruption - solved
  2012-03-22 16:30 [9fans] fossil archive corruption - solved Richard Miller
@ 2012-03-22 16:48 ` cinap_lenrek
  2012-03-22 17:57 ` Steve Simon
  2012-03-22 18:32 ` David du Colombier
  2 siblings, 0 replies; 7+ messages in thread
From: cinap_lenrek @ 2012-03-22 16:48 UTC (permalink / raw)
  To: 9fans

you rock!

--
cinap



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [9fans] fossil archive corruption - solved
  2012-03-22 16:30 [9fans] fossil archive corruption - solved Richard Miller
  2012-03-22 16:48 ` cinap_lenrek
@ 2012-03-22 17:57 ` Steve Simon
  2012-03-22 20:06   ` steve
  2012-03-22 18:32 ` David du Colombier
  2 siblings, 1 reply; 7+ messages in thread
From: Steve Simon @ 2012-03-22 17:57 UTC (permalink / raw)
  To: 9fans

Kudos to Mr Miller,

-Steve



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [9fans] fossil archive corruption - solved
  2012-03-22 16:30 [9fans] fossil archive corruption - solved Richard Miller
  2012-03-22 16:48 ` cinap_lenrek
  2012-03-22 17:57 ` Steve Simon
@ 2012-03-22 18:32 ` David du Colombier
  2012-03-22 19:24   ` Richard Miller
  2 siblings, 1 reply; 7+ messages in thread
From: David du Colombier @ 2012-03-22 18:32 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Good catch!

I've not been able to easily reproduce this problem.
How did you proceed?

--
David du Colombier



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [9fans] fossil archive corruption - solved
  2012-03-22 18:32 ` David du Colombier
@ 2012-03-22 19:24   ` Richard Miller
  2012-03-22 19:54     ` Nemo
  0 siblings, 1 reply; 7+ messages in thread
From: Richard Miller @ 2012-03-22 19:24 UTC (permalink / raw)
  To: 9fans

> I've not been able to easily reproduce this problem.
> How did you proceed?

Not very reproducible, but I found it could be encouraged to happen
(on a sacrificial secondary fossil+venti) with this:
	tar c /n/xfossil/sys/src >/dev/null &
	while (sleep 60) echo snap -a >>/srv/xfscons

An easily recognisable consequence when it fails is a BtDir block with
invalid contents.  So I peppered the source with diagnostics checking
for that, until I was able to spot exactly where it happened.




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [9fans] fossil archive corruption - solved
  2012-03-22 19:24   ` Richard Miller
@ 2012-03-22 19:54     ` Nemo
  0 siblings, 0 replies; 7+ messages in thread
From: Nemo @ 2012-03-22 19:54 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

impressive. hats off.

--
iphone kbd. excuse typos :)


On Mar 22, 2012, at 8:24 PM, Richard Miller <9fans@hamnavoe.com> wrote:

>> I've not been able to easily reproduce this problem.
>> How did you proceed?
>
> Not very reproducible, but I found it could be encouraged to happen
> (on a sacrificial secondary fossil+venti) with this:
>    tar c /n/xfossil/sys/src >/dev/null &
>    while (sleep 60) echo snap -a >>/srv/xfscons
>
> An easily recognisable consequence when it fails is a BtDir block with
> invalid contents.  So I peppered the source with diagnostics checking
> for that, until I was able to spot exactly where it happened.
>



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [9fans] fossil archive corruption - solved
  2012-03-22 17:57 ` Steve Simon
@ 2012-03-22 20:06   ` steve
  0 siblings, 0 replies; 7+ messages in thread
From: steve @ 2012-03-22 20:06 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

apologies to the others involved, i sent this before i read all of richard's email.

well done all, i have been hoping people with more skill than i might fix this.

-Steve


On 22 Mar 2012, at 05:57 PM, "Steve Simon" <steve@quintile.net> wrote:

> Kudos to Mr Miller,
> 
> -Steve



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-03-22 20:06 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-22 16:30 [9fans] fossil archive corruption - solved Richard Miller
2012-03-22 16:48 ` cinap_lenrek
2012-03-22 17:57 ` Steve Simon
2012-03-22 20:06   ` steve
2012-03-22 18:32 ` David du Colombier
2012-03-22 19:24   ` Richard Miller
2012-03-22 19:54     ` Nemo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).