From mboxrd@z Thu Jan  1 00:00:00 1970
Message-ID: <5a828837c6470dfc75b135b56c038ebe@plan9.bell-labs.com>
From: jmk@plan9.bell-labs.com
To: 9fans@cse.psu.edu
Subject: Re: [9fans] more fossil woes
In-Reply-To: <Pine.LNX.4.44.0310311701230.8457-100000@fbsd.cpsc.ucalgary.ca>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
Date: Fri, 31 Oct 2003 23:18:12 -0500
Topicbox-Message-UUID: 7b868e18-eacc-11e9-9e20-41e7f4b1d025

I'd say you had something more fundamental wrong, or else you're not telling
the whole story. If you do the 2nd flfmt as described below you should
get a message like
	fs header block already exists; are you sure? [y/n]:
unless you have the '-y' option.

On Fri Oct 31 19:25:36 EST 2003, mirtchov@cpsc.ucalgary.ca wrote:
> I never thought I'd get to that point, but here it is:
>
> 	Fossil is unable to initialize a partition with flfmt.
>
> Here's the whole story:
>
>
> This morning after succesfully checking my email from home I arrived at
> school just to find that fossil has died with the familiar:
>
> 	 assert failed: b->nlock == 1
> 	fossil 44: suicide: sys: trap: fault read addr=0x0 pc=0x0002b6b7
>
> It was the first crash in a long time, but unfortunately I had no way of
> finding out who/what had caused it, because Plan 9 does not allow me to
> examine process' activity based on utilization of a particular resource.
> (Interestingly enough, when I suggested such "features" are added to the
> system there was an outrage, especially from people who never use Plan 9,
> telling me I'm just polluting the beautiful system :)...
>
> I didn't give much thought to the problem and ran fossil/flchk, which
> surprisingly discovered much more errors than I had thought I had. Here's
> how many blocks it couldn't access anymore (I run a 3-day wide epoch
> window) and had suggested that I bfree:
>
> 	mirtchov@fbsd$ cat flchk | sed '/^[^b]/d' | wc -l
> 	 365357
> 	mirtchov@fbsd$
>
>
> that's 3 gigs of broken data... For comparison my entire venti archive
> weights in at 1.3GB.
>
> I examined the blocks for any obvious errors and cat them to the fossil
> console, which immediately came back with the somewhat new:
>
> 	cacheLocalData: addr=7840 type got 16 exp 8: tag got 0 exp 65afd613
> 	fossil 94: suicide: sys: trap: fault read addr=0x0 pc=0x0002b6b7
>
> A reboot or two later, and I had a running system that was good for checking
> email. Only much later, when I needed to do some real work with Plan 9 did I
> find out that /acme/bin/* was corrupted! It was showing binaries as
> existing, but no file operations could be done on them. At this point I
> decided that it's best to reinit fossil with the latest venti score and just
> forget about it, but fossil thought differently:
>
> 	plan9# fossil/flfmt -v ff96c3967c7815e15a8a4c09196221b01a8bba3d /dev/sdD0/fossil
> 	cacheLocalData: addr=7841 type got 16 exp 0: tag got 0 exp 6669fe74
> 	fossil 90: suicide: sys: trap: fault read addr=0x0 pc=0x0002b6b7
>
> exactly the same happens if I try to format the drive:
>
> 	plan9# fossil/flfmt /dev/sdD0/fossil
> 	cacheLocalData: addr=7841 type got 16 exp 0: tag got 0 exp 6669fe74
> 	fossil 89: suicide: sys: trap: fault read addr=0x0 pc=0x0002b807
>
> for all it's worth, reading and writing from sdD0 work fine...
>
> Anyway, I have another fossil disk that I can boot and with venti's help
> will reinitialize the system. Others may not be that lucky :)
>
> andrey