From mboxrd@z Thu Jan  1 00:00:00 1970
MIME-Version: 1.0
In-Reply-To: <7d3530220906181010l25557061k774bb250a4a2e6dd@mail.gmail.com>
References: <7d3530220906180930p575fcb4bk473decb7d1a89c27@mail.gmail.com>
	<f579f769d816e06babfc511fdeea9d33@quanstro.net>
	<7d3530220906181010l25557061k774bb250a4a2e6dd@mail.gmail.com>
Date: Wed, 24 Jun 2009 10:06:22 -0700
Message-ID: <7d3530220906241006t7e9799f8r17f09f57c1c41831@mail.gmail.com>
From: John Floren <slawmaster@gmail.com>
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Subject: Re: [9fans] fossil/venti falling down?
Topicbox-Message-UUID: 0e94c898-ead5-11e9-9d60-3106f5b1d025

On Thu, Jun 18, 2009 at 10:10 AM, John Floren<slawmaster@gmail.com> wrote:
> On Thu, Jun 18, 2009 at 9:45 AM, erik quanstrom <quanstro@quanstro.net> w=
rote:
>>
>> > It seems to only happen once per boot, but not necessarily when fossil
>> > starts responding--I've seen it a couple hours after booting, which
>> > the filesystem tends to go away at night.
>>
>> the failure is somewhere in blockWrite. =C2=A0since blockWrite
>> calls diskWrite and diskWrite just queues up i/o to send
>> to the disk, it's not possible to get i/o errors directly from
>> blockWrite.
>>
>> there are two case that do return errors.
>>
>> one is if the block can't be locked. =C2=A0a runaway periodic function
>> would make that more likely, since we don't wait for the lock.
>> but it seems more likely in this case that some of fossil's data is
>> corrupted since this started after the double-failure.
>> see http://9fans.net/archive/2009/03/487
>>
>> the other case is a funny dependency. =C2=A0there's a fprint there
>> that's commented out.
>>
>> - erik
>>
>
> Here's another message that may be of interest. I ran fshalt before
> rebooting (to test the periodicthread patch) and saw this:
>
> syncing.../srv/fscons...prompt: sourceRoot: fs->ehi =3D 5395, b->l =3D
> BtDir,3,Copied,e=3D5394,-1,tag=3D0x1
> venti...
> halting.../srv/fscons...archive vac:a9d9b0b9fe0db783fe618f680804a18df532a=
67a
>
> I don't remember seeing that "sourceRoot: ..." stuff before; as soon
> as the system comes back up I guess I'll take a look at source.
>

After replacing the problematic server and moving the fossil disk to
the new machine, we're not getting random hangs any more.

However, I've seen this a few times on the console:

/boot/fossil: cacheLocalData: addr=3D78989 type got 0 exp 0: tag got
e63eb942 exp 663eb942
archive(0, 0x1348d): cannot find block: block label mismatch

and

 /boot/fossil: cacheLocalData: addr=3D134772 type got 0 exp 0: tag got
7795335e exp 7715335e

is this something to worry about?

John
--=20
"I've tried programming Ruby on Rails, following TechCrunch in my RSS
reader, and drinking absinthe. It doesn't work. I'm going back to C,
Hunter S. Thompson, and cheap whiskey." -- Ted Dziuba