From mboxrd@z Thu Jan  1 00:00:00 1970
Message-ID: <7074eaeba3db257fbc8ddf9c9d169cae@quanstro.net>
From: erik quanstrom <quanstro@quanstro.net>
Date: Sat, 28 Mar 2009 13:31:01 -0400
To: 9fans@9fans.net
In-Reply-To: <20090328162750.GG22497@masters6.cs.jhu.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
Subject: Re: [9fans] fossil caching venti errors
Topicbox-Message-UUID: cbb0a902-ead4-11e9-9d60-3106f5b1d025

> AFAIK the disk is doing just fine.  Moreover, even during the period when
> fossil is complaining, venti/read on 9fs's score works just fine.  So I
> don't believe the fault is venti's.

i don't believe that conclusion is warranted.
/sys/src/cmd/fossil/cache.c:683,684
is where this condition gets set.  so either
the read fails or the score or length is bad.
%r is not set (see a few lines down) so when
combined with this report:

> This is likely too large a hammer, but when this happens I rebuild the venti index
> so that I can get past the issue.  I see this more under Plan 9 than p9p.  The
> block in error always exists in an arena and a checkarenas reports no errors.
> The problem usually persists across reboots until I reconstitute the index.

it's reasonable to guess that the block returned
might not be the right one.

in principle, this could be a drive failure,
bad memory or a venti bug. i don't have a
lot of venti experience, but i think this
/sys/src/cmd/venti/srv/lump.c:226,230
is where venti reads and it seems to insure
that the initial read double-checks scores.
it would 1e-80 hard for a drive error
to sneak by, so that leaves us with memory
errors or venti cache bugs.

it's hard to see how reindexing would fix
a cache bug, though.  so maybe i'm all wet.

it would be interesting to know if the score
of the block returned by venti/read is correct.

- erik