From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <7074eaeba3db257fbc8ddf9c9d169cae@quanstro.net> From: erik quanstrom Date: Sat, 28 Mar 2009 13:31:01 -0400 To: 9fans@9fans.net In-Reply-To: <20090328162750.GG22497@masters6.cs.jhu.edu> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Subject: Re: [9fans] fossil caching venti errors Topicbox-Message-UUID: cbb0a902-ead4-11e9-9d60-3106f5b1d025 > AFAIK the disk is doing just fine. Moreover, even during the period when > fossil is complaining, venti/read on 9fs's score works just fine. So I > don't believe the fault is venti's. i don't believe that conclusion is warranted. /sys/src/cmd/fossil/cache.c:683,684 is where this condition gets set. so either the read fails or the score or length is bad. %r is not set (see a few lines down) so when combined with this report: > This is likely too large a hammer, but when this happens I rebuild the venti index > so that I can get past the issue. I see this more under Plan 9 than p9p. The > block in error always exists in an arena and a checkarenas reports no errors. > The problem usually persists across reboots until I reconstitute the index. it's reasonable to guess that the block returned might not be the right one. in principle, this could be a drive failure, bad memory or a venti bug. i don't have a lot of venti experience, but i think this /sys/src/cmd/venti/srv/lump.c:226,230 is where venti reads and it seems to insure that the initial read double-checks scores. it would 1e-80 hard for a drive error to sneak by, so that leaves us with memory errors or venti cache bugs. it's hard to see how reindexing would fix a cache bug, though. so maybe i'm all wet. it would be interesting to know if the score of the block returned by venti/read is correct. - erik