From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christopher Nielsen To: 9fans@cse.psu.edu Subject: Re: [9fans] venti+fossil woes Message-ID: <20031118124023.GF65844@cassie.foobarbaz.net> References: <20031114231842.GC834@cassie.foobarbaz.net> <20031116013757.GO834@cassie.foobarbaz.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20031116013757.GO834@cassie.foobarbaz.net> User-Agent: Mutt/1.5.3i Date: Tue, 18 Nov 2003 04:40:23 -0800 Topicbox-Message-UUID: 8e6fe01a-eacc-11e9-9e20-41e7f4b1d025 Here's an update for anyone interested, since I can't manage to get to sleep for some reason. I bought some better quality ata cables yesterday. That helped to the point that I thought my troubles were over. No such luck. Now, what I am seeing is whenever a venti arena becomes full and is in the process of being sealed, the screen becomes filled with IBsy+ repeated ad infinitum, which I know is from the ata driver. Eventually, fossil gives an error from diskReadRaw() saying something like: archive(0, ): cannot find block: i/o error followed by a dump that I presume could be useful for diagnostics. What I am guessing is happening is that there is so much contention in the controller that it's causing reads and sometimes writes to timeout. This eventually causes fossil to just fall over dead. At which point, I reboot from a CD, run venti/checkarenas -vf on the arena partition and then reboot so that fossil can continue where it left off with the snapshot. Wash, rinse, repeat. Anyway, the saga continues. We'll see if I end up losing data. I'm still guessing not. My only comment is that it would be nice if fossil would handle such error conditions more gracefully. Regardless, I am going to dig around for another ata controller to spread the disks across. On Sat, Nov 15, 2003 at 05:37:57PM -0800, Christopher Nielsen wrote: > this is looking more and more like it was a hardware > problem. reseating all the connections eliminated most > of the errors i was seeing. now i am getting errors > from diskRawWrite, which leads me to believe that one > of the disks is going bad. i can't really tell which > one, though. the error message from diskRawWrite gives > some diagnostic info, but i don't know how to interpret > it. admittedly, i haven't dived into the source as much > as i could, but maybe someone can provide some insight > before i go ahead and do that. > > thanks to everyone that has provided input so far. > > i have to say, it doesn't look like i'm going to lose > any data. it's not certain yet, but it's looking good. > the paranoia in fossil and venti are good. > > On Fri, Nov 14, 2003 at 03:18:42PM -0800, Christopher Nielsen wrote: > > fossil crashed in the middle of an archival snapshot. > > now, i'm getting > > > > err 4: no space left in arenas > > failed to write lump for : no space left in arenas > > > > there's plenty of space left in the arenas. a whole other > > 167G disc, in fact. > > > > i've run venti/checkarenas and venti/checkindex to fix any > > inconsistencies. they were both successful according to the > > output. > > > > any ideas about what is going on and how to fix it? > > > > also, is there any way to tell fossil to stop trying to do > > the snapshot? > > -- > Christopher Nielsen > "They who can give up essential liberty for temporary > safety, deserve neither liberty nor safety." --Benjamin Franklin -- Christopher Nielsen "They who can give up essential liberty for temporary safety, deserve neither liberty nor safety." --Benjamin Franklin