From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <509071940709261232t1046ecv2ce6800d549c180c@mail.gmail.com> Date: Wed, 26 Sep 2007 15:32:32 -0400 From: "Anthony Sorace" To: "Fans of the OS Plan 9 from Bell Labs" <9fans@cse.psu.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline Subject: [9fans] More venti sync woes. Topicbox-Message-UUID: c5be6ed2-ead2-11e9-9d60-3106f5b1d025 I've had a cpu server running off a non-venti-backed fossil for a few weeks now. the same machine has also been running venti (but the fossil wasn't talking to it, intentionally). I'd confirmed the venti was working by doing direct dumps and mounting the results from vacfs. All was well. Yesterday I modified my fossil config to use the venti. Edited the config with fossil/conf, rebooted, and all was well. At boot time, the "sync..." message stayed for about 10 seconds (I didn't time it, but that's the right order), as it had been on every previous reboot (before fossil was using it), and then it moved on and booted as normal. Last night something outside my house got struck by lightning and we lost power for a few seconds. On boot, it hung at the "sync..." message. It's now been double-digit hours. The disk is slowish, and lacks supported DMA, but that still seems ridiculous, especially on a system with now one day's worth of dumps (with less than 50MB data beyond the stock plan9 install). On the up side, my microwave, which has been broken for months, is now working properly again. Go figure. So I've got questions. First, I was under the impression that venti's structure made it more or less immune to abrupt shutdown. In that case, assuming no damage to the actual hardware, is it safe to factor the power outage out of the equation and just treat this as a reboot? And the big one: what's going on? I've had this sync issue in a couple different setups. In the earlier ones, I wrote it off to having re-used oventi partitions and that confusing nventi. But this has been all nventi throughout. A handful of folks on IRC have observed indefinite stalls at the same place. Aside from the clock time theory proposed just a little bit ago (which is not the case for me; I checked), I've not heard any good working theories. My next step is going to be to try booting off some other medium and rebuild the index partitions, assuming the actual arenas are unharmed. Any bets on whether that's likely to pay off? Anthony