From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: From: erik quanstrom Date: Mon, 11 May 2009 09:53:48 -0400 To: 9fans@9fans.net In-Reply-To: <20090510.043848.7230.0@webmail04.dca.untd.com> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Subject: Re: [9fans] RE; MP Interrupt Topicbox-Message-UUID: f6e46abe-ead4-11e9-9d60-3106f5b1d025 > I have seen the situation you've described in other operating systems > and it's often been H/W related and not due to the OS itself. In the > situations I've seen such problems were caused by the way the bios > assigns irq's. Though seemingly un-necesasry, I have solved similar problem > by simply moving a PCI card to another slot. russ set me on the right track with this suggestion. suppose you had a driver that, when it got a spurious interrupt, would trigger a real interrupt for itself next. that would work fine in isolation and even with other, correct drivers: every time one of those drivers got an interrupt, the buggy one would see it as spurious and trigger a real one, but then the system would calm down. it turned out that there weren't any spurious interrupts, but there mv50xx had unhandled interrupts. the story starts with sd(3). Units are not accessed before the first attach. Units may be individually attached using the attach specifier, [...] the way the mv50xx driver had been interpreting this has been to not fully configure ports before sdev->verify is called. however, sdev->enable is called at boot time so until (all) the ports are first accessed, there there was a window where the irq could scream because it could not be serviced; accessing the drives calmed the interrupt. a the solution is to fully configure the drives in the pnp fn. so the remaining question is why the red herring of multiple sd devices on the same irq? that is actually easy to explain. on a cpu server, readnvram(2) scans sd devices until it finds a proper nvram. on my machine, sda0 was scanned first and has an acceptable nvram. thus sdF (the mv50xx controller) was not scanned. if i disable the orion controller, then there is no nvram and all the sd devices including sdF are scanned. - erik