From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-Version: 1.0 Date: Fri, 23 Jan 2009 18:08:51 -0500 Message-ID: <509071940901231508y6569a875s99008e8c3bb6b74@mail.gmail.com> From: Anthony Sorace To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: [9fans] sick cpu server; 9load hates my SATA controller Topicbox-Message-UUID: 860eccb2-ead4-11e9-9d60-3106f5b1d025 About a month ago, the motherboard in my CPU server went bad (visibly bulging capacitors!). I finally got the replacement part on RMA from the manufacturer and tried getting things going again yesterday. No joy, and the problems are strange. The symptoms differ depending on whether I have drives on sdE[0,1,3] (as was the case before) or sdE[0,1,2]. When I have drives on sdE[0,1,3], 9load starts, and proceeds normally until half-way through probing my SATA drives. The lines are: sb600: sata-II with 4 ports sdiahci: drive 0 in state ready after 0 resets sdiahci: drive 1 in state ready after 0 resets sdiahci: drive 2 in state missing after 0 resets sdiahci: drive 3 in state ready after 0 resets sdE3: i/o error 50 @0 sdE3: i/o error 50 @1 but (as best I can tell) after "state missing" line all I/O becomes dog slow. Characters print at what looks like maybe 300 baud, newlines take a few seconds to redraw the screen. Despite the extreme slowness of printing, it prints the 9load menu I had set up and responds to the menu entry and loads the kernel. It prints the "cpu0:" and "apm" lines as expected (but, again, very slowly), and then "sdE3" i/o error 50 @0" three times. It then finds the kernel, I get the expected ".886899....." and so on, with the .'s printing very slowly (less than 1/sec, suggesting that there's a more general I/O problem, not just printing). Once the kernel has finished loading, it prints "entry: 0xf0100020" and becomes totally unresponsive (no ^t^tp, random typing produces no characters). I've disabled what peripherals I can in BIOS, different BIOS settings for the SATA mode (although I'm sure it was running AHCI before), and tried with different kernels in my boot menu; no substantial change (loading a gzip'd kernel seems to print the "..." faster per dot, but hangs after the "=>"). I've tried booting of an ISO downloaded about two weeks ago, and get similar results: things seem okay until it probes the SATA controller, when it reports "sb600: sata-II with 4 ports" and then hangs (although this does respond to ^p). Note that the part is indeed a 4-port sb600 and there are indeed three disks attached (although the BIOS and 9load disagree on whether the second or third are missing). If I have drives on sdE[0,1,2], the case for the CD is the same, but the on-disk kernel gets through asking where root is from, and then yields "panic: fault: 0x11c" as it probes the drives. All the on-disk kernels perform the same way.