From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <293d086081456bfda9777d45dfa18ba1@plan9.bell-labs.com> To: 9fans@cse.psu.edu From: "Russ Cox" MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Subject: [9fans] exception/interrupt 14 bug Date: Mon, 13 May 2002 15:58:25 -0400 Topicbox-Message-UUID: 902d74c8-eaca-11e9-9e20-41e7f4b1d025 The short version is that I think this bug is fixed; at least, I think the bug that cause the two reports with register dumps are fixed. I don't know about the others that were mentioned without full panic messages, but it's worth trying again. Thanks to Marc Rocas for reporting the first instance of the bug and to Kevin Adams of VMware for reporting the fix five days ago (though I didn't realize it then). The long version is the story of the actual bug, just because it's fun. Here's the first transcript I got of the bug in action: PBS... Plan 9 from Bell Labs ELCR: 0E00 apm ax=f000 cx=f000 dx=40 di=ffff ebx=57c7 esi=-1 using fd0!dos!plan9.ini dev A0 port 1F0 config 045A capabilities 2F00 mwdma 0007 udma 0407 dev A0 port 170 config 80A0 capabilities 0E00 mwdam 0000 found 9PCFLOP .GZ attr 0x0 start 0xf0 len 1162928 .....gz..... 1162928 -> 637473+992156+63892 = 1693521 entry:80100020 FLAGS=10286 TRAP=e ECODE=0 PC=8001c50b AX d3c04c57 BX 000006e0 CX 80317718 DX 00000000 SI 8003b580 DI 8003b55c BP 0000000a CS 0010 DS 0008 ES 0008 FS 0008 GS 0008 CR0 80000011 CR2 d3c04c6b CR3 000c000 panic: exception/interrupt 14 After printing the entry line and before jumping to the loaded kernel, 9load calls _sddetach to turn off any devices that might otherwise interrupt the nascent kernel. static void _sddetach(void) { SDev *sdev; for(sdev = sdlist; sdev != nil; sdev = sdev->next){ if(sdev->enabled == 0) continue; if(sdev->ifc->disable) sdev->ifc->disable(sdev); sdev->enabled = 0; } } Unfortunately, sddetach has died because sdev->ifc does not contain &sdataifc as expected, but rather d3c04c57 (AX). We know that sdev is the right pointer because it's in CX and booting another 9load with some debugging prints in malloc shows that the first sdev always ends up at 80317718. The interesting question now is what d3c04c57 is. It's not a pointer or a small integer. It really looks like random noise; it could conceivably be a clock tick count except that the clock runs at 100Hz and is initialized to zero. Still, it's distinctive enough to look for. It's not in the 9load binary nor in 9pcflop.gz, but it _is_ in the uncompressed 9pcflop, at location 125159 (hex). The kernel size is (decimal) 637473+992156+63892, so that puts it in the data segment. The kernel file layout looks like: 00000 exec header 00020 text 9ba41 data while the layout in memory will eventually look like 80100000 exec header 80100020 text 8019c000 data with the data shifted to begin on the next page boundary after the end of the text. That means our magic value d3c04c57 should have ended up at 80225718, but we found it at 80317718 instead. The identical page alignment of the expected and actual locations of the value is very suspicious. Their difference is f2000, which doesn't ring any bells. Meanwhile, we have this puzzle. if(gunzip((uchar*)PADDR(entry)-sizeof(Exec), sizeof(Exec)+text+data, (uchar*)b->bp, b->wp-b->bp) < sizeof(Exec)+text+data) { print("error uncompressing kernel\n"); return FAIL; } /* relocate data to start at page boundary */ memmove((void*)PGROUND(entry+text), (void*)(entry+text), data); print("entry: %lux\n", entry); warp9(PADDR(entry)); The magic value came into existence in memory as a result of the gunzip call above, but it would have been in an unshifted text segment, so it would be at 80225159 rather than 80225718. The memmove is the only way the data could have been shifted into the right alignment, but how did it end up at 80317718? But first, a different story. Five days ago Kevin Adams of VMware pointed out that our protected mode entry code looked like: 0x1000:0x0066 | 0f 01 16 6d 01 LGDT (0x16d) 0x1000:0x006b | b8 01 00 MOV %ax,0x1 0x1000:0x006e | 00 00 ADD 0x0(%bx+%si),%al 0x1000:0x0070 | 0f 01 f0 LMSW %cr0,%ax 0x1000:0x0073 | eb 00 JMP 0x0 ;0x75 the weird part being the ADD instruction. Of course, this is because the mov %eax, 0x1 that precedes it was assembled with a 32-bit immediate rather than a 16-bit immediate. This is a fairly new bug. I introduced it while fiddling with the VMware port a few months ago. The effect is to increment the byte at (BX+SI), whatever that may be. Maybe, just maybe, some bit in a page table is getting whacked just right to double map one page at both 80225000 and 80317000. I put a check in warp9 to look for page table corruption and sent out a new boot disk, but the check didn't fire; the crash remained the same. So the page tables are okay. Of course, if I could do decimal/hex conversion a little better in my head I would have noticed that f2000 (the difference twixt 80225000 and 80317000) is almost exactly the size of the data segment itself: 992156 = f239c. If the memmove call being executed were really memmove((void*)PGROUND(entry+text+data), (void*)(entry+text), data); then that would break things in exactly the manner we observed. But clearly the memmove isn't that, and further it works on most people's machines. The disassembly looks perfectly fine: bootpass+0x210 0x80015798 MOVL entry+0x28(SP),DX bootpass+0x214 0x8001579c MOVL data+0x24(SP),CX bootpass+0x218 0x800157a0 MOVL text+0x20(SP),BX bootpass+0x21c 0x800157a4 MOVL CX,BP bootpass+0x21e 0x800157a6 ADDL BX,CX bootpass+0x220 0x800157a8 ADDL $0x20,CX bootpass+0x223 0x800157ab CMPL CX,AX bootpass+0x225 0x800157ad JCC bootpass+0x23d(SB) ... bootpass+0x23d 0x800157c5 MOVL BX,AX bootpass+0x23f 0x800157c7 ADDL DX,AX bootpass+0x241 0x800157c9 ADDL $0xfff,AX bootpass+0x246 0x800157ce ANDL $0xfffff000,AX bootpass+0x24b 0x800157d3 MOVL AX,0x0(SP) bootpass+0x24e 0x800157d6 LEA 0x0(BX)(DX*1),AX bootpass+0x251 0x800157d9 MOVL AX,0x4(SP) bootpass+0x255 0x800157dd MOVL BP,0x8(SP) bootpass+0x259 0x800157e1 CALL memmove(SB) The first bit is doing the comparison of the gunzip return value, and then we set the arguments for memmove. But wait. We don't have to guess at what byte got incremented by the other bug. The last thing before the jump to protected mode is the BIOS call to initialize APM. The registers upon return were saved for the kernel, and even printed in the transcript: apm ax=f000 cx=f000 dx=40 di=ffff ebx=57c7 esi=-1 And there it is. BX+SI = 57c6, and the data segment at the time of the increment is 1000, making the byte 157c6 aka 80157c6. The MOVL BX,AX at 80157c5 gets assembled as 89 D8; after the second byte is incremented it becomes 89 D9, which is MOVL BX, CX. So in fact, the memmove call that was being executed was memmove((void*)PGROUND(entry+text+data+sizeof(Exec)), (void*)(entry+text), data); which ends up being close enough. The fix is in /sys/src/boot/pc/l.s. In the section marked /* * goto protected mode */ change MOVL $1, AX into LWI(1, rAX) I put up a fixed boot disk, although the distribution archive still contains a bad 9load. Hopefully we can get a new distribution archive together soon. That takes a bit longer to test. Russ