From mboxrd@z Thu Jan  1 00:00:00 1970
From: Artem Letko <aletko@iname.com>
Subject: Re: [9fans] exception/interrupt 14 bug
To: 9fans@cse.psu.edu
Message-id: <000701c1fad8$71fef9e0$1c742544@rosele01.nj.comcast.net>
MIME-version: 1.0
Content-type: text/plain; charset=koi8-r
Content-transfer-encoding: 7BIT
References: <293d086081456bfda9777d45dfa18ba1@plan9.bell-labs.com>
Date: Mon, 13 May 2002 19:46:46 -0400
Topicbox-Message-UUID: 90639170-eaca-11e9-9e20-41e7f4b1d025

The fix doesn't seem to be related to my problem though -
exception/interrupt 14 happens much earlier in my case, before the kernel is
fully loaded. The new floppy still panics.

Art.

----- Original Message -----
From: "Russ Cox" <rsc@plan9.bell-labs.com>
To: <9fans@cse.psu.edu>
Sent: Monday, May 13, 2002 3:58 PM
Subject: [9fans] exception/interrupt 14 bug


> The short version is that I think this bug is fixed; at
> least, I think the bug that cause the two reports with
> register dumps are fixed.  I don't know about the others
> that were mentioned without full panic messages, but it's
> worth trying again.  Thanks to Marc Rocas for reporting the
> first instance of the bug and to Kevin Adams of VMware for
> reporting the fix five days ago (though I didn't realize it
> then).
>
> The long version is the story of the actual bug, just
> because it's fun.
>
> Here's the first transcript I got of the bug in action:
>
> PBS... Plan 9 from Bell Labs
> ELCR: 0E00
> apm ax=f000 cx=f000 dx=40 di=ffff ebx=57c7 esi=-1
> using fd0!dos!plan9.ini
> dev A0 port 1F0 config 045A capabilities 2F00 mwdma 0007 udma 0407
> dev A0 port 170 config 80A0 capabilities 0E00 mwdam 0000
>
> found 9PCFLOP .GZ attr 0x0 start 0xf0 len 1162928
> ...gz..... 1162928 -> 637473+992156+63892 = 1693521
> entry:80100020
> FLAGS=10286 TRAP=e ECODE=0 PC=8001c50b
> AX d3c04c57 BX 000006e0 CX 80317718 DX 00000000
> SI 8003b580 DI 8003b55c BP 0000000a
> CS 0010 DS 0008 ES 0008 FS 0008 GS 0008
> CR0 80000011 CR2 d3c04c6b CR3 000c000
> panic: exception/interrupt 14
>
> After printing the entry line and before jumping to the
> loaded kernel, 9load calls _sddetach to turn off any devices
> that might otherwise interrupt the nascent kernel.
>
> static void
> _sddetach(void)
> {
> SDev *sdev;
>
> for(sdev = sdlist; sdev != nil; sdev = sdev->next){
> if(sdev->enabled == 0)
> continue;
> if(sdev->ifc->disable)
> sdev->ifc->disable(sdev);
> sdev->enabled = 0;
> }
> }
>
> Unfortunately, sddetach has died because sdev->ifc does not
> contain &sdataifc as expected, but rather d3c04c57 (AX).  We
> know that sdev is the right pointer because it's in CX and
> booting another 9load with some debugging prints in malloc
> shows that the first sdev always ends up at 80317718.
>
> The interesting question now is what d3c04c57 is.  It's not
> a pointer or a small integer.  It really looks like random
> noise; it could conceivably be a clock tick count except
> that the clock runs at 100Hz and is initialized to zero.
> Still, it's distinctive enough to look for.  It's not in the
> 9load binary nor in 9pcflop.gz, but it _is_ in the
> uncompressed 9pcflop, at location 125159 (hex).  The
> kernel size is (decimal) 637473+992156+63892, so
> that puts it in the data segment.  The kernel file layout
> looks like:
>
> 00000  exec header
> 00020  text
> 9ba41  data
>
> while the layout in memory will eventually look like
>
> 80100000  exec header
> 80100020  text
> 8019c000  data
>
> with the data shifted to begin on the next page boundary
> after the end of the text.  That means our magic value
> d3c04c57 should have ended up at 80225718, but we
> found it at 80317718 instead.  The identical page alignment
> of the expected and actual locations of the value is very
> suspicious.  Their difference is f2000, which doesn't ring
> any bells.
>
> Meanwhile, we have this puzzle.
>
> if(gunzip((uchar*)PADDR(entry)-sizeof(Exec), sizeof(Exec)+text+data,
>      (uchar*)b->bp, b->wp-b->bp) < sizeof(Exec)+text+data) {
> print("error uncompressing kernel\n");
> return FAIL;
> }
>
> /* relocate data to start at page boundary */
> memmove((void*)PGROUND(entry+text), (void*)(entry+text), data);
>
> print("entry: %lux\n", entry);
> warp9(PADDR(entry));
>
> The magic value came into existence in memory as a result
> of the gunzip call above, but it would have been in an unshifted
> text segment, so it would be at 80225159 rather than 80225718.
> The memmove is the only way the data could have been shifted
> into the right alignment, but how did it end up at 80317718?
>
> But first, a different story.  Five days ago Kevin Adams of
> VMware pointed out that our protected mode entry code
> looked like:
>
>            0x1000:0x0066 | 0f 01 16 6d 01                 LGDT  (0x16d)
>            0x1000:0x006b | b8 01 00                       MOV   %ax,0x1
>            0x1000:0x006e | 00 00                          ADD
0x0(%bx+%si),%al
>            0x1000:0x0070 | 0f 01 f0                       LMSW  %cr0,%ax
>            0x1000:0x0073 | eb 00                          JMP   0x0 ;0x75
>
> the weird part being the ADD instruction.  Of course, this
> is because the mov %eax, 0x1 that precedes it was
> assembled with a 32-bit immediate rather than a 16-bit
> immediate.
>
> This is a fairly new bug.  I introduced it while fiddling with the
> VMware port a few months ago.  The effect is to increment
> the byte at (BX+SI), whatever that may be.  Maybe, just maybe,
> some bit in a page table is getting whacked just right to double
> map one page at both 80225000 and 80317000.
>
> I put a check in warp9 to look for page table corruption
> and sent out a new boot disk, but the check didn't fire;
> the crash remained the same.  So the page tables are okay.
>
> Of course, if I could do decimal/hex conversion a little
> better in my head I would have noticed that f2000 (the difference
> twixt 80225000 and 80317000) is almost exactly the size of
> the data segment itself: 992156 = f239c.
>
> If the memmove call being executed were really
>
> memmove((void*)PGROUND(entry+text+data), (void*)(entry+text), data);
>
> then that would break things in exactly the manner we
> observed.  But clearly the memmove isn't that, and further
> it works on most people's machines.
>
> The disassembly looks perfectly fine:
>
> bootpass+0x210 0x80015798 MOVL entry+0x28(SP),DX
> bootpass+0x214 0x8001579c MOVL data+0x24(SP),CX
> bootpass+0x218 0x800157a0 MOVL text+0x20(SP),BX
> bootpass+0x21c 0x800157a4 MOVL CX,BP
> bootpass+0x21e 0x800157a6 ADDL BX,CX
> bootpass+0x220 0x800157a8 ADDL $0x20,CX
> bootpass+0x223 0x800157ab CMPL CX,AX
> bootpass+0x225 0x800157ad JCC bootpass+0x23d(SB)
> ...
> bootpass+0x23d 0x800157c5 MOVL BX,AX
> bootpass+0x23f 0x800157c7 ADDL DX,AX
> bootpass+0x241 0x800157c9 ADDL $0xfff,AX
> bootpass+0x246 0x800157ce ANDL $0xfffff000,AX
> bootpass+0x24b 0x800157d3 MOVL AX,0x0(SP)
> bootpass+0x24e 0x800157d6 LEA 0x0(BX)(DX*1),AX
> bootpass+0x251 0x800157d9 MOVL AX,0x4(SP)
> bootpass+0x255 0x800157dd MOVL BP,0x8(SP)
> bootpass+0x259 0x800157e1 CALL memmove(SB)
>
> The first bit is doing the comparison of the gunzip return
> value, and then we set the arguments for memmove.
>
> But wait.  We don't have to guess at what byte got incremented
> by the other bug.  The last thing before the jump to protected
> mode is the BIOS call to initialize APM.  The registers upon
> return were saved for the kernel, and even printed in the
> transcript:
>
> apm ax=f000 cx=f000 dx=40 di=ffff ebx=57c7 esi=-1
>
> And there it is.  BX+SI = 57c6, and the data segment
> at the time of the increment is 1000, making the byte
> 157c6 aka 80157c6.  The MOVL BX,AX at 80157c5 gets
> assembled as 89 D8; after the second byte is incremented
> it becomes 89 D9, which is MOVL  BX, CX.
>
> So in fact, the memmove call that was being executed was
>
> memmove((void*)PGROUND(entry+text+data+sizeof(Exec)),
> (void*)(entry+text), data);
>
> which ends up being close enough.
>
> The fix is in /sys/src/boot/pc/l.s.  In the section marked
>
> /*
>  * goto protected mode
>  */
>
> change
>
> MOVL $1, AX
>
> into
>
> LWI(1, rAX)
>
> I put up a fixed boot disk, although the distribution archive
> still contains a bad 9load.  Hopefully we can get a new
> distribution archive together soon.  That takes a bit longer
> to test.
>
> Russ
>
>