9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] Missing interrupts in 9pxeload?
@ 2009-06-04 20:38 Dan Cross
  2009-06-05 18:28 ` erik quanstrom
  0 siblings, 1 reply; 9+ messages in thread
From: Dan Cross @ 2009-06-04 20:38 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Another odd thing, in my current battle to build a home plan 9
network.  Has anyone else seen this?

My terminal is a Mini-ITX machine with an Intel atom processor, a 1GB
of RAM, and a Realtek RTL8111-series ethernet controller (I haven't
cracked open the case to see what the actual chip number is; but see
below).  The machine has no mass storage device of any kind, nor an
optical drive.  It does have a VGA interface and is connected to a
keyboard and mouse by the onboard PS/2 connectors.  It is not using
USB at all.  I have disabled pretty much everything except the
graphics adapter and ethernet in the BIOS.

I am attempting to PXE boot it from my file/auth/boot/cpu server (the
aforementioned machine that is having some problems).  The machine
DHCP's fine, and will load 9pxeload via TFTP, but then hangs.  I
started playing around with 9pxeload to see what was going on,
including updating the driver in /sys/src/boot/pc using Erik's latest
from his directory on sources, but still no go.  I finally traced
through the code far enough to see that it is getting stuck in the
wait() routine in ether.c; that is defined in l.s, and just calls
'HLT' and 'RET'.  Ie, do nothing until you receipt of an interrupt and
return.  However, no interrupts ever arrive; modifying wait() to
comment out the call to idle() and then printing m->ticks every
million or so iterations through the loop shows that m->ticks doesn't
change.  It's as if all interrupts somehow got turned off prior to the
call to wait().  Has anyone else seen this?  Could there be something
somewhere that's disabling interrupts that I should look into?  Could
things be being routed weirdly on an Atom processor?

Even better, has anyone seen this and fixed it already?

        - Dan C.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] Missing interrupts in 9pxeload?
  2009-06-04 20:38 [9fans] Missing interrupts in 9pxeload? Dan Cross
@ 2009-06-05 18:28 ` erik quanstrom
  2009-06-05 20:49   ` Dan Cross
  0 siblings, 1 reply; 9+ messages in thread
From: erik quanstrom @ 2009-06-05 18:28 UTC (permalink / raw)
  To: 9fans

> below).  The machine has no mass storage device of any kind, nor an
> optical drive.  It does have a VGA interface and is connected to a
> keyboard and mouse by the onboard PS/2 connectors.  It is not using
> USB at all.  I have disabled pretty much everything except the
> graphics adapter and ethernet in the BIOS.
>
> I am attempting to PXE boot it from my file/auth/boot/cpu server (the
> aforementioned machine that is having some problems).  The machine
> DHCP's fine, and will load 9pxeload via TFTP, but then hangs.  I
> started playing around with 9pxeload to see what was going on,
> including updating the driver in /sys/src/boot/pc using Erik's latest
> from his directory on sources, but still no go.  I finally traced
> through the code far enough to see that it is getting stuck in the
> wait() routine in ether.c; that is defined in l.s, and just calls
> 'HLT' and 'RET'.  Ie, do nothing until you receipt of an interrupt and
> return.  However, no interrupts ever arrive; modifying wait() to
> comment out the call to idle() and then printing m->ticks every
> million or so iterations through the loop shows that m->ticks doesn't
> change.  It's as if all interrupts somehow got turned off prior to the
> call to wait().  Has anyone else seen this?  Could there be something
> somewhere that's disabling interrupts that I should look into?  Could
> things be being routed weirdly on an Atom processor?

how are you verifying that this machine isn't getting any
interrupts?  the wait loop will loop if an interrupt is rx'd
unless ring->owner != owner or it times out.  are you saying
that wait doesn't even timeout?

or do you mean that it's not getting any ethernet interrupts?
what irq is being enabled by ether8269.c?

- erik



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] Missing interrupts in 9pxeload?
  2009-06-05 18:28 ` erik quanstrom
@ 2009-06-05 20:49   ` Dan Cross
  2009-06-05 21:30     ` balaji
  2009-06-06  2:19     ` erik quanstrom
  0 siblings, 2 replies; 9+ messages in thread
From: Dan Cross @ 2009-06-05 20:49 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, Jun 5, 2009 at 2:28 PM, erik quanstrom<quanstro@quanstro.net> wrote:
>> I am attempting to PXE boot it from my file/auth/boot/cpu server (the
>> aforementioned machine that is having some problems).  The machine
>> DHCP's fine, and will load 9pxeload via TFTP, but then hangs.  I
>> started playing around with 9pxeload to see what was going on,
>> including updating the driver in /sys/src/boot/pc using Erik's latest
>> from his directory on sources, but still no go.  I finally traced
>> through the code far enough to see that it is getting stuck in the
>> wait() routine in ether.c; that is defined in l.s, and just calls

(Argh; it looks like I cut a line out of my original message.  Mea
culpa; the function I referred to in l.s is the idle() routine).

>> 'HLT' and 'RET'.  Ie, do nothing until you receipt of an interrupt and
>> return.  However, no interrupts ever arrive; modifying wait() to
>> comment out the call to idle() and then printing m->ticks every
>> million or so iterations through the loop shows that m->ticks doesn't
>> change.  It's as if all interrupts somehow got turned off prior to the
>> call to wait().  Has anyone else seen this?  Could there be something
>> somewhere that's disabling interrupts that I should look into?  Could
>> things be being routed weirdly on an Atom processor?
>
> how are you verifying that this machine isn't getting any
> interrupts?  the wait loop will loop if an interrupt is rx'd
> unless ring->owner != owner or it times out.

I verified by modifying wait() and performing several experiments.
For instance, manually inspecting (via a print() statement) the value
of, e.g., m->ticks and noting that it doesn't change.  Since it is
incremented in clockintr(), I'd expect it to if the machine was
servicing clock interrupts, but it stays as either '2' or '3'
depending on what else happens on the machine before it gets into
wait() (e.g., what debugging statements I add)..  Further, the call to
idle() never returns, no matter what happens that *should* be
generating an interrupt: entering key strokes on the keyboard, mouse
clicks, sending packets directly to the Ethernet interface by mucking
with the arp tables on a different machine and running ping from there
or sending to the broadcast address, etc.  If nothing else, I'd expect
clock interrupts to disrupt the HLT and thus make idle() return.  But
none of that happens.

If I comment out the call to idle() (which is how I see that m->ticks
never changes) then wait() just loops forever.

>  are you saying that wait doesn't even timeout?

That's exactly what I'm saying.

> or do you mean that it's not getting any ethernet interrupts?

It ain't getting any interrupts period: none from the ethernet, and
not even from the clock.  Of it if is, they're doing really strange
things that I cannot understand.  Or my understanding of these things
is even worse than I thought that it was.

> what irq is being enabled by ether8269.c?

According to the status messages, IRQ 11.

I should note that the machine can successfully boot (and run) OpenBSD
via PXE (by first loading the OpenBSD PXE loader and then loading and
booting an OpenBSD miniroot), so I don't think it's a hardware
problem.  That the clock doesn't seem to be interrupting at all and
that I pulled a new RTL8169 driver out of your directory on sources
and wedged it into 9pxeload with the same results makes me think that
it's not an ethernet driver issue.  I suspect that either the
interrupt vector is being incorrectly set or corrupted, or that
interrupts are somehow being disabled and never re-enabled.  The
latter doesn't seem particularly likely to me.

        - Dan C.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] Missing interrupts in 9pxeload?
  2009-06-05 20:49   ` Dan Cross
@ 2009-06-05 21:30     ` balaji
  2009-06-05 22:18       ` Dan Cross
  2009-06-06  2:19     ` erik quanstrom
  1 sibling, 1 reply; 9+ messages in thread
From: balaji @ 2009-06-05 21:30 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

I have had the same problems... With a Dell Precision 470.
In my case it was a SATA controller that was enabled/connected.

The pxeboot will load the boot agent however after that there
will be no network activity. What this means is the PXE agent
on the NIC is good and can bring down the initial agent. However
once it gets going, it has trouble communicating over the network.

It has to be an interrupt issue, and I could never figure it out. I pulled
the SATA disk out, disabled the controller via BIOS and all was good.

Something similar is happening in yours... Check what peripherals
you have and see if anything that is not necessary can be disabled
till you get past this boot process.

HTH

On Fri, Jun 5, 2009 at 1:49 PM, Dan Cross<crossd@gmail.com> wrote:
> On Fri, Jun 5, 2009 at 2:28 PM, erik quanstrom<quanstro@quanstro.net> wrote:
>>> I am attempting to PXE boot it from my file/auth/boot/cpu server (the
>>> aforementioned machine that is having some problems).  The machine
>>> DHCP's fine, and will load 9pxeload via TFTP, but then hangs.  I
>>> started playing around with 9pxeload to see what was going on,
>>> including updating the driver in /sys/src/boot/pc using Erik's latest
>>> from his directory on sources, but still no go.  I finally traced
>>> through the code far enough to see that it is getting stuck in the
>>> wait() routine in ether.c; that is defined in l.s, and just calls
>
> (Argh; it looks like I cut a line out of my original message.  Mea
> culpa; the function I referred to in l.s is the idle() routine).
>
>>> 'HLT' and 'RET'.  Ie, do nothing until you receipt of an interrupt and
>>> return.  However, no interrupts ever arrive; modifying wait() to
>>> comment out the call to idle() and then printing m->ticks every
>>> million or so iterations through the loop shows that m->ticks doesn't
>>> change.  It's as if all interrupts somehow got turned off prior to the
>>> call to wait().  Has anyone else seen this?  Could there be something
>>> somewhere that's disabling interrupts that I should look into?  Could
>>> things be being routed weirdly on an Atom processor?
>>
>> how are you verifying that this machine isn't getting any
>> interrupts?  the wait loop will loop if an interrupt is rx'd
>> unless ring->owner != owner or it times out.
>
> I verified by modifying wait() and performing several experiments.
> For instance, manually inspecting (via a print() statement) the value
> of, e.g., m->ticks and noting that it doesn't change.  Since it is
> incremented in clockintr(), I'd expect it to if the machine was
> servicing clock interrupts, but it stays as either '2' or '3'
> depending on what else happens on the machine before it gets into
> wait() (e.g., what debugging statements I add)..  Further, the call to
> idle() never returns, no matter what happens that *should* be
> generating an interrupt: entering key strokes on the keyboard, mouse
> clicks, sending packets directly to the Ethernet interface by mucking
> with the arp tables on a different machine and running ping from there
> or sending to the broadcast address, etc.  If nothing else, I'd expect
> clock interrupts to disrupt the HLT and thus make idle() return.  But
> none of that happens.
>
> If I comment out the call to idle() (which is how I see that m->ticks
> never changes) then wait() just loops forever.
>
>>  are you saying that wait doesn't even timeout?
>
> That's exactly what I'm saying.
>
>> or do you mean that it's not getting any ethernet interrupts?
>
> It ain't getting any interrupts period: none from the ethernet, and
> not even from the clock.  Of it if is, they're doing really strange
> things that I cannot understand.  Or my understanding of these things
> is even worse than I thought that it was.
>
>> what irq is being enabled by ether8269.c?
>
> According to the status messages, IRQ 11.
>
> I should note that the machine can successfully boot (and run) OpenBSD
> via PXE (by first loading the OpenBSD PXE loader and then loading and
> booting an OpenBSD miniroot), so I don't think it's a hardware
> problem.  That the clock doesn't seem to be interrupting at all and
> that I pulled a new RTL8169 driver out of your directory on sources
> and wedged it into 9pxeload with the same results makes me think that
> it's not an ethernet driver issue.  I suspect that either the
> interrupt vector is being incorrectly set or corrupted, or that
> interrupts are somehow being disabled and never re-enabled.  The
> latter doesn't seem particularly likely to me.
>
>        - Dan C.
>
>



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] Missing interrupts in 9pxeload?
  2009-06-05 21:30     ` balaji
@ 2009-06-05 22:18       ` Dan Cross
  2009-06-06  2:18         ` erik quanstrom
  0 siblings, 1 reply; 9+ messages in thread
From: Dan Cross @ 2009-06-05 22:18 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, Jun 5, 2009 at 5:30 PM, balaji<balaji.srinivasa+plan9@gmail.com> wrote:
> I have had the same problems... With a Dell Precision 470.
> In my case it was a SATA controller that was enabled/connected.
>
> The pxeboot will load the boot agent however after that there
> will be no network activity. What this means is the PXE agent
> on the NIC is good and can bring down the initial agent. However
> once it gets going, it has trouble communicating over the network.
>
> It has to be an interrupt issue, and I could never figure it out. I pulled
> the SATA disk out, disabled the controller via BIOS and all was good.
>
> Something similar is happening in yours... Check what peripherals
> you have and see if anything that is not necessary can be disabled
> till you get past this boot process.

Hmm, I don't think so.  There basically are no other peripherals in
the machine, and I disabled everything except the Ethernet and video
in the BIOS with the same results.  My suspicion is that the interrupt
vector isn't being initialized properly.

        - Dan C.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] Missing interrupts in 9pxeload?
  2009-06-05 22:18       ` Dan Cross
@ 2009-06-06  2:18         ` erik quanstrom
  2009-06-06  6:57           ` Dan Cross
  0 siblings, 1 reply; 9+ messages in thread
From: erik quanstrom @ 2009-06-06  2:18 UTC (permalink / raw)
  To: 9fans

> Hmm, I don't think so.  There basically are no other peripherals in
> the machine, and I disabled everything except the Ethernet and video
> in the BIOS with the same results.  My suspicion is that the interrupt
> vector isn't being initialized properly.

i think there are two general possiblities.
(a) the southbridge is not recognized and irq routing is not
properly initialized
(b) 9load is stepping on low memory that bios tells us
is not available.  in general x86 platforms have been shrinking
the amount of low memory available.

to tell what is going on, i think it would make sense to strip
everything out of 9load and just get to main() and set up
a clock interrupt at the default 86Hz and print something each
time through.  then call HLT.

i can't see how this could fail (i.e. print nothing), but if it does
at least we know where to start.

if it works, then we can start investigating other possibilties.

- erik



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] Missing interrupts in 9pxeload?
  2009-06-05 20:49   ` Dan Cross
  2009-06-05 21:30     ` balaji
@ 2009-06-06  2:19     ` erik quanstrom
  1 sibling, 0 replies; 9+ messages in thread
From: erik quanstrom @ 2009-06-06  2:19 UTC (permalink / raw)
  To: 9fans

> I should note that the machine can successfully boot (and run) OpenBSD
> via PXE (by first loading the OpenBSD PXE loader and then loading and
> booting an OpenBSD miniroot), so I don't think it's a hardware
> problem.  That the clock doesn't seem to be interrupting at all and
> that I pulled a new RTL8169 driver out of your directory on sources
> and wedged it into 9pxeload with the same results makes me think that
> it's not an ethernet driver issue.  I suspect that either the
> interrupt vector is being incorrectly set or corrupted, or that
> interrupts are somehow being disabled and never re-enabled.  The
> latter doesn't seem particularly likely to me.

consider smm mode.  and the possibility for stepping on smm memory
between 512k and 640k.

- erik



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] Missing interrupts in 9pxeload?
  2009-06-06  2:18         ` erik quanstrom
@ 2009-06-06  6:57           ` Dan Cross
  2009-06-06 15:00             ` erik quanstrom
  0 siblings, 1 reply; 9+ messages in thread
From: Dan Cross @ 2009-06-06  6:57 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, Jun 5, 2009 at 10:18 PM, erik quanstrom<quanstro@quanstro.net> wrote:
> i think there are two general possiblities.
> (a) the southbridge is not recognized and irq routing is not
> properly initialized
> (b) 9load is stepping on low memory that bios tells us
> is not available.  in general x86 platforms have been shrinking
> the amount of low memory available.
>
> to tell what is going on, i think it would make sense to strip
> everything out of 9load and just get to main() and set up
> a clock interrupt at the default 86Hz and print something each
> time through.  then call HLT.
>
> i can't see how this could fail (i.e. print nothing), but if it does
> at least we know where to start.
>
> if it works, then we can start investigating other possibilties.

Short summary: I got it working.  Short analysis: Well....  That was a
hell of a thing.

Longer analysis: Based on your advice, I started playing around in
9pxeload to disable things and see if I could run with just the clock
enabled.  That worked.  Then, I started to put things back in; I got
to the point where I realized that the rtl8169init() function wasn't
returning correctly.  Looking at it, I saw that the likely culprit was
the switch statement testing ctlr->macv; sure enough, my macv
(0x24800000 - I guess I was a little wrong about exactly what chipset
it is) didn't have a case associated with it.  So, I added one (that
just did break;).  Voila the kernel loaded (and so did the plan9.ini
from /cfg/pxe).  My next step was to add the same case to the
corresponding driver in the kernel.  Now, I got to the 'boot from'
prompt, selected tcp and was able to login.  The VESA video started
up, and I was good to go.  The terminal is running fine.  Hooray.

So that was it; a one line change to add a case statement to a switch
(two if you count the definition of a symbolic constant; three if you
count adding a print() before returning from rtl8169init()).  I knew
it would end up being something like that.

I guess when rtl8169init() returned, it had left interrupts off or
something; certainly, the adapter was only partially setup.  I was a
bit surprised that there wasn't more in the way of an error in that
case.  Oh well.

Now, my next big challenge is to get the VESA video modes to map up to
what the monitor that's running this expects (the monitor can do
1680x1050, but VESA doesn't support that).  I don't suppose anybody
has a "real" driver for the Intel 945?  Or some other way to force it
to change resolution?

        - Dan C.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [9fans] Missing interrupts in 9pxeload?
  2009-06-06  6:57           ` Dan Cross
@ 2009-06-06 15:00             ` erik quanstrom
  0 siblings, 0 replies; 9+ messages in thread
From: erik quanstrom @ 2009-06-06 15:00 UTC (permalink / raw)
  To: 9fans

> Longer analysis: Based on your advice, I started playing around in
> 9pxeload to disable things and see if I could run with just the clock
> enabled.  That worked.  Then, I started to put things back in; I got
> to the point where I realized that the rtl8169init() function wasn't
> returning correctly.  Looking at it, I saw that the likely culprit was
> the switch statement testing ctlr->macv; sure enough, my macv
> (0x24800000 - I guess I was a little wrong about exactly what chipset
> it is) didn't have a case associated with it.  So, I added one (that
> just did break;).  Voila the kernel loaded (and so did the plan9.ini
> from /cfg/pxe).

that's good work.  i put out a 9load-e820 with a revised rtl8169
driver that should complain if the mac is not recognized but the
vid/did are rather than putting the 8169 into this state.

if this works, i'll put similar changes into a 8169 kernel driver.

i believe several other people have seen this problem with
similar hardware.

- erik



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-06-06 15:00 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-04 20:38 [9fans] Missing interrupts in 9pxeload? Dan Cross
2009-06-05 18:28 ` erik quanstrom
2009-06-05 20:49   ` Dan Cross
2009-06-05 21:30     ` balaji
2009-06-05 22:18       ` Dan Cross
2009-06-06  2:18         ` erik quanstrom
2009-06-06  6:57           ` Dan Cross
2009-06-06 15:00             ` erik quanstrom
2009-06-06  2:19     ` erik quanstrom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).