* [9fans] /net panic
@ 2008-02-15 9:20 sqweek
2008-02-15 15:19 ` Iruata Souza
2008-02-15 16:23 ` erik quanstrom
0 siblings, 2 replies; 6+ messages in thread
From: sqweek @ 2008-02-15 9:20 UTC (permalink / raw)
To: 9fans
muzgo (from irc) was playing around with /net in qemu and came across this gem:
on drawterm:
cpu% cd /net/tcp
cpu% cat clone
23cpu% cd 23
cpu% echo connect 10.0.2.1!12345 >ctl
cpu% cat status
Finwait2 qin 0 qout 0 srtt 0 mdev 0 cwin 1461 swin 32850>>0 rwin
65535>>0 timer.start 10 timer.count 10 rerecv 0 katimer.start 200
katimer.count 159
cpu% echo connect 10.0.2.1!12345 >ctl
cpu%
this causes CPU server to reboot with:
panic: timerstate1
panic: timerstate1
dumpstack disabled
cpu0 exiting
The usage of /net is invalid, but you wouldn't really expect that to
reboot the machine (or maybe it's a holdover from before /dev/reboot
existed? ;) ).
Just tried it on my cpu server and got the same panic, so we can rule
qemu out. I adjusted the ip!port to something that would accept my
connection, and my status was something like Timedwait rather than
Finwait2. Second time around I skipped the cat status and still hit
the panic so the status read isn't affecting things (which is probably
blindingly obvious to anyone familiar with the /net code, but oh
well).
I appear to be running a realtek 8169 nic:
#l0: rtl8169: 100Mbps port 0xE400 irq 11: 000aeb2ff32c
Don't know what muzgo was using in qemu, but let me know if I can
provide any useful information.
-sqweek
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [9fans] /net panic
2008-02-15 9:20 [9fans] /net panic sqweek
@ 2008-02-15 15:19 ` Iruata Souza
2008-02-15 16:10 ` Iruata Souza
2008-02-15 16:23 ` erik quanstrom
1 sibling, 1 reply; 6+ messages in thread
From: Iruata Souza @ 2008-02-15 15:19 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
On Fri, Feb 15, 2008 at 7:20 AM, sqweek <sqweek@gmail.com> wrote:
> muzgo (from irc) was playing around with /net in qemu and came across this gem:
>
> on drawterm:
> cpu% cd /net/tcp
> cpu% cat clone
> 23cpu% cd 23
> cpu% echo connect 10.0.2.1!12345 >ctl
> cpu% cat status
> Finwait2 qin 0 qout 0 srtt 0 mdev 0 cwin 1461 swin 32850>>0 rwin
> 65535>>0 timer.start 10 timer.count 10 rerecv 0 katimer.start 200
> katimer.count 159
> cpu% echo connect 10.0.2.1!12345 >ctl
> cpu%
>
> this causes CPU server to reboot with:
> panic: timerstate1
> panic: timerstate1
> dumpstack disabled
> cpu0 exiting
>
>
> The usage of /net is invalid, but you wouldn't really expect that to
> reboot the machine (or maybe it's a holdover from before /dev/reboot
> existed? ;) ).
> Just tried it on my cpu server and got the same panic, so we can rule
> qemu out. I adjusted the ip!port to something that would accept my
> connection, and my status was something like Timedwait rather than
> Finwait2. Second time around I skipped the cat status and still hit
> the panic so the status read isn't affecting things (which is probably
> blindingly obvious to anyone familiar with the /net code, but oh
> well).
>
> I appear to be running a realtek 8169 nic:
> #l0: rtl8169: 100Mbps port 0xE400 irq 11: 000aeb2ff32c
>
> Don't know what muzgo was using in qemu, but let me know if I can
> provide any useful information.
> -sqweek
>
no *strange* things running, i guess.
iru
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [9fans] /net panic
2008-02-15 15:19 ` Iruata Souza
@ 2008-02-15 16:10 ` Iruata Souza
0 siblings, 0 replies; 6+ messages in thread
From: Iruata Souza @ 2008-02-15 16:10 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
On Fri, Feb 15, 2008 at 1:19 PM, Iruata Souza <iru.muzgo@gmail.com> wrote:
>
> On Fri, Feb 15, 2008 at 7:20 AM, sqweek <sqweek@gmail.com> wrote:
> > muzgo (from irc) was playing around with /net in qemu and came across this gem:
> >
> > on drawterm:
> > cpu% cd /net/tcp
> > cpu% cat clone
> > 23cpu% cd 23
> > cpu% echo connect 10.0.2.1!12345 >ctl
> > cpu% cat status
> > Finwait2 qin 0 qout 0 srtt 0 mdev 0 cwin 1461 swin 32850>>0 rwin
> > 65535>>0 timer.start 10 timer.count 10 rerecv 0 katimer.start 200
> > katimer.count 159
> > cpu% echo connect 10.0.2.1!12345 >ctl
> > cpu%
> >
> > this causes CPU server to reboot with:
> > panic: timerstate1
> > panic: timerstate1
> > dumpstack disabled
> > cpu0 exiting
> >
> >
> > The usage of /net is invalid, but you wouldn't really expect that to
> > reboot the machine (or maybe it's a holdover from before /dev/reboot
> > existed? ;) ).
> > Just tried it on my cpu server and got the same panic, so we can rule
> > qemu out. I adjusted the ip!port to something that would accept my
> > connection, and my status was something like Timedwait rather than
> > Finwait2. Second time around I skipped the cat status and still hit
> > the panic so the status read isn't affecting things (which is probably
> > blindingly obvious to anyone familiar with the /net code, but oh
> > well).
> >
> > I appear to be running a realtek 8169 nic:
> > #l0: rtl8169: 100Mbps port 0xE400 irq 11: 000aeb2ff32c
> >
> > Don't know what muzgo was using in qemu, but let me know if I can
> > provide any useful information.
> > -sqweek
> >
>
> no *strange* things running, i guess.
>
> iru
>
some info about my environment is in http://iru.oitobits.net/9netpanic/ where:
QEMU_Plan9 - sh script to run emulated Plan 9
cpuemu_config.tgz - CPU server's /cfg/cpuemu
qemu-ifup - sh script to up host<->guest tun interfaces, Plan 9 gets tun0
ifconfig.tun0 - tun interface configuration
listen.c - listener running on host
panic - complete scenario of snap pasted by sqweek
iru
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [9fans] /net panic
2008-02-15 9:20 [9fans] /net panic sqweek
2008-02-15 15:19 ` Iruata Souza
@ 2008-02-15 16:23 ` erik quanstrom
2008-02-15 17:29 ` Iruata Souza
1 sibling, 1 reply; 6+ messages in thread
From: erik quanstrom @ 2008-02-15 16:23 UTC (permalink / raw)
To: 9fans
[-- Attachment #1: Type: text/plain, Size: 1447 bytes --]
i'm not sure this is a perfect solution. i just don't have enough
of the plan 9 ip stack loaded into cache to be sure nothing's
been forgotten. but give this patch a whirl. basically, i think
the problem is that inittcpctl() was stepping on timers that might
have been active. these timers need to be shutdown. unfortunately,
tcpclose() and localclose() are too agressive. cleanupconnection()
is a chopped-down version of localclose.
- erik
/n/sources/plan9//sys/src/9/ip/tcp.c:782,787 - tcp.c:782,813
return mtu;
}
+ static void
+ cleanupconnection(Conv *s)
+ {
+ Tcpctl *tcb;
+ Reseq *rp,*rp1;
+ Tcppriv *tpriv;
+
+ tpriv = s->p->priv;
+ tcb = (Tcpctl*)s->ptcl;
+
+ iphtrem(&tpriv->ht, s);
+
+ tcphalt(tpriv, &tcb->timer);
+ tcphalt(tpriv, &tcb->rtt_timer);
+ tcphalt(tpriv, &tcb->acktimer);
+ tcphalt(tpriv, &tcb->katimer);
+
+ /* Flush reassembly queue; nothing more can arrive */
+ for(rp = tcb->reseq; rp != nil; rp = rp1) {
+ rp1 = rp->next;
+ freeblist(rp->bp);
+ free(rp);
+ }
+ tcb->reseq = nil;
+ }
+
void
inittcpctl(Conv *s, int mode)
{
/n/sources/plan9//sys/src/9/ip/tcp.c:792,798 - tcp.c:818,827
tcb = (Tcpctl*)s->ptcl;
- memset(tcb, 0, sizeof(Tcpctl));
+ if(tcb->timer.arg) // c->state != Idle?
+ cleanupconnection(s);
+ else
+ memset(tcb, 0, sizeof(Tcpctl));
tcb->ssthresh = 65535;
tcb->srtt = tcp_irtt<<LOGAGAIN;
[-- Attachment #2: Type: message/rfc822, Size: 4485 bytes --]
From: sqweek <sqweek@gmail.com>
To: 9fans@cse.psu.edu
Subject: [9fans] /net panic
Date: Fri, 15 Feb 2008 18:20:13 +0900
Message-ID: <140e7ec30802150120y1b0e4b33lf76787aaee84edd2@mail.gmail.com>
muzgo (from irc) was playing around with /net in qemu and came across this gem:
on drawterm:
cpu% cd /net/tcp
cpu% cat clone
23cpu% cd 23
cpu% echo connect 10.0.2.1!12345 >ctl
cpu% cat status
Finwait2 qin 0 qout 0 srtt 0 mdev 0 cwin 1461 swin 32850>>0 rwin
65535>>0 timer.start 10 timer.count 10 rerecv 0 katimer.start 200
katimer.count 159
cpu% echo connect 10.0.2.1!12345 >ctl
cpu%
this causes CPU server to reboot with:
panic: timerstate1
panic: timerstate1
dumpstack disabled
cpu0 exiting
The usage of /net is invalid, but you wouldn't really expect that to
reboot the machine (or maybe it's a holdover from before /dev/reboot
existed? ;) ).
Just tried it on my cpu server and got the same panic, so we can rule
qemu out. I adjusted the ip!port to something that would accept my
connection, and my status was something like Timedwait rather than
Finwait2. Second time around I skipped the cat status and still hit
the panic so the status read isn't affecting things (which is probably
blindingly obvious to anyone familiar with the /net code, but oh
well).
I appear to be running a realtek 8169 nic:
#l0: rtl8169: 100Mbps port 0xE400 irq 11: 000aeb2ff32c
Don't know what muzgo was using in qemu, but let me know if I can
provide any useful information.
-sqweek
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [9fans] /net panic
2008-02-15 16:23 ` erik quanstrom
@ 2008-02-15 17:29 ` Iruata Souza
2008-02-15 18:34 ` erik quanstrom
0 siblings, 1 reply; 6+ messages in thread
From: Iruata Souza @ 2008-02-15 17:29 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
On Fri, Feb 15, 2008 at 2:23 PM, erik quanstrom <quanstro@quanstro.net> wrote:
>
>i'm not sure this is a perfect solution. i just don't have enough
>of the plan 9 ip stack loaded into cache to be sure nothing's
>been forgotten. but give this patch a whirl. basically, i think
>the problem is that inittcpctl() was stepping on timers that might
>have been active. these timers need to be shutdown. unfortunately,
>tcpclose() and localclose() are too agressive. cleanupconnection()
>is a chopped-down version of localclose.
>
>- erik
>
>
>/n/sources/plan9//sys/src/9/ip/tcp.c:782,787 - tcp.c:782,813
> return mtu;
> }
>
>+ static void
>+ cleanupconnection(Conv *s)
>+ {
>+ Tcpctl *tcb;
>+ Reseq *rp,*rp1;
>+ Tcppriv *tpriv;
>+
>+ tpriv = s->p->priv;
>+ tcb = (Tcpctl*)s->ptcl;
>+
>+ iphtrem(&tpriv->ht, s);
>+
>+ tcphalt(tpriv, &tcb->timer);
>+ tcphalt(tpriv, &tcb->rtt_timer);
>+ tcphalt(tpriv, &tcb->acktimer);
>+ tcphalt(tpriv, &tcb->katimer);
>+
>+ /* Flush reassembly queue; nothing more can arrive */
>+ for(rp = tcb->reseq; rp != nil; rp = rp1) {
>+ rp1 = rp->next;
>+ freeblist(rp->bp);
>+ free(rp);
>+ }
>+ tcb->reseq = nil;
>+ }
>+
> void
> inittcpctl(Conv *s, int mode)
> {
>/n/sources/plan9//sys/src/9/ip/tcp.c:792,798 - tcp.c:818,827
>
> tcb = (Tcpctl*)s->ptcl;
>
>- memset(tcb, 0, sizeof(Tcpctl));
>+ if(tcb->timer.arg) // c->state != Idle?
>+ cleanupconnection(s);
>+ else
>+ memset(tcb, 0, sizeof(Tcpctl));
>
> tcb->ssthresh = 65535;
> tcb->srtt = tcp_irtt<<LOGAGAIN;
>
works for me.
I don't know the internal workings of the plan 9 ip stack so I take
the risk of being silly: could be that the bug is not tcp only?
iru
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [9fans] /net panic
2008-02-15 17:29 ` Iruata Souza
@ 2008-02-15 18:34 ` erik quanstrom
0 siblings, 0 replies; 6+ messages in thread
From: erik quanstrom @ 2008-02-15 18:34 UTC (permalink / raw)
To: 9fans
> works for me.
> I don't know the internal workings of the plan 9 ip stack so I take
> the risk of being silly: could be that the bug is not tcp only?
> iru
no. the problem is that active tcp timers are overwritten.
all the tcp timer code is contained within ip/tcp.c
- erik
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2008-02-15 18:34 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-15 9:20 [9fans] /net panic sqweek
2008-02-15 15:19 ` Iruata Souza
2008-02-15 16:10 ` Iruata Souza
2008-02-15 16:23 ` erik quanstrom
2008-02-15 17:29 ` Iruata Souza
2008-02-15 18:34 ` erik quanstrom
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).