9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] infrequent panics
@ 2008-03-12 20:08 erik quanstrom
  2008-03-13  0:33 ` Russ Cox
  0 siblings, 1 reply; 5+ messages in thread
From: erik quanstrom @ 2008-03-12 20:08 UTC (permalink / raw)
  To: 9fans

on machines that serve dns, i get infrequently get
panics like the one appeneded.  there doesn't appear
to be any pattern to the backtrace.  but perhaps i
am missing something.

- erik

----

src(0xf0108bbc); // dumpstack+0x10
src(0xf0145f4c); // panic+0xfc
src(0xf0186e59); // ppanic+0xb4
src(0xf01a3db8); // panicblock+0x49
src(0xf01a403a); // blockcheck+0x27e
src(0xf01a4b1a); // poolfreel+0x36
src(0xf01a5037); // poolfree+0x41
src(0xf0187208); // free+0x23
src(0xf018c815); // devattach+0xc8
src(0xf013ae0b); // procattach+0x18
src(0xf018c218); // namec+0xd50
src(0xf019a17a); // sysopen+0xa9
src(0xf010903b); // syscall+0x1b5
src(0xf0100ca3); // _syscallintr+0x18
//didn't find pc at sp=0xf36a4140, last pc found at sp=0xf36a413c


Wed Mar 12 13:20:01: panic: corrupt tail magic0
Wed Mar 12 13:20:01: pool Main block f36fa3a0
Wed Mar 12 13:20:01: hdr 0a110c09 00000020 f018c7ce cafebabe 20737023 7064755b
Wed Mar 12 13:20:01: tail 0a110c09 00000020 f018c7ce cafebabe 20737023 7064755b | 72657320 20726576
Wed Mar 12 13:20:01: pool panic
Wed Mar 12 13:20:01: panic: corrupt tdil magic0
Wed Mar 12 13:20:02: pool Main blouck f36fa3a0
Wed Mar 12 13:20:02: hdr 0a110c09 00000020 f018c7ce cafebabe 20737023 7064755b
Wed Mar 12 13:20:02: tail 0a110c09 00000020 f018c7ce cafebabe 20737023 7064755b | 72657320 20726576
Wed Mar 12 13:20:02: pool panic
Wed Mar 12 13:20:02: pstack
Wed Mar 12 13:20:02: ktrace /kernel/path f0108bbc f36a3cbc <<EOF
Wed Mar 12 13:20:02: estackx f36a4190
Wed Mar 12 13:20:02: f36a3c5c=f0108944 f36a3c64=f0196a16 f36a3c90=f0145c0e f36a3ca4=f0108bbc
Wed Mar 12 13:20:02: f36a3cb8=f0108bbc f36a3cbc=f0108948 f36a3cc4=f0145f4c f36a3dac=f01a5b6b
Wed Mar 12 13:20:02: f36a3dd0=f019f76b f36a3de0=f0192039 f36a3df0=f0186e59 f36a3f18=f01a3db8
Wed Mar 12 13:20:02: f36a3f2c=f01a403a f36a3f3c=f019ef8d f36a3f40=f019efa3 f36a3f50=f019f41e
Wed Mar 12 13:20:02: f36a3f64=f01a4b1a f36a3f70=f0186e76 f36a3f78=f0186e82 f36a3f84=f01893b0
Wed Mar 12 13:20:02: f36a3f8c=f01a5037 f36a3fa4=f0187208 f36a3fb4=f018c815 f36a3fdc=f013ae0b
Wed Mar 12 13:20:02: f36a3fe8=f018b502 f36a3fec=f018c218 f36a3ffc=f01a2ff0 f36a4008=f01a40ab
Wed Mar 12 13:20:02: f36a4014=f019f33f f36a402c=f019f33f f36a4040=f01a4bc7 f36a404c=f0186e76
Wed Mar 12 13:20:02: f36a4054=f0186e82 f36a4060=f019ef8d f36a4068=f01a5077 f36a4070=f019ef8d
Wed Mar 12 13:20:03: f36a4074=f019efa3 f36a407c=f019f0e7 f36a4090=f019f63f f36a4098=f0188f2e
Wed Mar 12 13:20:03: f36a40a0=f0188f48 f36a40b0=f018f578 f36a40cc=f019a1cb f36a40d0=f018f608
Wed Mar 12 13:20:03: f36a40e4=f019a17a f36a4108=f010903b f36a411c=f010065d f36a413c=f0100ca3
Wed Mar 12 13:20:03: f36a4170=f010001b f36a4174=00000040 f36a4178=f01007ee f36a417c=00017bec
Wed Mar 12 13:20:03: f36a4180=00000023 f36a4184=00000286 f36a4188=dfffe3d4 f36a418c=0000001b
Wed Mar 12 13:20:03: EOF


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [9fans] infrequent panics
  2008-03-12 20:08 [9fans] infrequent panics erik quanstrom
@ 2008-03-13  0:33 ` Russ Cox
  2008-03-13 10:44   ` roger peppe
  0 siblings, 1 reply; 5+ messages in thread
From: Russ Cox @ 2008-03-13  0:33 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

This is a cool bug; thanks for the report.
I love a good concurrency bug.

>  Wed Mar 12 13:20:01: hdr 0a110c09 00000020 f018c7ce cafebabe 20737023 7064755b
>  Wed Mar 12 13:20:01: tail 0a110c09 00000020 f018c7ce cafebabe 20737023 7064755b | 72657320 20726576

This says "#ps [udp server]" (I'm guessing about the last few bytes),
which is the name of the dns proc as it shows up in ps.

The stack trace showed this buffer being corrupt in devattach,
which executes:

	buf = smalloc(4+strlen(spec)+1);
	sprint(buf, "#%C%s", tc, spec);
	c->path = newpath(buf);
	free(buf);

There's already something wrong here -- if spec contains bad
UTF then buf will overflow all the time.  Really it should be
snprint or sprint(buf, "#%C") followed by strcat.  Let's ignore that,
since there's no bad UTF in your string.

If you poke around at how this gets called on a file like #p/1/blah,
you'll find that at this point in the code, spec == up->genbuf+2.
So if something else was overwriting up->genbuf at just the
wrong time (between the strlen and the execution of %s)
with the process name as it shows up in ps, then you'd see
the corruption in the stack trace.

Since up->genbuf is private to up (i.e., the current process)
I wondered at first if somehow the scheduler was broken and
was trying to run the same Proc on multiple CPUs, but
Erik assured me the scheduler had not been touched.

But aha!  If you poke around in the source code you will find
that there is one place where one proc uses another's genbuf:

% g genbuf | grep -v 'up->genbuf'
chan.c:1630: 	/* place final element in genbuf for e.g. exec */
devproc.c:717: 		j = procargs(p, p->genbuf, sizeof p->genbuf);
devproc.c:723: 		memmove(a, &p->genbuf[offset], n);
portdat.h:701: 	char	genbuf[128];	/* buffer used e.g. for last name
element from namec */
%

And look, it's the code for reading #p/1/args, which is
how ps figures out what to print:

/sys/src/9/port/devproc.c:/^procread
	case Qargs:
		qlock(&p->debug);
		j = procargs(p, p->genbuf, sizeof p->genbuf);
		qunlock(&p->debug);
		if(offset >= j)
			return 0;
		if(offset+n > j)
			n = j-offset;
		memmove(a, &p->genbuf[offset], n);
		return n;

So if you ran ps at exactly the right time (while one of the
procs on the machine was opening a #-ed file name and
was between strlen and sprint), you'd win the race.
You might improve your chances if the machine were
close to out of memory and the smalloc had to wait for
more memory.

Oops.  Changing those lines to use up->genbuf instead
of p->genbuf should eliminate the race.

Russ


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [9fans] infrequent panics
  2008-03-13  0:33 ` Russ Cox
@ 2008-03-13 10:44   ` roger peppe
  0 siblings, 0 replies; 5+ messages in thread
From: roger peppe @ 2008-03-13 10:44 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> There's already something wrong here -- if spec contains bad
> UTF then buf will overflow all the time.

that's a good gotcha! i'd have naively assumed that
sprint(buf, "%s", s) was equivalent to strcpy(buf, s).
i wonder how many other bits of code are potentially broken because
of this.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [9fans] infrequent panics
@ 2008-03-13  5:57 Brian L. Stuart
  0 siblings, 0 replies; 5+ messages in thread
From: Brian L. Stuart @ 2008-03-13  5:57 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> This is a cool bug; thanks for the report.
> I love a good concurrency bug.

It might just be the result of being on a
plane all day, but this just caught me in
a weirdly funny mood.  I can't tell whether
I should respect you, admire you, fear you,
or pity you for your unnatural affection
for racing critters.

Slightly more seriously, I do have to admit
that finding and fixing a race condition is
one of the most satisfying bug fixes one
can experience.

BLS


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [9fans] infrequent panics
@ 2008-03-13  2:14 erik quanstrom
  0 siblings, 0 replies; 5+ messages in thread
From: erik quanstrom @ 2008-03-13  2:14 UTC (permalink / raw)
  To: 9fans

> So if you ran ps at exactly the right time (while one of the
> procs on the machine was opening a #-ed file name and
> was between strlen and sprint), you'd win the race.
> You might improve your chances if the machine were
> close to out of memory and the smalloc had to wait for
> more memory.
>
> Oops.  Changing those lines to use up->genbuf instead
> of p->genbuf should eliminate the race.

many thanks.  i've been chasing this bug for some time.  often the
backtrace has been much more confusing.

in fact, ps is run fairly often on this machine because i am
sanguine about dns' stability.  if ever there were a canidate for
publish subscribe, it would be dns.

- erik


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-03-13 10:44 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-12 20:08 [9fans] infrequent panics erik quanstrom
2008-03-13  0:33 ` Russ Cox
2008-03-13 10:44   ` roger peppe
2008-03-13  2:14 erik quanstrom
2008-03-13  5:57 Brian L. Stuart

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).