9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] That deadlock, again
@ 2010-11-16  4:21 lucio
  2010-11-16  4:40 ` erik quanstrom
  0 siblings, 1 reply; 39+ messages in thread
From: lucio @ 2010-11-16  4:21 UTC (permalink / raw)
  To: 9fans, lucio

Regarding the "deadlock" report that I occasionally see on my CPU
server console, I won't bore anyone with PC addresses or anything like
that, but I will recommend something I believe to be a possible
trigger: the failure always seems to occur within "exportfs", which in
this case is used exclusively to run stats(1) remotely from my
workstation.  So the recommendation is that somebody like Erik, who is
infinitely more clued up than I am in the kernel arcana should run one
or more stats sessions into a cpu server (I happen to be running
fossil, so maybe Erik won't see this) and see if he can also trigger this behaviour.  I'm hoping that it is not platform specific.

Right now, I'm short of skills as well as a serial console :-(

++L

PS: here is a kmesg from the server:

	Plan 9
	E820: 00000000 0009fc00 memory
	E820: 0009fc00 000a0000 reserved
	E820: 000e0000 00100000 reserved
	E820: 00100000 47740000 memory
	E820: 47740000 47750000 acpi reclaim
	E820: 47750000 47800000 acpi nvs
	126 holes free
	00018000 0009f000 552960
	00468000 0642b000 100413440
	100966400 bytes free
	cpu0: 2599MHz GenuineIntel PentiumIV/Xeon (cpuid: AX 0x0F29 DX 0xBFEBFBFF)
	ELCR: 0E28
	#l0: i82557: 100Mbps port 0xDC00 irq 11: 00111104e0b6
	1143M memory: 100M kernel data, 1043M user, 1668M swap
	root is from (tcp, local)[local!#S/sdC0/fossil]: time...
	venti...2010/1115 17:36:16 venti: conf.../boot/venti: mem 31,972,556 bcmem 63,945,112 icmem 95,917,670...httpd tcp!127.1!8000...init...icache 95,917,670 bytes = 1,498,714 entries; 16 scache
	sync...announce tcp!127.1!17034...serving.
	fossil(#S/sdC0/fossil)...fsys: dialing venti at tcp!127.1!17034
	version...time...

	init: starting /bin/rc

which also supplies:

lock 0xf09d8980 loop key 0xdeaddead pc 0xf01e736a held by pc 0xf01e736a proc 2052
 17: #I0tcpack pc f01ff12a dbgpc        0   Running (Running) ut 530 st 0 bss 0 qpc f014583c nl 0 nd 0 lpc f01e2cc8 pri 13
2052:  exportfs pc f01efc9f dbgpc     94ad    Pwrite (Ready) ut 43 st 209 bss 40000 qpc f0145b62 nl 1 nd 0 lpc f01e2c60 pri 10

and, a bit later:

lock 0xf0057d74 loop key 0xdeaddead pc 0xf01e736a held by pc 0xf01e736a proc 2052
 61:etherread4 pc f01ef8a0 dbgpc        0   Running (Running) ut 2923 st 0 bss 0 qpc f0148c8a nl 0 nd 0 lpc f0100f6e pri 13
2052:  exportfs pc f01e7377 dbgpc     94ad    Pwrite (Ready) ut 55 st 270 bss 40000 qpc f0145b62 nl 1 nd 0 lpc f01e2c60 pri 10

to my surprise.




^ permalink raw reply	[flat|nested] 39+ messages in thread
* Re: [9fans] That deadlock, again
@ 2010-11-18  5:50 Lucio De Re
  2010-11-18  5:53 ` erik quanstrom
  2010-11-18  5:57 ` Lucio De Re
  0 siblings, 2 replies; 39+ messages in thread
From: Lucio De Re @ 2010-11-18  5:50 UTC (permalink / raw)
  To: 9fans

> one could move:
>
> 	up->qpc = getcallerpc(&q);
>
>  from qlock() before the lock(&q->use); so we can see from where that
> qlock gets called that hangs the exportfs call, or add another magic
> debug pointer (qpctry) to the proc stucture and print it in dumpaproc().


Cinap, I tried your debugging code and got an odd panic at boot time.
Consistently:

	panic: kernel fault: no user process pc=0xf01e739e addr=0x000009e8

Having a look with acid, this seems to be caused by an attempt at
setting the debug PC (your "up->qpctry") at a time when "up" has no
value yet.

Strangely, later in the qlock() code "up" is checked and a panic
issued if zero.  I'm missing something here: it is possible to execute
this code

/sys/src/9/port/qlock.c:29,37 (more or less)
	lock(&q->use);
	rwstats.qlock++;
	if(!q->locked) {
		q->locked = 1;
		unlock(&q->use);
		return;
	}

which is immediately followed by

	if(up == 0)
		panic("qlock");

If "up" is nil, but it looks like a bit of a one-way operation.

Anyway, I have moved the assignment to "qpctry" to after "up" is
tested.  Let's see what happens.  I'll have to get back to you once
the system is back up.

++L




^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2010-11-18 19:27 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-16  4:21 [9fans] That deadlock, again lucio
2010-11-16  4:40 ` erik quanstrom
2010-11-16  5:03   ` lucio
2010-11-16  5:11     ` cinap_lenrek
2010-11-16  5:18       ` lucio
2010-11-16  5:28         ` cinap_lenrek
2010-11-16  6:47           ` Lucio De Re
2010-11-16 13:53     ` erik quanstrom
2010-11-16 18:03       ` lucio
2010-11-17  4:08         ` Lucio De Re
2010-11-17  4:18           ` erik quanstrom
2010-11-17  4:37             ` Lucio De Re
2010-11-17  4:43               ` erik quanstrom
2010-11-17  5:22             ` cinap_lenrek
2010-11-17  6:45               ` Lucio De Re
2010-11-17  7:03                 ` Lucio De Re
2010-11-17  7:09                   ` erik quanstrom
2010-11-17  5:33             ` cinap_lenrek
2010-11-17  6:48               ` Lucio De Re
2010-11-17  7:03                 ` erik quanstrom
2010-11-17 14:40           ` Russ Cox
2010-11-18  5:50 Lucio De Re
2010-11-18  5:53 ` erik quanstrom
2010-11-18  8:11   ` Lucio De Re
2010-11-18  8:35     ` cinap_lenrek
2010-11-18  9:20     ` cinap_lenrek
2010-11-18 10:48       ` Lucio De Re
2010-11-18 15:10         ` erik quanstrom
2010-11-18 16:46           ` erik quanstrom
2010-11-18 18:01             ` Lucio De Re
2010-11-18 18:29               ` C H Forsyth
2010-11-18 18:23                 ` Lucio De Re
2010-11-18 18:33                 ` Lucio De Re
2010-11-18 18:43               ` erik quanstrom
2010-11-18 18:54                 ` erik quanstrom
2010-11-18 19:01                 ` Lucio De Re
2010-11-18 19:27                   ` Lucio De Re
2010-11-18 18:03           ` Lucio De Re
2010-11-18  5:57 ` Lucio De Re

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).