From mboxrd@z Thu Jan 1 00:00:00 1970 From: erik quanstrom Date: Sun, 17 Nov 2013 18:40:57 -0500 To: 9fans@9fans.net Message-ID: In-Reply-To: <71F713A4-13CE-424C-B148-7F0238DB9E57@corpus-callosum.com> References: <71F713A4-13CE-424C-B148-7F0238DB9E57@corpus-callosum.com> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [9fans] arm & httpd Topicbox-Message-UUID: 8ab1d670-ead8-11e9-9d60-3106f5b1d025 On Sun Nov 17 17:32:22 EST 2013, jas@corpus-callosum.com wrote: > Has anyone else experienced new builds of the sources arm tree getting = hung up with semacquire? >=20 > 99: httpd pc cac0 dbgpc cac0 Semacquire (Wakeme) ut 1 st = 2 bss 168000 qpc 608157d8 nl 0 nd 0 lpc 608758c4 pri 10 >=20 >=20 >=20 > acid: lstk() > semacquire()+0xc /sys/src/libc/9syscall/semacquire.s:6 > lock(l=3D0x31208)+0x20 /sys/src/libc/port/lock.c:10 > plock()+0x8 /sys/src/libc/port/malloc.c:80 > pv=3D0x31208 > poolalloc(p=3D0x35a24,n=3D0x2c)+0xc /sys/src/libc/port/pool.c:1223 > v=3D0xd970 > mallocz(size=3D0x24,clr=3D0x1)+0x18 /sys/src/libc/port/malloc.c:221 > v=3D0x5ffffd39 > getnetconninfo(fd=3D0xffffffff,dir=3D0x5ffffeec)+0x78 /sys/src/libc/9sy= s/getnetconninfo.c:59 > path=3D0x0 > nci=3D0xb > spec=3D0x0 > d=3D0x0 > netname=3D0x28 > dolisten(address=3D0xd16dc)+0x134 /sys/src/cmd/ip/httpd/httpd.c:291 > spotchk=3D0x1 > dir=3D0x74656e2f > ctl=3D0xa > ndir=3D0x74656e2f > nctl=3D0xb > swamped=3D0x0 > nci=3D0x161c40 > data=3D0x313aa > conn=3D0x73 > scheme=3D0xd16e6 > c=3D0x38898 > t=3D0x5ffffeb4 > ok=3D0xa284 > main(argc=3D0x0,argv=3D0x5fffff9c)+0x1c0 /sys/src/cmd/ip/httpd/httpd.c:= 138 > address=3D0x38846 > _argc=3D0x0 > _args=3D0x0 > _main+0x28 /sys/src/libc/arm/main9.s:19 >=20 >=20 > I see this on the second http request, the first completes successfully= , and don=E2=80=99t yet know if it=E2=80=99s a dns configuration error or= something else. this is clearly a case of deadlock. on each allocation the pool library locks the pool lock. for the duration, and releases it before returning. for some reason, the pool lock already appears locked, you go to the contended case, which in the standard distribution calls semacquire, and wait forever. so there are just a few possibilities 1. either the code was always broken, and the old locking scheme got lucky every time. (i don't think this is likely.) 2. there's a bug in implementation of lock. 3. there is a bug in locking that's been introduced that's architecture- specific. i haven't been using the semaphore-based locks because they are slow. this is because wakeup() takes about 100-1000x as long as sleep(0) which is just sched(), and this is hard to make up without doing some hard thinking that hasn't been done yet. even better schedulers don't fully fix this. but still, were i a betting man, my money would be on door #3. - erik