[9fans] arm httpd

9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed

* [9fans] arm httpd
@ 2014-11-09 15:34 Jeff Sickel
  2014-11-09 19:42 ` erik quanstrom
  2014-11-09 20:28 ` Skip Tavakkolian
  0 siblings, 2 replies; 12+ messages in thread
From: Jeff Sickel @ 2014-11-09 15:34 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Has anyone else seen the arm httpd lock up on them?  I can start it, but then after a few proper responses it just sits:

bootes           95    0:00   0:00     1436K Semacqui httpd

dreamplug% acid 95
/proc/95/text:arm plan 9 executable
/sys/lib/acid/port
/sys/lib/acid/arm
acid: stk()
semacquire()+0xc /sys/src/libc/9syscall/semacquire.s:6
lock(l=0x31204)+0x20 /sys/src/libc/port/lock.c:10
plock()+0x8 /sys/src/libc/port/malloc.c:80
poolalloc(p=0x360a4,n=0x2c)+0xc /sys/src/libc/port/pool.c:1223
mallocz(size=0x24,clr=0x1)+0x18 /sys/src/libc/port/malloc.c:221
getnetconninfo(fd=0xffffffff,dir=0x5ffffeb4)+0x78 /sys/src/libc/9sys/getnetconninfo.c:59
dolisten(address=0xd14fc)+0x134 /sys/src/cmd/ip/httpd/httpd.c:291
main(argc=0x0,argv=0x5fffff74)+0x1c0 /sys/src/cmd/ip/httpd/httpd.c:138
_main+0x28 /sys/src/libc/arm/main9.s:19
acid: 




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [9fans] arm httpd
  2014-11-09 15:34 [9fans] arm httpd Jeff Sickel
@ 2014-11-09 19:42 ` erik quanstrom
  2014-11-09 19:51   ` Jeff Sickel
  2014-11-09 20:28 ` Skip Tavakkolian
  1 sibling, 1 reply; 12+ messages in thread
From: erik quanstrom @ 2014-11-09 19:42 UTC (permalink / raw)
  To: 9fans

On Sun Nov  9 10:35:34 EST 2014, jas@corpus-callosum.com wrote:
> Has anyone else seen the arm httpd lock up on them?  I can start it, but then after a few proper responses it just sits:
>
> bootes           95    0:00   0:00     1436K Semacqui httpd

(aside: i notice that throttle doesn't work like you'd expect, since RFMEM is not set,
the stats won't be propogated to the parent.  thus, this is really just a proc-local
calculation, and since each forked proc only handles 1 connection, the hash is
unnecessary.  a local variable would do just fine.)

the aside leads me to believe that there is something wrong with the segment
copy on fork.  since the semaphore in question is in the data segment,
i'm going to guess that you're running the labs kernel, and you're hitting the
page caching issue we've seen before.  does this happen on an atom kernel?

- erik

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [9fans] arm httpd
  2014-11-09 19:42 ` erik quanstrom
@ 2014-11-09 19:51   ` Jeff Sickel
  2014-11-09 20:21     ` erik quanstrom
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff Sickel @ 2014-11-09 19:51 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs


> On Nov 9, 2014, at 1:42 PM, erik quanstrom <quanstro@quanstro.net> wrote:
> 
> the aside leads me to believe that there is something wrong with the segment
> copy on fork.  since the semaphore in question is in the data segment,
> i'm going to guess that you're running the labs kernel, and you're hitting the
> page caching issue we've seen before.  does this happen on an atom kernel?

Only happens in the labs ARM kernel.  The labs mips and 386 kernels work fine
in this situation.

-jas




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [9fans] arm httpd
  2014-11-09 19:51   ` Jeff Sickel
@ 2014-11-09 20:21     ` erik quanstrom
  2014-11-10  1:47       ` Jeff Sickel
  0 siblings, 1 reply; 12+ messages in thread
From: erik quanstrom @ 2014-11-09 20:21 UTC (permalink / raw)
  To: 9fans

On Sun Nov  9 14:51:37 EST 2014, jas@corpus-callosum.com wrote:
>
> > On Nov 9, 2014, at 1:42 PM, erik quanstrom <quanstro@quanstro.net> wrote:
> >
> > the aside leads me to believe that there is something wrong with the segment
> > copy on fork.  since the semaphore in question is in the data segment,
> > i'm going to guess that you're running the labs kernel, and you're hitting the
> > page caching issue we've seen before.  does this happen on an atom kernel?
>
> Only happens in the labs ARM kernel.  The labs mips and 386 kernels work fine
> in this situation.

my thinking is that this isn't a defect in the arch-specific bits but rather a timing
bug.  in that case, only manifesting on certain hardware is not diagnostic.

do you have any reason to believe this is not a timing bug?  it does fit the pattern
rarely seen on x86 systems.  (actually (cf. the console appliance) there were ways
to really make x86 systems suffer by forking fast enough on slow hardware.)

- erik

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [9fans] arm httpd
  2014-11-09 15:34 [9fans] arm httpd Jeff Sickel
  2014-11-09 19:42 ` erik quanstrom
@ 2014-11-09 20:28 ` Skip Tavakkolian
  1 sibling, 0 replies; 12+ messages in thread
From: Skip Tavakkolian @ 2014-11-09 20:28 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 2145 bytes --]

i can't get it to fail:

$ boom -n 1000 -c 100 http://rpi.9netics.com
1000 / 1000 Booooooooooooooooooooooooooooooooooooooooooooooooooooooo!
100.00 %

Summary:
  Total: 26.6064 secs.
  Slowest: 4.0497 secs.
  Fastest: 1.1270 secs.
  Average: 2.6087 secs.
  Requests/sec: 37.5474
  Total Data Received: 2679318 bytes.
  Response Size per Request: 2682 bytes.

Status code distribution:
  [200] 999 responses

Response time histogram:
  1.127 [1] |
  1.419 [40] |∎∎∎∎
  1.712 [22] |∎∎
  2.004 [127] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  2.296 [107] |∎∎∎∎∎∎∎∎∎∎∎∎
  2.588 [21] |∎∎
  2.881 [284] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  3.173 [338] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  3.465 [13] |∎
  3.757 [0] |
  4.050 [46] |∎∎∎∎∎

Latency distribution:
  10% in 1.7711 secs.
  25% in 2.1948 secs.
  50% in 2.7615 secs.
  75% in 2.9254 secs.
  90% in 3.0287 secs.
  95% in 3.1996 secs.
  99% in 4.0224 secs.


On Sun, Nov 9, 2014 at 7:34 AM, Jeff Sickel <jas@corpus-callosum.com> wrote:

> Has anyone else seen the arm httpd lock up on them?  I can start it, but
> then after a few proper responses it just sits:
>
> bootes           95    0:00   0:00     1436K Semacqui httpd
>
> dreamplug% acid 95
> /proc/95/text:arm plan 9 executable
> /sys/lib/acid/port
> /sys/lib/acid/arm
> acid: stk()
> semacquire()+0xc /sys/src/libc/9syscall/semacquire.s:6
> lock(l=0x31204)+0x20 /sys/src/libc/port/lock.c:10
> plock()+0x8 /sys/src/libc/port/malloc.c:80
> poolalloc(p=0x360a4,n=0x2c)+0xc /sys/src/libc/port/pool.c:1223
> mallocz(size=0x24,clr=0x1)+0x18 /sys/src/libc/port/malloc.c:221
> getnetconninfo(fd=0xffffffff,dir=0x5ffffeb4)+0x78
> /sys/src/libc/9sys/getnetconninfo.c:59
> dolisten(address=0xd14fc)+0x134 /sys/src/cmd/ip/httpd/httpd.c:291
> main(argc=0x0,argv=0x5fffff74)+0x1c0 /sys/src/cmd/ip/httpd/httpd.c:138
> _main+0x28 /sys/src/libc/arm/main9.s:19
> acid:
>
>
>

[-- Attachment #2: Type: text/html, Size: 3809 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [9fans] arm httpd
  2014-11-09 20:21     ` erik quanstrom
@ 2014-11-10  1:47       ` Jeff Sickel
  0 siblings, 0 replies; 12+ messages in thread
From: Jeff Sickel @ 2014-11-10  1:47 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs


> On Nov 9, 2014, at 2:21 PM, erik quanstrom <quanstro@quanstro.net> wrote:
> 
> On Sun Nov  9 14:51:37 EST 2014, jas@corpus-callosum.com wrote:
>> 
>>> On Nov 9, 2014, at 1:42 PM, erik quanstrom <quanstro@quanstro.net> wrote:
>>> 
>>> the aside leads me to believe that there is something wrong with the segment
>>> copy on fork.  since the semaphore in question is in the data segment,
>>> i'm going to guess that you're running the labs kernel, and you're hitting the
>>> page caching issue we've seen before.  does this happen on an atom kernel?
>> 
>> Only happens in the labs ARM kernel.  The labs mips and 386 kernels work fine
>> in this situation.
> 
> my thinking is that this isn't a defect in the arch-specific bits but rather a timing
> bug.  in that case, only manifesting on certain hardware is not diagnostic.
> 
> do you have any reason to believe this is not a timing bug?  it does fit the pattern
> rarely seen on x86 systems.  (actually (cf. the console appliance) there were ways
> to really make x86 systems suffer by forking fast enough on slow hardware.)

Well, sys/src/libc/9sys/time.c did change about the time I started seeing this bug
on my old dreamplug.

-jas




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [9fans] arm & httpd
  2013-11-18 10:12   ` Richard Miller
@ 2013-11-18 14:15     ` erik quanstrom
  0 siblings, 0 replies; 12+ messages in thread
From: erik quanstrom @ 2013-11-18 14:15 UTC (permalink / raw)
  To: 9fans

On Mon Nov 18 06:46:22 EST 2013, 9fans@hamnavoe.com wrote:
> > this is because wakeup() takes about 100-1000x as long as sleep(0)
>
> [Citation needed]

since rendezvous has to do a bunch of locks and a context switch,
whereas sleep(0) doesn't really have to do anything other than
check anyhigher(), i thought this wouldn't need proof.

here are some numbers. it turns out my lazy guess was too high,
but the thing to remember is the time wasting and sleep(0) tricks
work because rendezvous is quite slow.

(the test results for semaphore locks vs tas locks were posted to
the list, iirc, and tas locks generally come out on top.  semaphores
get worse as the number of processors increase.)

the rendezvous numbers should be /4 so we have
	time per million
	rendezvous		sleep0
kw	17.71			2.78		6.3
32c intel	1.01			0.14		7.2
4c amd	.785			0.15		5.2

raw data

# arm kirkwood; 1 processor
kw; time 5.rpingpong; time 5.sleep0
4.86u 65.97s 70.84r 	 5.rpingpong
0.61u 2.17s 2.78r 	 5.sleep0

sooner; aux/cpuid -i
       Intel(R) Xeon(R) CPU E5-2470 0 @ 2.30GHz
sooner; wc -l /dev/sysstat
     32 /dev/sysstat
sooner; time 6.rpingpong; time 6.sleep0
0.56u 1.27s 4.04r 	 6.rpingpong
0.05u 0.08s 0.14r 	 6.sleep0
; aux/cpuid -i
AMD Phenom(tm) II X4 965 Processor
; wc -l /dev/sysstat
      4 /dev/sysstat
; time 6.rpingpong;time 6.sleep0
0.25u 1.29s 3.14r 	 6.rpingpong
0.06u 0.08s 0.15r 	 6.sleep0

- erik

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [9fans] arm & httpd
  2013-11-17 23:40 ` erik quanstrom
@ 2013-11-18 10:12   ` Richard Miller
  2013-11-18 14:15     ` erik quanstrom
  0 siblings, 1 reply; 12+ messages in thread
From: Richard Miller @ 2013-11-18 10:12 UTC (permalink / raw)
  To: 9fans

> this is because wakeup() takes about 100-1000x as long as sleep(0)

[Citation needed]




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [9fans] arm & httpd
  2013-11-17 21:54 [9fans] arm & httpd Jeff Sickel
  2013-11-17 22:04 ` Steve Simon
@ 2013-11-17 23:40 ` erik quanstrom
  2013-11-18 10:12   ` Richard Miller
  1 sibling, 1 reply; 12+ messages in thread
From: erik quanstrom @ 2013-11-17 23:40 UTC (permalink / raw)
  To: 9fans

On Sun Nov 17 17:32:22 EST 2013, jas@corpus-callosum.com wrote:
> Has anyone else experienced new builds of the sources arm tree getting hung up with semacquire?
> 
>  99:     httpd pc     cac0 dbgpc     cac0  Semacquire (Wakeme) ut 1 st 2 bss 168000 qpc 608157d8 nl 0 nd 0 lpc 608758c4 pri 10
> 
> 
> 
> acid: lstk()
> semacquire()+0xc /sys/src/libc/9syscall/semacquire.s:6
> lock(l=0x31208)+0x20 /sys/src/libc/port/lock.c:10
> plock()+0x8 /sys/src/libc/port/malloc.c:80
> 	pv=0x31208
> poolalloc(p=0x35a24,n=0x2c)+0xc /sys/src/libc/port/pool.c:1223
> 	v=0xd970
> mallocz(size=0x24,clr=0x1)+0x18 /sys/src/libc/port/malloc.c:221
> 	v=0x5ffffd39
> getnetconninfo(fd=0xffffffff,dir=0x5ffffeec)+0x78 /sys/src/libc/9sys/getnetconninfo.c:59
> 	path=0x0
> 	nci=0xb
> 	spec=0x0
> 	d=0x0
> 	netname=0x28
> dolisten(address=0xd16dc)+0x134 /sys/src/cmd/ip/httpd/httpd.c:291
> 	spotchk=0x1
> 	dir=0x74656e2f
> 	ctl=0xa
> 	ndir=0x74656e2f
> 	nctl=0xb
> 	swamped=0x0
> 	nci=0x161c40
> 	data=0x313aa
> 	conn=0x73
> 	scheme=0xd16e6
> 	c=0x38898
> 	t=0x5ffffeb4
> 	ok=0xa284
> main(argc=0x0,argv=0x5fffff9c)+0x1c0 /sys/src/cmd/ip/httpd/httpd.c:138
> 	address=0x38846
> 	_argc=0x0
> 	_args=0x0
> _main+0x28 /sys/src/libc/arm/main9.s:19
> 
> 
> I see this on the second http request, the first completes successfully, and don’t yet know if it’s a dns configuration error or something else.

this is clearly a case of deadlock.

on each allocation the pool library locks the pool lock.  for
the duration, and releases it before returning.  for some reason,
the pool lock already appears locked, you go to the contended
case, which in the standard distribution calls semacquire, and
wait forever.

so there are just a few possibilities
1.  either the code was always broken, and the old locking scheme
got lucky every time.  (i don't think this is likely.)
2.  there's a bug in implementation of lock.
3.  there is a bug in locking that's been introduced that's architecture-
specific.

i haven't been using the semaphore-based locks because they are slow.
this is because wakeup() takes about 100-1000x as long as sleep(0)
which is just sched(), and this is hard to make up without doing some
hard thinking that hasn't been done yet.  even better schedulers don't
fully fix this.

but still, were i a betting man, my money would be on door #3.

- erik

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [9fans] arm & httpd
  2013-11-17 22:04 ` Steve Simon
@ 2013-11-17 22:33   ` Jeff Sickel
  0 siblings, 0 replies; 12+ messages in thread
From: Jeff Sickel @ 2013-11-17 22:33 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Not that this should mattter, this host is listening to 4 addresses on the IP stack.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [9fans] arm & httpd
  2013-11-17 21:54 [9fans] arm & httpd Jeff Sickel
@ 2013-11-17 22:04 ` Steve Simon
  2013-11-17 22:33   ` Jeff Sickel
  2013-11-17 23:40 ` erik quanstrom
  1 sibling, 1 reply; 12+ messages in thread
From: Steve Simon @ 2013-11-17 22:04 UTC (permalink / raw)
  To: 9fans

Don't know if this helps at all but I did an arm build a few weeks ago
and its all working fine.

-Steve



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [9fans] arm & httpd
@ 2013-11-17 21:54 Jeff Sickel
  2013-11-17 22:04 ` Steve Simon
  2013-11-17 23:40 ` erik quanstrom
  0 siblings, 2 replies; 12+ messages in thread
From: Jeff Sickel @ 2013-11-17 21:54 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Has anyone else experienced new builds of the sources arm tree getting hung up with semacquire?

 99:     httpd pc     cac0 dbgpc     cac0  Semacquire (Wakeme) ut 1 st 2 bss 168000 qpc 608157d8 nl 0 nd 0 lpc 608758c4 pri 10



acid: lstk()
semacquire()+0xc /sys/src/libc/9syscall/semacquire.s:6
lock(l=0x31208)+0x20 /sys/src/libc/port/lock.c:10
plock()+0x8 /sys/src/libc/port/malloc.c:80
	pv=0x31208
poolalloc(p=0x35a24,n=0x2c)+0xc /sys/src/libc/port/pool.c:1223
	v=0xd970
mallocz(size=0x24,clr=0x1)+0x18 /sys/src/libc/port/malloc.c:221
	v=0x5ffffd39
getnetconninfo(fd=0xffffffff,dir=0x5ffffeec)+0x78 /sys/src/libc/9sys/getnetconninfo.c:59
	path=0x0
	nci=0xb
	spec=0x0
	d=0x0
	netname=0x28
dolisten(address=0xd16dc)+0x134 /sys/src/cmd/ip/httpd/httpd.c:291
	spotchk=0x1
	dir=0x74656e2f
	ctl=0xa
	ndir=0x74656e2f
	nctl=0xb
	swamped=0x0
	nci=0x161c40
	data=0x313aa
	conn=0x73
	scheme=0xd16e6
	c=0x38898
	t=0x5ffffeb4
	ok=0xa284
main(argc=0x0,argv=0x5fffff9c)+0x1c0 /sys/src/cmd/ip/httpd/httpd.c:138
	address=0x38846
	_argc=0x0
	_args=0x0
_main+0x28 /sys/src/libc/arm/main9.s:19


I see this on the second http request, the first completes successfully, and don’t yet know if it’s a dns configuration error or something else.

-jas




^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-11-10  1:47 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-09 15:34 [9fans] arm httpd Jeff Sickel
2014-11-09 19:42 ` erik quanstrom
2014-11-09 19:51   ` Jeff Sickel
2014-11-09 20:21     ` erik quanstrom
2014-11-10  1:47       ` Jeff Sickel
2014-11-09 20:28 ` Skip Tavakkolian
  -- strict thread matches above, loose matches on Subject: below --
2013-11-17 21:54 [9fans] arm & httpd Jeff Sickel
2013-11-17 22:04 ` Steve Simon
2013-11-17 22:33   ` Jeff Sickel
2013-11-17 23:40 ` erik quanstrom
2013-11-18 10:12   ` Richard Miller
2013-11-18 14:15     ` erik quanstrom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).