[9fans] usb slowness

9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed

* [9fans] usb slowness
@ 2012-03-04  1:56 Tristan
  2012-03-05 15:46 ` erik quanstrom
  2012-03-05 15:56 ` Richard Miller
  0 siblings, 2 replies; 20+ messages in thread
From: Tristan @ 2012-03-04  1:56 UTC (permalink / raw)
  To: 9fans

for a while now i've known that usb/ether runs slow for me. i thought it
was something to do with the specific situation or hardware, and that,
say, usb/disk would be alright. it isn't. and not just on the olpc. now,
my machines are fairly slow as machines go... but 2.5MB/s reads are a far
cry from 22MB/s (with the olpc on linux).

it turns out that _strayintr is taking up nearly all the time.

cpu% kprof /n/9fat/9cpcpuf /dev/kpdata
total: 24620	in kernel text: 23890	outside kernel text:730
RTZERO f0100000 PGSIZE 4Kb
ms	  %	sym
21840	 91.3	_strayintr
290	  1.2	memmove
290	  1.2	iunlock
180	  0.7	sched
160	  0.6	memset
150	  0.6	sleep
...

my understanding is that _strayintr is glue between the interrupt vector
table and trap. and presumably includes the time spent in trap.

the short review of usbehci.c didn't provide any insight, so i guess i'm
asking, is this expected behavior? and what can i do?

enjoy,
tristan

-- 
All original matter is hereby placed immediately under the public domain.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [9fans] usb slowness
  2012-03-04  1:56 [9fans] usb slowness Tristan
@ 2012-03-05 15:46 ` erik quanstrom
  2012-03-05 16:05   ` Richard Miller
  2012-03-06 13:19   ` Tristan
  2012-03-05 15:56 ` Richard Miller
  1 sibling, 2 replies; 20+ messages in thread
From: erik quanstrom @ 2012-03-05 15:46 UTC (permalink / raw)
  To: 9p-st, 9fans

> cpu% kprof /n/9fat/9cpcpuf /dev/kpdata
> total: 24620	in kernel text: 23890	outside kernel text:730
> RTZERO f0100000 PGSIZE 4Kb
> ms	  %	sym
> 21840	 91.3	_strayintr
> 290	  1.2	memmove
> 290	  1.2	iunlock
> 180	  0.7	sched
> 160	  0.6	memset
> 150	  0.6	sleep

if you could temporarly run a 9atom kernel,
(http://ftp.quanstro.net/other/9pc*.gz), there
are a few things you can do to look at the
problem,

1.  /dev/irqalloc (http://ftp.quanstro.net/magic/man2html/3/arch)
has two additional fields detailing the irq allocation, and time
spent in each interrupt.  unless things are really hosed, this should
tell you what's at fault.  if things are really quite screwed up, you
may be inducing spurious interrupts.  in that case,

2. /dev/mpirq (http://ftp.quanstro.net/magic/man2html/3/apic)
should detail enough of bios reported interrupts to tell if something
is going sideways in interrupt allocation.

unfortunately, i would think that the most likely reason
for this is that usbether/devusb are not tamping down the
interrupt properly.

i've seen this with usb serial.

- erik

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [9fans] usb slowness
  2012-03-04  1:56 [9fans] usb slowness Tristan
  2012-03-05 15:46 ` erik quanstrom
@ 2012-03-05 15:56 ` Richard Miller
  2012-03-06  0:01   ` Tristan
  1 sibling, 1 reply; 20+ messages in thread
From: Richard Miller @ 2012-03-05 15:56 UTC (permalink / raw)
  To: 9fans

> it turns out that _strayintr is taking up nearly all the time.

Or maybe it isn't.  Taking up nearly all the cpu time isn't the
same as taking up nearly all the time.  You can use time(1) while
reading usb disk to see if cpu is really the bottleneck, or if
in fact your machine is spending most of its time waiting for
a usb interrupt.  If the machine is mostly idle, profiling clock
ticks should usually find it sitting at the HLT instruction which
happens to be couple of bytes before _strayintr.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [9fans] usb slowness
  2012-03-05 15:46 ` erik quanstrom
@ 2012-03-05 16:05   ` Richard Miller
  2012-03-05 16:14     ` erik quanstrom
  2012-03-06 13:19   ` Tristan
  1 sibling, 1 reply; 20+ messages in thread
From: Richard Miller @ 2012-03-05 16:05 UTC (permalink / raw)
  To: 9fans

> if you could temporarly run a 9atom kernel,

With the normal kernel, you can use acid or db to look at the intrtimes
array to see the spread of cpu times used in each interrupt vector entry.

See http://9fans.net/archive/2003/11/520




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [9fans] usb slowness
  2012-03-05 16:05   ` Richard Miller
@ 2012-03-05 16:14     ` erik quanstrom
  2012-03-05 16:20       ` Richard Miller
  0 siblings, 1 reply; 20+ messages in thread
From: erik quanstrom @ 2012-03-05 16:14 UTC (permalink / raw)
  To: 9fans, 9fans

On Mon Mar  5 11:06:43 EST 2012, 9fans@hamnavoe.com wrote:
> > if you could temporarly run a 9atom kernel,
> 
> With the normal kernel, you can use acid or db to look at the intrtimes
> array to see the spread of cpu times used in each interrupt vector entry.
> 
> See http://9fans.net/archive/2003/11/520

i didn't find the original histogram very useful.  it is not scaled
by the speed of the processor nor type of interrupt.  the slots
in the histogram are appropriate for 8259 interrupts on a 2000-era
processor (100µs increments).  almost all the interrupts i see are
< 20µs.  the 82598 on my relativly slow cpu server averages 2.5µs
per interrupt.

it's also a pain to print out without a file in addition to /dev/irqalloc.
i found the absolute count and absolute # of cpu cycles (to avoid
predivision sloppiness) to be a lot more useful for the general case.

a hisogram sure could be useful, but i think i'd want to taylor it to
a specific problem.  and i haven't had one yet.

- erik

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [9fans] usb slowness
  2012-03-05 16:14     ` erik quanstrom
@ 2012-03-05 16:20       ` Richard Miller
  0 siblings, 0 replies; 20+ messages in thread
From: Richard Miller @ 2012-03-05 16:20 UTC (permalink / raw)
  To: 9fans

> a hisogram sure could be useful, but i think i'd want to taylor it to
> a specific problem.  and i haven't had one yet.

The problem in hand is to check the hypothesis that usb disk slowness
is due to "nearly all the time" being spent in interrupt handling.
A quick glance at intrtimes should suffice for that.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [9fans] usb slowness
  2012-03-05 15:56 ` Richard Miller
@ 2012-03-06  0:01   ` Tristan
  2012-03-06 10:39     ` Richard Miller
  0 siblings, 1 reply; 20+ messages in thread
From: Tristan @ 2012-03-06  0:01 UTC (permalink / raw)
  To: 9fans

> You can use time(1) while reading usb disk to see if cpu is really the
> bottleneck, or if in fact your machine is spending most of its time
> waiting for a usb interrupt.  If the machine is mostly idle, profiling
> clock ticks should usually find it sitting at the HLT instruction which
> happens to be couple of bytes before _strayintr.

time suggests that it is waiting for interrupts: 0.03u 0.75s 25.42r
if i force polling, it's a little faster: 0.01u 0.45s 17.70r
(dd -if /dev/sdU0.0/data -of /dev/null -bs 65536 -count 1024)

holding a second, time wouldn't show interrupt-handling time, correct?

load remains fairly low during both, but system calls and context
switches are high.

time to read up on ehci...

thank you,
tristan

-- 
All original matter is hereby placed immediately under the public domain.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [9fans] usb slowness
  2012-03-06  0:01   ` Tristan
@ 2012-03-06 10:39     ` Richard Miller
  0 siblings, 0 replies; 20+ messages in thread
From: Richard Miller @ 2012-03-06 10:39 UTC (permalink / raw)
  To: 9fans

> if i force polling, it's a little faster: 0.01u 0.45s 17.70r

That's interesting - it shouldn't make a difference unless there's something
wrong with the controller or the driver.  What did you do to force polling?

> holding a second, time wouldn't show interrupt-handling time, correct?

Yes, I think you're right - system cputime will only be incremented for
a running process.  But you can watch the last column of /dev/sysstat
(or use 'stats -I') to see the percentage of time spent in interrupt handlers.

> load remains fairly low during both, but system calls and context
> switches are high.

How about number of interrupts?  Erik's theory was that you were getting
too many of these.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [9fans] usb slowness
  2012-03-05 15:46 ` erik quanstrom
  2012-03-05 16:05   ` Richard Miller
@ 2012-03-06 13:19   ` Tristan
  2012-03-06 15:36     ` Tristan
  1 sibling, 1 reply; 20+ messages in thread
From: Tristan @ 2012-03-06 13:19 UTC (permalink / raw)
  To: 9fans

> if you could temporarly run a 9atom kernel,

cpu% dd -if /dev/sdU0.0/data -of /dev/null -count 1024 -bs 65536

/dev/irqalloc yields
trap	irq	delta count	delta cycles	type	name
35	3	27648           510854000	i8259	usbehci

which doesn't seem extraordinary. and nothing else is huge either.

mpirq doesn't appear to exist in #P

the whole transfer takes about 8% longer on 9atom.

> > if i force polling, it's a little faster: 0.01u 0.45s 17.70r
> That's interesting - it shouldn't make a difference unless there's
> something wrong with the controller or the driver.  What did you do to
> force polling?

added 'ctrl->poll.must = 1' to init in usbehci.c

> Yes, I think you're right - system cputime will only be incremented for
> a running process.  But you can watch the last column of /dev/sysstat
> (or use 'stats -I') to see the percentage of time spent in interrupt
> handlers.

interrupt time is quite low/very low... (olpc / intel)

> How about number of interrupts?  Erik's theory was that you were
> getting too many of these.

interrupt count is very low/moderate... (olpc / intel)

my working theory is that i'm getting all the interrupts, but not soon
enough... the fascinating piece is that the olpc and the pc with intel
ehc take just about the same amount of time.

and a different usb flash device doesn't change it (which was expected as
usb/ether seems to suffer also).

enjoy,
tristan

-- 
All original matter is hereby placed immediately under the public domain.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [9fans] usb slowness
  2012-03-06 13:19   ` Tristan
@ 2012-03-06 15:36     ` Tristan
  2012-03-06 16:10       ` Richard Miller
  2012-03-06 18:55       ` erik quanstrom
  0 siblings, 2 replies; 20+ messages in thread
From: Tristan @ 2012-03-06 15:36 UTC (permalink / raw)
  To: 9fans

> my working theory is that i'm getting all the interrupts, but not soon
> enough... the fascinating piece is that the olpc and the pc with intel
> ehc take just about the same amount of time.

indeed, if i set the interrupt threshold control to 1 micro-frame (they
default to 8), i get 10-11 sec dd, say, half as slow. and ping time to my
switch is half. and i can get about twice the throughput from the file
server, which is still kinda slow.

(hget yields about 6Mbit over a 100Mbit link)

tristan

-- 
All original matter is hereby placed immediately under the public domain.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [9fans] usb slowness
  2012-03-06 15:36     ` Tristan
@ 2012-03-06 16:10       ` Richard Miller
  2012-03-06 18:55       ` erik quanstrom
  1 sibling, 0 replies; 20+ messages in thread
From: Richard Miller @ 2012-03-06 16:10 UTC (permalink / raw)
  To: 9fans

> indeed, if i set the interrupt threshold control to 1 micro-frame (they
> default to 8), i get 10-11 sec dd, say, half as slow.

That seems a good change to make.  Particularly for usb mass storage,
which for some reason emulates a scsi transaction with three usb
transfers and therefore three interrupts per r/w operation (send
command, recv/send data, recv status).  To get anything like maximum
throughput, we'd also have to queue all three transfers at once so
that the usb controller could schedule them in adjacent microframes
(or maybe even the same one?).  Sending them separately amd
synchronously from the user-space driver will probably take long
enough to miss one or more microframes.

¡Hola! Nemo, should the interrupt threshold be set to 1 microframe
(125 μs) to get better latency for all devices?  I can't see any good
reason for wanting to slow interrupts deliberately.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [9fans] usb slowness
  2012-03-06 15:36     ` Tristan
  2012-03-06 16:10       ` Richard Miller
@ 2012-03-06 18:55       ` erik quanstrom
  2012-03-06 22:22         ` Tristan
  1 sibling, 1 reply; 20+ messages in thread
From: erik quanstrom @ 2012-03-06 18:55 UTC (permalink / raw)
  To: 9p-st, 9fans

On Tue Mar  6 10:37:06 EST 2012, 9p-st@imu.li wrote:
> > my working theory is that i'm getting all the interrupts, but not soon
> > enough... the fascinating piece is that the olpc and the pc with intel
> > ehc take just about the same amount of time.
>
> indeed, if i set the interrupt threshold control to 1 micro-frame (they
> default to 8), i get 10-11 sec dd, say, half as slow. and ping time to my
> switch is half. and i can get about twice the throughput from the file
> server, which is still kinda slow.
>
> (hget yields about 6Mbit over a 100Mbit link)

it sounds like you have it all figured out.  but just in case there is a spurious
interrupt causing trouble, you can check with this rather embarassing acid
script.  sorry, i can't get acid to recoginze intrcountx as a vlong.

acid -l acid -k 1 /386/9pccpu
acid: i=0; while i<256*2 do {if(intrcountx[i]>0) then {print(i/2\D, "\t", intrcountx[i]);} i=i+2;}
7	143
14	44355
50	99090099
65	19938
73	37571
145	9814
153	1
161	42
169	7

- erik



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [9fans] usb slowness
  2012-03-06 18:55       ` erik quanstrom
@ 2012-03-06 22:22         ` Tristan
  2012-03-07  5:50           ` Tristan
  2012-03-07 15:27           ` [9fans] usb slowness Richard Miller
  0 siblings, 2 replies; 20+ messages in thread
From: Tristan @ 2012-03-06 22:22 UTC (permalink / raw)
  To: 9fans

i tried another device a bit ago. openrd (kw). with and without setting
the interrupt delay to 1 microframe. i get just about the same times (a
little higher). does anybody get less than 10 sec for

dd -if /dev/sdU0.0/data -of /dev/null -count 1024 -bs 65536 ?

or even less than 20 sec?

that is, is this a problem with all my available hardware, or all of our
software?

tristan

-- 
All original matter is hereby placed immediately under the public domain.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [9fans] usb slowness
  2012-03-06 22:22         ` Tristan
@ 2012-03-07  5:50           ` Tristan
  2012-03-07  5:53             ` Tristan
  2012-03-07 15:27           ` [9fans] usb slowness Richard Miller
  1 sibling, 1 reply; 20+ messages in thread
From: Tristan @ 2012-03-07  5:50 UTC (permalink / raw)
  To: 9fans

a little reading later.

the usb mass storage bulk only spec stipulates that only one request may
be active at a time.

wince.

so, bigger buffer? yup, 32k reads double the speed again. 64k reads
aren't quite right, something clips it's tail. now i'm up to ~11 MB/s.

except that it crashes the pc and olpc kernels. kw does fine though (for
now!). changing devmnt seem unlikely to be very safe without checking
on every fileserver in case it now happily assumes 8k (plus headers) is
enough for anything.

anybody messed with devmnt's Msgsize successfully?

enjoy,
tristan

-- 
All original matter is hereby placed immediately under the public domain.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [9fans] usb slowness
  2012-03-07  5:50           ` Tristan
@ 2012-03-07  5:53             ` Tristan
  2012-03-07  9:31               ` Charles Forsyth
  0 siblings, 1 reply; 20+ messages in thread
From: Tristan @ 2012-03-07  5:53 UTC (permalink / raw)
  To: 9fans

> anybody messed with devmnt's Msgsize successfully?

make that devmnt's MAXRPC, in usbfs.h it's Msgsize.

enjoy,
tristan

-- 
All original matter is hereby placed immediately under the public domain.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [9fans] usb slowness
  2012-03-07  5:53             ` Tristan
@ 2012-03-07  9:31               ` Charles Forsyth
  2012-03-10  3:13                 ` [9fans] donehead changed before ack Tristan
  0 siblings, 1 reply; 20+ messages in thread
From: Charles Forsyth @ 2012-03-07  9:31 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

fversion allows you to change msize without changing devmnt.c,
supposedly

On 7 March 2012 05:53, Tristan <9p-st@imu.li> wrote:
>> anybody messed with devmnt's Msgsize successfully?
>
> make that devmnt's MAXRPC, in usbfs.h it's Msgsize.
>
> enjoy,
> tristan
>
> --
> All original matter is hereby placed immediately under the public domain.
>



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [9fans] usb slowness
  2012-03-06 22:22         ` Tristan
  2012-03-07  5:50           ` Tristan
@ 2012-03-07 15:27           ` Richard Miller
  1 sibling, 0 replies; 20+ messages in thread
From: Richard Miller @ 2012-03-07 15:27 UTC (permalink / raw)
  To: 9fans

> does anybody get less than 10 sec for
>
> dd -if /dev/sdU0.0/data -of /dev/null -count 1024 -bs 65536 ?
>
> or even less than 20 sec?

10.4 sec on a sheevaplug (with 1 microframe interrupt threshold).

By the way, everyone has been very tactful not to point out my
silly mistake in saying:

> ... You can use time(1) while
> reading usb disk to see if cpu is really the bottleneck, or if
> in fact your machine is spending most of its time waiting for
> a usb interrupt.

Of course most of the user+sys cpu time involved in reading from usb
disk will be charged to the usb/disk process (normally a subprocess of
usbd), not to user command doing the reading.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [9fans] donehead changed before ack
  2012-03-07  9:31               ` Charles Forsyth
@ 2012-03-10  3:13                 ` Tristan
  2012-03-10 12:33                   ` erik quanstrom
  0 siblings, 1 reply; 20+ messages in thread
From: Tristan @ 2012-03-10  3:13 UTC (permalink / raw)
  To: 9fans

every so often, using a mouse/tablet with the olpc, i get this message:

ohci: bug: donehead changed before ack

but not such that i can reproduce it on demand.

the archives suggest that erik experienced this.

disabling the ohc interrupts during the interrupt routine (the ohci spec
says to) appears to help, but i havn't been running that for very long.

anyone know anything about this?

thanks,
tristan

-- 
All original matter is hereby placed immediately under the public domain.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [9fans] donehead changed before ack
  2012-03-10  3:13                 ` [9fans] donehead changed before ack Tristan
@ 2012-03-10 12:33                   ` erik quanstrom
  2012-03-10 15:14                     ` Tristan
  0 siblings, 1 reply; 20+ messages in thread
From: erik quanstrom @ 2012-03-10 12:33 UTC (permalink / raw)
  To: 9fans

> disabling the ohc interrupts during the interrupt routine (the ohci spec
> says to) appears to help, but i havn't been running that for very long.

so i guess the question is the interrupt ohci is using edge or level.
if level, interrupts are going to be masked by the level and that
wouldn't be your problem.  i suppose it could also be that shared
usb hardware has more than one interrupt enabled?

- erik

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [9fans] donehead changed before ack
  2012-03-10 12:33                   ` erik quanstrom
@ 2012-03-10 15:14                     ` Tristan
  0 siblings, 0 replies; 20+ messages in thread
From: Tristan @ 2012-03-10 15:14 UTC (permalink / raw)
  To: 9fans

> > disabling the ohc interrupts during the interrupt routine (the ohci spec
> > says to) appears to help, but i havn't been running that for very long.
indeed, a hour or so later, it started acting up.

the devices keep working if i jump back to the beginning of the handler
when donehead changes... nose wrinkle.

> so i guess the question is the interrupt ohci is using edge or level.
> if level, interrupts are going to be masked by the level and that
> wouldn't be your problem.
and it doesn't appear to be the problem.

my understanding is that donehead shouldn't change, because the hcd has
no way of knowing that the driver has read anything.

> i suppose it could also be that shared usb hardware has more than one
> interrupt enabled?
what? do you mean multiple devices on the interrupt or different types of
interrupts?

so far (and this is only an hour or so of use) donehead only changes from
nil to a real pointer, which leaves very little time in the routine.
cycles says [300,386].

which of course means that it isn't changing, it's just an interrupt with
without status&Wdh but with donehead set. checking for Wdh is running
perfectly well now.

thank you,
tristan

-- 
All original matter is hereby placed immediately under the public domain.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2012-03-10 15:14 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-04  1:56 [9fans] usb slowness Tristan
2012-03-05 15:46 ` erik quanstrom
2012-03-05 16:05   ` Richard Miller
2012-03-05 16:14     ` erik quanstrom
2012-03-05 16:20       ` Richard Miller
2012-03-06 13:19   ` Tristan
2012-03-06 15:36     ` Tristan
2012-03-06 16:10       ` Richard Miller
2012-03-06 18:55       ` erik quanstrom
2012-03-06 22:22         ` Tristan
2012-03-07  5:50           ` Tristan
2012-03-07  5:53             ` Tristan
2012-03-07  9:31               ` Charles Forsyth
2012-03-10  3:13                 ` [9fans] donehead changed before ack Tristan
2012-03-10 12:33                   ` erik quanstrom
2012-03-10 15:14                     ` Tristan
2012-03-07 15:27           ` [9fans] usb slowness Richard Miller
2012-03-05 15:56 ` Richard Miller
2012-03-06  0:01   ` Tristan
2012-03-06 10:39     ` Richard Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).