* [9fans] etherigbe.c using _xinc?
@ 2009-12-08 16:25 Venkatesh Srinivas
2009-12-08 16:36 ` erik quanstrom
0 siblings, 1 reply; 9+ messages in thread
From: Venkatesh Srinivas @ 2009-12-08 16:25 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
Hi,
I noticed etherigbe.c (in igberballoc) was recently changed to
increment the refcount on the block it allocates. Any reason it uses
_xinc rather than incref?
-- vs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] etherigbe.c using _xinc?
2009-12-08 16:25 [9fans] etherigbe.c using _xinc? Venkatesh Srinivas
@ 2009-12-08 16:36 ` erik quanstrom
2009-12-08 19:35 ` Russ Cox
0 siblings, 1 reply; 9+ messages in thread
From: erik quanstrom @ 2009-12-08 16:36 UTC (permalink / raw)
To: 9fans
On Tue Dec 8 11:28:30 EST 2009, me@acm.jhu.edu wrote:
> Hi,
>
> I noticed etherigbe.c (in igberballoc) was recently changed to
> increment the refcount on the block it allocates. Any reason it uses
> _xinc rather than incref?
>
> -- vs
because it's not a Ref. unfortunately, if it were
a Ref, it would be much faster. _xinc is deadly
slow even if there is no contention on x86.
i wish the ref counting had at least been isolated to the
case that needs them. blocks in queues typically
have one owner. so the owner of the block assumes
it can modify the block whenever with no locking.
ref counting means this assumption is false.
i'm not sure how your supposed to wlock a block.
- erik
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] etherigbe.c using _xinc?
2009-12-08 16:36 ` erik quanstrom
@ 2009-12-08 19:35 ` Russ Cox
2009-12-08 19:52 ` John Floren
2009-12-08 20:00 ` erik quanstrom
0 siblings, 2 replies; 9+ messages in thread
From: Russ Cox @ 2009-12-08 19:35 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
> because it's not a Ref. unfortunately, if it were
> a Ref, it would be much faster. _xinc is deadly
> slow even if there is no contention on x86.
do you have numbers to back up this claim?
you are claiming that the locked XCHGL
in tas (pc/l.s) called from lock (port/taslock.c)
called from incref (port/chan.c) is "much faster"
than the locked INCL in _xinc (pc/l.s).
it seems to me that a locked memory bus
is a locked memory bus.
also, when up != nil (a common condition),
lock does a locked INCL and DECL
(_xinc and _xdec) in addition to the tas,
which seems like strictly more work than
a single _xinc.
russ
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] etherigbe.c using _xinc?
2009-12-08 19:35 ` Russ Cox
@ 2009-12-08 19:52 ` John Floren
2009-12-08 20:00 ` erik quanstrom
1 sibling, 0 replies; 9+ messages in thread
From: John Floren @ 2009-12-08 19:52 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
On Tue, Dec 8, 2009 at 2:35 PM, Russ Cox <rsc@swtch.com> wrote:
>> because it's not a Ref. unfortunately, if it were
>> a Ref, it would be much faster. _xinc is deadly
>> slow even if there is no contention on x86.
>
> do you have numbers to back up this claim?
>
I don't have the code or the numbers in front of me, but I recall
seeing quite a bit of speed improvement when I experimentally replaced
incref/decref with direct calls to _xinc/_xdec. I don't remember what
the test was, but I do remember that I got something like 35%
improvement on it. I ran that kernel on my terminal for the rest of
the summer without trouble; while I didn't notice a blazing speed
increase, it didn't slow me down either.
John
--
"Object-oriented design is the roman numerals of computing" -- Rob Pike
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] etherigbe.c using _xinc?
2009-12-08 19:35 ` Russ Cox
2009-12-08 19:52 ` John Floren
@ 2009-12-08 20:00 ` erik quanstrom
2009-12-08 23:52 ` Russ Cox
1 sibling, 1 reply; 9+ messages in thread
From: erik quanstrom @ 2009-12-08 20:00 UTC (permalink / raw)
To: 9fans
[-- Attachment #1: Type: text/plain, Size: 883 bytes --]
> do you have numbers to back up this claim?
>
> you are claiming that the locked XCHGL
> in tas (pc/l.s) called from lock (port/taslock.c)
> called from incref (port/chan.c) is "much faster"
> than the locked INCL in _xinc (pc/l.s).
> it seems to me that a locked memory bus
> is a locked memory bus.
yes, i do. xinc on most modern intel is a real
loss. and a moderate loss on amd. my atom 330
is an exception.
intel core i7 2.4ghz
loop 0 nsec/call
loopxinc 20 nsec/call
looplock 11 nsec/call
intel 5000 1.6ghz
loop 0 nsec/call
loopxinc 44 nsec/call
looplock 25 nsec/call
intel atom 330 1.6ghz (exception!)
loop 2 nsec/call
loopxinc 14 nsec/call
looplock 22 nsec/call
amd k10 2.0ghz
loop 2 nsec/call
loopxinc 30 nsec/call
looplock 20 nsec/call
intel p4 xeon 3.0ghz
loop 1 nsec/call
loopxinc 76 nsec/call
looplock 42 nsec/call
- erik
[-- Attachment #2: xinc.s --]
[-- Type: text/plain, Size: 286 bytes --]
TEXT _xinc(SB), 1, $0 /* void _xinc(long*); */
MOVL l+0(FP), AX
LOCK; INCL 0(AX)
RET
TEXT _xdec(SB), 1, $0 /* long _xdec(long*); */
MOVL l+0(FP), BX
XORL AX, AX
LOCK; DECL 0(BX)
JLT _xdeclt
JGT _xdecgt
RET
_xdecgt:
INCL AX
RET
_xdeclt:
DECL AX
RET
[-- Attachment #3: timing.c --]
[-- Type: text/plain, Size: 699 bytes --]
#include <u.h>
#include <libc.h>
void _xinc(uint*);
void _xdec(uint*);
enum {
N = 1<<30,
};
void
loop(void)
{
uint i;
for(i = 0; i < N; i++)
;
}
void
loopxinc(void)
{
uint i, x;
for(i = 0; i < N; i++){
_xinc(&x);
_xdec(&x);
}
}
void
looplock(void)
{
uint i;
static Lock l;
for(i = 0; i < N; i++){
lock(&l);
unlock(&l);
}
}
void
timing(char *s, void (*f)(void))
{
uvlong t[2];
t[0] = nsec();
f();
t[1] = nsec();
fprint(2, "%s\t%llud nsec/call\n", s, (t[1] - t[0])/(uvlong)N);
}
void
main(void)
{
nsec();
timing("loop", loop);
timing("loopxinc", loopxinc);
timing("looplock", looplock);
exits("");
}
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] etherigbe.c using _xinc?
2009-12-08 20:00 ` erik quanstrom
@ 2009-12-08 23:52 ` Russ Cox
0 siblings, 0 replies; 9+ messages in thread
From: Russ Cox @ 2009-12-08 23:52 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
it looks like you are comparing these two functions
void
loopxinc(void)
{
uint i, x;
for(i = 0; i < N; i++){
_xinc(&x);
_xdec(&x);
}
}
void
looplock(void)
{
uint i;
static Lock l;
for(i = 0; i < N; i++){
lock(&l);
unlock(&l);
}
}
but the former does two operations and the latter
only one. your claim was that _xinc is slower
than incref (== lock(), x++, unlock()). but you are
timing xinc+xdec against incref.
assuming xinc and xdec are approximately the same
cost (so i can just halve the numbers for loopxinc),
that would make the fair comparison produce:
intel core i7 2.4ghz
loop 0 nsec/call
loopxinc 10 nsec/call // was 20
looplock 11 nsec/call
intel 5000 1.6ghz
loop 0 nsec/call
loopxinc 22 nsec/call // was 44
looplock 25 nsec/call
intel atom 330 1.6ghz (exception!)
loop 2 nsec/call
loopxinc 7 nsec/call // was 14
looplock 22 nsec/call
amd k10 2.0ghz
loop 2 nsec/call
loopxinc 15 nsec/call // was 30
looplock 20 nsec/call
intel p4 xeon 3.0ghz
loop 1 nsec/call
loopxinc 38 nsec/call // was 76
looplock 42 nsec/call
which looks like a much different story.
russ
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] etherigbe.c using _xinc?
[not found] <<dd6fe68a0912081705q6fad8a6cl20ca648397070dae@mail.gmail.com>
@ 2009-12-09 2:04 ` erik quanstrom
0 siblings, 0 replies; 9+ messages in thread
From: erik quanstrom @ 2009-12-09 2:04 UTC (permalink / raw)
To: 9fans
> has the network gotten fast enough that an extra
> bus transaction per block slows it down?
> it seems like gigabit ethernet would be around
> 100k packets per second, so the extra 50ns
> or so per packet would be 5ms per second in
> practice, which is significantly but hardly
> seems prohibitive.
i'm working with 10gbe. pcie 2.0 is making 2x10gbe
attractive. multiply by 10 or 20. and if you're doing a
request/response, multiply by 2 again.
- erik
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] etherigbe.c using _xinc?
2009-12-09 0:32 ` erik quanstrom
@ 2009-12-09 1:05 ` Russ Cox
0 siblings, 0 replies; 9+ messages in thread
From: Russ Cox @ 2009-12-09 1:05 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
On Tue, Dec 8, 2009 at 4:32 PM, erik quanstrom <quanstro@quanstro.net> wrote:
>> but the former does two operations and the latter
>> only one. your claim was that _xinc is slower
>> than incref (== lock(), x++, unlock()). but you are
>> timing xinc+xdec against incref.
>
> sure. i was looking it as a kernel version of a
> semaphore.
no, your original claim was that incref/decref
was faster than _xinc/_xdec. the numbers
don't support that claim.
> the reference
> counting is a heavy price to pay on every network
> block, when it is only used by ip/gre.c.
has the network gotten fast enough that an extra
bus transaction per block slows it down?
it seems like gigabit ethernet would be around
100k packets per second, so the extra 50ns
or so per packet would be 5ms per second in
practice, which is significantly but hardly
seems prohibitive.
> before allocb/freeb
> did 2 lock/unlocks. now it does 2 unlock/locks
> + 2 xinc/xdec, and is, in the best case 31% slower.
> and in the worst case 90% slower.
i don't know how you get those numbers but
anything even approaching that would mean that
the kernel is spending all its time in igberballoc,
at which point you probably have other things
to fix.
russ
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] etherigbe.c using _xinc?
[not found] <<dd6fe68a0912081552s30851f04n109e56479bb423cb@mail.gmail.com>
@ 2009-12-09 0:32 ` erik quanstrom
2009-12-09 1:05 ` Russ Cox
0 siblings, 1 reply; 9+ messages in thread
From: erik quanstrom @ 2009-12-09 0:32 UTC (permalink / raw)
To: 9fans
> but the former does two operations and the latter
> only one. your claim was that _xinc is slower
> than incref (== lock(), x++, unlock()). but you are
> timing xinc+xdec against incref.
sure. i was looking it as a kernel version of a
semaphore.
back to the original problem, before allocb/freeb
did 2 lock/unlocks. now it does 2 unlock/locks
+ 2 xinc/xdec, and is, in the best case 31% slower.
and in the worst case 90% slower. the reference
counting is a heavy price to pay on every network
block, when it is only used by ip/gre.c.
- erik
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2009-12-09 2:04 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-12-08 16:25 [9fans] etherigbe.c using _xinc? Venkatesh Srinivas
2009-12-08 16:36 ` erik quanstrom
2009-12-08 19:35 ` Russ Cox
2009-12-08 19:52 ` John Floren
2009-12-08 20:00 ` erik quanstrom
2009-12-08 23:52 ` Russ Cox
[not found] <<dd6fe68a0912081552s30851f04n109e56479bb423cb@mail.gmail.com>
2009-12-09 0:32 ` erik quanstrom
2009-12-09 1:05 ` Russ Cox
[not found] <<dd6fe68a0912081705q6fad8a6cl20ca648397070dae@mail.gmail.com>
2009-12-09 2:04 ` erik quanstrom
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).