* Re: [9fans] etherigbe.c using _xinc? [not found] <<dd6fe68a0912081552s30851f04n109e56479bb423cb@mail.gmail.com> @ 2009-12-09 0:32 ` erik quanstrom 2009-12-09 1:05 ` Russ Cox 0 siblings, 1 reply; 9+ messages in thread From: erik quanstrom @ 2009-12-09 0:32 UTC (permalink / raw) To: 9fans > but the former does two operations and the latter > only one. your claim was that _xinc is slower > than incref (== lock(), x++, unlock()). but you are > timing xinc+xdec against incref. sure. i was looking it as a kernel version of a semaphore. back to the original problem, before allocb/freeb did 2 lock/unlocks. now it does 2 unlock/locks + 2 xinc/xdec, and is, in the best case 31% slower. and in the worst case 90% slower. the reference counting is a heavy price to pay on every network block, when it is only used by ip/gre.c. - erik ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] etherigbe.c using _xinc? 2009-12-09 0:32 ` [9fans] etherigbe.c using _xinc? erik quanstrom @ 2009-12-09 1:05 ` Russ Cox 0 siblings, 0 replies; 9+ messages in thread From: Russ Cox @ 2009-12-09 1:05 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Tue, Dec 8, 2009 at 4:32 PM, erik quanstrom <quanstro@quanstro.net> wrote: >> but the former does two operations and the latter >> only one. your claim was that _xinc is slower >> than incref (== lock(), x++, unlock()). but you are >> timing xinc+xdec against incref. > > sure. i was looking it as a kernel version of a > semaphore. no, your original claim was that incref/decref was faster than _xinc/_xdec. the numbers don't support that claim. > the reference > counting is a heavy price to pay on every network > block, when it is only used by ip/gre.c. has the network gotten fast enough that an extra bus transaction per block slows it down? it seems like gigabit ethernet would be around 100k packets per second, so the extra 50ns or so per packet would be 5ms per second in practice, which is significantly but hardly seems prohibitive. > before allocb/freeb > did 2 lock/unlocks. now it does 2 unlock/locks > + 2 xinc/xdec, and is, in the best case 31% slower. > and in the worst case 90% slower. i don't know how you get those numbers but anything even approaching that would mean that the kernel is spending all its time in igberballoc, at which point you probably have other things to fix. russ ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <<dd6fe68a0912081705q6fad8a6cl20ca648397070dae@mail.gmail.com>]
* Re: [9fans] etherigbe.c using _xinc? [not found] <<dd6fe68a0912081705q6fad8a6cl20ca648397070dae@mail.gmail.com> @ 2009-12-09 2:04 ` erik quanstrom 0 siblings, 0 replies; 9+ messages in thread From: erik quanstrom @ 2009-12-09 2:04 UTC (permalink / raw) To: 9fans > has the network gotten fast enough that an extra > bus transaction per block slows it down? > it seems like gigabit ethernet would be around > 100k packets per second, so the extra 50ns > or so per packet would be 5ms per second in > practice, which is significantly but hardly > seems prohibitive. i'm working with 10gbe. pcie 2.0 is making 2x10gbe attractive. multiply by 10 or 20. and if you're doing a request/response, multiply by 2 again. - erik ^ permalink raw reply [flat|nested] 9+ messages in thread
* [9fans] etherigbe.c using _xinc? @ 2009-12-08 16:25 Venkatesh Srinivas 2009-12-08 16:36 ` erik quanstrom 0 siblings, 1 reply; 9+ messages in thread From: Venkatesh Srinivas @ 2009-12-08 16:25 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs Hi, I noticed etherigbe.c (in igberballoc) was recently changed to increment the refcount on the block it allocates. Any reason it uses _xinc rather than incref? -- vs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] etherigbe.c using _xinc? 2009-12-08 16:25 Venkatesh Srinivas @ 2009-12-08 16:36 ` erik quanstrom 2009-12-08 19:35 ` Russ Cox 0 siblings, 1 reply; 9+ messages in thread From: erik quanstrom @ 2009-12-08 16:36 UTC (permalink / raw) To: 9fans On Tue Dec 8 11:28:30 EST 2009, me@acm.jhu.edu wrote: > Hi, > > I noticed etherigbe.c (in igberballoc) was recently changed to > increment the refcount on the block it allocates. Any reason it uses > _xinc rather than incref? > > -- vs because it's not a Ref. unfortunately, if it were a Ref, it would be much faster. _xinc is deadly slow even if there is no contention on x86. i wish the ref counting had at least been isolated to the case that needs them. blocks in queues typically have one owner. so the owner of the block assumes it can modify the block whenever with no locking. ref counting means this assumption is false. i'm not sure how your supposed to wlock a block. - erik ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] etherigbe.c using _xinc? 2009-12-08 16:36 ` erik quanstrom @ 2009-12-08 19:35 ` Russ Cox 2009-12-08 19:52 ` John Floren 2009-12-08 20:00 ` erik quanstrom 0 siblings, 2 replies; 9+ messages in thread From: Russ Cox @ 2009-12-08 19:35 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs > because it's not a Ref. unfortunately, if it were > a Ref, it would be much faster. _xinc is deadly > slow even if there is no contention on x86. do you have numbers to back up this claim? you are claiming that the locked XCHGL in tas (pc/l.s) called from lock (port/taslock.c) called from incref (port/chan.c) is "much faster" than the locked INCL in _xinc (pc/l.s). it seems to me that a locked memory bus is a locked memory bus. also, when up != nil (a common condition), lock does a locked INCL and DECL (_xinc and _xdec) in addition to the tas, which seems like strictly more work than a single _xinc. russ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] etherigbe.c using _xinc? 2009-12-08 19:35 ` Russ Cox @ 2009-12-08 19:52 ` John Floren 2009-12-08 20:00 ` erik quanstrom 1 sibling, 0 replies; 9+ messages in thread From: John Floren @ 2009-12-08 19:52 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Tue, Dec 8, 2009 at 2:35 PM, Russ Cox <rsc@swtch.com> wrote: >> because it's not a Ref. unfortunately, if it were >> a Ref, it would be much faster. _xinc is deadly >> slow even if there is no contention on x86. > > do you have numbers to back up this claim? > I don't have the code or the numbers in front of me, but I recall seeing quite a bit of speed improvement when I experimentally replaced incref/decref with direct calls to _xinc/_xdec. I don't remember what the test was, but I do remember that I got something like 35% improvement on it. I ran that kernel on my terminal for the rest of the summer without trouble; while I didn't notice a blazing speed increase, it didn't slow me down either. John -- "Object-oriented design is the roman numerals of computing" -- Rob Pike ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] etherigbe.c using _xinc? 2009-12-08 19:35 ` Russ Cox 2009-12-08 19:52 ` John Floren @ 2009-12-08 20:00 ` erik quanstrom 2009-12-08 23:52 ` Russ Cox 1 sibling, 1 reply; 9+ messages in thread From: erik quanstrom @ 2009-12-08 20:00 UTC (permalink / raw) To: 9fans [-- Attachment #1: Type: text/plain, Size: 883 bytes --] > do you have numbers to back up this claim? > > you are claiming that the locked XCHGL > in tas (pc/l.s) called from lock (port/taslock.c) > called from incref (port/chan.c) is "much faster" > than the locked INCL in _xinc (pc/l.s). > it seems to me that a locked memory bus > is a locked memory bus. yes, i do. xinc on most modern intel is a real loss. and a moderate loss on amd. my atom 330 is an exception. intel core i7 2.4ghz loop 0 nsec/call loopxinc 20 nsec/call looplock 11 nsec/call intel 5000 1.6ghz loop 0 nsec/call loopxinc 44 nsec/call looplock 25 nsec/call intel atom 330 1.6ghz (exception!) loop 2 nsec/call loopxinc 14 nsec/call looplock 22 nsec/call amd k10 2.0ghz loop 2 nsec/call loopxinc 30 nsec/call looplock 20 nsec/call intel p4 xeon 3.0ghz loop 1 nsec/call loopxinc 76 nsec/call looplock 42 nsec/call - erik [-- Attachment #2: xinc.s --] [-- Type: text/plain, Size: 286 bytes --] TEXT _xinc(SB), 1, $0 /* void _xinc(long*); */ MOVL l+0(FP), AX LOCK; INCL 0(AX) RET TEXT _xdec(SB), 1, $0 /* long _xdec(long*); */ MOVL l+0(FP), BX XORL AX, AX LOCK; DECL 0(BX) JLT _xdeclt JGT _xdecgt RET _xdecgt: INCL AX RET _xdeclt: DECL AX RET [-- Attachment #3: timing.c --] [-- Type: text/plain, Size: 699 bytes --] #include <u.h> #include <libc.h> void _xinc(uint*); void _xdec(uint*); enum { N = 1<<30, }; void loop(void) { uint i; for(i = 0; i < N; i++) ; } void loopxinc(void) { uint i, x; for(i = 0; i < N; i++){ _xinc(&x); _xdec(&x); } } void looplock(void) { uint i; static Lock l; for(i = 0; i < N; i++){ lock(&l); unlock(&l); } } void timing(char *s, void (*f)(void)) { uvlong t[2]; t[0] = nsec(); f(); t[1] = nsec(); fprint(2, "%s\t%llud nsec/call\n", s, (t[1] - t[0])/(uvlong)N); } void main(void) { nsec(); timing("loop", loop); timing("loopxinc", loopxinc); timing("looplock", looplock); exits(""); } ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [9fans] etherigbe.c using _xinc? 2009-12-08 20:00 ` erik quanstrom @ 2009-12-08 23:52 ` Russ Cox 0 siblings, 0 replies; 9+ messages in thread From: Russ Cox @ 2009-12-08 23:52 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs it looks like you are comparing these two functions void loopxinc(void) { uint i, x; for(i = 0; i < N; i++){ _xinc(&x); _xdec(&x); } } void looplock(void) { uint i; static Lock l; for(i = 0; i < N; i++){ lock(&l); unlock(&l); } } but the former does two operations and the latter only one. your claim was that _xinc is slower than incref (== lock(), x++, unlock()). but you are timing xinc+xdec against incref. assuming xinc and xdec are approximately the same cost (so i can just halve the numbers for loopxinc), that would make the fair comparison produce: intel core i7 2.4ghz loop 0 nsec/call loopxinc 10 nsec/call // was 20 looplock 11 nsec/call intel 5000 1.6ghz loop 0 nsec/call loopxinc 22 nsec/call // was 44 looplock 25 nsec/call intel atom 330 1.6ghz (exception!) loop 2 nsec/call loopxinc 7 nsec/call // was 14 looplock 22 nsec/call amd k10 2.0ghz loop 2 nsec/call loopxinc 15 nsec/call // was 30 looplock 20 nsec/call intel p4 xeon 3.0ghz loop 1 nsec/call loopxinc 38 nsec/call // was 76 looplock 42 nsec/call which looks like a much different story. russ ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2009-12-09 2:04 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <<dd6fe68a0912081552s30851f04n109e56479bb423cb@mail.gmail.com> 2009-12-09 0:32 ` [9fans] etherigbe.c using _xinc? erik quanstrom 2009-12-09 1:05 ` Russ Cox [not found] <<dd6fe68a0912081705q6fad8a6cl20ca648397070dae@mail.gmail.com> 2009-12-09 2:04 ` erik quanstrom 2009-12-08 16:25 Venkatesh Srinivas 2009-12-08 16:36 ` erik quanstrom 2009-12-08 19:35 ` Russ Cox 2009-12-08 19:52 ` John Floren 2009-12-08 20:00 ` erik quanstrom 2009-12-08 23:52 ` Russ Cox
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).