From mboxrd@z Thu Jan 1 00:00:00 1970 From: erik quanstrom Date: Mon, 21 Jun 2010 10:21:36 -0400 To: 9fans@9fans.net Message-ID: <8668dded1f0a71f7c699f3f4ee7cf18c@kw.quanstro.net> In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Subject: Re: [9fans] A little ado about taslock Topicbox-Message-UUID: 35ef5740-ead6-11e9-9d60-3106f5b1d025 > In a crude test on a 1.5GHz p4 willamette with a local fossil/venti and > 256mb of ram, 'time mk 'CONF=pcf' > /dev/null' in /sys/src/9/pc, on a > fully-built source tree, adding the PAUSE reduced times from an average of > 18.97s to 18.84s (across ten runs). we tried this at coraid years ago. it's a win — but only on the p4 and netburst-based xeons with old-and-crappy hyperthreading enabled. it seems to otherwise be a small loss. i don't see an actual performance problem on the 16-cpu machine. i see an apparent performance problem. the 4- and 16- processor machines have a single-threaded speed ratio of ~ 1:1.7, so since kprof does sampling on the clock interrupt, it seems reasonable that processors could get in a timing-predictable loop and get sampled at different places each time. no way rebalance is using 40% of the cpu, right? the anomoly in time(1) is not yet explained. but it's clearly not much of a performance problem there was only a 10% slowdown between 1 core busy and 16 cores busy. that's likely due to the fact that plan 9 knows nothing of the numa nature of that board. richard miller does point out a real problem. idlehands just returns if conf.nproc>1. this is done so we don't have to wait for the next clock tick should work become available. this is a power management problem, not a performance problem. your interesting locking solution posted previously doesn't help with this. it's not even a locking problem. a potential solution to this would be to have a new bit array, e.g. active.schedwait which is set when a proc has no work. the mach could then call halt. a mach could then check for an idle mach to wake after reading a proc. an apic ipi would be a suitable wakeup mechanism with r.t. latencies < 500ns. (www.barrelfish.org/barrelfish_mmcs08.pdf) one assumes that 500ns/2 + wakeup time ≈ wakeup time. two unfinished thoughts: 1. it sure wouldn't surprise me if this has been done in plan 9 before. i'd be interested to know what ken's sequent kernel did. 2. if today 16 machs are possible (and 128 on an intel xeon mp 7500— 8 sockets * 8 core * 2t = 128), what do we expect in 5 years? 128? - erik