Hi,
Erik's thread about a 16-processor x86 machine convinced me to try something related to spinlocks.
The current 9 spinlocks are portable code, calling an arch-provided tas() in a loop to do their thing. On i386, Intel recommends 'PAUSE' in the core of a spin-lock loop; I modified tas to PAUSE (0xF3 0x90 if you prefer) if the lock-acquire attempt failed.
In a crude test on a 1.5GHz p4 willamette with a local fossil/venti and 256mb of ram, 'time mk 'CONF=pcf' > /dev/null' in /sys/src/9/pc, on a fully-built source tree, adding the PAUSE reduced times from an average of 18.97s to 18.84s (across ten runs).
I tinkered a bit further. Removing the increments of glare, inglare and lockstat.locks, coupled with the PAUSE addition, reduced the average real time to 18.16s, again across 10 runs.
If taslock.c were arch-specific, we could almost certainly do better - i386 doesn't need the coherence() call in unlock, we could safely test-and-tas rather than than raw tas().
None of these are a huge deal; just thought they might be interesting.
Take care,
-- vs