Hi,

Erik's thread about a 16-processor x86 machine convinced me to try something related to spinlocks.

The current 9 spinlocks are portable code, calling an arch-provided tas() in a loop to do their thing. On i386, Intel recommends 'PAUSE' in the core of a spin-lock loop; I modified tas to PAUSE (0xF3 0x90 if you prefer) if the lock-acquire attempt failed.

In a crude test on a 1.5GHz p4 willamette with a local fossil/venti and 256mb of ram, 'time mk 'CONF=pcf' > /dev/null' in /sys/src/9/pc, on a fully-built source tree, adding the PAUSE reduced times from an average of 18.97s to 18.84s (across ten runs).

I tinkered a bit further. Removing the increments of glare, inglare and lockstat.locks, coupled with the PAUSE addition, reduced the average real time to 18.16s, again across 10 runs. 

If taslock.c were arch-specific, we could almost certainly do better - i386 doesn't need the coherence() call in unlock, we could safely test-and-tas rather than than raw tas().

There're also other places to look at too, wrt to application of arch-specific bits; see: http://code.google.com/p/inferno-npe/source/detail?r=b83540e1e77e62a19cbd21d2eb54d43d338716a5 for what XADD can do for incref/decref. Similarly, pc/l.s:_xdec could be much shorter, again using XADD.

None of these are a huge deal; just thought they might be interesting.

Take care,
-- vs