From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-Version: 1.0 From: Venkatesh Srinivas Date: Mon, 21 Jun 2010 03:25:32 -0400 Message-ID: To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Content-Type: multipart/alternative; boundary=0016e6d7e6b7f02eae048985362f Subject: [9fans] A little ado about taslock Topicbox-Message-UUID: 35e9d22a-ead6-11e9-9d60-3106f5b1d025 --0016e6d7e6b7f02eae048985362f Content-Type: text/plain; charset=UTF-8 Hi, Erik's thread about a 16-processor x86 machine convinced me to try something related to spinlocks. The current 9 spinlocks are portable code, calling an arch-provided tas() in a loop to do their thing. On i386, Intel recommends 'PAUSE' in the core of a spin-lock loop; I modified tas to PAUSE (0xF3 0x90 if you prefer) if the lock-acquire attempt failed. In a crude test on a 1.5GHz p4 willamette with a local fossil/venti and 256mb of ram, 'time mk 'CONF=pcf' > /dev/null' in /sys/src/9/pc, on a fully-built source tree, adding the PAUSE reduced times from an average of 18.97s to 18.84s (across ten runs). I tinkered a bit further. Removing the increments of glare, inglare and lockstat.locks, coupled with the PAUSE addition, reduced the average real time to 18.16s, again across 10 runs. If taslock.c were arch-specific, we could almost certainly do better - i386 doesn't need the coherence() call in unlock, we could safely test-and-tas rather than than raw tas(). There're also other places to look at too, wrt to application of arch-specific bits; see: http://code.google.com/p/inferno-npe/source/detail?r=b83540e1e77e62a19cbd21d2eb54d43d338716a5for what XADD can do for incref/decref. Similarly, pc/l.s:_xdec could be much shorter, again using XADD. None of these are a huge deal; just thought they might be interesting. Take care, -- vs --0016e6d7e6b7f02eae048985362f Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi,

Erik's thread about a 16-processor x86 machine c= onvinced me to try something related to spinlocks.

The current 9 spinlocks are portable code, calling an arch-provided tas() = in a loop to do their thing. On i386, Intel recommends 'PAUSE' in t= he core of a spin-lock loop; I modified tas to PAUSE (0xF3 0x90 if you pref= er) if the lock-acquire attempt failed.

In a crude test on a 1.5GHz p4 willamette with a local = fossil/venti and 256mb of ram, 'time mk 'CONF=3Dpcf' > /dev/= null' in /sys/src/9/pc, on a fully-built source tree, adding the PAUSE = reduced times from an average of 18.97s to 18.84s (across ten runs).

I tinkered a bit further. Removing the increments of gl= are, inglare and lockstat.locks, coupled with the PAUSE addition, reduced t= he average real time to 18.16s, again across 10 runs.=C2=A0

<= /div>
If taslock.c were arch-specific, we could almost certainly do better -= i386 doesn't need the coherence() call in unlock, we could safely test= -and-tas rather than than raw tas().

There're = also other places to look at too, wrt to application of arch-specific bits;= see:=C2=A0http://code.google.com/p/infer= no-npe/source/detail?r=3Db83540e1e77e62a19cbd21d2eb54d43d338716a5 for w= hat XADD can do for incref/decref. Similarly, pc/l.s:_xdec could be much sh= orter, again using XADD.

None of these are a huge deal; just thought they might = be interesting.

Take care,
-- vs<= br>
--0016e6d7e6b7f02eae048985362f--