From mboxrd@z Thu Jan  1 00:00:00 1970
MIME-Version: 1.0
From: Venkatesh Srinivas <me@acm.jhu.edu>
Date: Mon, 21 Jun 2010 03:25:32 -0400
Message-ID: <AANLkTinzxR0X40SrF6iaA_qdYzNV4wOfOI6Mr7CgtTfY@mail.gmail.com>
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Content-Type: multipart/alternative; boundary=0016e6d7e6b7f02eae048985362f
Subject: [9fans] A little ado about taslock
Topicbox-Message-UUID: 35e9d22a-ead6-11e9-9d60-3106f5b1d025

--0016e6d7e6b7f02eae048985362f
Content-Type: text/plain; charset=UTF-8

Hi,

Erik's thread about a 16-processor x86 machine convinced me to try something
related to spinlocks.

The current 9 spinlocks are portable code, calling an arch-provided tas() in
a loop to do their thing. On i386, Intel recommends 'PAUSE' in the core of a
spin-lock loop; I modified tas to PAUSE (0xF3 0x90 if you prefer) if the
lock-acquire attempt failed.

In a crude test on a 1.5GHz p4 willamette with a local fossil/venti and
256mb of ram, 'time mk 'CONF=pcf' > /dev/null' in /sys/src/9/pc, on a
fully-built source tree, adding the PAUSE reduced times from an average of
18.97s to 18.84s (across ten runs).

I tinkered a bit further. Removing the increments of glare, inglare and
lockstat.locks, coupled with the PAUSE addition, reduced the average real
time to 18.16s, again across 10 runs.

If taslock.c were arch-specific, we could almost certainly do better - i386
doesn't need the coherence() call in unlock, we could safely test-and-tas
rather than than raw tas().

There're also other places to look at too, wrt to application of
arch-specific bits; see:
http://code.google.com/p/inferno-npe/source/detail?r=b83540e1e77e62a19cbd21d2eb54d43d338716a5for
what XADD can do for incref/decref. Similarly, pc/l.s:_xdec could be
much shorter, again using XADD.

None of these are a huge deal; just thought they might be interesting.

Take care,
-- vs

--0016e6d7e6b7f02eae048985362f
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hi,<div><br></div><div>Erik&#39;s thread about a 16-processor x86 machine c=
onvinced me to try something related to spinlocks.</div><div><br></div><div=
>The current 9 spinlocks are portable code, calling an arch-provided tas() =
in a loop to do their thing. On i386, Intel recommends &#39;PAUSE&#39; in t=
he core of a spin-lock loop; I modified tas to PAUSE (0xF3 0x90 if you pref=
er) if the lock-acquire attempt failed.</div>

<div><br></div><div>In a crude test on a 1.5GHz p4 willamette with a local =
fossil/venti and 256mb of ram, &#39;time mk &#39;CONF=3Dpcf&#39; &gt; /dev/=
null&#39; in /sys/src/9/pc, on a fully-built source tree, adding the PAUSE =
reduced times from an average of 18.97s to 18.84s (across ten runs).</div>

<div><br></div><div>I tinkered a bit further. Removing the increments of gl=
are, inglare and lockstat.locks, coupled with the PAUSE addition, reduced t=
he average real time to 18.16s, again across 10 runs.=C2=A0</div><div><br><=
/div>

<div>If taslock.c were arch-specific, we could almost certainly do better -=
 i386 doesn&#39;t need the coherence() call in unlock, we could safely test=
-and-tas rather than than raw tas().</div><div><br></div><div>There&#39;re =
also other places to look at too, wrt to application of arch-specific bits;=
 see:=C2=A0<a href=3D"http://code.google.com/p/inferno-npe/source/detail?r=
=3Db83540e1e77e62a19cbd21d2eb54d43d338716a5">http://code.google.com/p/infer=
no-npe/source/detail?r=3Db83540e1e77e62a19cbd21d2eb54d43d338716a5</a> for w=
hat XADD can do for incref/decref. Similarly, pc/l.s:_xdec could be much sh=
orter, again using XADD.</div>

<div><br></div><div>None of these are a huge deal; just thought they might =
be interesting.</div><div><br></div><div>Take care,<br clear=3D"all">-- vs<=
br>
</div>

--0016e6d7e6b7f02eae048985362f--