From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/7744 Path: news.gmane.org!not-for-mail From: Jens Gustedt Newsgroups: gmane.linux.lib.musl.general Subject: Re: New optimized normal-type mutex? Date: Fri, 22 May 2015 09:30:48 +0200 Message-ID: <1432279848.11462.45.camel@inria.fr> References: <20150521234402.GA25373@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/signed; micalg="pgp-sha1"; protocol="application/pgp-signature"; boundary="=-S7R4CmbhFrKQKi4RpDNO" X-Trace: ger.gmane.org 1432279866 24954 80.91.229.3 (22 May 2015 07:31:06 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 22 May 2015 07:31:06 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-7756-gllmg-musl=m.gmane.org@lists.openwall.com Fri May 22 09:31:01 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1YvhQ5-0001nj-Iu for gllmg-musl@m.gmane.org; Fri, 22 May 2015 09:31:01 +0200 Original-Received: (qmail 32234 invoked by uid 550); 22 May 2015 07:30:59 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 32216 invoked from network); 22 May 2015 07:30:59 -0000 X-IronPort-AV: E=Sophos;i="5.13,474,1427752800"; d="scan'";a="152788752" In-Reply-To: <20150521234402.GA25373@brightrain.aerifal.cx> Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwCAYAAABXAvmHAAAABHNCSVQICAgIfAhkiAAAEb9JREFUaN7VmXuQ3lV5xz/n/O7v7333fffdW3azSTY3Qi6wGy4aEc0qWlGhWcAq7WiDrWNpO61paS29EpzWaevYpvfL2JK0VRBbBxQtg0U2BRGKhIAI4ZKwSTbZ+3u//G7nnP7BtGM7UyVAnfb56/xznuf7nXnOnOf5fuH/eYjXI8mDn/5k8ch90xNj+eKka8xkI1MTGmY6Rk/XpD5w091fPvnf7xz8sQ/sLTQ6k54SY2mW1mQY3DVX8Kdv/PvPnfyBE7hn797bxBPP3NBv28RAqgyzJqNgJIs5q7bhHW+94Yd+5/fuBrjj5l/cs/LUs/vNiVMTo5aNJQX9lks3iaivHqlt/9D7xs778Y/Uf6AE/uWa6x7wT52ZTDyLKEkoOHmW4oSaSDFxQmq7tHPuDJ5TUkv10iAGgSLvediZotf16HS6KG1RuXDT/g99/vO3vtLa8vUgELuCVAja0mVJKRaylGbgYnwP27bImYRcozkmKrWS53vYjosfhNSUIjXQUoq2K0ksRdZqTZ5Lbfs/Dl/66E/uSebmb8hF8URab48JS+IPDVJtto9uuHj8wKW//6lD/1OSznxtLEPRv2kdPH2MSBpkqlDGkLoOIgGkIWcEiTYsk1JSDpGUFAAcBzfRCKXxV+ql7/XWnn7uxcm3vOGS2rrR1UcLV+6pC4B//tjPjGdHjhwttFOEcMmMAgFeLkezWsOTkmigfyYZGTzYN7HtwFt u+rU6wJnprxQf/JvP7beffH5fBIwGIZqEutJUlCKSNoFjYbQiihOkFmih0FJgSUGIpGxstE6RwiLQmsyzcXZesO8N1773YOHKPfUv/uxP726ePjsVNVoT9WZjMq02kRpEKay9+UPXjQmAL1991UuDZ+fGQulSVQrpuqg4pWkkupSjVGshtGZJpzQ9DysfQqZIkgzR7 jAc+qAFrm2xTIRJJKnr0okTfCnQKLRlo9MUicGWL7dcnCnytoOTKTxpERpBv8moomm7AZFjEddb9EiHdhJhPJdYanQKUZrivmH7Pnlm+itFuVgdi7WkpSSJdFlqRwRugLAkQakXJQ06cAhcB0tnRI0mstOllKYEvk8qJJGj8IygID16hwZBGIzIsIXCRREq6DGSsu9RkBYBBldKbCFwbMnA2mH8wKIrbQrGY1WkKbQ6DFkWOQEicMks8JTEQRJI0LOLJbvxwosTMlMk0kYbjTCGci4gzrr0Sod0YQFHWiSuRTHMkbRaCCxMZlh2FLGQLEjBmHFIhUBrEI0OlgYnyNN0LFKliYxGeT69rouqtOjP5+lLEtxugosgOjNHj7ZRGmwjaDoGX9skKEQQIFptkAIjIWckCQIrE9ija8aOznk5wiwh9lw8aeMlMamUeEKis4wOFiaFRtpEYxBJSoQA4ZM4ksgSPG4E9Siibjt02xUaUQsnzFFfSZlPM4TRlPp7yQWCs/PzlMs53HrMRLGPt/f2UFYRbkOzbGm6JkPgYjAgbVo6I/Ntup0unuPgWAK0RaLVy//AfVe864lgcWkicz1yloNvoJMmpFLTsiCJNIFt0VERHdcnGyxPW5598BmlJo6tVPc9v1RB2CHdqMWOYpkLCwHnhyHVTHNcpTxUrVCp1rAtyYZ1w0i3SLFk89A3/g2UZH3O4ubL30RxsU5ldhkpDFJIbGFRE4rYSLQQRHGEJQQeFm7gYHaev98CuO6NF62q1hqTKktJlKarNW0JDZUgHA9dys/Y q4emnQ3Df3nxj7z7+t2f+pO/Ysv5U0dm525uKYty3wCetPjRdYNcjaaQabq1Oq1MkVOKkc1reGFhiWIuj7C6JGlKyZNsNor3n7eFStLlkcoKz61fvX/79u3T9cWliTBWvrQEHa0QQmIZgdIK27FQmaKVxIRbN+77z5/4sT/+1PjJEycmosWVmXJvP3YpZM3aNYyuWz dTuHLPf5lPrnr3VXtPnDp10HNDlIqROEwO9fG+VosXspQ7lyto3+fE8ln2rlpDfniQP332WRCCd75tB43jZ7jUDRlspTiugxAeR+I2960s0hBy3503fmT69FcfONian5uQliDIF+m2YyKlUUIRxRF968eOfuC+r+4851Hill//+PiRx79ztNrs0mm0KZaLZEnGWwsB725WOFRv8HAmSDodSo7D3oF+bMfirxbOYkKHQDhc5lusznxaaO6fO8O2fIE9vf1M0+Go0jM73nT55EcvGa8duedr+3S1si8v7VLnzDxRpkjCHGJ4aHr1utVT7/mzP6/b5wJ++v57izf9wi8dVNrHcT3yxQK27RJ3Y6zM0LI1b+0ZoFldIvFsxvsH6LMELQOpkST1GBVq3NIAvvC54ztPcTwRLOs2b+sf4u2RS5JFY1++5yt3XbXnvZPX3/n5W4FbH/nLA+ueuOeBG/zefG3ynW87+N3D3jkRuP3226fiVEwkOiV0bNIsJep28QOfqOAjl2DEElw1WCZKFGUtkK6h7fj4DpRKfURJjeGwSE/UJcJga4d6p0lHCoo64w1BkfsqtYnf/LVPHgSuAdh1476TwMsD3t9/9jUMc9pMSiGxJRgV0+22Wa7XqTfqHH7pON8u9/NE6BB5PnZPD42BkMe14IG4gVXIsVSvkijFU2fPIrOUraPrsD3Jlt4SnTQlsi2qtqLSarG0VJnaObHztu8HyTqX9vn6Aw/cMTw0TJpkeH6AhaIncCn3FkmFzWNzc3y7U+ObjTqPVJs8Vq1h1oScabdoV xWxyQg9l57BATZZDpuCgH5PMrVxM3G3g513WfNDW3liYRGsPN1ua2LHtq0zM6dOPfmqCDx15227/+ILd58EaNa717/00smp3bsvo7fcS3VpGTcAKV2q1QZJ3MVIC88JsFJN2c+xbrDM+aHPlqE+ji0vY7kWvaFhaaXJzs3nYaoNNq/tJ1/tsiwtnpEOtz/+HHNLMXG UkmYp7XarVKlWD53TQtO89+7iN//stoPWqdkpPx/Wur3F6WccZ+KFdnuskXNZaCVkiWJ++Qy+59EbuHi6y1t2XkL27Ak2CJssS6lHMYPCYDLDP/dJvj1fQ0qb+WoMqk3BzrFlYAidc5hZXAQnh9DQ6TbRQhF3U7Is4eZf+YWJj9308Se/5z7w3fH8s8fGnNOnp1ZJie52S+VOd2oIeJObo2Vs5i2JWZWn1e/TrjQZK/WSiyPCE2eImh1S12PJaEJtYbsCN81oL7eROPQVodVI6ZiA1MvxQhyhOxGWlcOkKUHg4aQOaaoQ0hDFXf7pi3dPAk+e00r56HuurubnzpZsY5NKiYMhwUagcY1CCINB0pACIQXLoaBHWdTbXbTwWHEhHwQE9SbVgRJ/Mz/Hi4sr7H3fOIuVJvc+OEsYFnG8AKETGs0WljBYtk2j3nkZmRDoqMPqdSNHHztyZOc5rZTpQHlfIh2MpdDSgAZbZ/hGgzRIaSEF9AiNLyWylRFbNh0bfKEZsCxkK2JxZJi/Pv0CJyornLd5DQ8++hxHv7NIEFg0W00ajRqNRh3XtciUot5okBlFZgzSssm04fjxExN/+kcHxs+JwOWHDh06u3bkQNV1a7ZSIDWehEQoUgEaQGtMlpHlLXp6XPLdjFyqwBMsAvfR5D4Z0fFKbNqwgWajgnQHWF6K8V0H35EIrQFodWKSzGAJQc63sS2JylKkZaO1ZnTNqplXpUo07727+NV/vPuAv3D2hrWpRtYaZJbEwmBlINf00144S35kgMZsxEpgc0zEfL3Spu 1LklihlU2u6NBtNZDaohXHIDKadYWRLsZkZFhgNCaLcB2XzECWalAZhR6v9vzx472v+BF/dxSu3FPf9cY3z0TNLgO+YFNfHzozJHFE4CvW1CPGBvtYabQ5YQuON+aI7YCqUTSX2hgTUeop025IstSiv1wiUQu4fkiz2SBTHaSQ9AQOaZqBk6MTdzFGYAnB0FAvWjD9 fVWJ7xUjI+XJ54/NMNeVLM3XMEZjWRYGzf2dWUp5kJZNmlmE+RDdaXHpznFOzc6xOL8AKIqlArVaTKdTQSvD4sIifb29KOOgshaOJal0UzSgjcQYyNKU4zMzXHTR+F2vSRdKOm3CnIvnO4QFn77ePL7vErVaqLhN4OUYcGHL+j7iVoeBoX5mT5+mtjzHeRtLDA2UWFqYx3c8Ot2MocE875y8iMCHLElIogytDVJKjFIYLUiiCKVihkeGD379Xx869JoICGHj2C4qU2ASFqsLNFptvMBj9ZoSlzgen7hiFz919QQDoWB+9gwqU4SFXk6eXGHN6kH6e21Wls7SX3Zot5t889HvUKlFL8ssJkOgAIXv2+gswqiYjRvHjv7qr+zb94qEre8Vnu/juRlKCTJlgQhBxdRaTVb1lNh7yRb41tOUR99G72Afza6h0e7gezZWOMzRY7NIZVHoyVOtxli2hSUEaarJlwLqOqOTaoTMYVmagXKOQnHk6O/+7u9MTl5xZf01D3NrR0dK7W58pe/naHS7SB1z7fg6fm58I2/OuRSencH2HIIffjNzLRhfv5ZGo85itU6aSRr1Jq6laEcJjcgwWA7YMOoS5B2WKx38oEAUJURRhko1nm9P/8EffvrK7wf+FbfQjT/zUwdzoTvTjVv4UlK0XHZs3UTZd1mbaqyiCwKc02d4v51ycW2Zy9Zvw+AQ5DxW9RcoFwRDZZ9VA2UAdu26iD3vupQLto5SrXSwpIPjKsbWr+aDH/zg/lcC/pzU6Y98eO/4iReem67WOyU/6KFdX2S kELLRU3y0d5Cg1aEZBqSxQJdtprXL5546guXm6QtCwmKOTrvC2NoSq8tw8fYSDx9t8tATVSqNOo16lzdech7Ndrz/a4e/9fqr05+57dCTW7edtz/fE2JIySyHlxqKr8+3OFXooykND8ocf1vwuD1WHO1G5AKPUi5EYVGpLeA4OcqhZtVASKtjkLZFpjM6nZjNG0dnN p63fepcwL8qf+BdV+x+YGmxNimRtNOUJI3Z1APX7NjBPxx5ntjLQRLTTBKGSz4jwz3UajWGhgap1uqcPFXh0ovX88zTz1Mo9tJNQdiF6d273zh1y29/un6ueM7ZH7hw24apfGDPJN02ri0phXlUaZi/e3qGSpziOgKjI8qhS73eQKuUIBAsLS3i+x5DQ0W6zRUyY1hpiNrAwND+PXve+6rAv2qH5rdv+dXxr933tbtWlqpj/cOjkAkWaosUHUW5UGBmYZF1AwM0WjUKvX2g25TLJZ56dh7Pztg5vmFm8vIL9udHLrtr6trrXxXw12wxTd9/b/ETt95y15lTC5Pbtu2gUVkgaTeItSJKDI4Fo6t66cZtbNuiUu8QlvpnVq0auuGzd3758P8Jl3L6/nuLf3Tg05OWEJOFfH4iiuLJOMlYXmnSrDURaIxJEFJzwYXb9x+640u38v8hPvazP717+5atZtvmrWbN6vVm9dBqc83VP7znf6OW/Xok+a2b941XG9HY7KmTE81me/Khww+P5XMFkszQiFIKPQEQTwF3/58zui+98IK9KA4W8gVOn10kF4Tk8z7SdWl2I6JWA0fE9A+VqdXq0x/+iQ/f8PM3/cbJ14vAa7JZr3rXFR+bn188aFk2veUCxXIPQeiDkdQbbXKuZP3qIju3rcVKm7z9su2TTxy+a+Yzn3z/La8XAevVXrzu2j3jBnlXrVonVSmtbpdms40AwlyeTGeEtk2r00HYLhdsXc/OrR4XbVaU/Mbkb930nqme0rrowUefffIHTuC6a/fsPfbM C3d0WolfKhTp6clRyhcg04xtGCHt1Dh/rExRamYWGyxXa7xn1yYuvkAwP7fClg0Zhx+qr/rM7d+Yesc73nHD5OW7Zh478tRzPxACH/iRa245duzkAdsJfMuy8aXm/FUFlpeWWOm2WZidI9WapaUlFIaTlQ4jfR6/vO9D2NE32LTeYnZOUmnA6PAohx9+qnTy1NnrP/ Cj15cu23XZI4/826Px/xqB37z547sPH374oOvnKTsOb+1zuCoImZARM8ZiLlKkXcWOcoFRJ+Do/CwFO4e2DEe+9Ti+ZXA8ny/cU+X4bMKRZxZIVEiqXI6/+NKuUzPHb7z40ovnX3zx+Ctuq38HyuqWG7Tu+A0AAAAASUVORK5CYII= X-Mailer: Evolution 3.12.9-1+b1 Xref: news.gmane.org gmane.linux.lib.musl.general:7744 Archived-At: --=-S7R4CmbhFrKQKi4RpDNO Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hello, Am Donnerstag, den 21.05.2015, 19:44 -0400 schrieb Rich Felker: > I realized (thanks to conversations with wm4) that it's possible to > optimize normal-type mutexes a lot to eliminate spurious futex wakes > and reduce the number of atomic ops/barriers under contention. >=20 > The concept is to store the full mutex state in a single int: >=20 > bits 0-29: waiter count > bit 30: waiter flag > bit 31: lock bit >=20 > Then unlock is: >=20 > old =3D a_fetch_and(p, 0x3fffffff); // new op > if (old & 0x40000000) wake(p,1); >=20 > Trylock is: >=20 > old =3D *p; > if (old < 0) { > a_barrier(); > old =3D *p; > } > if (old >=3D 0 && a_cas(p, old, old|INT_MIN)) return 0; >=20 > Lock is: >=20 > while (spins--) { > if (!trylock()) return 0; > a_spin(); > } > while ((old =3D *p) < 0) { > new =3D old+1 | 0x40000000; > if (a_cas(p, old, new)=3D=3Dold) break; > } > for(;;) { > while ((old =3D *p) < 0) { > futex_wait(p, old); > } > new =3D old-1; > if (new) new |=3D 0x40000000; // set waiters flag if any other waiters > new |=3D INT_MIN; > if (a_cas(p, old, new)=3D=3Dold) return 0; > } >=20 > The waiter flag marks that unlocking needs to send a wake. It is set > whenever a new waiter arrives, and when a waiter that consumed a wake > observes that there are more waiters left to be woken. The flag is > notably clear when the same thread which just released the mutex > happens to obtain it again before an existing waiter does (when > hammering the mutex); this is where we should see huge savings, since > the subsequent unlock(s) will not produce additional futex wakes which > (1) incur syscall overhead, and (2) cause additional waiters which > cannot yet get the mutex to wake up, try to lock it, and go back to > waiting again. Further: >=20 > The code path for unlock has exactly one atomic. >=20 > The code path for trylock has at least one barrier and at most (only > on success) one atomic, but may have a spurious barrier if a stale > value was initially read and the operation subsequently succeeds. >=20 > The code path for lock on contention (trylock failed and thus produced > no atomic changes, only barriers/spins), contains one atomic cas to > set the waiter state (versus 2 in the current musl implementation), > whatever the kernel does in futex, and one more atomic cas to finally > take the lock (versus 2 in current musl). >=20 > Are these changes worthwhile and worth having additional custom code > for the normal-type mutex? I'm not sure. We should probably do an > out-of-tree test implementation to measure differences then consider > integrating in musl if it helps. I think the gain in maintainability and readability would be very interesting, so I would even be in favor to apply it if it doesn't bring performance gain. I would at least expect it not to have performance loss. Even though this might be a medium sized patch, I think it is worth a try. (But perhaps you could go for it one arch at a time, to have things go more smoothly?) But I personally expect it to be a win, in particular for mtx_t, where it probably covers 99% of the use cases. > If it does look helpful to musl, it may make sense to use this code as > the implementation for __lock/__unlock (implementation-internal locks) > and thereby shrink them all from int[2] to int[1], and then have the > pthread mutex code tail-call to these functions for the normal-mutex > case. Actually from a code migration POV I would merely start from this end. Implement internal locks that use that strategy, and use them internally. Then, in a second phase start using this lock in data structures that are exposed to the user. There will be problems with dynamic linking between executables and libc that differ on these strategies, so we'd have to be careful. > Even if it's not a performance help for musl, the above design may be > very nice as a third-party mutex library since it handles > self-synchronized destruction, minimizes spurious futex wakes, and > fits the whole lock in a single int. "third-party" library is already all the internal stuff where we use lock features that are not exposed to the user. Jens --=20 :: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS ::: :: ::::::::::::::: office Strasbourg : +33 368854536 :: :: :::::::::::::::::::::: gsm France : +33 651400183 :: :: ::::::::::::::: gsm international : +49 15737185122 :: :: http://icube-icps.unistra.fr/index.php/Jens_Gustedt :: --=-S7R4CmbhFrKQKi4RpDNO Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEABECAAYFAlVe2ygACgkQD9PoadrVN+LR3ACeLwg8bGkDPQFhN2pkXvPDl3Un WO4AoJf7/K+/7d3K4a5bHaa07gJnzceo =ZPJw -----END PGP SIGNATURE----- --=-S7R4CmbhFrKQKi4RpDNO--