From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/8235 Path: news.gmane.org!not-for-mail From: Joakim Sindholt Newsgroups: gmane.linux.lib.musl.general Subject: Re: New optimized normal-type mutex? Date: Wed, 29 Jul 2015 14:09:27 +0200 Message-ID: <8c49d81e.dNq.dMV.21.hNiSfA@mailjet.com> References: <20150521234402.GA25373@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed X-Trace: ger.gmane.org 1438187389 27537 80.91.229.3 (29 Jul 2015 16:29:49 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 29 Jul 2015 16:29:49 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-8248-gllmg-musl=m.gmane.org@lists.openwall.com Wed Jul 29 18:29:48 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1ZKUEm-0006cL-Dw for gllmg-musl@m.gmane.org; Wed, 29 Jul 2015 18:29:48 +0200 Original-Received: (qmail 30247 invoked by uid 550); 29 Jul 2015 16:29:46 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 1862 invoked from network); 29 Jul 2015 12:10:24 -0000 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/simple; q=dns/txt; d=zhasha.com; i=opensource@zhasha.com; s=mailjet; h=domainkey-signature:message-id:mime-version:content-type:from:to:subject:date:in-reply-to:references:list-unsubscribe:x-csa-complaints; bh=M79csxvnFNHGj67s6l6YdhwnIas=; b=PDahPrvxUopX9XWfK4bA13Y5sSzHkGZqlenYce35JnICpVI5yd0ULd0Jy259jLxEOiyp2H9HOdIVaQzd2U+xxt0c8/JhYGKRxGIccBSYtgHQacYPHWf3yUH3d8T2mUUcjciB17xzdPtTOglnUaMynFfi3PYZlj4DRFSKLZIzWwY= Domainkey-Signature: a=rsa-sha1; c=simple; q=dns; d=zhasha.com; s=mailjet; h=message-id:mime-version:content-type:from:to:subject:date:in-reply-to:references:list-unsubscribe:x-csa-complaints; b=SBz27C6aAg9BIFmfWFvbOpISzoxgaGtGQ+DdR7vxY319okJkW8z2O4LlD04aqscxotbPvMXj/veGlusisONksmQV0hbSVNecWpWfbQw/99loHkLkybeWBsq0RlCauw/pG/Czf2mmIrmmzexS71vYooIoMzCFgfIuVEwfctar5C0= In-Reply-To: <20150521234402.GA25373@brightrain.aerifal.cx> X-CSA-Complaints: whitelist-complaints@eco.de Xref: news.gmane.org gmane.linux.lib.musl.general:8235 Archived-At: On Fri, May 22, 2015 at 1:44 AM, Rich Felker wrote: > I realized (thanks to conversations with wm4) that it's possible to > optimize normal-type mutexes a lot to eliminate spurious futex wakes > and reduce the number of atomic ops/barriers under contention. > > The concept is to store the full mutex state in a single int: > > bits 0-29: waiter count > bit 30: waiter flag > bit 31: lock bit > > Then unlock is: > > old = a_fetch_and(p, 0x3fffffff); // new op > if (old & 0x40000000) wake(p,1); > > Trylock is: > > old = *p; > if (old < 0) { > a_barrier(); > old = *p; > } > if (old >= 0 && a_cas(p, old, old|INT_MIN)) return 0; > > Lock is: > > while (spins--) { > if (!trylock()) return 0; > a_spin(); > } > while ((old = *p) < 0) { > new = old+1 | 0x40000000; > if (a_cas(p, old, new)==old) break; > } > for(;;) { > while ((old = *p) < 0) { > futex_wait(p, old); > } > new = old-1; > if (new) new |= 0x40000000; // set waiters flag if any other waiters > new |= INT_MIN; > if (a_cas(p, old, new)==old) return 0; > } I implemented this lock in my own library and I even did some preliminary race tests. I couldn't find fault with it. However amonakov did and made me aware of it on IRC. Thread A takes the lock Thread B tries to take the lock but ends up in futex_wait Thread A unlocks the lock Thread A futex_wakes Thread B Thread A steals the lock Thread B goes back into futex_wait Thread A unlocks the lock Thread B hangs Because: The first unlock clears the 0x40000000 flag and it does not get set again at any point. So he went on and suggested that a cas-less lock was possible with a_fetch_add however I can't make it work and I don't think he can either. His idea however is sound: the one who flips the sign bit takes the lock. Based on that I've cobbled together a different lock that will probably perform worse than this approach but none-the-less be correct as far as I can tell. The difference is that we consider the lock owner a waiter as well, thus requiring a cas loop in the unlock function to remove itself, so to speak, from the waiter count. a_fetch_and also turns into a cas loop so I consider this fairly minor. This makes the wait loop a little simpler while still maintaining a waiter count and still only using one int. void lock(volatile int *p) { int val, spins = 200; while (spins--) { if (trylock(p)) { return; } spin(); } while (1) { if ((val = a_load(p)) < 0) { if (a_cas(p, val, val - 1) == val) { break; } } else { if (a_cas(p, val, -val - 1) == val) { return; } } } while (1) { while ((val = a_load(p)) < 0) { futex_wait(p, val); } if (a_cas(p, val, -val) == val) { return; } } } int trylock(volatile int *p) { int val = a_load(p); if (val < 0) { a_barrier(); val = a_load(p); } return (val >= 0 && a_cas(p, val, -val - 1) == val); } void unlock(volatile int *p) { int new, val = a_load(p); while ((new = a_cas(p, val, -(val + 1))) != val) { val = new; } if (val + 1 != 0) { futex_wake(p, 1); } } -- Joakim