From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/7342 Path: news.gmane.org!not-for-mail From: Alexander Monakov Newsgroups: gmane.linux.lib.musl.general Subject: Re: Resuming work on new semaphore Date: Sun, 5 Apr 2015 17:07:11 +0300 (MSK) Message-ID: References: <20150402013006.GA1108@brightrain.aerifal.cx> <20150402152642.GW6817@brightrain.aerifal.cx> <20150402231457.GC6817@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Trace: ger.gmane.org 1428242853 30782 80.91.229.3 (5 Apr 2015 14:07:33 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 5 Apr 2015 14:07:33 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-7355-gllmg-musl=m.gmane.org@lists.openwall.com Sun Apr 05 16:07:30 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1YelCy-0008Sw-7a for gllmg-musl@m.gmane.org; Sun, 05 Apr 2015 16:07:28 +0200 Original-Received: (qmail 22506 invoked by uid 550); 5 Apr 2015 14:07:23 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 22482 invoked from network); 5 Apr 2015 14:07:23 -0000 In-Reply-To: <20150402231457.GC6817@brightrain.aerifal.cx> User-Agent: Alpine 2.11 (LNX 23 2013-08-11) Xref: news.gmane.org gmane.linux.lib.musl.general:7342 Archived-At: So, your solution results in a simpler execution path for successful trywait, but needs to undo semaphore adjustment to return EAGAIN (which I believe to be relatively common). Also, if a concurrent poster sees transient negative val[0], it will proceed to slow sem_post path (val[1] increment and futex wake), so if there's high activity on the semaphore, looks like this solution may slow down posters. My solution keeps trywait successful path as-is, and adds a test on val[1] in EAGAIN case with a conditional branch that is normally not executed. Semaphore's contents are not changed if returning EAGAIN. I was about to say that this leads to more preferable cache traffic on highly contended semaphore (all calls to trywait returning EAGAIN simultaneously can work on shared rather than exclusive cache lines), but unfortunately we need a read memory barrier on the semaphore and the solution to that was to perform CAS on val[0] unconditionally, which would cause each caller to make the cache line with the semaphore exclusive anyway, IIUC. Regarding the problem raised on IRC (that with 1 waiter, post-getvalue-trywait can result in 0-0 return values rather than 0-EAGAIN, while previously it could result in 1-EAGAIN rather that 1-0, with the explanation that the waiter did not "become a waiter" so that post had to wake it, but suspended execution and resumed only after semaphore value became positive, and now the surprising behavior needs a different explanation): I think in the absence of a mechanism to detect whether current waiters are still alive, that's the way it has to be. Either you pretend there are no waiters at any time, like today, or you count dead waiters same as live waiters and report 0 semaphore value to the caller, but then what do you do when caller invokes sem_wait and all other waiters are dead? Suspend the caller indefinitely, or let it proceed by consuming a pending post and thus revealing an inconsistency? Alexander