From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/7328 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: Resuming work on new semaphore Date: Thu, 2 Apr 2015 19:14:57 -0400 Message-ID: <20150402231457.GC6817@brightrain.aerifal.cx> References: <20150402013006.GA1108@brightrain.aerifal.cx> <20150402152642.GW6817@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1428016517 12201 80.91.229.3 (2 Apr 2015 23:15:17 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 2 Apr 2015 23:15:17 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-7341-gllmg-musl=m.gmane.org@lists.openwall.com Fri Apr 03 01:15:17 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1YdoKO-0001Zv-NR for gllmg-musl@m.gmane.org; Fri, 03 Apr 2015 01:15:12 +0200 Original-Received: (qmail 20424 invoked by uid 550); 2 Apr 2015 23:15:10 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 20391 invoked from network); 2 Apr 2015 23:15:09 -0000 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:7328 Archived-At: On Fri, Apr 03, 2015 at 12:39:10AM +0300, Alexander Monakov wrote: > On Thu, 2 Apr 2015, Rich Felker wrote: > > > Interesting. To examine the issue under a different light, consider that from > > > the perspective of semaphore implementation, waiters that were killed, > > > stopped, or pre-empted forever in the middle of sem_wait are > > > indistinguishable. > > > > Yes, I noticed this too. In that sense, theoretically there should be > > no harm (aside from eventual overflow of pending wake counter) from > > having asynchronously-killed waiters, assuming the implementation is > > bug-free in the absence of async killing of waiters. > > Did you mean "presence"? I'm having trouble understanding your phrase, > especially after "assuming ..."; can you elaborate or rephrase? I meant to say assuming that there aren't already any bugs, by your reasoning adding async killing of waiters cannot add bugs (except the overflow) since they're equivalent to a situation that arises without async killing. > That waiters can die breaks an assumption that operations on val[0] and val[1] > do not under/overflow due to their range exceeding the number of > simultaneously live tasks. Right. I'm ignoring that one. The current implementation likewise has that issue for the waiter count (but it could avoid it by saturating the waiter count at INT_MAX I suppose, or by throwing away the waiter count and just using a potential-waiters flag). > > > Thus, subsequent sem_wait succeeds by effectively stealing > > > a post, and to make things consistent you can teach sem_trywait to steal posts > > > too (i.e. try atomic-decrement-if-positive val[1] just before returning > > > EAGAIN, return 0 if that succeeds). > > > > Hmm, perhaps that is valid. I'll have to think about it again. I was > > thinking of having sem_trywait unconditionally down the value (val[0]) > > then immitate the exit path of sem_timedwait, but that's not valid > > because another waiter could race and prevent sem_trywait from ever > > being able to exit. But if it only does the down as a dec-if-positive > > then it seems like it can safely dec-if-positive the wake count before > > reporting failure. > > I think my proposition above needs at least the following correction: when > trywait succeeds in stealing a post by dec-if-positive(val[1]), it should also > decrement val[0] before returning. Yes, that seems right. > Are you sure your proposition is invalid? I don't think so. How is trywait > different from a timedwait with a timeout that immediately expires? That is > basically what your scheme should do. Indeed, I think you're right. Conceptually trywait and timedwait with zero timeout should be identical modulo error value and cancellation. Rich