From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/7328
Path: news.gmane.org!not-for-mail
From: Rich Felker <dalias@libc.org>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: Resuming work on new semaphore
Date: Thu, 2 Apr 2015 19:14:57 -0400
Message-ID: <20150402231457.GC6817@brightrain.aerifal.cx>
References: <20150402013006.GA1108@brightrain.aerifal.cx>
 <alpine.LNX.2.11.1504021036070.31632@monopod.intra.ispras.ru>
 <20150402152642.GW6817@brightrain.aerifal.cx>
 <alpine.LNX.2.11.1504030021400.8195@monopod.intra.ispras.ru>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: ger.gmane.org 1428016517 12201 80.91.229.3 (2 Apr 2015 23:15:17 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Thu, 2 Apr 2015 23:15:17 +0000 (UTC)
To: musl@lists.openwall.com
Original-X-From: musl-return-7341-gllmg-musl=m.gmane.org@lists.openwall.com Fri Apr 03 01:15:17 2015
Return-path: <musl-return-7341-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@m.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by plane.gmane.org with smtp (Exim 4.69)
	(envelope-from <musl-return-7341-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1YdoKO-0001Zv-NR
	for gllmg-musl@m.gmane.org; Fri, 03 Apr 2015 01:15:12 +0200
Original-Received: (qmail 20424 invoked by uid 550); 2 Apr 2015 23:15:10 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
Original-Received: (qmail 20391 invoked from network); 2 Apr 2015 23:15:09 -0000
Content-Disposition: inline
In-Reply-To: <alpine.LNX.2.11.1504030021400.8195@monopod.intra.ispras.ru>
User-Agent: Mutt/1.5.21 (2010-09-15)
Original-Sender: Rich Felker <dalias@aerifal.cx>
Xref: news.gmane.org gmane.linux.lib.musl.general:7328
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/7328>

On Fri, Apr 03, 2015 at 12:39:10AM +0300, Alexander Monakov wrote:
> On Thu, 2 Apr 2015, Rich Felker wrote:
> > > Interesting.  To examine the issue under a different light, consider that from
> > > the perspective of semaphore implementation, waiters that were killed,
> > > stopped, or pre-empted forever in the middle of sem_wait are
> > > indistinguishable.
> > 
> > Yes, I noticed this too. In that sense, theoretically there should be
> > no harm (aside from eventual overflow of pending wake counter) from
> > having asynchronously-killed waiters, assuming the implementation is
> > bug-free in the absence of async killing of waiters.
> 
> Did you mean "presence"?  I'm having trouble understanding your phrase,
> especially after "assuming ..."; can you elaborate or rephrase?

I meant to say assuming that there aren't already any bugs, by your
reasoning adding async killing of waiters cannot add bugs (except the
overflow) since they're equivalent to a situation that arises without
async killing.

> That waiters can die breaks an assumption that operations on val[0] and val[1]
> do not under/overflow due to their range exceeding the number of
> simultaneously live tasks.

Right. I'm ignoring that one. The current implementation likewise has
that issue for the waiter count (but it could avoid it by saturating
the waiter count at INT_MAX I suppose, or by throwing away the waiter
count and just using a potential-waiters flag).

> > > Thus, subsequent sem_wait succeeds by effectively stealing
> > > a post, and to make things consistent you can teach sem_trywait to steal posts
> > > too (i.e. try atomic-decrement-if-positive val[1] just before returning
> > > EAGAIN, return 0 if that succeeds).
> > 
> > Hmm, perhaps that is valid. I'll have to think about it again. I was
> > thinking of having sem_trywait unconditionally down the value (val[0])
> > then immitate the exit path of sem_timedwait, but that's not valid
> > because another waiter could race and prevent sem_trywait from ever
> > being able to exit. But if it only does the down as a dec-if-positive
> > then it seems like it can safely dec-if-positive the wake count before
> > reporting failure.
> 
> I think my proposition above needs at least the following correction: when
> trywait succeeds in stealing a post by dec-if-positive(val[1]), it should also
> decrement val[0] before returning.

Yes, that seems right.

> Are you sure your proposition is invalid?  I don't think so.  How is trywait
> different from a timedwait with a timeout that immediately expires?  That is
> basically what your scheme should do.

Indeed, I think you're right. Conceptually trywait and timedwait with
zero timeout should be identical modulo error value and cancellation.

Rich