From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/7346
Path: news.gmane.org!not-for-mail
From: Rich Felker <dalias@libc.org>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: Resuming work on new semaphore
Date: Sun, 5 Apr 2015 16:23:14 -0400
Message-ID: <20150405202314.GG6817@brightrain.aerifal.cx>
References: <20150402013006.GA1108@brightrain.aerifal.cx>
 <alpine.LNX.2.11.1504021036070.31632@monopod.intra.ispras.ru>
 <20150402152642.GW6817@brightrain.aerifal.cx>
 <alpine.LNX.2.11.1504030021400.8195@monopod.intra.ispras.ru>
 <20150402231457.GC6817@brightrain.aerifal.cx>
 <alpine.LNX.2.11.1504051613090.8195@monopod.intra.ispras.ru>
 <alpine.LNX.2.11.1504051712170.8195@monopod.intra.ispras.ru>
 <20150405190214.GF6817@brightrain.aerifal.cx>
 <alpine.LNX.2.11.1504052217310.8195@monopod.intra.ispras.ru>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: ger.gmane.org 1428265410 9701 80.91.229.3 (5 Apr 2015 20:23:30 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Sun, 5 Apr 2015 20:23:30 +0000 (UTC)
To: musl@lists.openwall.com
Original-X-From: musl-return-7359-gllmg-musl=m.gmane.org@lists.openwall.com Sun Apr 05 22:23:30 2015
Return-path: <musl-return-7359-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@m.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by plane.gmane.org with smtp (Exim 4.69)
	(envelope-from <musl-return-7359-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1Yer4q-0000Xc-So
	for gllmg-musl@m.gmane.org; Sun, 05 Apr 2015 22:23:28 +0200
Original-Received: (qmail 28187 invoked by uid 550); 5 Apr 2015 20:23:27 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
Original-Received: (qmail 28169 invoked from network); 5 Apr 2015 20:23:26 -0000
Content-Disposition: inline
In-Reply-To: <alpine.LNX.2.11.1504052217310.8195@monopod.intra.ispras.ru>
User-Agent: Mutt/1.5.21 (2010-09-15)
Original-Sender: Rich Felker <dalias@aerifal.cx>
Xref: news.gmane.org gmane.linux.lib.musl.general:7346
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/7346>

On Sun, Apr 05, 2015 at 11:03:34PM +0300, Alexander Monakov wrote:
> On Sun, 5 Apr 2015, Rich Felker wrote:
> > 1. Thread A enters sem_wait.
> > 2. Thread B observes thread A in sem_wait via failed sem_trywait.
> 
> Hm, I don't see how that can be achieved.  As a result I'm afraid I didn't
> fully understand your example.

Indeed I was wrong about that, so I agree the whole scenario may fall
apart. Only sem_getvalue could show this, and only if it returns -1
rather than 0. So returning negative values from sem_getvalue seems
like a very bad idea -- it puts difficult- or impossible-to-satisfy
additional constraints on the implementation.

> > > Well we can make sem_getvalue return val[0]+val[1] instead... ;)
> > 
> > That just makes the new implementation look like the old one, no? :-)
> 
> Can't be bad if it behaves the same but works a bit faster.
> Apropos, like I've said on IRC, looks like there's "semaphore uncertainty
> principle": that formal semaphore value is between val[0] and (val[0] +/-
> val[1]) (clamped to 0 as needed).  It seems you can either do your hack and
> pretend that there are never any waiters, or try to faithfully count waiters
> in sem_getvalue, but then also reveal that sometimes the implementation works
> by stealing a post.  I believe you could argue that the latter is explicitely
> disallowed by the spec.

Yes, I think I agree.

> By the way, I think there's an interesting interplay with cancellation.
> Consider the following.  Thread B does "return sem_wait(sem);". Thread A does:
> 
>   pthread_cancel(thread_B);
>   sem_post(sem);
>   sem_getvalue(sem);
> 
> If it observes semaphore value as 1 it follows that thread B has not become a
> waiter yet, and since it must have cancellation already pending, it may not
> consume the post.  And yet if thread B is already futex-waiting in sem_wait,
> consuming the post takes priority over acting on cancellation.  So if then
> thread A does
> 
>   pthread_join(thread_B);
>   sem_getvalue(sem);
> 
> and gets value of 0, it sees a contradiction.  And return value from
> pthread_join will indicate that thread_B exited normally rather than was
> cancelled.

So the contradiction you claim exists is that cancellation happened
before the post, and thus thread B can't act on the post when it
didn't act on cancellation? I don't think that follows from the rules
of cancellation. The relevant text is:

    "Whenever a thread has cancelability enabled and a cancellation
    request has been made with that thread as the target, and the
    thread then calls any function that is a cancellation point (such
    as pthread_testcancel() or read()), the cancellation request shall
    be acted upon before the function."

So if cancellation was pending _before_ the call to sem_wait, then
sem_wait has to honor it. But there is no requirement that entry to
the sem_wait function be "atomic" with becoming a waiter on the
semaphore, and of course this is impossible to satisfy or even
specify. So it's totally legal to have the sequence:

1. Thread B enters sem_wait.
2. Thread B observes that cancellation was not already pending.
3. Thread A sends cancellation request.
4. Thread A sends post.
5. Thread B receives both, and chooses to act on the post per this
    text:

    "It is unspecified whether the cancellation request is acted upon
    or whether the cancellation request remains pending and the thread
    resumes normal execution if:

    - The thread is suspended at a cancellation point and the event for
    which it is waiting occurs

    - A specified timeout expired

    before the cancellation request is acted upon."

Here, the event for which it was waiting (the post) clearly occurs.

> And on the contrary, if you make acting on cancellation/timeout take priority,
> you can observe semaphore value increasing when waiters leave the wait on
> error path without consuming the post.

Yes obviously that is not possible.

Rich