My current understanding of cond var access restrictions

mailing list of musl libc
 help / color / mirror / code / Atom feed

* My current understanding of cond var access restrictions
@ 2014-08-13 21:23 Rich Felker
  2014-08-13 23:20 ` Jens Gustedt
  0 siblings, 1 reply; 14+ messages in thread
From: Rich Felker @ 2014-08-13 21:23 UTC (permalink / raw)
  To: musl

Since this has all gotten really confusing, I've written up a summary
of my current understanding of access- and destruction-related issues
with cond vars. Jens, I'd appreciate it if you could comment on
whether the below text seems to match your understanding. Hopefully
this will help us work out a design that solves the current flaws (of
which I suspect there are many).

1. When can a cv be destroyed?

After a return from pthread_cond_broadcast when the program has
ensured that there will be no new waiters or signalers.

After a return from pthread_cond_signal when there is only one waiter
and the program has ensured that there will be no new waiters or
signalers.

After a return from pthread_cond_[timed]wait when there are no other
waiters or sufficiently many wakes (possibly via a broadcast) have
been performed to unblock all waiters, and the program has ensured
that there will be no new waiters or signalers. In the case where the
program is relying on the wake being the result of a signal in order
to ensure that no further signals happen, it is MANDATORY that the
signal be performed while holding the mutex (or that some other form
of synchronization with the signaling thread be used). Otherwise there
is no way for the woken thread to establish that its wake was a result
of the signal operation and not a spurious wake just before the signal
operation (in which case the subsequent signal operation would be an
illegal use of the cv after destruction).

2. When can a shared cv be unmapped?

This is unclear pending the resolution of Austin Group issue #864:

http://austingroupbugs.net/view.php?id=864

For the time being I think we should operate with the assumption that
it's valid to unmap (without destroying) at any time that destroying
would be valid.

3. When can the mutex associated with a cv be destroyed?

After a return from pthread_cond_[timed]wait when there are no other
waiters, and the program has ensured that there will be no new waiters
or other attempts to use the mutex.

4. When can signal and broadcast safely use the mutex?

Not at all, unless it can block waiters from exiting the wait. Any
waiter could spontaneously exit the wait as a result of cancellation,
timeout, or a cv signal from another thread, and by the above, it may
be entitled to destroy the mutex.

5. When can [timed]wait safely access the cv?

Only before unlocking the mutex, unless the implementation
synchronizes with possible signaling threads, or with destruction (and
possibly unmapping). Otherwise, per the above, it's possible that a
signaling thread destroys the cv.

(The reason the mutex unlocking is significant is that becoming a
waiter is specified to be atomic with unlocking the mutex; before the
mutex is unlocked, the thread is formally "about to become a waiter"
rather than "already a waiter" and thus not subject to signals but
instead is a "future waiter" that would make it illegal to destroy
after signaling.)

Since it seems impossible to implement a cv without accessing the cv
object after the mutex unlock (even in the minimal implementation,
it's necessary to have at least one read access for the futex wait), I
think this means an internal synchronization mechanism is necessary.

6. When can signal/broadcast safely access the cv?

Since it's impossible for a woken waiter to determine whether it woke
spuriously or as a result of signal/broadcast, and it's impossible to
know whether the signal/broadcast was actually called yet without
further synchronization after signal/broadcast returns (either by
unlocking the mutex, if the signal was performed with the mutex held,
or via some other mechanism), signal/broadcast has unlimited access to
the cv object for the entire duration of the call.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: My current understanding of cond var access restrictions
  2014-08-13 21:23 My current understanding of cond var access restrictions Rich Felker
@ 2014-08-13 23:20 ` Jens Gustedt
  2014-08-14  2:19   ` Rich Felker
  2014-08-14  6:10   ` Rich Felker
  0 siblings, 2 replies; 14+ messages in thread
From: Jens Gustedt @ 2014-08-13 23:20 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 6070 bytes --]

Hi,

Am Mittwoch, den 13.08.2014, 17:23 -0400 schrieb Rich Felker:
> Since this has all gotten really confusing, I've written up a summary
> of my current understanding of access- and destruction-related issues
> with cond vars. Jens, I'd appreciate it if you could comment on
> whether the below text seems to match your understanding.

I think so.

But have in mind that all what you say, strongly relies on
implementing each of the control structures as one monolithic
object. As soon as you split conditions into two parts (one the user
space interface, one an internal synchronization structure that is
allocated separately) part of the trouble disappears.

I know that you don't like that "split" approach for the disadvantages
that dynamic allocation has. But I think that the standard(s) clearly
have such an implementation in mind when they speak of init functions
allocating the necessary resources for a control structure.

> Hopefully
> this will help us work out a design that solves the current flaws (of
> which I suspect there are many).

> 1. When can a cv be destroyed?
> 
> After a return from pthread_cond_broadcast when the program has
> ensured that there will be no new waiters or signalers.
> 
> After a return from pthread_cond_signal when there is only one waiter
> and the program has ensured that there will be no new waiters or
> signalers.
> 
> After a return from pthread_cond_[timed]wait when there are no other
> waiters or sufficiently many wakes (possibly via a broadcast) have
> been performed to unblock all waiters, and the program has ensured
> that there will be no new waiters or signalers. In the case where the
> program is relying on the wake being the result of a signal in order
> to ensure that no further signals happen, it is MANDATORY that the
> signal be performed while holding the mutex (or that some other form
> of synchronization with the signaling thread be used). Otherwise there
> is no way for the woken thread to establish that its wake was a result
> of the signal operation and not a spurious wake just before the signal
> operation (in which case the subsequent signal operation would be an
> illegal use of the cv after destruction).

I think this can be summarized in that the thread that does the
destruction must be able to establish a happens-before relation
between all pthread_cond_[timed]wait operations and the final wake
operation. In current POSIX this relation can only be established by
mutexes or similar synchronization mechanisms. Once C11 will be
integrated to POSIX, atomic operations will be added to that.

> 2. When can a shared cv be unmapped?
> 
> This is unclear pending the resolution of Austin Group issue #864:
> 
> http://austingroupbugs.net/view.php?id=864
> 
> For the time being I think we should operate with the assumption that
> it's valid to unmap (without destroying) at any time that destroying
> would be valid.

I don't have thought enough about the shared case, yet, since it
doesn't appear for C threads. But my first view is to agree with what
you write in #864.

> 3. When can the mutex associated with a cv be destroyed?
> 
> After a return from pthread_cond_[timed]wait when there are no other
> waiters, and the program has ensured that there will be no new waiters
> or other attempts to use the mutex.

ok

> 4. When can signal and broadcast safely use the mutex?
> 
> Not at all, unless it can block waiters from exiting the wait. Any
> waiter could spontaneously exit the wait as a result of cancellation,
> timeout, or a cv signal from another thread, and by the above, it may
> be entitled to destroy the mutex.

Are you suggesting that all waiters when coming back should first
regain an internal lock on the cv?

> 5. When can [timed]wait safely access the cv?
> 
> Only before unlocking the mutex, unless the implementation
> synchronizes with possible signaling threads, or with destruction (and
> possibly unmapping). Otherwise, per the above, it's possible that a
> signaling thread destroys the cv.

so again this suggests an internal lock on the cv that would be used
to synchronize between waiters and wakers?

> (The reason the mutex unlocking is significant is that becoming a
> waiter is specified to be atomic with unlocking the mutex; before the
> mutex is unlocked, the thread is formally "about to become a waiter"
> rather than "already a waiter" and thus not subject to signals but
> instead is a "future waiter" that would make it illegal to destroy
> after signaling.)

an important distinction, I agree

> Since it seems impossible to implement a cv without accessing the cv
> object after the mutex unlock (even in the minimal implementation,

This supposing that the mutex unlock operation and the wait operation
are done with two different system calls.

> it's necessary to have at least one read access for the futex wait), I
> think this means an internal synchronization mechanism is necessary.

If by internal synchronization you mean that the transition from the
unlock to the wait only has to appear to be atomic (the ``as-if''
rule), yes.

> 6. When can signal/broadcast safely access the cv?
> 
> Since it's impossible for a woken waiter to determine whether it woke
> spuriously or as a result of signal/broadcast, and it's impossible to
> know whether the signal/broadcast was actually called yet without
> further synchronization after signal/broadcast returns (either by
> unlocking the mutex, if the signal was performed with the mutex held,
> or via some other mechanism), signal/broadcast has unlimited access to
> the cv object for the entire duration of the call.

yes

Jens

-- 
:: INRIA Nancy Grand Est ::: AlGorille ::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: My current understanding of cond var access restrictions
  2014-08-13 23:20 ` Jens Gustedt
@ 2014-08-14  2:19   ` Rich Felker
  2014-08-14  7:41     ` Jens Gustedt
  2014-08-14  6:10   ` Rich Felker
  1 sibling, 1 reply; 14+ messages in thread
From: Rich Felker @ 2014-08-14  2:19 UTC (permalink / raw)
  To: musl

On Thu, Aug 14, 2014 at 01:20:25AM +0200, Jens Gustedt wrote:
> Hi,
> 
> Am Mittwoch, den 13.08.2014, 17:23 -0400 schrieb Rich Felker:
> > Since this has all gotten really confusing, I've written up a summary
> > of my current understanding of access- and destruction-related issues
> > with cond vars. Jens, I'd appreciate it if you could comment on
> > whether the below text seems to match your understanding.
> 
> I think so.
> 
> But have in mind that all what you say, strongly relies on
> implementing each of the control structures as one monolithic
> object. As soon as you split conditions into two parts (one the user
> space interface, one an internal synchronization structure that is
> allocated separately) part of the trouble disappears.

The basic idea of your design is reference-counting the object.
There's no need for dynamic allocation to make this work. Simply
incrementing/decrementing a reference counter in the main object, and
waiting for it to reach zero in the destroy function, achieves the
same thing -- but it requires either spinlocks or a FUTEX_WAKE_OP for
the final decrement-to-zero if there is a destroyer waiting.

Of course this doesn't work for the unmapping issue with
process-shared objects, but allocation doesn't work there either.

> I know that you don't like that "split" approach for the disadvantages
> that dynamic allocation has. But I think that the standard(s) clearly
> have such an implementation in mind when they speak of init functions
> allocating the necessary resources for a control structure.

C, yes. POSIX, yes, but only for non-process-shared objects unless by
allocation you mean to include kernelspace allocation of
synchronization objects via a kernel API specifically designed for
this (which Linux lacks).

> > 1. When can a cv be destroyed?
> > [...]
> 
> I think this can be summarized in that the thread that does the
> destruction must be able to establish a happens-before relation
> between all pthread_cond_[timed]wait operations and the final wake
> operation. In current POSIX this relation can only be established by
> mutexes or similar synchronization mechanisms. Once C11 will be
> integrated to POSIX, atomic operations will be added to that.

Yes.

> > 4. When can signal and broadcast safely use the mutex?
> > 
> > Not at all, unless it can block waiters from exiting the wait. Any
> > waiter could spontaneously exit the wait as a result of cancellation,
> > timeout, or a cv signal from another thread, and by the above, it may
> > be entitled to destroy the mutex.
> 
> Are you suggesting that all waiters when coming back should first
> regain an internal lock on the cv?

I'm not sure. To some extent, rather, it's making me doubt that
requeue is practical, but I'm still hopeful.

> > 5. When can [timed]wait safely access the cv?
> > 
> > Only before unlocking the mutex, unless the implementation
> > synchronizes with possible signaling threads, or with destruction (and
> > possibly unmapping). Otherwise, per the above, it's possible that a
> > signaling thread destroys the cv.
> 
> so again this suggests an internal lock on the cv that would be used
> to synchronize between waiters and wakers?

Yes, I think so.

> > Since it seems impossible to implement a cv without accessing the cv
> > object after the mutex unlock (even in the minimal implementation,
> 
> This supposing that the mutex unlock operation and the wait operation
> are done with two different system calls.

Yes. Linux has no futex operation which both writes and waits except
possibly priority-inheritance locking, which is definitely not an
appropriate primitive here.

> > it's necessary to have at least one read access for the futex wait), I
> > think this means an internal synchronization mechanism is necessary.
> 
> If by internal synchronization you mean that the transition from the
> unlock to the wait only has to appear to be atomic (the ``as-if''
> rule), yes.

I just meant that it seems necessary to do something to preserve the
right to access the cv object by excluding other operations which
could lead to it being destroyed. Possibly this is just a simplified
version of what I have now (blocking destroy), but I don't see how
that can solve the unmapping problem for process-shared cv's (if it
needs to be supported). Perhaps a better solution would be to have
signal/broadcast wait for any waiters that weren't woken/requeued
(since woken/requeued ones are potentially past the point of needing
to access the cv object) to reach the point just before relocking the
mutex. This sounds similar to what you were proposing.

I'll elaborate more in a second follow-up email.

Rich

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: My current understanding of cond var access restrictions
  2014-08-13 23:20 ` Jens Gustedt
  2014-08-14  2:19   ` Rich Felker
@ 2014-08-14  6:10   ` Rich Felker
  2014-08-14  8:00     ` Jens Gustedt
  1 sibling, 1 reply; 14+ messages in thread
From: Rich Felker @ 2014-08-14  6:10 UTC (permalink / raw)
  To: musl

On Thu, Aug 14, 2014 at 01:20:25AM +0200, Jens Gustedt wrote:
> > 4. When can signal and broadcast safely use the mutex?
> > 
> > Not at all, unless it can block waiters from exiting the wait. Any
> > waiter could spontaneously exit the wait as a result of cancellation,
> > timeout, or a cv signal from another thread, and by the above, it may
> > be entitled to destroy the mutex.
> 
> Are you suggesting that all waiters when coming back should first
> regain an internal lock on the cv?

I think I have an informal proof sketch that this is necessary unless
we abandon requeue:

If we want to be able to use the mutex in broadcast, which is needed
for requeue, then broadcast needs a lock that can block at least one
waiter from returning, and needs to confirm that at least one waiter
remains after the lock is obtained (otherwise it's easy -- there's no
work to do), so that the mutex is valid.

(Note: Broadcast can immediately release this lock if it determines
that the calling thread holds the mutex, since in this case, the mutex
will be sufficient to prevent any waiter from returning. But in
general it needs to hold the lock until requeue is performed.)

In order for this lock to block waiters from returning, any waiter
that woke possibly not under the control of broadcast/signal (i.e.
futex wait not returning 0) has to obtain the lock. (For safety
against application use of futexes that generates spurious wakes, it
might be best to just ignore the return value and always attempt to
get the lock.) This probably means it has to access the cv object
(unless it uses an object at another location whose address was
obtained before waiting), which in turn means that we have to track
references so that destroy can wait for all references to be released
before returning.

So I think we're stuck with something like the current implementation,
or abandoning requeue and just doing private cond vars the same as
process-shared ones. This is actually somewhat reassuring -- it means
I wasn't completely insane when I came up with the current
implementation a couple years back. Or at least, if I was, the
insane line of reasoning is at least reproducible. :-)

With that in mind, I'd like to look for ways we can fix the bogus
waiter accounting for the mutex that seems to be the source of the bug
you found. One "obvious" (but maybe bad/wrong?) solution would be to
put the count on the mutex at the time of waiting (rather than moving
it there as part of broadcast), so that decrementing the mutex waiter
count is always the right thing to do in unwait. Of course this
possibly results in lots of spurious futex wakes to the mutex (every
time it's unlocked while there are waiters on the cv, which could be a
lot). It would be nice if we had a separate field in the mutex (rather
than in the cv, as it is now) to store these on, and only move them to
the active waiters count at broadcast time, but I don't see any way to
get additional space in the mutex structure for this -- it's full.

> > 5. When can [timed]wait safely access the cv?
> > 
> > Only before unlocking the mutex, unless the implementation
> > synchronizes with possible signaling threads, or with destruction (and
> > possibly unmapping). Otherwise, per the above, it's possible that a
> > signaling thread destroys the cv.
> 
> so again this suggests an internal lock on the cv that would be used
> to synchronize between waiters and wakers?

This argument applies even to process-shared cv's, and for them, no
allocation is possible, and I don't see a really good way to solve the
unmapping issue -- I think broadcast/signal would have to block
unmapping, and the last waiter to wake up would have to unblock it.
Maybe that's the right solution?

Rich

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: My current understanding of cond var access restrictions
  2014-08-14  2:19   ` Rich Felker
@ 2014-08-14  7:41     ` Jens Gustedt
  0 siblings, 0 replies; 14+ messages in thread
From: Jens Gustedt @ 2014-08-14  7:41 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 2426 bytes --]

Am Mittwoch, den 13.08.2014, 22:19 -0400 schrieb Rich Felker:
> On Thu, Aug 14, 2014 at 01:20:25AM +0200, Jens Gustedt wrote:
> > But have in mind that all what you say, strongly relies on
> > implementing each of the control structures as one monolithic
> > object. As soon as you split conditions into two parts (one the user
> > space interface, one an internal synchronization structure that is
> > allocated separately) part of the trouble disappears.
> 
> The basic idea of your design is reference-counting the object.
> There's no need for dynamic allocation to make this work.

I am not sure. At least for my current design for C threads, the
auxiliary object is unlinked as soon as there is a potential move-over
to use a different mutex. That is as soon that the number of waiters
provably drops to 0. Then the aux object is unlinked and an eventual
new waiter allocates a new one. By that the processing is clearly in
batches, all threads involved in one batch use the same mutex and
these batches don't step on each others feet.

> Simply
> incrementing/decrementing a reference counter in the main object, and
> waiting for it to reach zero in the destroy function, achieves the
> same thing -- but it requires either spinlocks or a FUTEX_WAKE_OP for
> the final decrement-to-zero if there is a destroyer waiting.

Yes, if you don't like spinlocks you'd have to use a futex wait-wake
scheme.

But in the "aux" model, the futex wake can act on an object in the
aux. Only those waiters and wakers from the same batch will ever touch
that aux. So there is no problem to change that to a mutex-like lock
scheme.

> Of course this doesn't work for the unmapping issue with
> process-shared objects, but allocation doesn't work there either.

yes

> > This supposing that the mutex unlock operation and the wait operation
> > are done with two different system calls.
> 
> Yes. Linux has no futex operation which both writes and waits except
> possibly priority-inheritance locking, which is definitely not an
> appropriate primitive here.

yes, too bad


Jens


-- 
:: INRIA Nancy Grand Est ::: AlGorille ::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: My current understanding of cond var access restrictions
  2014-08-14  6:10   ` Rich Felker
@ 2014-08-14  8:00     ` Jens Gustedt
  2014-08-14 14:41       ` Rich Felker
  0 siblings, 1 reply; 14+ messages in thread
From: Jens Gustedt @ 2014-08-14  8:00 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 2700 bytes --]

Am Donnerstag, den 14.08.2014, 02:10 -0400 schrieb Rich Felker:
> I think I have an informal proof sketch that this is necessary unless
> we abandon requeue:

> ...

> With that in mind, I'd like to look for ways we can fix the bogus
> waiter accounting for the mutex that seems to be the source of the bug
> you found. One "obvious" (but maybe bad/wrong?) solution would be to
> put the count on the mutex at the time of waiting (rather than moving
> it there as part of broadcast), so that decrementing the mutex waiter
> count is always the right thing to do in unwait.

sounds like a good idea, at least for correctness

> Of course this
> possibly results in lots of spurious futex wakes to the mutex (every
> time it's unlocked while there are waiters on the cv, which could be a
> lot).

I we'd be more careful in not spreading too much wakes where we
shouldn't, there would perhaps not be "a lot" of such wakeups.

> It would be nice if we had a separate field in the mutex (rather
> than in the cv, as it is now) to store these on, and only move them to
> the active waiters count at broadcast time, but I don't see any way to
> get additional space in the mutex structure for this -- it's full.

I thought of such designs, too, but one major problem (besides the
space) with it is that a mutex can be used by several cv at a time.

> > > 5. When can [timed]wait safely access the cv?
> > > 
> > > Only before unlocking the mutex, unless the implementation
> > > synchronizes with possible signaling threads, or with destruction (and
> > > possibly unmapping). Otherwise, per the above, it's possible that a
> > > signaling thread destroys the cv.
> > 
> > so again this suggests an internal lock on the cv that would be used
> > to synchronize between waiters and wakers?
> 
> This argument applies even to process-shared cv's, and for them, no
> allocation is possible,

at least difficult, for sure

this would need support to allocate some object in the kernel and to
use that object shared between processes :(

> and I don't see a really good way to solve the
> unmapping issue -- I think broadcast/signal would have to block
> unmapping, and the last waiter to wake up would have to unblock it.
> Maybe that's the right solution?

probably, as I said, I don't have the right feeling for the mapping
issues, yet.

Jens


-- 
:: INRIA Nancy Grand Est ::: AlGorille ::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: My current understanding of cond var access restrictions
  2014-08-14  8:00     ` Jens Gustedt
@ 2014-08-14 14:41       ` Rich Felker
  2014-08-14 15:36         ` Rich Felker
  2014-08-14 16:27         ` Jens Gustedt
  0 siblings, 2 replies; 14+ messages in thread
From: Rich Felker @ 2014-08-14 14:41 UTC (permalink / raw)
  To: musl

On Thu, Aug 14, 2014 at 10:00:04AM +0200, Jens Gustedt wrote:
> Am Donnerstag, den 14.08.2014, 02:10 -0400 schrieb Rich Felker:
> > I think I have an informal proof sketch that this is necessary unless
> > we abandon requeue:
> 
> > ...
> 
> > With that in mind, I'd like to look for ways we can fix the bogus
> > waiter accounting for the mutex that seems to be the source of the bug
> > you found. One "obvious" (but maybe bad/wrong?) solution would be to
> > put the count on the mutex at the time of waiting (rather than moving
> > it there as part of broadcast), so that decrementing the mutex waiter
> > count is always the right thing to do in unwait.
> 
> sounds like a good idea, at least for correctness
> 
> > Of course this
> > possibly results in lots of spurious futex wakes to the mutex (every
> > time it's unlocked while there are waiters on the cv, which could be a
> > lot).
> 
> I we'd be more careful in not spreading too much wakes where we
> shouldn't, there would perhaps not be "a lot" of such wakeups.

Well this is different from the wake-after-release that you dislike.
It's a wake on a necessarily-valid object that just doesn't have any
actual waiters right now because its potential-waiters are still
waiting on the cv.

However I think it may be costly (one syscall per unlock) in
applications where mutex is used to protect state that's frequently
modified but where the predicate associated with the cv only rarely
changes (and thus signaling is rare and cv waiters wait around a long
time). In what's arguably the common case (a reasonable number of
waiters as opposed to thousands of waiters on a 4-core box) just
waking all waiters on broadcast would be a lot less expensive.

Thus I'm skeptical of trying an approach like this when it would be
easier, and likely less costly on the common usage cases, just to
remove requeue and always use broadcast wakes. I modified your test
case for the bug to use a process-shared cv (using broadcast wake),
and as expected, the test runs with no failure.

> > It would be nice if we had a separate field in the mutex (rather
> > than in the cv, as it is now) to store these on, and only move them to
> > the active waiters count at broadcast time, but I don't see any way to
> > get additional space in the mutex structure for this -- it's full.
> 
> I thought of such designs, too, but one major problem (besides the
> space) with it is that a mutex can be used by several cv at a time.

Yes. It would improve the situation versus the above, but not
eliminate it, since in the case where a mutex is used with multiple
cv's, a broadcast on one of the cv's would move the entire wait count
to the active wait count.

> > > > 5. When can [timed]wait safely access the cv?
> > > > 
> > > > Only before unlocking the mutex, unless the implementation
> > > > synchronizes with possible signaling threads, or with destruction (and
> > > > possibly unmapping). Otherwise, per the above, it's possible that a
> > > > signaling thread destroys the cv.
> > > 
> > > so again this suggests an internal lock on the cv that would be used
> > > to synchronize between waiters and wakers?
> > 
> > This argument applies even to process-shared cv's, and for them, no
> > allocation is possible,
> 
> at least difficult, for sure
> 
> this would need support to allocate some object in the kernel and to
> use that object shared between processes :(

And as I've mentioned before, this is presently not possible due to
security considerations: there's no way to make an object for which
the set of processes which can access it exactly matches the set which
can access the shared memory the cv lies in. This could be done with a
new futex command which returned the object using the memory address
as a key, but it would be heavy and ugly and not compatible with any
existing systems (so not appropriate for a mandatory feature).

Rich

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: My current understanding of cond var access restrictions
  2014-08-14 14:41       ` Rich Felker
@ 2014-08-14 15:36         ` Rich Felker
  2014-08-14 16:27         ` Jens Gustedt
  1 sibling, 0 replies; 14+ messages in thread
From: Rich Felker @ 2014-08-14 15:36 UTC (permalink / raw)
  To: musl

On Thu, Aug 14, 2014 at 10:41:10AM -0400, Rich Felker wrote:
> On Thu, Aug 14, 2014 at 10:00:04AM +0200, Jens Gustedt wrote:
> > Am Donnerstag, den 14.08.2014, 02:10 -0400 schrieb Rich Felker:
> > > I think I have an informal proof sketch that this is necessary unless
> > > we abandon requeue:
> > 
> > > ...
> > 
> > > With that in mind, I'd like to look for ways we can fix the bogus
> > > waiter accounting for the mutex that seems to be the source of the bug
> > > you found. One "obvious" (but maybe bad/wrong?) solution would be to
> > > put the count on the mutex at the time of waiting (rather than moving
> > > it there as part of broadcast), so that decrementing the mutex waiter
> > > count is always the right thing to do in unwait.
> > 
> > sounds like a good idea, at least for correctness
> > 
> > > Of course this
> > > possibly results in lots of spurious futex wakes to the mutex (every
> > > time it's unlocked while there are waiters on the cv, which could be a
> > > lot).
> > 
> > I we'd be more careful in not spreading too much wakes where we
> > shouldn't, there would perhaps not be "a lot" of such wakeups.
> 
> Well this is different from the wake-after-release that you dislike.
> It's a wake on a necessarily-valid object that just doesn't have any
> actual waiters right now because its potential-waiters are still
> waiting on the cv.
> 
> However I think it may be costly (one syscall per unlock) in
> applications where mutex is used to protect state that's frequently
> modified but where the predicate associated with the cv only rarely
> changes (and thus signaling is rare and cv waiters wait around a long
> time). In what's arguably the common case (a reasonable number of
> waiters as opposed to thousands of waiters on a 4-core box) just
> waking all waiters on broadcast would be a lot less expensive.
> 
> Thus I'm skeptical of trying an approach like this when it would be
> easier, and likely less costly on the common usage cases, just to
> remove requeue and always use broadcast wakes. I modified your test
> case for the bug to use a process-shared cv (using broadcast wake),
> and as expected, the test runs with no failure.

A really ugly hack that might solve the problem: adaptively switching
to a less efficient mode the first time a different mutex is used. It
could either switch to pre-moving wait counts to the mutex, or revert
to broadcast wakes.

Rich


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: My current understanding of cond var access restrictions
  2014-08-14 14:41       ` Rich Felker
  2014-08-14 15:36         ` Rich Felker
@ 2014-08-14 16:27         ` Jens Gustedt
  2014-08-14 16:58           ` Rich Felker
  1 sibling, 1 reply; 14+ messages in thread
From: Jens Gustedt @ 2014-08-14 16:27 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1009 bytes --]

Am Donnerstag, den 14.08.2014, 10:41 -0400 schrieb Rich Felker:
> Thus I'm skeptical of trying an approach like this when it would be
> easier, and likely less costly on the common usage cases, just to
> remove requeue and always use broadcast wakes. I modified your test
> case for the bug to use a process-shared cv (using broadcast wake),
> and as expected, the test runs with no failure.

You shouldn't draw much conclusion from the fact that it works in that
case. This still heavily interacts with the waiters count and thus a
signaling thread that comes after such a broadcast might wake up a
thread that it shouldn't.

(But I didn't do a full analysis of that situation.)

Jens


-- 
:: INRIA Nancy Grand Est ::: AlGorille ::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: My current understanding of cond var access restrictions
  2014-08-14 16:27         ` Jens Gustedt
@ 2014-08-14 16:58           ` Rich Felker
  2014-08-14 18:12             ` Jens Gustedt
  0 siblings, 1 reply; 14+ messages in thread
From: Rich Felker @ 2014-08-14 16:58 UTC (permalink / raw)
  To: musl

On Thu, Aug 14, 2014 at 06:27:21PM +0200, Jens Gustedt wrote:
> Am Donnerstag, den 14.08.2014, 10:41 -0400 schrieb Rich Felker:
> > Thus I'm skeptical of trying an approach like this when it would be
> > easier, and likely less costly on the common usage cases, just to
> > remove requeue and always use broadcast wakes. I modified your test
> > case for the bug to use a process-shared cv (using broadcast wake),
> > and as expected, the test runs with no failure.
> 
> You shouldn't draw much conclusion from the fact that it works in that
> case. This still heavily interacts with the waiters count and thus a
> signaling thread that comes after such a broadcast might wake up a
> thread that it shouldn't.
> 
> (But I didn't do a full analysis of that situation.)

In the process-shared case, broadcast just increments the sequence
number and wakes all futex waiters. It's very simple.

Formally, there's no such thing as waking up a thread you shouldn't,
since spurious wakes are always allowed. The current implementation
has a lot of potential for spurious wakes but they don't happen except
in certain situations:

- If a futex wait gets interrupted by a signal, the wait will always
  terminate after the signal handler returns if any intervening
  signals or broadcasts happened (except in the case of a full
  wraparound of the sequence number, i.e. exactly 2<<32 cv signals
  while stuck in a signal handler, which I don't know how to fix, but
  it would be easy to write a test for this) even if the signal was
  already received by another waiter.

- If the sequence number gets incremented by a signal before the
  initial futex wait, the waiter will return immediately; this can
  happen to multiple waiters even for just one signal.

Really sequence numbers are the wrong tool here, but they were
introduced because the previous approach (having each waiter write its
own tid, and futex wait comparing that tid) lead to pathologically bad
performance under heavy waiter arrival where waiters were constantly
returning because another waiter was almost always able to write its
tid before the first one could block on the futex. I'd like to have a
better solution, but I can't think of any.

Rich

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: My current understanding of cond var access restrictions
  2014-08-14 16:58           ` Rich Felker
@ 2014-08-14 18:12             ` Jens Gustedt
  2014-08-14 18:23               ` Rich Felker
  0 siblings, 1 reply; 14+ messages in thread
From: Jens Gustedt @ 2014-08-14 18:12 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 3520 bytes --]

Am Donnerstag, den 14.08.2014, 12:58 -0400 schrieb Rich Felker:
> On Thu, Aug 14, 2014 at 06:27:21PM +0200, Jens Gustedt wrote:
> > Am Donnerstag, den 14.08.2014, 10:41 -0400 schrieb Rich Felker:
> > > Thus I'm skeptical of trying an approach like this when it would be
> > > easier, and likely less costly on the common usage cases, just to
> > > remove requeue and always use broadcast wakes. I modified your test
> > > case for the bug to use a process-shared cv (using broadcast wake),
> > > and as expected, the test runs with no failure.
> > 
> > You shouldn't draw much conclusion from the fact that it works in that
> > case. This still heavily interacts with the waiters count and thus a
> > signaling thread that comes after such a broadcast might wake up a
> > thread that it shouldn't.
> > 
> > (But I didn't do a full analysis of that situation.)
> 
> In the process-shared case, broadcast just increments the sequence
> number and wakes all futex waiters. It's very simple.
> 
> Formally, there's no such thing as waking up a thread you shouldn't,
> since spurious wakes are always allowed.

I meant an internal wake, not one that returns to user space, and that
might wake up a thread that came into waiting after the corresponding
signaling thread entered his call. But sequence numbers here probably
are sufficient to ensure that at that point this is already a thread
that has the right to wakeup.

I am just getting sort of paranoid on that stuff :)

> The current implementation
> has a lot of potential for spurious wakes but they don't happen except
> in certain situations:
> 
> - If a futex wait gets interrupted by a signal, the wait will always
>   terminate after the signal handler returns if any intervening
>   signals or broadcasts happened (except in the case of a full
>   wraparound of the sequence number, i.e. exactly 2<<32 cv signals
>   while stuck in a signal handler, which I don't know how to fix, but
>   it would be easy to write a test for this) even if the signal was
>   already received by another waiter.
> 
> - If the sequence number gets incremented by a signal before the
>   initial futex wait, the waiter will return immediately; this can
>   happen to multiple waiters even for just one signal.
> 
> Really sequence numbers are the wrong tool here, but they were
> introduced because the previous approach (having each waiter write its
> own tid, and futex wait comparing that tid) lead to pathologically bad
> performance under heavy waiter arrival where waiters were constantly
> returning because another waiter was almost always able to write its
> tid before the first one could block on the futex. I'd like to have a
> better solution, but I can't think of any.

I don't think they are too bad, actually. They help to distinguish two
phases for a waiting thread. In the first, he has released the mutex
and no signal or broadcast has been issued. A thread should never
attempt to relock the mutex and/or return to user space during that
phase.

And then the second phase after such a signal or broadcast, where any
wakeup could be legitimate and in the worst case just be spurious.

Jens


-- 
:: INRIA Nancy Grand Est ::: AlGorille ::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: My current understanding of cond var access restrictions
  2014-08-14 18:12             ` Jens Gustedt
@ 2014-08-14 18:23               ` Rich Felker
  2014-08-14 20:47                 ` Jens Gustedt
  0 siblings, 1 reply; 14+ messages in thread
From: Rich Felker @ 2014-08-14 18:23 UTC (permalink / raw)
  To: musl

On Thu, Aug 14, 2014 at 08:12:46PM +0200, Jens Gustedt wrote:
> Am Donnerstag, den 14.08.2014, 12:58 -0400 schrieb Rich Felker:
> > On Thu, Aug 14, 2014 at 06:27:21PM +0200, Jens Gustedt wrote:
> > > Am Donnerstag, den 14.08.2014, 10:41 -0400 schrieb Rich Felker:
> > > > Thus I'm skeptical of trying an approach like this when it would be
> > > > easier, and likely less costly on the common usage cases, just to
> > > > remove requeue and always use broadcast wakes. I modified your test
> > > > case for the bug to use a process-shared cv (using broadcast wake),
> > > > and as expected, the test runs with no failure.
> > > 
> > > You shouldn't draw much conclusion from the fact that it works in that
> > > case. This still heavily interacts with the waiters count and thus a
> > > signaling thread that comes after such a broadcast might wake up a
> > > thread that it shouldn't.
> > > 
> > > (But I didn't do a full analysis of that situation.)
> > 
> > In the process-shared case, broadcast just increments the sequence
> > number and wakes all futex waiters. It's very simple.
> > 
> > Formally, there's no such thing as waking up a thread you shouldn't,
> > since spurious wakes are always allowed.
> 
> I meant an internal wake, not one that returns to user space, and that
> might wake up a thread that came into waiting after the corresponding
> signaling thread entered his call. But sequence numbers here probably
> are sufficient to ensure that at that point this is already a thread
> that has the right to wakeup.
> 
> I am just getting sort of paranoid on that stuff :)

Indeed, wrongly waking a waiter that arrives _after_ the signal is a
potential serious bug; glibc actually has this bug and it's remained
unfixed for years as far as I know.

However, without holding the mutex while signaling, or perhaps other
equivalent synchronization, threre is no "happens-before" relationship
between new waiters and signal, so it can't matter. And if the mutex
is held, or other synchronization happens, a new waiter cannot arrive
during the signal operation.

I'm pretty confident musl is fully correct here, since I've both
analyzed the issue before and run the glibc test case that reproduces
the error against musl, with no failure on musl.

> > The current implementation
> > has a lot of potential for spurious wakes but they don't happen except
> > in certain situations:
> > 
> > - If a futex wait gets interrupted by a signal, the wait will always
> >   terminate after the signal handler returns if any intervening
> >   signals or broadcasts happened (except in the case of a full
> >   wraparound of the sequence number, i.e. exactly 2<<32 cv signals
> >   while stuck in a signal handler, which I don't know how to fix, but
> >   it would be easy to write a test for this) even if the signal was
> >   already received by another waiter.
> > 
> > - If the sequence number gets incremented by a signal before the
> >   initial futex wait, the waiter will return immediately; this can
> >   happen to multiple waiters even for just one signal.
> > 
> > Really sequence numbers are the wrong tool here, but they were
> > introduced because the previous approach (having each waiter write its
> > own tid, and futex wait comparing that tid) lead to pathologically bad
> > performance under heavy waiter arrival where waiters were constantly
> > returning because another waiter was almost always able to write its
> > tid before the first one could block on the futex. I'd like to have a
> > better solution, but I can't think of any.
> 
> I don't think they are too bad, actually. They help to distinguish two
> phases for a waiting thread. In the first, he has released the mutex
> and no signal or broadcast has been issued. A thread should never
> attempt to relock the mutex and/or return to user space during that
> phase.
> 
> And then the second phase after such a signal or broadcast, where any
> wakeup could be legitimate and in the worst case just be spurious.

Yes. Really the only reason I dislike sequence numbers is the
theoretical possibility of wrapping after 2<<32 signals. This would
require extreme scheduling delays to realize without a signal handler
intentionally preventing the waiter from making forward progress, so
it's unlikely to impact anything, but it still seems wrong.

Rich


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: My current understanding of cond var access restrictions
  2014-08-14 18:23               ` Rich Felker
@ 2014-08-14 20:47                 ` Jens Gustedt
  2014-08-14 22:22                   ` Rich Felker
  0 siblings, 1 reply; 14+ messages in thread
From: Jens Gustedt @ 2014-08-14 20:47 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 2079 bytes --]

Am Donnerstag, den 14.08.2014, 14:23 -0400 schrieb Rich Felker:
> On Thu, Aug 14, 2014 at 08:12:46PM +0200, Jens Gustedt wrote:
> > I don't think they are too bad, actually. They help to distinguish two
> > phases for a waiting thread. In the first, he has released the mutex
> > and no signal or broadcast has been issued. A thread should never
> > attempt to relock the mutex and/or return to user space during that
> > phase.
> > 
> > And then the second phase after such a signal or broadcast, where any
> > wakeup could be legitimate and in the worst case just be spurious.
> 
> Yes. Really the only reason I dislike sequence numbers is the
> theoretical possibility of wrapping after 2<<32 signals. This would
> require extreme scheduling delays to realize without a signal handler
> intentionally preventing the waiter from making forward progress, so
> it's unlikely to impact anything, but it still seems wrong.

Wrapping alone doesn't matter, waiters don't have just to do an
equality test on the sequence counter. So with the traditional int
(and supposing that there is no UB because of the overflow) even
negative values may occur with no harm. The thing that would do harm
would be the waiter that is woken up *exactly* 2<<32 sequence points
later and concludes that the world hasn't changed. Really hard to
generate a test program for that bug :)

But why not give it a 64 bit integer then? It seems to me that
__u.i[9] is unused, so with a shift of the whole crowd _c_seq could
get two slots.

Ah, and for the records for the discussion on mutex we had before,
pthread_mutex_t also seems to have an empty slot, namely __u.__i[3],
no? Or was it intended that _c_count be in that, and using __u.i[5]
for it is a typo?

Jens

-- 
:: INRIA Nancy Grand Est ::: AlGorille ::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: My current understanding of cond var access restrictions
  2014-08-14 20:47                 ` Jens Gustedt
@ 2014-08-14 22:22                   ` Rich Felker
  0 siblings, 0 replies; 14+ messages in thread
From: Rich Felker @ 2014-08-14 22:22 UTC (permalink / raw)
  To: musl

On Thu, Aug 14, 2014 at 10:47:24PM +0200, Jens Gustedt wrote:
> Am Donnerstag, den 14.08.2014, 14:23 -0400 schrieb Rich Felker:
> > On Thu, Aug 14, 2014 at 08:12:46PM +0200, Jens Gustedt wrote:
> > > I don't think they are too bad, actually. They help to distinguish two
> > > phases for a waiting thread. In the first, he has released the mutex
> > > and no signal or broadcast has been issued. A thread should never
> > > attempt to relock the mutex and/or return to user space during that
> > > phase.
> > > 
> > > And then the second phase after such a signal or broadcast, where any
> > > wakeup could be legitimate and in the worst case just be spurious.
> > 
> > Yes. Really the only reason I dislike sequence numbers is the
> > theoretical possibility of wrapping after 2<<32 signals. This would
> > require extreme scheduling delays to realize without a signal handler
> > intentionally preventing the waiter from making forward progress, so
> > it's unlikely to impact anything, but it still seems wrong.
> 
> Wrapping alone doesn't matter, waiters don't have just to do an
> equality test on the sequence counter.

They do, in the futex wait syscall. As soon as the mutex is unlocked,
they are formally a waiter, and must be releasable by signal or
broadcast. If a signal happens before the futex wait syscall is made,
or while the futex wait syscall is pending restart due to a signal
handler running, the waiter depends on the sequence number having
changed in order not to block in futex wait.

> So with the traditional int
> (and supposing that there is no UB because of the overflow) even
> negative values may occur with no harm. The thing that would do harm
> would be the waiter that is woken up *exactly* 2<<32 sequence points
> later and concludes that the world hasn't changed. Really hard to
> generate a test program for that bug :)

Actually I don't think it's hard at all. Just make a signal handler
that blocks reading from a pipe, and issue 2<<32 cv signals before
writing to the pipe.

> But why not give it a 64 bit integer then? It seems to me that
> __u.i[9] is unused, so with a shift of the whole crowd _c_seq could
> get two slots.

Because futex only uses 32-bit integers.

> Ah, and for the records for the discussion on mutex we had before,
> pthread_mutex_t also seems to have an empty slot, namely __u.__i[3],
> no? Or was it intended that _c_count be in that, and using __u.i[5]
> for it is a typo?

I'd have to look -- maybe there is an extra empty slot. I laid the
slots out so that the same indices would work on 32- and 64-bit archs,
but it might be possible to get more slots by making different 32- and
64-bit orderings.

Rich


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-08-14 22:22 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-13 21:23 My current understanding of cond var access restrictions Rich Felker
2014-08-13 23:20 ` Jens Gustedt
2014-08-14  2:19   ` Rich Felker
2014-08-14  7:41     ` Jens Gustedt
2014-08-14  6:10   ` Rich Felker
2014-08-14  8:00     ` Jens Gustedt
2014-08-14 14:41       ` Rich Felker
2014-08-14 15:36         ` Rich Felker
2014-08-14 16:27         ` Jens Gustedt
2014-08-14 16:58           ` Rich Felker
2014-08-14 18:12             ` Jens Gustedt
2014-08-14 18:23               ` Rich Felker
2014-08-14 20:47                 ` Jens Gustedt
2014-08-14 22:22                   ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).