Re: My current understanding of cond var access restrictions

mailing list of musl libc
 help / color / mirror / code / Atom feed

From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Subject: Re: My current understanding of cond var access restrictions
Date: Thu, 14 Aug 2014 02:10:09 -0400	[thread overview]
Message-ID: <20140814061009.GA6599@brightrain.aerifal.cx> (raw)
In-Reply-To: <1407972025.4951.73.camel@eris.loria.fr>

On Thu, Aug 14, 2014 at 01:20:25AM +0200, Jens Gustedt wrote:
> > 4. When can signal and broadcast safely use the mutex?
> > 
> > Not at all, unless it can block waiters from exiting the wait. Any
> > waiter could spontaneously exit the wait as a result of cancellation,
> > timeout, or a cv signal from another thread, and by the above, it may
> > be entitled to destroy the mutex.
> 
> Are you suggesting that all waiters when coming back should first
> regain an internal lock on the cv?

I think I have an informal proof sketch that this is necessary unless
we abandon requeue:

If we want to be able to use the mutex in broadcast, which is needed
for requeue, then broadcast needs a lock that can block at least one
waiter from returning, and needs to confirm that at least one waiter
remains after the lock is obtained (otherwise it's easy -- there's no
work to do), so that the mutex is valid.

(Note: Broadcast can immediately release this lock if it determines
that the calling thread holds the mutex, since in this case, the mutex
will be sufficient to prevent any waiter from returning. But in
general it needs to hold the lock until requeue is performed.)

In order for this lock to block waiters from returning, any waiter
that woke possibly not under the control of broadcast/signal (i.e.
futex wait not returning 0) has to obtain the lock. (For safety
against application use of futexes that generates spurious wakes, it
might be best to just ignore the return value and always attempt to
get the lock.) This probably means it has to access the cv object
(unless it uses an object at another location whose address was
obtained before waiting), which in turn means that we have to track
references so that destroy can wait for all references to be released
before returning.

So I think we're stuck with something like the current implementation,
or abandoning requeue and just doing private cond vars the same as
process-shared ones. This is actually somewhat reassuring -- it means
I wasn't completely insane when I came up with the current
implementation a couple years back. Or at least, if I was, the
insane line of reasoning is at least reproducible. :-)

With that in mind, I'd like to look for ways we can fix the bogus
waiter accounting for the mutex that seems to be the source of the bug
you found. One "obvious" (but maybe bad/wrong?) solution would be to
put the count on the mutex at the time of waiting (rather than moving
it there as part of broadcast), so that decrementing the mutex waiter
count is always the right thing to do in unwait. Of course this
possibly results in lots of spurious futex wakes to the mutex (every
time it's unlocked while there are waiters on the cv, which could be a
lot). It would be nice if we had a separate field in the mutex (rather
than in the cv, as it is now) to store these on, and only move them to
the active waiters count at broadcast time, but I don't see any way to
get additional space in the mutex structure for this -- it's full.

> > 5. When can [timed]wait safely access the cv?
> > 
> > Only before unlocking the mutex, unless the implementation
> > synchronizes with possible signaling threads, or with destruction (and
> > possibly unmapping). Otherwise, per the above, it's possible that a
> > signaling thread destroys the cv.
> 
> so again this suggests an internal lock on the cv that would be used
> to synchronize between waiters and wakers?

This argument applies even to process-shared cv's, and for them, no
allocation is possible, and I don't see a really good way to solve the
unmapping issue -- I think broadcast/signal would have to block
unmapping, and the last waiter to wake up would have to unblock it.
Maybe that's the right solution?

Rich

next prev parent reply	other threads:[~2014-08-14  6:10 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-13 21:23 Rich Felker
2014-08-13 23:20 ` Jens Gustedt
2014-08-14  2:19   ` Rich Felker
2014-08-14  7:41     ` Jens Gustedt
2014-08-14  6:10   ` Rich Felker [this message]
2014-08-14  8:00     ` Jens Gustedt
2014-08-14 14:41       ` Rich Felker
2014-08-14 15:36         ` Rich Felker
2014-08-14 16:27         ` Jens Gustedt
2014-08-14 16:58           ` Rich Felker
2014-08-14 18:12             ` Jens Gustedt
2014-08-14 18:23               ` Rich Felker
2014-08-14 20:47                 ` Jens Gustedt
2014-08-14 22:22                   ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140814061009.GA6599@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).