mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Alexey Izbyshev <>
Subject: Re: [musl] sem_post() can miss waiters
Date: Wed, 14 Dec 2022 13:29:54 +0300	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On 2022-12-14 04:48, Rich Felker wrote:
> On Tue, Nov 22, 2022 at 08:41:44AM +0300, Alexey Izbyshev wrote:
>> On 2022-11-22 03:09, Rich Felker wrote:
>> >If I understand correctly, the cost of the first option is generally
>> >an extra "last" broadcast wake that shouldn't be needed. Is that
>> >right?
>> >
>> >For example, if sem starts with count 0 and thread A calls wait, then
>> >thread B calls post twice, both posts make a syscall even though only
>> >the first one should have.
>> >
>> Yes, exactly.
>> >What if you instead perform the broadcast wake when the observed
>> >waiters count is <=1 rather than ==0? This should have no cost in the
>> >common case, but in the race case, I think it forces any hiding
>> >(just-arrived) extra waiters to wake, fail, and re-publish their
>> >waiting status to the waiters bit.
>> >
>> Indeed, I think this solves the overhead issue quite nicely, thanks.
>> So sem_post wake up logic would basically boil down to this:
>> * WAITERS_BIT is set and waiters > 1: don't reset WAITERS_BIT since
>> it's likely that some waiters will remain (and it's always fine to
>> err on the side of preserving the WAITERS_BIT); wake a single
>> waiter.
>> * WAITERS_BIT is set and waiters <= 1: reset WAITERS_BIT and wake
>> all waiters. In non-racy cases only a single waiter will be woken.
>> * WAITERS_BIT is unset: don't wake anybody. Even if there are some
>> waiters, another sem_post (that reset the WAITERS_BIT) is
>> responsible for waking them.
>> In code:
>> int val, new, waiters, priv = sem->__val[2];
>> do {
>>     val = sem->__val[0];
>>     waiters = sem->__val[1];
>>     if ((val & SEM_VALUE_MAX) == SEM_VALUE_MAX) {
>>         errno = EOVERFLOW;
>>         return -1;
>>     }
>>     new = val + 1;
>>     if (waiters <= 1)
>>         new &= ~0x80000000;
>> } while (a_cas(sem->__val, val, new) != val);
>> if (val<0) __wake(sem->__val, waiters <= 1 ? -1 : 1, priv);
> Yes, this looks good to me. How is the attached patch?
The patch looks good to me.

>> >I think there's another really stupid solution too that I would even
>> >consider, partly motivated by the fact that, with long-lived
>> >process-shared semaphores, the waiters count can become stale if
>> >waiters are killed. (Note: semaphores aren't required to be robust
>> >against this, but it's ugly that they're not.) This solution is just
>> >to keep the current code, but drop the waiters count entirely, and use
>> >broadcast wakes for all wakes. Then, any still-live waiters are
>> >*always* responsible for re-asserting their waiting status when all
>> >but one fail to acquire the semaphore after a wake. Of course this is
>> >a thundering herd, which is arguably something we should not want, but
>> >we accept it for correctness in several other places like condvars...
>> >
>> I'm not sure that the kind of "partial robustness" that could be
>> achieved by always waking all waiters is worth punishing normal
>> cases. Also, I don't think waiters count being stale is an issue per
>> se because it can be wrong only in one way (greater than the real
>> count of waiters), assuming you don't mean overflow (but overflow
>> could be handled by simply pinning it at INT_MAX permanently). But
>> many other issues are possible if we allow killing processes at
>> arbitrary points, including sem_post not sending a wake up
>> notification after updating the semaphore value, or sem_wait
>> consuming a notification but not updating the value. Granted, some
>> of these issues may "self-heal" on a future sem_wait/sem_post (in
>> the sense that the semaphore will leave a forbidden state), but they
>> might still interfere with the program shutdown logic (e.g. if the
>> program expects to claim the semaphore N times at the end without
>> being aware that some consumers are dead), and, of course, with any
>> logic outside of semaphore manipulation. So it seems to me that
>> either the program already watches for death of processes it cares
>> about, or it's not even clear that allowing further progress (due to
>> a broadcast) is always more desirable than blocking.
> Indeed, this seems like it's just a bad direction to go for the
> reasons you described. One thing I'd like to note here though, but not
> act on at this time:
> When we were designing the semaphore long ago, there was a vague
> proposal to make the post operation responsible for removing waiter
> counts, rather than the waiter. It was thrown out because it didn't
> seem to work. But if we ever did have a reason to want to do this, I
> think it's possible now, since it's essentially just a "countdown to
> broadcast wake" with no constraint that it be an accurate count, and
> the new waiters bit is what really controls whether wakes happen.
Indeed, now the waiters count is just a hint used for resetting the 
waiters bit.
However, its increments/decrements must still be balanced for it to 
remain useful, and I don't see an easy way to achieve that if sem_post 
is responsible for decrementing it.

One further note: after this bug report I'd been pointed to a semaphore 
redesign proposal from 2015 (starts in after some off-list 
discussion, ends in April). In that design, the waiters count is stored 
in the semaphore word, and sem_post is indeed responsible for adjusting 
it. However, another counter ("wake count") is still needed to make the 
semaphore work.


  reply	other threads:[~2022-12-14 10:30 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-21  5:14 Alexey Izbyshev
2022-11-22  0:09 ` Rich Felker
2022-11-22  5:41   ` A. Wilcox
2022-11-22  5:41   ` Alexey Izbyshev
2022-12-14  1:48     ` Rich Felker
2022-12-14 10:29       ` Alexey Izbyshev [this message]
2023-02-22 18:12         ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).