From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org
X-Spam-Level: 
X-Spam-Status: No, score=-1.1 required=5.0 tests=DKIM_SIGNED,DKIM_VALID,
	DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_MSPIKE_H2 autolearn=ham
	autolearn_force=no version=3.4.4
Received: (qmail 13240 invoked from network); 14 Dec 2022 10:30:11 -0000
Received: from second.openwall.net (193.110.157.125)
  by inbox.vuxu.org with ESMTPUTF8; 14 Dec 2022 10:30:11 -0000
Received: (qmail 5153 invoked by uid 550); 14 Dec 2022 10:30:07 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Reply-To: musl@lists.openwall.com
Received: (qmail 4091 invoked from network); 14 Dec 2022 10:30:06 -0000
DKIM-Filter: OpenDKIM Filter v2.11.0 mail.ispras.ru B3F06419E9C8
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ispras.ru;
	s=default; t=1671013794;
	bh=ta4nzp8P3SxCrJbyr6Ed2ZaVMYIWsieDN0V1efvJzb4=;
	h=Date:From:To:Subject:In-Reply-To:References:From;
	b=hxTgWjdB30+/Uojn2vyzea7r3ryFTTK/FXbgGBe3VlUL97P+OvOaQahf8KeklDinB
	 Ex0dTk7bEPJVkmg5xilTTOweZzZrdKU2b68zel8tHg0n34VP/9C12Ne/DrQQ+wi0fk
	 MGxp8gXSnT40IW1z6yEdx6xYOZzfu+PkR2Iaze9Q=
MIME-Version: 1.0
Date: Wed, 14 Dec 2022 13:29:54 +0300
From: Alexey Izbyshev <izbyshev@ispras.ru>
To: musl@lists.openwall.com
In-Reply-To: <20221214014857.GA15716@brightrain.aerifal.cx>
References: <d352f85c0aab78a5777b0857a12e1de3@ispras.ru>
 <20221122000958.GQ29905@brightrain.aerifal.cx>
 <87064ce2eef96ffbced93da7812d5358@ispras.ru>
 <20221214014857.GA15716@brightrain.aerifal.cx>
User-Agent: Roundcube Webmail/1.4.4
Message-ID: <583865e860854d059f50d23a9acd7a8f@ispras.ru>
X-Sender: izbyshev@ispras.ru
Content-Type: text/plain; charset=US-ASCII;
 format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [musl] sem_post() can miss waiters

On 2022-12-14 04:48, Rich Felker wrote:
> On Tue, Nov 22, 2022 at 08:41:44AM +0300, Alexey Izbyshev wrote:
>> On 2022-11-22 03:09, Rich Felker wrote:
>> >If I understand correctly, the cost of the first option is generally
>> >an extra "last" broadcast wake that shouldn't be needed. Is that
>> >right?
>> >
>> >For example, if sem starts with count 0 and thread A calls wait, then
>> >thread B calls post twice, both posts make a syscall even though only
>> >the first one should have.
>> >
>> Yes, exactly.
>> 
>> >What if you instead perform the broadcast wake when the observed
>> >waiters count is <=1 rather than ==0? This should have no cost in the
>> >common case, but in the race case, I think it forces any hiding
>> >(just-arrived) extra waiters to wake, fail, and re-publish their
>> >waiting status to the waiters bit.
>> >
>> Indeed, I think this solves the overhead issue quite nicely, thanks.
>> So sem_post wake up logic would basically boil down to this:
>> 
>> * WAITERS_BIT is set and waiters > 1: don't reset WAITERS_BIT since
>> it's likely that some waiters will remain (and it's always fine to
>> err on the side of preserving the WAITERS_BIT); wake a single
>> waiter.
>> 
>> * WAITERS_BIT is set and waiters <= 1: reset WAITERS_BIT and wake
>> all waiters. In non-racy cases only a single waiter will be woken.
>> 
>> * WAITERS_BIT is unset: don't wake anybody. Even if there are some
>> waiters, another sem_post (that reset the WAITERS_BIT) is
>> responsible for waking them.
>> 
>> In code:
>> 
>> int val, new, waiters, priv = sem->__val[2];
>> do {
>>     val = sem->__val[0];
>>     waiters = sem->__val[1];
>>     if ((val & SEM_VALUE_MAX) == SEM_VALUE_MAX) {
>>         errno = EOVERFLOW;
>>         return -1;
>>     }
>>     new = val + 1;
>>     if (waiters <= 1)
>>         new &= ~0x80000000;
>> } while (a_cas(sem->__val, val, new) != val);
>> if (val<0) __wake(sem->__val, waiters <= 1 ? -1 : 1, priv);
> 
> Yes, this looks good to me. How is the attached patch?
> 
The patch looks good to me.

>> >I think there's another really stupid solution too that I would even
>> >consider, partly motivated by the fact that, with long-lived
>> >process-shared semaphores, the waiters count can become stale if
>> >waiters are killed. (Note: semaphores aren't required to be robust
>> >against this, but it's ugly that they're not.) This solution is just
>> >to keep the current code, but drop the waiters count entirely, and use
>> >broadcast wakes for all wakes. Then, any still-live waiters are
>> >*always* responsible for re-asserting their waiting status when all
>> >but one fail to acquire the semaphore after a wake. Of course this is
>> >a thundering herd, which is arguably something we should not want, but
>> >we accept it for correctness in several other places like condvars...
>> >
>> I'm not sure that the kind of "partial robustness" that could be
>> achieved by always waking all waiters is worth punishing normal
>> cases. Also, I don't think waiters count being stale is an issue per
>> se because it can be wrong only in one way (greater than the real
>> count of waiters), assuming you don't mean overflow (but overflow
>> could be handled by simply pinning it at INT_MAX permanently). But
>> many other issues are possible if we allow killing processes at
>> arbitrary points, including sem_post not sending a wake up
>> notification after updating the semaphore value, or sem_wait
>> consuming a notification but not updating the value. Granted, some
>> of these issues may "self-heal" on a future sem_wait/sem_post (in
>> the sense that the semaphore will leave a forbidden state), but they
>> might still interfere with the program shutdown logic (e.g. if the
>> program expects to claim the semaphore N times at the end without
>> being aware that some consumers are dead), and, of course, with any
>> logic outside of semaphore manipulation. So it seems to me that
>> either the program already watches for death of processes it cares
>> about, or it's not even clear that allowing further progress (due to
>> a broadcast) is always more desirable than blocking.
> 
> Indeed, this seems like it's just a bad direction to go for the
> reasons you described. One thing I'd like to note here though, but not
> act on at this time:
> 
> When we were designing the semaphore long ago, there was a vague
> proposal to make the post operation responsible for removing waiter
> counts, rather than the waiter. It was thrown out because it didn't
> seem to work. But if we ever did have a reason to want to do this, I
> think it's possible now, since it's essentially just a "countdown to
> broadcast wake" with no constraint that it be an accurate count, and
> the new waiters bit is what really controls whether wakes happen.
> 
Indeed, now the waiters count is just a hint used for resetting the 
waiters bit.
However, its increments/decrements must still be balanced for it to 
remain useful, and I don't see an easy way to achieve that if sem_post 
is responsible for decrementing it.

One further note: after this bug report I'd been pointed to a semaphore 
redesign proposal from 2015 (starts in 
https://www.openwall.com/lists/musl/2015/02/27/1 after some off-list 
discussion, ends in April). In that design, the waiters count is stored 
in the semaphore word, and sem_post is indeed responsible for adjusting 
it. However, another counter ("wake count") is still needed to make the 
semaphore work.

Thanks,
Alexey