From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org
X-Spam-Level: 
X-Spam-Status: No, score=-1.0 required=5.0 tests=MAILING_LIST_MULTI,
	RCVD_IN_MSPIKE_H2 autolearn=ham autolearn_force=no version=3.4.4
Received: (qmail 20466 invoked from network); 14 Dec 2022 02:26:34 -0000
Received: from second.openwall.net (193.110.157.125)
  by inbox.vuxu.org with ESMTPUTF8; 14 Dec 2022 02:26:34 -0000
Received: (qmail 9662 invoked by uid 550); 14 Dec 2022 02:26:31 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Reply-To: musl@lists.openwall.com
Received: (qmail 9630 invoked from network); 14 Dec 2022 02:26:30 -0000
Date: Tue, 13 Dec 2022 21:26:18 -0500
From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Message-ID: <20221214022618.GB15716@brightrain.aerifal.cx>
References: <20221109104613.48062-1-izbyshev@ispras.ru>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20221109104613.48062-1-izbyshev@ispras.ru>
User-Agent: Mutt/1.5.21 (2010-09-15)
Subject: Re: [musl] [PATCH] mq_notify: fix close/recv race on failure path

On Wed, Nov 09, 2022 at 01:46:13PM +0300, Alexey Izbyshev wrote:
> In case of failure mq_notify closes the socket immediately after
> sending a cancellation request to the worker thread that is going to
> call or have already called recv on that socket. Even if we don't
> consider the kernel behavior when the only descriptor to an object that
> is being used in a system call is closed, if the socket descriptor is
> closed before the kernel looks at it, another thread could open a
> descriptor with the same value in the meantime, resulting in recv
> acting on a wrong object.
> 
> Fix the race by moving pthread_cancel call before the barrier wait to
> guarantee that the cancellation flag is set before the worker thread
> enters recv.
> ---
> Other ways to fix this:
> 
> * Remove the racing close call from mq_notify and surround recv
>   with pthread_cleanup_push/pop.
> 
> * Make the worker thread joinable initially, join it before closing
>   the socket on the failure path, and detach it on the happy path.
>   This would also require disabling cancellation around join/detach
>   to ensure that mq_notify itself is not cancelled in an inappropriate
>   state.

I'd put this aside for a while because of the pthread barrier
involvement I kinda didn't want to deal with. The fix you have sounds
like it works, but I think I'd rather pursue one of the other
approaches, probably the joinable thread one.

At present, the implementation of barriers seems to be buggy (I need
to dig back up the post about that), and they're also a really
expensive synchronization tool that goes both directions where we
really only need one direction (notifying the caller we're done
consuming the args). I'd rather switch to a semaphore, which is the
lightest and most idiomatic (at least per present-day musl idioms) way
to do this.

Using a joinable thread also lets us ensure we don't leave around
threads that are waiting to be scheduled just to exit on failure
return. Depending on scheduling attributes, this probably could be
bad.

(Also, arguably we should perhaps start the thread with unchanged
scheduling attributes and only change to the caller-provided ones
after successfully making the SYS_mq_notify syscall -- bleh. But
that's a separate topic.)

Rich