From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_MSPIKE_H2 autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 20466 invoked from network); 14 Dec 2022 02:26:34 -0000 Received: from second.openwall.net (193.110.157.125) by inbox.vuxu.org with ESMTPUTF8; 14 Dec 2022 02:26:34 -0000 Received: (qmail 9662 invoked by uid 550); 14 Dec 2022 02:26:31 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 9630 invoked from network); 14 Dec 2022 02:26:30 -0000 Date: Tue, 13 Dec 2022 21:26:18 -0500 From: Rich Felker To: musl@lists.openwall.com Message-ID: <20221214022618.GB15716@brightrain.aerifal.cx> References: <20221109104613.48062-1-izbyshev@ispras.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20221109104613.48062-1-izbyshev@ispras.ru> User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] [PATCH] mq_notify: fix close/recv race on failure path On Wed, Nov 09, 2022 at 01:46:13PM +0300, Alexey Izbyshev wrote: > In case of failure mq_notify closes the socket immediately after > sending a cancellation request to the worker thread that is going to > call or have already called recv on that socket. Even if we don't > consider the kernel behavior when the only descriptor to an object that > is being used in a system call is closed, if the socket descriptor is > closed before the kernel looks at it, another thread could open a > descriptor with the same value in the meantime, resulting in recv > acting on a wrong object. > > Fix the race by moving pthread_cancel call before the barrier wait to > guarantee that the cancellation flag is set before the worker thread > enters recv. > --- > Other ways to fix this: > > * Remove the racing close call from mq_notify and surround recv > with pthread_cleanup_push/pop. > > * Make the worker thread joinable initially, join it before closing > the socket on the failure path, and detach it on the happy path. > This would also require disabling cancellation around join/detach > to ensure that mq_notify itself is not cancelled in an inappropriate > state. I'd put this aside for a while because of the pthread barrier involvement I kinda didn't want to deal with. The fix you have sounds like it works, but I think I'd rather pursue one of the other approaches, probably the joinable thread one. At present, the implementation of barriers seems to be buggy (I need to dig back up the post about that), and they're also a really expensive synchronization tool that goes both directions where we really only need one direction (notifying the caller we're done consuming the args). I'd rather switch to a semaphore, which is the lightest and most idiomatic (at least per present-day musl idioms) way to do this. Using a joinable thread also lets us ensure we don't leave around threads that are waiting to be scheduled just to exit on failure return. Depending on scheduling attributes, this probably could be bad. (Also, arguably we should perhaps start the thread with unchanged scheduling attributes and only change to the caller-provided ones after successfully making the SYS_mq_notify syscall -- bleh. But that's a separate topic.) Rich