From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_MSPIKE_H2 autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 12299 invoked from network); 11 Feb 2023 14:53:03 -0000 Received: from second.openwall.net (193.110.157.125) by inbox.vuxu.org with ESMTPUTF8; 11 Feb 2023 14:53:03 -0000 Received: (qmail 23945 invoked by uid 550); 11 Feb 2023 14:53:00 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 23904 invoked from network); 11 Feb 2023 14:52:59 -0000 Date: Sat, 11 Feb 2023 09:52:47 -0500 From: Rich Felker To: Alexey Izbyshev Cc: musl@lists.openwall.com Message-ID: <20230211145246.GH4163@brightrain.aerifal.cx> References: <20221109104613.48062-1-izbyshev@ispras.ru> <20221214022618.GB15716@brightrain.aerifal.cx> <1a0289c15879bef6d538c0066f58545c@ispras.ru> <20230210162957.GB4163@brightrain.aerifal.cx> <63c0897d647936c946268f5a967a5e4d@ispras.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <63c0897d647936c946268f5a967a5e4d@ispras.ru> User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] [PATCH] mq_notify: fix close/recv race on failure path On Sat, Feb 11, 2023 at 05:45:14PM +0300, Alexey Izbyshev wrote: > On 2023-02-10 19:29, Rich Felker wrote: > >On Wed, Dec 14, 2022 at 09:49:26AM +0300, Alexey Izbyshev wrote: > >>On 2022-12-14 05:26, Rich Felker wrote: > >>>On Wed, Nov 09, 2022 at 01:46:13PM +0300, Alexey Izbyshev wrote: > >>>>In case of failure mq_notify closes the socket immediately after > >>>>sending a cancellation request to the worker thread that is going to > >>>>call or have already called recv on that socket. Even if we don't > >>>>consider the kernel behavior when the only descriptor to an > >>>>object that > >>>>is being used in a system call is closed, if the socket descriptor is > >>>>closed before the kernel looks at it, another thread could open a > >>>>descriptor with the same value in the meantime, resulting in recv > >>>>acting on a wrong object. > >>>> > >>>>Fix the race by moving pthread_cancel call before the barrier wait to > >>>>guarantee that the cancellation flag is set before the worker thread > >>>>enters recv. > >>>>--- > >>>>Other ways to fix this: > >>>> > >>>>* Remove the racing close call from mq_notify and surround recv > >>>> with pthread_cleanup_push/pop. > >>>> > >>>>* Make the worker thread joinable initially, join it before closing > >>>> the socket on the failure path, and detach it on the happy path. > >>>> This would also require disabling cancellation around join/detach > >>>> to ensure that mq_notify itself is not cancelled in an inappropriate > >>>> state. > >>> > >>>I'd put this aside for a while because of the pthread barrier > >>>involvement I kinda didn't want to deal with. The fix you have sounds > >>>like it works, but I think I'd rather pursue one of the other > >>>approaches, probably the joinable thread one. > >>> > >>>At present, the implementation of barriers seems to be buggy (I need > >>>to dig back up the post about that), and they're also a really > >>>expensive synchronization tool that goes both directions where we > >>>really only need one direction (notifying the caller we're done > >>>consuming the args). I'd rather switch to a semaphore, which is the > >>>lightest and most idiomatic (at least per present-day musl idioms) way > >>>to do this. > >>> > >>This sounds good to me. The same approach can also be used in > >>timer_create (assuming it's acceptable to add dependency on > >>pthread_cancel to that code). > >> > >>>Using a joinable thread also lets us ensure we don't leave around > >>>threads that are waiting to be scheduled just to exit on failure > >>>return. Depending on scheduling attributes, this probably could be > >>>bad. > >>> > >>I also prefer this approach, though mostly for aesthetic reasons (I > >>haven't thought about the scheduling behavior). I didn't use it only > >>because I felt it's a "logically larger" change than simply moving > >>the pthread_barrier_wait call. And I wasn't aware that barriers are > >>buggy in musl. > > > >Finally following up on this. How do the attached commits look? > > > The first and third patches add calls to sem_wait, pthread_join, and > pthread_detach, which are cancellation points in musl, so ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Nice catch -- this is actually a bug. pthread_detach is not permitted to be a cancellation point. Rich