From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_MSPIKE_H2 autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 14853 invoked from network); 11 Feb 2023 15:14:14 -0000 Received: from second.openwall.net (193.110.157.125) by inbox.vuxu.org with ESMTPUTF8; 11 Feb 2023 15:14:14 -0000 Received: (qmail 5785 invoked by uid 550); 11 Feb 2023 15:14:11 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 5753 invoked from network); 11 Feb 2023 15:14:11 -0000 DKIM-Filter: OpenDKIM Filter v2.11.0 mail.ispras.ru 8B1BB40737B7 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ispras.ru; s=default; t=1676128439; bh=5dHi5Q83jfc0Gr9lKi324H/mYmOgZozmqeaVrVuGTeM=; h=Date:From:To:Subject:In-Reply-To:References:From; b=IihpbK1cmvl870aE3zo5r3nFhpF4mDuN7jyXDUvnubfiUCLJw5D8lyDcKs4pkCwBP MeDgKuvEkLJEwwIUmPCZ2UUPDksCrfRYuzoxNni5EeC5K/GAiLA38cOMkkvBXiLA4V sw0N5dQ9Fr+cGxgwLD5w2h12EKNNdlhjpGj87cw4= MIME-Version: 1.0 Date: Sat, 11 Feb 2023 18:13:59 +0300 From: Alexey Izbyshev To: musl@lists.openwall.com In-Reply-To: <20230211145246.GH4163@brightrain.aerifal.cx> References: <20221109104613.48062-1-izbyshev@ispras.ru> <20221214022618.GB15716@brightrain.aerifal.cx> <1a0289c15879bef6d538c0066f58545c@ispras.ru> <20230210162957.GB4163@brightrain.aerifal.cx> <63c0897d647936c946268f5a967a5e4d@ispras.ru> <20230211145246.GH4163@brightrain.aerifal.cx> User-Agent: Roundcube Webmail/1.4.4 Message-ID: <5ca5f57982db1867b11ec9eecefc4df2@ispras.ru> X-Sender: izbyshev@ispras.ru Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [musl] [PATCH] mq_notify: fix close/recv race on failure path On 2023-02-11 17:52, Rich Felker wrote: > On Sat, Feb 11, 2023 at 05:45:14PM +0300, Alexey Izbyshev wrote: >> On 2023-02-10 19:29, Rich Felker wrote: >> >On Wed, Dec 14, 2022 at 09:49:26AM +0300, Alexey Izbyshev wrote: >> >>On 2022-12-14 05:26, Rich Felker wrote: >> >>>On Wed, Nov 09, 2022 at 01:46:13PM +0300, Alexey Izbyshev wrote: >> >>>>In case of failure mq_notify closes the socket immediately after >> >>>>sending a cancellation request to the worker thread that is going to >> >>>>call or have already called recv on that socket. Even if we don't >> >>>>consider the kernel behavior when the only descriptor to an >> >>>>object that >> >>>>is being used in a system call is closed, if the socket descriptor is >> >>>>closed before the kernel looks at it, another thread could open a >> >>>>descriptor with the same value in the meantime, resulting in recv >> >>>>acting on a wrong object. >> >>>> >> >>>>Fix the race by moving pthread_cancel call before the barrier wait to >> >>>>guarantee that the cancellation flag is set before the worker thread >> >>>>enters recv. >> >>>>--- >> >>>>Other ways to fix this: >> >>>> >> >>>>* Remove the racing close call from mq_notify and surround recv >> >>>> with pthread_cleanup_push/pop. >> >>>> >> >>>>* Make the worker thread joinable initially, join it before closing >> >>>> the socket on the failure path, and detach it on the happy path. >> >>>> This would also require disabling cancellation around join/detach >> >>>> to ensure that mq_notify itself is not cancelled in an inappropriate >> >>>> state. >> >>> >> >>>I'd put this aside for a while because of the pthread barrier >> >>>involvement I kinda didn't want to deal with. The fix you have sounds >> >>>like it works, but I think I'd rather pursue one of the other >> >>>approaches, probably the joinable thread one. >> >>> >> >>>At present, the implementation of barriers seems to be buggy (I need >> >>>to dig back up the post about that), and they're also a really >> >>>expensive synchronization tool that goes both directions where we >> >>>really only need one direction (notifying the caller we're done >> >>>consuming the args). I'd rather switch to a semaphore, which is the >> >>>lightest and most idiomatic (at least per present-day musl idioms) way >> >>>to do this. >> >>> >> >>This sounds good to me. The same approach can also be used in >> >>timer_create (assuming it's acceptable to add dependency on >> >>pthread_cancel to that code). >> >> >> >>>Using a joinable thread also lets us ensure we don't leave around >> >>>threads that are waiting to be scheduled just to exit on failure >> >>>return. Depending on scheduling attributes, this probably could be >> >>>bad. >> >>> >> >>I also prefer this approach, though mostly for aesthetic reasons (I >> >>haven't thought about the scheduling behavior). I didn't use it only >> >>because I felt it's a "logically larger" change than simply moving >> >>the pthread_barrier_wait call. And I wasn't aware that barriers are >> >>buggy in musl. >> > >> >Finally following up on this. How do the attached commits look? >> > >> The first and third patches add calls to sem_wait, pthread_join, and >> pthread_detach, which are cancellation points in musl, so > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > Nice catch -- this is actually a bug. pthread_detach is not permitted > to be a cancellation point. > Indeed. I'd actually tried to check that before I sent the email, but got confused by the following sentence from "man 7 pthreads"[1]: "An implementation may also mark other functions not specified in the standard as cancellation points." My mistake is that I read this as "if a function is not specified to be a cancellation point in the standard, an implementation may still mark it as a cancellation point". But apparently it means that "if a function is not mentioned in the standard at all, an implementation may still mark it as a cancellation point". To anyone wondering, the actual text from POSIX is[2]: "In addition, a cancellation point may occur when a thread is executing any function that this standard does not require to be thread-safe but the implementation documents as being thread-safe. If a thread is cancelled while executing a non-thread-safe function, the behavior is undefined. An implementation shall not introduce cancellation points into any other functions specified in this volume of POSIX.1-2017." (And pthread_detach is required to be thread-safe). [1] https://man7.org/linux/man-pages/man7/pthreads.7.html [2] https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09_05_02