On Wed, Dec 14, 2022 at 09:49:26AM +0300, Alexey Izbyshev wrote: > On 2022-12-14 05:26, Rich Felker wrote: > >On Wed, Nov 09, 2022 at 01:46:13PM +0300, Alexey Izbyshev wrote: > >>In case of failure mq_notify closes the socket immediately after > >>sending a cancellation request to the worker thread that is going to > >>call or have already called recv on that socket. Even if we don't > >>consider the kernel behavior when the only descriptor to an > >>object that > >>is being used in a system call is closed, if the socket descriptor is > >>closed before the kernel looks at it, another thread could open a > >>descriptor with the same value in the meantime, resulting in recv > >>acting on a wrong object. > >> > >>Fix the race by moving pthread_cancel call before the barrier wait to > >>guarantee that the cancellation flag is set before the worker thread > >>enters recv. > >>--- > >>Other ways to fix this: > >> > >>* Remove the racing close call from mq_notify and surround recv > >> with pthread_cleanup_push/pop. > >> > >>* Make the worker thread joinable initially, join it before closing > >> the socket on the failure path, and detach it on the happy path. > >> This would also require disabling cancellation around join/detach > >> to ensure that mq_notify itself is not cancelled in an inappropriate > >> state. > > > >I'd put this aside for a while because of the pthread barrier > >involvement I kinda didn't want to deal with. The fix you have sounds > >like it works, but I think I'd rather pursue one of the other > >approaches, probably the joinable thread one. > > > >At present, the implementation of barriers seems to be buggy (I need > >to dig back up the post about that), and they're also a really > >expensive synchronization tool that goes both directions where we > >really only need one direction (notifying the caller we're done > >consuming the args). I'd rather switch to a semaphore, which is the > >lightest and most idiomatic (at least per present-day musl idioms) way > >to do this. > > > This sounds good to me. The same approach can also be used in > timer_create (assuming it's acceptable to add dependency on > pthread_cancel to that code). > > >Using a joinable thread also lets us ensure we don't leave around > >threads that are waiting to be scheduled just to exit on failure > >return. Depending on scheduling attributes, this probably could be > >bad. > > > I also prefer this approach, though mostly for aesthetic reasons (I > haven't thought about the scheduling behavior). I didn't use it only > because I felt it's a "logically larger" change than simply moving > the pthread_barrier_wait call. And I wasn't aware that barriers are > buggy in musl. Finally following up on this. How do the attached commits look? Rich