On Thu, Oct 06, 2022 at 03:20:42PM -0400, Rich Felker wrote: > On Thu, Oct 06, 2022 at 10:02:11AM +0300, Alexey Izbyshev wrote: > > On 2022-10-06 09:37, Alexey Izbyshev wrote: > > >Hi, > > > > > >I noticed that fork() doesn't take key_lock that is used to protect > > >the global table of thread-specific keys. I couldn't find mentions of > > >this lock in the MT fork discussion in the mailing list archive. Was > > >this lock overlooked? > > > > > >Also, I looked at how __aio_atfork() handles a similar case with > > >maplock, and it seems wrong. It takes the read lock and then simply > > >unlocks it both in the parent and in the child. But if there were > > >other holders of the read lock at the time of fork(), the lock won't > > >end up in the unlocked state in the child. It should probably be > > >completely nulled-out in the child instead. > > > > > Looking at aio further, I don't understand how it's supposed to work > > with MT fork at all. __aio_atfork() is called in _Fork() when the > > allocator locks are already held. Meanwhile another thread could be > > stuck in __aio_get_queue() holding maplock in exclusive mode while > > trying to allocate, resulting in deadlock. > > Indeed, this is messy and I don't think it makes sense to be doing > this at all. The child is just going to throw away the state so the > parent shouldn't need to synchronize at all, but if we walk the > multi-level map[] table in the child after async fork, it's possible > that the contents seen are inconsistent, even that the pointers are > only half-written or something. > > I see a few possible solutions: > > 1. Just set map = 0 in the child and leak the memory. This is not > going to matter unless you're doing multiple generations of fork > with aio anyway. > > 2. The same, but be a little bit smarter. pthread_rwlock_tryrdlock in > the child, and if it succeeds, we know the map is consistent so we > can just zero it out the same as now. Still "leaks" but only on > contention to expand the map. > > 3. Getting a little smarter still: move the __aio_atfork for the > parent side from _Fork to fork, outside of the critical section > where malloc lock is held. Then proceed as in (2). Now, the > tryrdlock is guaranteed to succeed in the child. Leak is only > possible when _Fork is used (in which case the child context is an > async signal one, and thus calling any aio_* that would allocate > map[] again is UB -- note that in this case, the only reason we > have to do anything at all in the child is to prevent close from > interacting with aio). > > After writing them out, 3 seems like the right choice. Proposed patch attached.