On 2022-10-08 00:18, Rich Felker wrote:
> On Fri, Oct 07, 2022 at 01:53:14PM +0300, Alexey Izbyshev wrote:
>> On 2022-10-07 04:26, Rich Felker wrote:
>> >There's at least one other related bug here: when __aio_get_queue has
>> >to take the write lock, it does so without blocking signals, so close
>> >called from a signal handler that interrupts will self-deadlock.
>> >
>> This is worse. Even if maplock didn't exist, __aio_get_queue still
>> takes the queue lock, so close from a signal handler would still
>> deadlock. Maybe this could be fixed by simply blocking signals
>> earlier in submit? aio_cancel already calls __aio_get_queue with
>> signals blocked.
> 
> The queue lock only affects a close concurrent with aio on the same
> fd. This is at least a programming error and possibly UB, so it's not
> that big a concern.
> 
Thanks. Indeed, since aio_cancel isn't required to be async-signal-safe, 
we're interested only in the case when it's called from close to cancel 
all requests for a particular fd.

>> >This is fixable by blocking signals for the write-lock critical
>> >section, but there's still a potentially long window where we're
>> >blocking forward progress of other threads.
>> >
>> To make this more concrete, are you worrying about the libc-internal
>> allocator delaying other threads attempting to take maplock (even in
>> shared mode)? If so, this seems like a problem that is independent
>> from close signal-safety.
> 
> Yes. The "but there's still..." part is a separate, relatively minor,
> issue that's just a matter of delaying forward progress of unrelated
> close operations, which could be nasty in a realtime process. It only
> happens boundedly-many times (at most once for each fd number ever
> used) so it's not that big a deal asymptotically, only worst-case.
> 
>> >I think the right thing to do here is add another lock for updating
>> >the map that we can take while we still hold the rdlock. After taking
>> >this lock, we can re-inspect if any work is needed. If so, do all the
>> >work *still holding only the rdlock*, but not writing the results into
>> >the existing map. After all the allocation is done, release the
>> >rdlock, take the wrlock (with signals blocked), install the new
>> >memory, then release all the locks. This makes it so the wrlock is
>> >only hold momentarily, under very controlled conditions, so we don't
>> >have to worry about messing up AS-safety of close.
>> >
>> Sorry, I can't understand what exactly you mean here, in particular,
>> what the old lock is supposed to protect if its holders are not
>> mutually-excluded with updates of the map guarded by the new lock.
> 
> Updates to the map are guarded by the new lock, but exclusion of reads
> concurrent with installing the update still need to be guarded by the
> rwlock unless the updates are made atomically.
> 
Thanks. I think I understand the purpose of the new lock now, but I 
still can't understand the locking sequence that you proposed (if I read 
it correctly): "take rdlock -> take new_lock -> drop rdlock -> block 
signals -> take wrlock -> ...". This can't work due to lock ordering 
issues, and it's also unclear why we would need to re-inspect anything 
after taking the new lock if we're still holding the rdlock and the 
latter protects us from "installing the updates" (as you say in the 
follow-up).

I've implemented my understanding of how the newly proposed lock should 
work in the attached patch. The patch doesn't optimize the case when on 
re-inspection we discover an already existing queue (it will still block 
signals and take the write lock), but this can be easily added if 
desired. Also it doesn't handle the new lock in aio_atfork.

>> I understand the general idea of doing allocations outside the
>> critical section though. I imagine it in the following way:
>> 
>> 1. Take maplock in shared mode.
>> 2. Find the first level/entry in the map that we need to allocate.
>> If no allocation is needed, goto 7.
>> 3. Release maplock.
>> 4. Allocate all needed map levels and a queue.
>> 5. Take maplock in exclusive mode.
>> 6. Modify the map as needed, remembering chunks that we need to free
>> because another thread beat us.
>> 7. Take queue lock.
>> 8. Release maplock.
>> 9. Free all unneeded chunks. // This could be delayed further if
>> doing it under the queue lock is undesirable.
> 
> The point of the proposed new lock is to avoid having 9 at all. By
> evaluating what you need to allocate under a lock, you can ensure
> nobody else races to allocate it before you.
> 
>> Waiting at step 7 under the exclusive lock can occur only if at step
>> 6 we discovered that we don't need modify the map at all (i.e. we're
>> going to use an already existing queue). It doesn't seem to be a
>> problem because nothing seems to take the queue lock for prolonged
>> periods (except possibly step 9), but even if it is, we could
>> "downgrade" the exclusive lock to the shared one by dropping it and
>> retrying from step 1 in a loop.
> 
> AFAICT there's no need for a loop. The map only grows never shrinks.
> If it's been seen expanded to cover the fd you want once, it will
> cover it forever.
> 
This is true for the map itself, but not for queues. As soon as we drop 
the write lock after seeing an existing queue, it might disappear before 
we take the read lock again. In your scheme with an additional lock this 
problem is avoided.

Thanks,
Alexey