From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_MSPIKE_H2 autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 29414 invoked from network); 8 Oct 2022 16:07:35 -0000 Received: from second.openwall.net (193.110.157.125) by inbox.vuxu.org with ESMTPUTF8; 8 Oct 2022 16:07:35 -0000 Received: (qmail 3845 invoked by uid 550); 8 Oct 2022 16:07:30 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 3813 invoked from network); 8 Oct 2022 16:07:30 -0000 DKIM-Filter: OpenDKIM Filter v2.11.0 mail.ispras.ru 6AFE540737A7 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ispras.ru; s=default; t=1665245237; bh=GzC7QaTWradHzhDwhM7L09k7AhXfUP/Y2H8jBMUQJc4=; h=Date:From:To:Subject:Reply-To:In-Reply-To:References:From; b=UnvMQBlvv/ELI1no4IJ2wo90KLEadVvtnHAv1M4iX2N4sWxl4fTJZZllisycKDFPV zZzlSfrmZFfl2uGE2Gs9pBQ4FZe9jeRgCHE8kQxgEsiTsIXr1jXWvPwqMKotbDjdvC ydBos/C48+Q2x//x+xdKTOOxUJiJ/YwXRYfhrf9s= MIME-Version: 1.0 Date: Sat, 08 Oct 2022 19:07:17 +0300 From: Alexey Izbyshev To: musl@lists.openwall.com Mail-Followup-To: musl@lists.openwall.com In-Reply-To: <20221007211816.GA29905@brightrain.aerifal.cx> References: <033092a85a52d9f8e8d73a235e2381ae@ispras.ru> <6cf9cc5562e17610392c0f9c2cc68316@ispras.ru> <20221006192042.GW29905@brightrain.aerifal.cx> <20221006195053.GX29905@brightrain.aerifal.cx> <20221007012654.GZ29905@brightrain.aerifal.cx> <20221007211816.GA29905@brightrain.aerifal.cx> User-Agent: Roundcube Webmail/1.4.4 Message-ID: <0b71eb940d7d798517ba476ef86f4a42@ispras.ru> X-Sender: izbyshev@ispras.ru Content-Type: multipart/mixed; boundary="=_59060feec2fed2b6048bff0c0f85912b" Subject: Re: [musl] Re: MT fork and key_lock in pthread_key_create.c --=_59060feec2fed2b6048bff0c0f85912b Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII; format=flowed On 2022-10-08 00:18, Rich Felker wrote: > On Fri, Oct 07, 2022 at 01:53:14PM +0300, Alexey Izbyshev wrote: >> On 2022-10-07 04:26, Rich Felker wrote: >> >There's at least one other related bug here: when __aio_get_queue has >> >to take the write lock, it does so without blocking signals, so close >> >called from a signal handler that interrupts will self-deadlock. >> > >> This is worse. Even if maplock didn't exist, __aio_get_queue still >> takes the queue lock, so close from a signal handler would still >> deadlock. Maybe this could be fixed by simply blocking signals >> earlier in submit? aio_cancel already calls __aio_get_queue with >> signals blocked. > > The queue lock only affects a close concurrent with aio on the same > fd. This is at least a programming error and possibly UB, so it's not > that big a concern. > Thanks. Indeed, since aio_cancel isn't required to be async-signal-safe, we're interested only in the case when it's called from close to cancel all requests for a particular fd. >> >This is fixable by blocking signals for the write-lock critical >> >section, but there's still a potentially long window where we're >> >blocking forward progress of other threads. >> > >> To make this more concrete, are you worrying about the libc-internal >> allocator delaying other threads attempting to take maplock (even in >> shared mode)? If so, this seems like a problem that is independent >> from close signal-safety. > > Yes. The "but there's still..." part is a separate, relatively minor, > issue that's just a matter of delaying forward progress of unrelated > close operations, which could be nasty in a realtime process. It only > happens boundedly-many times (at most once for each fd number ever > used) so it's not that big a deal asymptotically, only worst-case. > >> >I think the right thing to do here is add another lock for updating >> >the map that we can take while we still hold the rdlock. After taking >> >this lock, we can re-inspect if any work is needed. If so, do all the >> >work *still holding only the rdlock*, but not writing the results into >> >the existing map. After all the allocation is done, release the >> >rdlock, take the wrlock (with signals blocked), install the new >> >memory, then release all the locks. This makes it so the wrlock is >> >only hold momentarily, under very controlled conditions, so we don't >> >have to worry about messing up AS-safety of close. >> > >> Sorry, I can't understand what exactly you mean here, in particular, >> what the old lock is supposed to protect if its holders are not >> mutually-excluded with updates of the map guarded by the new lock. > > Updates to the map are guarded by the new lock, but exclusion of reads > concurrent with installing the update still need to be guarded by the > rwlock unless the updates are made atomically. > Thanks. I think I understand the purpose of the new lock now, but I still can't understand the locking sequence that you proposed (if I read it correctly): "take rdlock -> take new_lock -> drop rdlock -> block signals -> take wrlock -> ...". This can't work due to lock ordering issues, and it's also unclear why we would need to re-inspect anything after taking the new lock if we're still holding the rdlock and the latter protects us from "installing the updates" (as you say in the follow-up). I've implemented my understanding of how the newly proposed lock should work in the attached patch. The patch doesn't optimize the case when on re-inspection we discover an already existing queue (it will still block signals and take the write lock), but this can be easily added if desired. Also it doesn't handle the new lock in aio_atfork. >> I understand the general idea of doing allocations outside the >> critical section though. I imagine it in the following way: >> >> 1. Take maplock in shared mode. >> 2. Find the first level/entry in the map that we need to allocate. >> If no allocation is needed, goto 7. >> 3. Release maplock. >> 4. Allocate all needed map levels and a queue. >> 5. Take maplock in exclusive mode. >> 6. Modify the map as needed, remembering chunks that we need to free >> because another thread beat us. >> 7. Take queue lock. >> 8. Release maplock. >> 9. Free all unneeded chunks. // This could be delayed further if >> doing it under the queue lock is undesirable. > > The point of the proposed new lock is to avoid having 9 at all. By > evaluating what you need to allocate under a lock, you can ensure > nobody else races to allocate it before you. > >> Waiting at step 7 under the exclusive lock can occur only if at step >> 6 we discovered that we don't need modify the map at all (i.e. we're >> going to use an already existing queue). It doesn't seem to be a >> problem because nothing seems to take the queue lock for prolonged >> periods (except possibly step 9), but even if it is, we could >> "downgrade" the exclusive lock to the shared one by dropping it and >> retrying from step 1 in a loop. > > AFAICT there's no need for a loop. The map only grows never shrinks. > If it's been seen expanded to cover the fd you want once, it will > cover it forever. > This is true for the map itself, but not for queues. As soon as we drop the write lock after seeing an existing queue, it might disappear before we take the read lock again. In your scheme with an additional lock this problem is avoided. Thanks, Alexey --=_59060feec2fed2b6048bff0c0f85912b Content-Transfer-Encoding: base64 Content-Type: text/x-diff; name=aio-map-update-lock.diff Content-Disposition: attachment; filename=aio-map-update-lock.diff; size=3853 ZGlmZiAtLWdpdCBhL3NyYy9haW8vYWlvLmMgYi9zcmMvYWlvL2Fpby5jCmluZGV4IGExYTNlNzkx Li5hMDk3ZGU5NCAxMDA2NDQKLS0tIGEvc3JjL2Fpby9haW8uYworKysgYi9zcmMvYWlvL2Fpby5j CkBAIC04LDYgKzgsNyBAQAogI2luY2x1ZGUgPHN5cy9hdXh2Lmg+CiAjaW5jbHVkZSAic3lzY2Fs bC5oIgogI2luY2x1ZGUgImF0b21pYy5oIgorI2luY2x1ZGUgImxvY2suaCIKICNpbmNsdWRlICJw dGhyZWFkX2ltcGwuaCIKICNpbmNsdWRlICJhaW9faW1wbC5oIgogCkBAIC03MSw2ICs3Miw3IEBA IHN0cnVjdCBhaW9fYXJncyB7CiAJc2VtX3Qgc2VtOwogfTsKIAorc3RhdGljIHZvbGF0aWxlIGlu dCBtYXBfdXBkYXRlX2xvY2tbMV07CiBzdGF0aWMgcHRocmVhZF9yd2xvY2tfdCBtYXBsb2NrID0g UFRIUkVBRF9SV0xPQ0tfSU5JVElBTElaRVI7CiBzdGF0aWMgc3RydWN0IGFpb19xdWV1ZSAqKioq Km1hcDsKIHN0YXRpYyB2b2xhdGlsZSBpbnQgYWlvX2ZkX2NudDsKQEAgLTg2LDQwICs4OCw2MiBA QCBzdGF0aWMgc3RydWN0IGFpb19xdWV1ZSAqX19haW9fZ2V0X3F1ZXVlKGludCBmZCwgaW50IG5l ZWQpCiAJCWVycm5vID0gRUJBREY7CiAJCXJldHVybiAwOwogCX0KKwlzaWdzZXRfdCBhbGxtYXNr LCBvcmlnbWFzazsKIAlpbnQgYT1mZD4+MjQ7CiAJdW5zaWduZWQgY2hhciBiPWZkPj4xNiwgYz1m ZD4+OCwgZD1mZDsKLQlzdHJ1Y3QgYWlvX3F1ZXVlICpxID0gMDsKKwlzdHJ1Y3QgYWlvX3F1ZXVl ICoqKioqbSA9IDAsICoqKiptYSA9IDAsICoqKm1iID0gMCwgKiptYyA9IDAsICpxID0gMDsKKwlp bnQgbiA9IDA7CisKIAlwdGhyZWFkX3J3bG9ja19yZGxvY2soJm1hcGxvY2spOwotCWlmICgoIW1h cCB8fCAhbWFwW2FdIHx8ICFtYXBbYV1bYl0gfHwgIW1hcFthXVtiXVtjXSB8fCAhKHE9bWFwW2Fd W2JdW2NdW2RdKSkgJiYgbmVlZCkgewotCQlwdGhyZWFkX3J3bG9ja191bmxvY2soJm1hcGxvY2sp OwotCQlpZiAoZmNudGwoZmQsIEZfR0VURkQpIDwgMCkgcmV0dXJuIDA7Ci0JCXB0aHJlYWRfcnds b2NrX3dybG9jaygmbWFwbG9jayk7CisJaWYgKG1hcCAmJiBtYXBbYV0gJiYgbWFwW2FdW2JdICYm IG1hcFthXVtiXVtjXSAmJiAocT1tYXBbYV1bYl1bY11bZF0pKQorCQlwdGhyZWFkX211dGV4X2xv Y2soJnEtPmxvY2spOworCXB0aHJlYWRfcndsb2NrX3VubG9jaygmbWFwbG9jayk7CisKKwlpZiAo cSB8fCAhbmVlZCkKKwkJcmV0dXJuIHE7CisJaWYgKGZjbnRsKGZkLCBGX0dFVEZEKSA8IDApCisJ CXJldHVybiAwOworCisJTE9DSyhtYXBfdXBkYXRlX2xvY2spOworCWlmICghbWFwIHx8ICEobisr LCBtYXBbYV0pIHx8ICEobisrLCBtYXBbYV1bYl0pIHx8ICEobisrLCBtYXBbYV1bYl1bY10pIHx8 ICEobisrLCBxPW1hcFthXVtiXVtjXVtkXSkpIHsKIAkJaWYgKCFpb190aHJlYWRfc3RhY2tfc2l6 ZSkgewogCQkJdW5zaWduZWQgbG9uZyB2YWwgPSBfX2dldGF1eHZhbChBVF9NSU5TSUdTVEtTWik7 CiAJCQlpb190aHJlYWRfc3RhY2tfc2l6ZSA9IE1BWChNSU5TSUdTVEtTWisyMDQ4LCB2YWwrNTEy KTsKIAkJfQotCQlpZiAoIW1hcCkgbWFwID0gY2FsbG9jKHNpemVvZiAqbWFwLCAoLTFVLzIrMSk+ PjI0KTsKLQkJaWYgKCFtYXApIGdvdG8gb3V0OwotCQlpZiAoIW1hcFthXSkgbWFwW2FdID0gY2Fs bG9jKHNpemVvZiAqKm1hcCwgMjU2KTsKLQkJaWYgKCFtYXBbYV0pIGdvdG8gb3V0OwotCQlpZiAo IW1hcFthXVtiXSkgbWFwW2FdW2JdID0gY2FsbG9jKHNpemVvZiAqKiptYXAsIDI1Nik7Ci0JCWlm ICghbWFwW2FdW2JdKSBnb3RvIG91dDsKLQkJaWYgKCFtYXBbYV1bYl1bY10pIG1hcFthXVtiXVtj XSA9IGNhbGxvYyhzaXplb2YgKioqKm1hcCwgMjU2KTsKLQkJaWYgKCFtYXBbYV1bYl1bY10pIGdv dG8gb3V0OwotCQlpZiAoIShxID0gbWFwW2FdW2JdW2NdW2RdKSkgewotCQkJbWFwW2FdW2JdW2Nd W2RdID0gcSA9IGNhbGxvYyhzaXplb2YgKioqKiptYXAsIDEpOwotCQkJaWYgKHEpIHsKLQkJCQlx LT5mZCA9IGZkOwotCQkJCXB0aHJlYWRfbXV0ZXhfaW5pdCgmcS0+bG9jaywgMCk7Ci0JCQkJcHRo cmVhZF9jb25kX2luaXQoJnEtPmNvbmQsIDApOwotCQkJCWFfaW5jKCZhaW9fZmRfY250KTsKLQkJ CX0KKwkJc3dpdGNoIChuKSB7CisJCQljYXNlIDA6IGlmICghKG0gPSBjYWxsb2Moc2l6ZW9mICpt LCAoLTFVLzIrMSk+PjI0KSkpIGdvdG8gZmFpbDsKKwkJCWNhc2UgMTogaWYgKCEobWEgPSBjYWxs b2Moc2l6ZW9mICptYSwgMjU2KSkpIGdvdG8gZmFpbDsKKwkJCWNhc2UgMjogaWYgKCEobWIgPSBj YWxsb2Moc2l6ZW9mICptYiwgMjU2KSkpIGdvdG8gZmFpbDsKKwkJCWNhc2UgMzogaWYgKCEobWMg PSBjYWxsb2Moc2l6ZW9mICptYywgMjU2KSkpIGdvdG8gZmFpbDsKKwkJCWNhc2UgNDogaWYgKCEo cSA9IGNhbGxvYyhzaXplb2YgKnEsIDEpKSkgZ290byBmYWlsOwogCQl9CisJCXEtPmZkID0gZmQ7 CisJCXB0aHJlYWRfbXV0ZXhfaW5pdCgmcS0+bG9jaywgMCk7CisJCXB0aHJlYWRfY29uZF9pbml0 KCZxLT5jb25kLCAwKTsKKwkJYV9pbmMoJmFpb19mZF9jbnQpOwogCX0KLQlpZiAocSkgcHRocmVh ZF9tdXRleF9sb2NrKCZxLT5sb2NrKTsKLW91dDoKKwlzaWdmaWxsc2V0KCZhbGxtYXNrKTsKKwlw dGhyZWFkX3NpZ21hc2soU0lHX0JMT0NLLCAmYWxsbWFzaywgJm9yaWdtYXNrKTsKKwlwdGhyZWFk X3J3bG9ja193cmxvY2soJm1hcGxvY2spOworCXN3aXRjaCAobikgeworCQljYXNlIDA6IG1hcCA9 IG07CisJCWNhc2UgMTogbWFwW2FdID0gbWE7CisJCWNhc2UgMjogbWFwW2FdW2JdID0gbWI7CisJ CWNhc2UgMzogbWFwW2FdW2JdW2NdID0gbWM7CisJCWNhc2UgNDogbWFwW2FdW2JdW2NdW2RdID0g cTsKKwl9CisJcHRocmVhZF9tdXRleF9sb2NrKCZxLT5sb2NrKTsKIAlwdGhyZWFkX3J3bG9ja191 bmxvY2soJm1hcGxvY2spOworCXB0aHJlYWRfc2lnbWFzayhTSUdfU0VUTUFTSywgJm9yaWdtYXNr LCAwKTsKKwlVTkxPQ0sobWFwX3VwZGF0ZV9sb2NrKTsKIAlyZXR1cm4gcTsKK2ZhaWw6CisJVU5M T0NLKG1hcF91cGRhdGVfbG9jayk7CisJZnJlZShtYyk7CisJZnJlZShtYik7CisJZnJlZShtYSk7 CisJZnJlZShtKTsKKwlyZXR1cm4gMDsKIH0KIAogc3RhdGljIHZvaWQgX19haW9fdW5yZWZfcXVl dWUoc3RydWN0IGFpb19xdWV1ZSAqcSkKQEAgLTEzNCw2ICsxNTgsNyBAQCBzdGF0aWMgdm9pZCBf X2Fpb191bnJlZl9xdWV1ZShzdHJ1Y3QgYWlvX3F1ZXVlICpxKQogCSAqIG1heSBhcnJpdmUgc2lu Y2Ugd2UgY2Fubm90IGZyZWUgdGhlIHF1ZXVlIG9iamVjdCB3aXRob3V0IGZpcnN0CiAJICogdGFr aW5nIHRoZSBtYXBsb2NrLCB3aGljaCByZXF1aXJlcyByZWxlYXNpbmcgdGhlIHF1ZXVlIGxvY2su ICovCiAJcHRocmVhZF9tdXRleF91bmxvY2soJnEtPmxvY2spOworCUxPQ0sobWFwX3VwZGF0ZV9s b2NrKTsKIAlwdGhyZWFkX3J3bG9ja193cmxvY2soJm1hcGxvY2spOwogCXB0aHJlYWRfbXV0ZXhf bG9jaygmcS0+bG9jayk7CiAJaWYgKHEtPnJlZiA9PSAxKSB7CkBAIC0xNDQsMTEgKzE2OSwxMyBA QCBzdGF0aWMgdm9pZCBfX2Fpb191bnJlZl9xdWV1ZShzdHJ1Y3QgYWlvX3F1ZXVlICpxKQogCQlh X2RlYygmYWlvX2ZkX2NudCk7CiAJCXB0aHJlYWRfcndsb2NrX3VubG9jaygmbWFwbG9jayk7CiAJ CXB0aHJlYWRfbXV0ZXhfdW5sb2NrKCZxLT5sb2NrKTsKKwkJVU5MT0NLKG1hcF91cGRhdGVfbG9j ayk7CiAJCWZyZWUocSk7CiAJfSBlbHNlIHsKIAkJcS0+cmVmLS07CiAJCXB0aHJlYWRfcndsb2Nr X3VubG9jaygmbWFwbG9jayk7CiAJCXB0aHJlYWRfbXV0ZXhfdW5sb2NrKCZxLT5sb2NrKTsKKwkJ VU5MT0NLKG1hcF91cGRhdGVfbG9jayk7CiAJfQogfQogCg== --=_59060feec2fed2b6048bff0c0f85912b--