mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Alexey Izbyshev <izbyshev@ispras.ru>
To: musl@lists.openwall.com
Subject: Re: [musl] Re: MT fork and key_lock in pthread_key_create.c
Date: Sat, 08 Oct 2022 19:07:17 +0300	[thread overview]
Message-ID: <0b71eb940d7d798517ba476ef86f4a42@ispras.ru> (raw)
In-Reply-To: <20221007211816.GA29905@brightrain.aerifal.cx>

[-- Attachment #1: Type: text/plain, Size: 5428 bytes --]

On 2022-10-08 00:18, Rich Felker wrote:
> On Fri, Oct 07, 2022 at 01:53:14PM +0300, Alexey Izbyshev wrote:
>> On 2022-10-07 04:26, Rich Felker wrote:
>> >There's at least one other related bug here: when __aio_get_queue has
>> >to take the write lock, it does so without blocking signals, so close
>> >called from a signal handler that interrupts will self-deadlock.
>> >
>> This is worse. Even if maplock didn't exist, __aio_get_queue still
>> takes the queue lock, so close from a signal handler would still
>> deadlock. Maybe this could be fixed by simply blocking signals
>> earlier in submit? aio_cancel already calls __aio_get_queue with
>> signals blocked.
> 
> The queue lock only affects a close concurrent with aio on the same
> fd. This is at least a programming error and possibly UB, so it's not
> that big a concern.
> 
Thanks. Indeed, since aio_cancel isn't required to be async-signal-safe, 
we're interested only in the case when it's called from close to cancel 
all requests for a particular fd.

>> >This is fixable by blocking signals for the write-lock critical
>> >section, but there's still a potentially long window where we're
>> >blocking forward progress of other threads.
>> >
>> To make this more concrete, are you worrying about the libc-internal
>> allocator delaying other threads attempting to take maplock (even in
>> shared mode)? If so, this seems like a problem that is independent
>> from close signal-safety.
> 
> Yes. The "but there's still..." part is a separate, relatively minor,
> issue that's just a matter of delaying forward progress of unrelated
> close operations, which could be nasty in a realtime process. It only
> happens boundedly-many times (at most once for each fd number ever
> used) so it's not that big a deal asymptotically, only worst-case.
> 
>> >I think the right thing to do here is add another lock for updating
>> >the map that we can take while we still hold the rdlock. After taking
>> >this lock, we can re-inspect if any work is needed. If so, do all the
>> >work *still holding only the rdlock*, but not writing the results into
>> >the existing map. After all the allocation is done, release the
>> >rdlock, take the wrlock (with signals blocked), install the new
>> >memory, then release all the locks. This makes it so the wrlock is
>> >only hold momentarily, under very controlled conditions, so we don't
>> >have to worry about messing up AS-safety of close.
>> >
>> Sorry, I can't understand what exactly you mean here, in particular,
>> what the old lock is supposed to protect if its holders are not
>> mutually-excluded with updates of the map guarded by the new lock.
> 
> Updates to the map are guarded by the new lock, but exclusion of reads
> concurrent with installing the update still need to be guarded by the
> rwlock unless the updates are made atomically.
> 
Thanks. I think I understand the purpose of the new lock now, but I 
still can't understand the locking sequence that you proposed (if I read 
it correctly): "take rdlock -> take new_lock -> drop rdlock -> block 
signals -> take wrlock -> ...". This can't work due to lock ordering 
issues, and it's also unclear why we would need to re-inspect anything 
after taking the new lock if we're still holding the rdlock and the 
latter protects us from "installing the updates" (as you say in the 
follow-up).

I've implemented my understanding of how the newly proposed lock should 
work in the attached patch. The patch doesn't optimize the case when on 
re-inspection we discover an already existing queue (it will still block 
signals and take the write lock), but this can be easily added if 
desired. Also it doesn't handle the new lock in aio_atfork.

>> I understand the general idea of doing allocations outside the
>> critical section though. I imagine it in the following way:
>> 
>> 1. Take maplock in shared mode.
>> 2. Find the first level/entry in the map that we need to allocate.
>> If no allocation is needed, goto 7.
>> 3. Release maplock.
>> 4. Allocate all needed map levels and a queue.
>> 5. Take maplock in exclusive mode.
>> 6. Modify the map as needed, remembering chunks that we need to free
>> because another thread beat us.
>> 7. Take queue lock.
>> 8. Release maplock.
>> 9. Free all unneeded chunks. // This could be delayed further if
>> doing it under the queue lock is undesirable.
> 
> The point of the proposed new lock is to avoid having 9 at all. By
> evaluating what you need to allocate under a lock, you can ensure
> nobody else races to allocate it before you.
> 
>> Waiting at step 7 under the exclusive lock can occur only if at step
>> 6 we discovered that we don't need modify the map at all (i.e. we're
>> going to use an already existing queue). It doesn't seem to be a
>> problem because nothing seems to take the queue lock for prolonged
>> periods (except possibly step 9), but even if it is, we could
>> "downgrade" the exclusive lock to the shared one by dropping it and
>> retrying from step 1 in a loop.
> 
> AFAICT there's no need for a loop. The map only grows never shrinks.
> If it's been seen expanded to cover the fd you want once, it will
> cover it forever.
> 
This is true for the map itself, but not for queues. As soon as we drop 
the write lock after seeing an existing queue, it might disappear before 
we take the read lock again. In your scheme with an additional lock this 
problem is avoided.

Thanks,
Alexey

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: aio-map-update-lock.diff --]
[-- Type: text/x-diff; name=aio-map-update-lock.diff, Size: 3853 bytes --]

diff --git a/src/aio/aio.c b/src/aio/aio.c
index a1a3e791..a097de94 100644
--- a/src/aio/aio.c
+++ b/src/aio/aio.c
@@ -8,6 +8,7 @@
 #include <sys/auxv.h>
 #include "syscall.h"
 #include "atomic.h"
+#include "lock.h"
 #include "pthread_impl.h"
 #include "aio_impl.h"
 
@@ -71,6 +72,7 @@ struct aio_args {
 	sem_t sem;
 };
 
+static volatile int map_update_lock[1];
 static pthread_rwlock_t maplock = PTHREAD_RWLOCK_INITIALIZER;
 static struct aio_queue *****map;
 static volatile int aio_fd_cnt;
@@ -86,40 +88,62 @@ static struct aio_queue *__aio_get_queue(int fd, int need)
 		errno = EBADF;
 		return 0;
 	}
+	sigset_t allmask, origmask;
 	int a=fd>>24;
 	unsigned char b=fd>>16, c=fd>>8, d=fd;
-	struct aio_queue *q = 0;
+	struct aio_queue *****m = 0, ****ma = 0, ***mb = 0, **mc = 0, *q = 0;
+	int n = 0;
+
 	pthread_rwlock_rdlock(&maplock);
-	if ((!map || !map[a] || !map[a][b] || !map[a][b][c] || !(q=map[a][b][c][d])) && need) {
-		pthread_rwlock_unlock(&maplock);
-		if (fcntl(fd, F_GETFD) < 0) return 0;
-		pthread_rwlock_wrlock(&maplock);
+	if (map && map[a] && map[a][b] && map[a][b][c] && (q=map[a][b][c][d]))
+		pthread_mutex_lock(&q->lock);
+	pthread_rwlock_unlock(&maplock);
+
+	if (q || !need)
+		return q;
+	if (fcntl(fd, F_GETFD) < 0)
+		return 0;
+
+	LOCK(map_update_lock);
+	if (!map || !(n++, map[a]) || !(n++, map[a][b]) || !(n++, map[a][b][c]) || !(n++, q=map[a][b][c][d])) {
 		if (!io_thread_stack_size) {
 			unsigned long val = __getauxval(AT_MINSIGSTKSZ);
 			io_thread_stack_size = MAX(MINSIGSTKSZ+2048, val+512);
 		}
-		if (!map) map = calloc(sizeof *map, (-1U/2+1)>>24);
-		if (!map) goto out;
-		if (!map[a]) map[a] = calloc(sizeof **map, 256);
-		if (!map[a]) goto out;
-		if (!map[a][b]) map[a][b] = calloc(sizeof ***map, 256);
-		if (!map[a][b]) goto out;
-		if (!map[a][b][c]) map[a][b][c] = calloc(sizeof ****map, 256);
-		if (!map[a][b][c]) goto out;
-		if (!(q = map[a][b][c][d])) {
-			map[a][b][c][d] = q = calloc(sizeof *****map, 1);
-			if (q) {
-				q->fd = fd;
-				pthread_mutex_init(&q->lock, 0);
-				pthread_cond_init(&q->cond, 0);
-				a_inc(&aio_fd_cnt);
-			}
+		switch (n) {
+			case 0: if (!(m = calloc(sizeof *m, (-1U/2+1)>>24))) goto fail;
+			case 1: if (!(ma = calloc(sizeof *ma, 256))) goto fail;
+			case 2: if (!(mb = calloc(sizeof *mb, 256))) goto fail;
+			case 3: if (!(mc = calloc(sizeof *mc, 256))) goto fail;
+			case 4: if (!(q = calloc(sizeof *q, 1))) goto fail;
 		}
+		q->fd = fd;
+		pthread_mutex_init(&q->lock, 0);
+		pthread_cond_init(&q->cond, 0);
+		a_inc(&aio_fd_cnt);
 	}
-	if (q) pthread_mutex_lock(&q->lock);
-out:
+	sigfillset(&allmask);
+	pthread_sigmask(SIG_BLOCK, &allmask, &origmask);
+	pthread_rwlock_wrlock(&maplock);
+	switch (n) {
+		case 0: map = m;
+		case 1: map[a] = ma;
+		case 2: map[a][b] = mb;
+		case 3: map[a][b][c] = mc;
+		case 4: map[a][b][c][d] = q;
+	}
+	pthread_mutex_lock(&q->lock);
 	pthread_rwlock_unlock(&maplock);
+	pthread_sigmask(SIG_SETMASK, &origmask, 0);
+	UNLOCK(map_update_lock);
 	return q;
+fail:
+	UNLOCK(map_update_lock);
+	free(mc);
+	free(mb);
+	free(ma);
+	free(m);
+	return 0;
 }
 
 static void __aio_unref_queue(struct aio_queue *q)
@@ -134,6 +158,7 @@ static void __aio_unref_queue(struct aio_queue *q)
 	 * may arrive since we cannot free the queue object without first
 	 * taking the maplock, which requires releasing the queue lock. */
 	pthread_mutex_unlock(&q->lock);
+	LOCK(map_update_lock);
 	pthread_rwlock_wrlock(&maplock);
 	pthread_mutex_lock(&q->lock);
 	if (q->ref == 1) {
@@ -144,11 +169,13 @@ static void __aio_unref_queue(struct aio_queue *q)
 		a_dec(&aio_fd_cnt);
 		pthread_rwlock_unlock(&maplock);
 		pthread_mutex_unlock(&q->lock);
+		UNLOCK(map_update_lock);
 		free(q);
 	} else {
 		q->ref--;
 		pthread_rwlock_unlock(&maplock);
 		pthread_mutex_unlock(&q->lock);
+		UNLOCK(map_update_lock);
 	}
 }
 

  reply	other threads:[~2022-10-08 16:07 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-06  6:37 [musl] " Alexey Izbyshev
2022-10-06  7:02 ` [musl] " Alexey Izbyshev
2022-10-06 19:20   ` Rich Felker
2022-10-06 19:50     ` Rich Felker
2022-10-07  1:26       ` Rich Felker
2022-10-07 10:53         ` Alexey Izbyshev
2022-10-07 21:18           ` Rich Felker
2022-10-08 16:07             ` Alexey Izbyshev [this message]
2022-10-07  8:18       ` Alexey Izbyshev
2022-10-06 20:04     ` Jeffrey Walton
2022-10-06 20:09       ` Rich Felker
2022-10-06 18:21 ` [musl] " Rich Felker
2022-10-08  1:36   ` Rich Felker
2022-10-08 17:03     ` Alexey Izbyshev
2022-10-11 17:50       ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0b71eb940d7d798517ba476ef86f4a42@ispras.ru \
    --to=izbyshev@ispras.ru \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).