mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] MT fork and key_lock in pthread_key_create.c
@ 2022-10-06  6:37 Alexey Izbyshev
  2022-10-06  7:02 ` [musl] " Alexey Izbyshev
  2022-10-06 18:21 ` [musl] " Rich Felker
  0 siblings, 2 replies; 15+ messages in thread
From: Alexey Izbyshev @ 2022-10-06  6:37 UTC (permalink / raw)
  To: musl

Hi,

I noticed that fork() doesn't take key_lock that is used to protect the 
global table of thread-specific keys. I couldn't find mentions of this 
lock in the MT fork discussion in the mailing list archive. Was this 
lock overlooked?

Also, I looked at how __aio_atfork() handles a similar case with 
maplock, and it seems wrong. It takes the read lock and then simply 
unlocks it both in the parent and in the child. But if there were other 
holders of the read lock at the time of fork(), the lock won't end up in 
the unlocked state in the child. It should probably be completely 
nulled-out in the child instead.

Thanks,
Alexey

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [musl] Re: MT fork and key_lock in pthread_key_create.c
  2022-10-06  6:37 [musl] MT fork and key_lock in pthread_key_create.c Alexey Izbyshev
@ 2022-10-06  7:02 ` Alexey Izbyshev
  2022-10-06 19:20   ` Rich Felker
  2022-10-06 18:21 ` [musl] " Rich Felker
  1 sibling, 1 reply; 15+ messages in thread
From: Alexey Izbyshev @ 2022-10-06  7:02 UTC (permalink / raw)
  To: musl

On 2022-10-06 09:37, Alexey Izbyshev wrote:
> Hi,
> 
> I noticed that fork() doesn't take key_lock that is used to protect
> the global table of thread-specific keys. I couldn't find mentions of
> this lock in the MT fork discussion in the mailing list archive. Was
> this lock overlooked?
> 
> Also, I looked at how __aio_atfork() handles a similar case with
> maplock, and it seems wrong. It takes the read lock and then simply
> unlocks it both in the parent and in the child. But if there were
> other holders of the read lock at the time of fork(), the lock won't
> end up in the unlocked state in the child. It should probably be
> completely nulled-out in the child instead.
> 
Looking at aio further, I don't understand how it's supposed to work 
with MT fork at all. __aio_atfork() is called in _Fork() when the 
allocator locks are already held. Meanwhile another thread could be 
stuck in __aio_get_queue() holding maplock in exclusive mode while 
trying to allocate, resulting in deadlock.

Alexey

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [musl] MT fork and key_lock in pthread_key_create.c
  2022-10-06  6:37 [musl] MT fork and key_lock in pthread_key_create.c Alexey Izbyshev
  2022-10-06  7:02 ` [musl] " Alexey Izbyshev
@ 2022-10-06 18:21 ` Rich Felker
  2022-10-08  1:36   ` Rich Felker
  1 sibling, 1 reply; 15+ messages in thread
From: Rich Felker @ 2022-10-06 18:21 UTC (permalink / raw)
  To: musl

On Thu, Oct 06, 2022 at 09:37:50AM +0300, Alexey Izbyshev wrote:
> Hi,
> 
> I noticed that fork() doesn't take key_lock that is used to protect
> the global table of thread-specific keys. I couldn't find mentions
> of this lock in the MT fork discussion in the mailing list archive.
> Was this lock overlooked?

I think what happened was that we made the main list of locks to
review and take care of via grep for LOCK, and then manually added
known instances of locks using other locking primitives. This one must
have been missed.

Having special-case lock types like this is kinda a pain, but as long
as there aren't too many I guess it's not a big deal.

> Also, I looked at how __aio_atfork() handles a similar case with
> maplock, and it seems wrong. It takes the read lock and then simply
> unlocks it both in the parent and in the child. But if there were
> other holders of the read lock at the time of fork(), the lock won't
> end up in the unlocked state in the child. It should probably be
> completely nulled-out in the child instead.

Conceptually, perhaps it should be taking the write-lock instead?
But null-out is probably okay too, and less costly.

Rich

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [musl] Re: MT fork and key_lock in pthread_key_create.c
  2022-10-06  7:02 ` [musl] " Alexey Izbyshev
@ 2022-10-06 19:20   ` Rich Felker
  2022-10-06 19:50     ` Rich Felker
  2022-10-06 20:04     ` Jeffrey Walton
  0 siblings, 2 replies; 15+ messages in thread
From: Rich Felker @ 2022-10-06 19:20 UTC (permalink / raw)
  To: musl

On Thu, Oct 06, 2022 at 10:02:11AM +0300, Alexey Izbyshev wrote:
> On 2022-10-06 09:37, Alexey Izbyshev wrote:
> >Hi,
> >
> >I noticed that fork() doesn't take key_lock that is used to protect
> >the global table of thread-specific keys. I couldn't find mentions of
> >this lock in the MT fork discussion in the mailing list archive. Was
> >this lock overlooked?
> >
> >Also, I looked at how __aio_atfork() handles a similar case with
> >maplock, and it seems wrong. It takes the read lock and then simply
> >unlocks it both in the parent and in the child. But if there were
> >other holders of the read lock at the time of fork(), the lock won't
> >end up in the unlocked state in the child. It should probably be
> >completely nulled-out in the child instead.
> >
> Looking at aio further, I don't understand how it's supposed to work
> with MT fork at all. __aio_atfork() is called in _Fork() when the
> allocator locks are already held. Meanwhile another thread could be
> stuck in __aio_get_queue() holding maplock in exclusive mode while
> trying to allocate, resulting in deadlock.

Indeed, this is messy and I don't think it makes sense to be doing
this at all. The child is just going to throw away the state so the
parent shouldn't need to synchronize at all, but if we walk the
multi-level map[] table in the child after async fork, it's possible
that the contents seen are inconsistent, even that the pointers are
only half-written or something.

I see a few possible solutions:

1. Just set map = 0 in the child and leak the memory. This is not
   going to matter unless you're doing multiple generations of fork
   with aio anyway.

2. The same, but be a little bit smarter. pthread_rwlock_tryrdlock in
   the child, and if it succeeds, we know the map is consistent so we
   can just zero it out the same as now. Still "leaks" but only on
   contention to expand the map.

3. Getting a little smarter still: move the __aio_atfork for the
   parent side from _Fork to fork, outside of the critical section
   where malloc lock is held. Then proceed as in (2). Now, the
   tryrdlock is guaranteed to succeed in the child. Leak is only
   possible when _Fork is used (in which case the child context is an
   async signal one, and thus calling any aio_* that would allocate
   map[] again is UB -- note that in this case, the only reason we
   have to do anything at all in the child is to prevent close from
   interacting with aio).

After writing them out, 3 seems like the right choice.

Rich

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [musl] Re: MT fork and key_lock in pthread_key_create.c
  2022-10-06 19:20   ` Rich Felker
@ 2022-10-06 19:50     ` Rich Felker
  2022-10-07  1:26       ` Rich Felker
  2022-10-07  8:18       ` Alexey Izbyshev
  2022-10-06 20:04     ` Jeffrey Walton
  1 sibling, 2 replies; 15+ messages in thread
From: Rich Felker @ 2022-10-06 19:50 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 2692 bytes --]

On Thu, Oct 06, 2022 at 03:20:42PM -0400, Rich Felker wrote:
> On Thu, Oct 06, 2022 at 10:02:11AM +0300, Alexey Izbyshev wrote:
> > On 2022-10-06 09:37, Alexey Izbyshev wrote:
> > >Hi,
> > >
> > >I noticed that fork() doesn't take key_lock that is used to protect
> > >the global table of thread-specific keys. I couldn't find mentions of
> > >this lock in the MT fork discussion in the mailing list archive. Was
> > >this lock overlooked?
> > >
> > >Also, I looked at how __aio_atfork() handles a similar case with
> > >maplock, and it seems wrong. It takes the read lock and then simply
> > >unlocks it both in the parent and in the child. But if there were
> > >other holders of the read lock at the time of fork(), the lock won't
> > >end up in the unlocked state in the child. It should probably be
> > >completely nulled-out in the child instead.
> > >
> > Looking at aio further, I don't understand how it's supposed to work
> > with MT fork at all. __aio_atfork() is called in _Fork() when the
> > allocator locks are already held. Meanwhile another thread could be
> > stuck in __aio_get_queue() holding maplock in exclusive mode while
> > trying to allocate, resulting in deadlock.
> 
> Indeed, this is messy and I don't think it makes sense to be doing
> this at all. The child is just going to throw away the state so the
> parent shouldn't need to synchronize at all, but if we walk the
> multi-level map[] table in the child after async fork, it's possible
> that the contents seen are inconsistent, even that the pointers are
> only half-written or something.
> 
> I see a few possible solutions:
> 
> 1. Just set map = 0 in the child and leak the memory. This is not
>    going to matter unless you're doing multiple generations of fork
>    with aio anyway.
> 
> 2. The same, but be a little bit smarter. pthread_rwlock_tryrdlock in
>    the child, and if it succeeds, we know the map is consistent so we
>    can just zero it out the same as now. Still "leaks" but only on
>    contention to expand the map.
> 
> 3. Getting a little smarter still: move the __aio_atfork for the
>    parent side from _Fork to fork, outside of the critical section
>    where malloc lock is held. Then proceed as in (2). Now, the
>    tryrdlock is guaranteed to succeed in the child. Leak is only
>    possible when _Fork is used (in which case the child context is an
>    async signal one, and thus calling any aio_* that would allocate
>    map[] again is UB -- note that in this case, the only reason we
>    have to do anything at all in the child is to prevent close from
>    interacting with aio).
> 
> After writing them out, 3 seems like the right choice.

Proposed patch attached.

[-- Attachment #2: aio_atfork.diff --]
[-- Type: text/plain, Size: 2550 bytes --]

diff --git a/src/aio/aio.c b/src/aio/aio.c
index fa24f6b6..4c3379e1 100644
--- a/src/aio/aio.c
+++ b/src/aio/aio.c
@@ -401,11 +401,25 @@ void __aio_atfork(int who)
 	if (who<0) {
 		pthread_rwlock_rdlock(&maplock);
 		return;
+	} else if (!who) {
+		pthread_rwlock_unlock(&maplock);
+		return;
 	}
-	if (who>0 && map) for (int a=0; a<(-1U/2+1)>>24; a++)
+	if (pthread_rwlock_tryrdlock(&maplock)) {
+		/* Obtaining lock may fail if _Fork was called nor via
+		 * fork. In this case, no further aio is possible from
+		 * child and we can just null out map so __aio_close
+		 * does not attempt to do anything. */
+		map = 0;
+		return;
+	}
+	if (map) for (int a=0; a<(-1U/2+1)>>24; a++)
 		if (map[a]) for (int b=0; b<256; b++)
 			if (map[a][b]) for (int c=0; c<256; c++)
 				if (map[a][b][c]) for (int d=0; d<256; d++)
 					map[a][b][c][d] = 0;
-	pthread_rwlock_unlock(&maplock);
+	/* Re-initialize the rwlock rather than unlocking since there
+	 * may have been more than one reference on it in the parent.
+	 * We are not a lock holder anyway; the thread in the parent was. */
+	pthread_rwlock_init(&maplock, 0);
 }
diff --git a/src/process/_Fork.c b/src/process/_Fork.c
index da063868..fb0fdc2c 100644
--- a/src/process/_Fork.c
+++ b/src/process/_Fork.c
@@ -14,7 +14,6 @@ pid_t _Fork(void)
 	pid_t ret;
 	sigset_t set;
 	__block_all_sigs(&set);
-	__aio_atfork(-1);
 	LOCK(__abort_lock);
 #ifdef SYS_fork
 	ret = __syscall(SYS_fork);
@@ -32,7 +31,7 @@ pid_t _Fork(void)
 		if (libc.need_locks) libc.need_locks = -1;
 	}
 	UNLOCK(__abort_lock);
-	__aio_atfork(!ret);
+	if (!ret) __aio_atfork(1);
 	__restore_sigs(&set);
 	return __syscall_ret(ret);
 }
diff --git a/src/process/fork.c b/src/process/fork.c
index ff71845c..80e804b1 100644
--- a/src/process/fork.c
+++ b/src/process/fork.c
@@ -36,6 +36,7 @@ static volatile int *const *const atfork_locks[] = {
 static void dummy(int x) { }
 weak_alias(dummy, __fork_handler);
 weak_alias(dummy, __malloc_atfork);
+weak_alias(dummy, __aio_atfork);
 weak_alias(dummy, __ldso_atfork);
 
 static void dummy_0(void) { }
@@ -50,6 +51,7 @@ pid_t fork(void)
 	int need_locks = libc.need_locks > 0;
 	if (need_locks) {
 		__ldso_atfork(-1);
+		__aio_atfork(-1);
 		__inhibit_ptc();
 		for (int i=0; i<sizeof atfork_locks/sizeof *atfork_locks; i++)
 			if (*atfork_locks[i]) LOCK(*atfork_locks[i]);
@@ -75,6 +77,7 @@ pid_t fork(void)
 				if (ret) UNLOCK(*atfork_locks[i]);
 				else **atfork_locks[i] = 0;
 		__release_ptc();
+		if (ret) __aio_atfork(0);
 		__ldso_atfork(!ret);
 	}
 	__restore_sigs(&set);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [musl] Re: MT fork and key_lock in pthread_key_create.c
  2022-10-06 19:20   ` Rich Felker
  2022-10-06 19:50     ` Rich Felker
@ 2022-10-06 20:04     ` Jeffrey Walton
  2022-10-06 20:09       ` Rich Felker
  1 sibling, 1 reply; 15+ messages in thread
From: Jeffrey Walton @ 2022-10-06 20:04 UTC (permalink / raw)
  To: musl

On Thu, Oct 6, 2022 at 3:21 PM Rich Felker <dalias@libc.org> wrote:
> On Thu, Oct 06, 2022 at 10:02:11AM +0300, Alexey Izbyshev wrote:
> > On 2022-10-06 09:37, Alexey Izbyshev wrote:
> > >...
> > >Also, I looked at how __aio_atfork() handles a similar case with
> > >maplock, and it seems wrong. It takes the read lock and then simply
> > >unlocks it both in the parent and in the child. But if there were
> > >other holders of the read lock at the time of fork(), the lock won't
> > >end up in the unlocked state in the child. It should probably be
> > >completely nulled-out in the child instead.
> > >
> > Looking at aio further, I don't understand how it's supposed to work
> > with MT fork at all. __aio_atfork() is called in _Fork() when the
> > allocator locks are already held. Meanwhile another thread could be
> > stuck in __aio_get_queue() holding maplock in exclusive mode while
> > trying to allocate, resulting in deadlock.
>
> Indeed, this is messy and I don't think it makes sense to be doing
> this at all. The child is just going to throw away the state so the
> parent shouldn't need to synchronize at all, but if we walk the
> multi-level map[] table in the child after async fork, it's possible
> that the contents seen are inconsistent, even that the pointers are
> only half-written or something.
>
> I see a few possible solutions:
>
> 1. Just set map = 0 in the child and leak the memory. This is not
>    going to matter unless you're doing multiple generations of fork
>    with aio anyway.

This may make security testing and evaluation trickier, like when
using -fanalyze=memory.

I think it is better to work with the tools nowadays.

Jeff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [musl] Re: MT fork and key_lock in pthread_key_create.c
  2022-10-06 20:04     ` Jeffrey Walton
@ 2022-10-06 20:09       ` Rich Felker
  0 siblings, 0 replies; 15+ messages in thread
From: Rich Felker @ 2022-10-06 20:09 UTC (permalink / raw)
  To: Jeffrey Walton; +Cc: musl

On Thu, Oct 06, 2022 at 04:04:48PM -0400, Jeffrey Walton wrote:
> On Thu, Oct 6, 2022 at 3:21 PM Rich Felker <dalias@libc.org> wrote:
> > On Thu, Oct 06, 2022 at 10:02:11AM +0300, Alexey Izbyshev wrote:
> > > On 2022-10-06 09:37, Alexey Izbyshev wrote:
> > > >...
> > > >Also, I looked at how __aio_atfork() handles a similar case with
> > > >maplock, and it seems wrong. It takes the read lock and then simply
> > > >unlocks it both in the parent and in the child. But if there were
> > > >other holders of the read lock at the time of fork(), the lock won't
> > > >end up in the unlocked state in the child. It should probably be
> > > >completely nulled-out in the child instead.
> > > >
> > > Looking at aio further, I don't understand how it's supposed to work
> > > with MT fork at all. __aio_atfork() is called in _Fork() when the
> > > allocator locks are already held. Meanwhile another thread could be
> > > stuck in __aio_get_queue() holding maplock in exclusive mode while
> > > trying to allocate, resulting in deadlock.
> >
> > Indeed, this is messy and I don't think it makes sense to be doing
> > this at all. The child is just going to throw away the state so the
> > parent shouldn't need to synchronize at all, but if we walk the
> > multi-level map[] table in the child after async fork, it's possible
> > that the contents seen are inconsistent, even that the pointers are
> > only half-written or something.
> >
> > I see a few possible solutions:
> >
> > 1. Just set map = 0 in the child and leak the memory. This is not
> >    going to matter unless you're doing multiple generations of fork
> >    with aio anyway.
> 
> This may make security testing and evaluation trickier, like when
> using -fanalyze=memory.
> 
> I think it is better to work with the tools nowadays.

These allocations are immutable/permanent-once-allocated anyway in the
parent. The only difference would be potentially getting an additional
round of them in the child after forking. "Leak detection" tooling
already needs to be aware that there will be some one-time permanent
allocations that will/can never be freed.

Rich

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [musl] Re: MT fork and key_lock in pthread_key_create.c
  2022-10-06 19:50     ` Rich Felker
@ 2022-10-07  1:26       ` Rich Felker
  2022-10-07 10:53         ` Alexey Izbyshev
  2022-10-07  8:18       ` Alexey Izbyshev
  1 sibling, 1 reply; 15+ messages in thread
From: Rich Felker @ 2022-10-07  1:26 UTC (permalink / raw)
  To: musl

On Thu, Oct 06, 2022 at 03:50:54PM -0400, Rich Felker wrote:
> On Thu, Oct 06, 2022 at 03:20:42PM -0400, Rich Felker wrote:
> > On Thu, Oct 06, 2022 at 10:02:11AM +0300, Alexey Izbyshev wrote:
> > > On 2022-10-06 09:37, Alexey Izbyshev wrote:
> > > >Hi,
> > > >
> > > >I noticed that fork() doesn't take key_lock that is used to protect
> > > >the global table of thread-specific keys. I couldn't find mentions of
> > > >this lock in the MT fork discussion in the mailing list archive. Was
> > > >this lock overlooked?
> > > >
> > > >Also, I looked at how __aio_atfork() handles a similar case with
> > > >maplock, and it seems wrong. It takes the read lock and then simply
> > > >unlocks it both in the parent and in the child. But if there were
> > > >other holders of the read lock at the time of fork(), the lock won't
> > > >end up in the unlocked state in the child. It should probably be
> > > >completely nulled-out in the child instead.
> > > >
> > > Looking at aio further, I don't understand how it's supposed to work
> > > with MT fork at all. __aio_atfork() is called in _Fork() when the
> > > allocator locks are already held. Meanwhile another thread could be
> > > stuck in __aio_get_queue() holding maplock in exclusive mode while
> > > trying to allocate, resulting in deadlock.
> > 
> > Indeed, this is messy and I don't think it makes sense to be doing
> > this at all. The child is just going to throw away the state so the
> > parent shouldn't need to synchronize at all, but if we walk the
> > multi-level map[] table in the child after async fork, it's possible
> > that the contents seen are inconsistent, even that the pointers are
> > only half-written or something.
> > 
> > I see a few possible solutions:
> > 
> > 1. Just set map = 0 in the child and leak the memory. This is not
> >    going to matter unless you're doing multiple generations of fork
> >    with aio anyway.
> > 
> > 2. The same, but be a little bit smarter. pthread_rwlock_tryrdlock in
> >    the child, and if it succeeds, we know the map is consistent so we
> >    can just zero it out the same as now. Still "leaks" but only on
> >    contention to expand the map.
> > 
> > 3. Getting a little smarter still: move the __aio_atfork for the
> >    parent side from _Fork to fork, outside of the critical section
> >    where malloc lock is held. Then proceed as in (2). Now, the
> >    tryrdlock is guaranteed to succeed in the child. Leak is only
> >    possible when _Fork is used (in which case the child context is an
> >    async signal one, and thus calling any aio_* that would allocate
> >    map[] again is UB -- note that in this case, the only reason we
> >    have to do anything at all in the child is to prevent close from
> >    interacting with aio).
> > 
> > After writing them out, 3 seems like the right choice.
> 
> Proposed patch attached.

> diff --git a/src/aio/aio.c b/src/aio/aio.c
> index fa24f6b6..4c3379e1 100644
> --- a/src/aio/aio.c
> +++ b/src/aio/aio.c
> @@ -401,11 +401,25 @@ void __aio_atfork(int who)
>  	if (who<0) {
>  		pthread_rwlock_rdlock(&maplock);
>  		return;
> +	} else if (!who) {
> +		pthread_rwlock_unlock(&maplock);
> +		return;
>  	}
> -	if (who>0 && map) for (int a=0; a<(-1U/2+1)>>24; a++)
> +	if (pthread_rwlock_tryrdlock(&maplock)) {
> +		/* Obtaining lock may fail if _Fork was called nor via
> +		 * fork. In this case, no further aio is possible from
> +		 * child and we can just null out map so __aio_close
> +		 * does not attempt to do anything. */
> +		map = 0;
> +		return;
> +	}
> +	if (map) for (int a=0; a<(-1U/2+1)>>24; a++)
>  		if (map[a]) for (int b=0; b<256; b++)
>  			if (map[a][b]) for (int c=0; c<256; c++)
>  				if (map[a][b][c]) for (int d=0; d<256; d++)
>  					map[a][b][c][d] = 0;
> -	pthread_rwlock_unlock(&maplock);
> +	/* Re-initialize the rwlock rather than unlocking since there
> +	 * may have been more than one reference on it in the parent.
> +	 * We are not a lock holder anyway; the thread in the parent was. */
> +	pthread_rwlock_init(&maplock, 0);
>  }
> diff --git a/src/process/_Fork.c b/src/process/_Fork.c
> index da063868..fb0fdc2c 100644
> --- a/src/process/_Fork.c
> +++ b/src/process/_Fork.c
> @@ -14,7 +14,6 @@ pid_t _Fork(void)
>  	pid_t ret;
>  	sigset_t set;
>  	__block_all_sigs(&set);
> -	__aio_atfork(-1);
>  	LOCK(__abort_lock);
>  #ifdef SYS_fork
>  	ret = __syscall(SYS_fork);
> @@ -32,7 +31,7 @@ pid_t _Fork(void)
>  		if (libc.need_locks) libc.need_locks = -1;
>  	}
>  	UNLOCK(__abort_lock);
> -	__aio_atfork(!ret);
> +	if (!ret) __aio_atfork(1);
>  	__restore_sigs(&set);
>  	return __syscall_ret(ret);
>  }
> diff --git a/src/process/fork.c b/src/process/fork.c
> index ff71845c..80e804b1 100644
> --- a/src/process/fork.c
> +++ b/src/process/fork.c
> @@ -36,6 +36,7 @@ static volatile int *const *const atfork_locks[] = {
>  static void dummy(int x) { }
>  weak_alias(dummy, __fork_handler);
>  weak_alias(dummy, __malloc_atfork);
> +weak_alias(dummy, __aio_atfork);
>  weak_alias(dummy, __ldso_atfork);
>  
>  static void dummy_0(void) { }
> @@ -50,6 +51,7 @@ pid_t fork(void)
>  	int need_locks = libc.need_locks > 0;
>  	if (need_locks) {
>  		__ldso_atfork(-1);
> +		__aio_atfork(-1);
>  		__inhibit_ptc();
>  		for (int i=0; i<sizeof atfork_locks/sizeof *atfork_locks; i++)
>  			if (*atfork_locks[i]) LOCK(*atfork_locks[i]);
> @@ -75,6 +77,7 @@ pid_t fork(void)
>  				if (ret) UNLOCK(*atfork_locks[i]);
>  				else **atfork_locks[i] = 0;
>  		__release_ptc();
> +		if (ret) __aio_atfork(0);
>  		__ldso_atfork(!ret);
>  	}
>  	__restore_sigs(&set);

There's at least one other related bug here: when __aio_get_queue has
to take the write lock, it does so without blocking signals, so close
called from a signal handler that interrupts will self-deadlock.

This is fixable by blocking signals for the write-lock critical
section, but there's still a potentially long window where we're
blocking forward progress of other threads.

I think the right thing to do here is add another lock for updating
the map that we can take while we still hold the rdlock. After taking
this lock, we can re-inspect if any work is needed. If so, do all the
work *still holding only the rdlock*, but not writing the results into
the existing map. After all the allocation is done, release the
rdlock, take the wrlock (with signals blocked), install the new
memory, then release all the locks. This makes it so the wrlock is
only hold momentarily, under very controlled conditions, so we don't
have to worry about messing up AS-safety of close.

As an alternative, the rwlock could be removed entirely in favor of
maing the whole map[] atomic. This has the amusing effect or making
what's conceptually one of the worst C types ever to appear in real
code:

static struct aio_queue *volatile *volatile *volatile *volatile *volatile map;

although it would actually have to be declared void *volatile map;
with lots of type conversions to navigate through it, in order to
respect type rules for a_cas_p. Then all that would be needed is one
normal lock governing modifications.

Given that this code *tries* to mostly use high-level synchronization
primitives, my leaning would be not to do that.

FWIW, note that all of this was a problem irrespective of MT-fork. So
thankfully it was not a hellish unforseen consequence of adopting
MT-fork, just an existing bug.

Rich

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [musl] Re: MT fork and key_lock in pthread_key_create.c
  2022-10-06 19:50     ` Rich Felker
  2022-10-07  1:26       ` Rich Felker
@ 2022-10-07  8:18       ` Alexey Izbyshev
  1 sibling, 0 replies; 15+ messages in thread
From: Alexey Izbyshev @ 2022-10-07  8:18 UTC (permalink / raw)
  To: musl

On 2022-10-06 22:50, Rich Felker wrote:
> On Thu, Oct 06, 2022 at 03:20:42PM -0400, Rich Felker wrote:
>> On Thu, Oct 06, 2022 at 10:02:11AM +0300, Alexey Izbyshev wrote:
>> > On 2022-10-06 09:37, Alexey Izbyshev wrote:
>> > >Hi,
>> > >
>> > >I noticed that fork() doesn't take key_lock that is used to protect
>> > >the global table of thread-specific keys. I couldn't find mentions of
>> > >this lock in the MT fork discussion in the mailing list archive. Was
>> > >this lock overlooked?
>> > >
>> > >Also, I looked at how __aio_atfork() handles a similar case with
>> > >maplock, and it seems wrong. It takes the read lock and then simply
>> > >unlocks it both in the parent and in the child. But if there were
>> > >other holders of the read lock at the time of fork(), the lock won't
>> > >end up in the unlocked state in the child. It should probably be
>> > >completely nulled-out in the child instead.
>> > >
>> > Looking at aio further, I don't understand how it's supposed to work
>> > with MT fork at all. __aio_atfork() is called in _Fork() when the
>> > allocator locks are already held. Meanwhile another thread could be
>> > stuck in __aio_get_queue() holding maplock in exclusive mode while
>> > trying to allocate, resulting in deadlock.
>> 
>> Indeed, this is messy and I don't think it makes sense to be doing
>> this at all. The child is just going to throw away the state so the
>> parent shouldn't need to synchronize at all, but if we walk the
>> multi-level map[] table in the child after async fork, it's possible
>> that the contents seen are inconsistent, even that the pointers are
>> only half-written or something.
>> 
Doesn't musl assume that pointer-sized memory accesses are atomic?

>> I see a few possible solutions:
>> 
>> 1. Just set map = 0 in the child and leak the memory. This is not
>>    going to matter unless you're doing multiple generations of fork
>>    with aio anyway.
>> 
>> 2. The same, but be a little bit smarter. pthread_rwlock_tryrdlock in
>>    the child, and if it succeeds, we know the map is consistent so we
>>    can just zero it out the same as now. Still "leaks" but only on
>>    contention to expand the map.
>> 
>> 3. Getting a little smarter still: move the __aio_atfork for the
>>    parent side from _Fork to fork, outside of the critical section
>>    where malloc lock is held. Then proceed as in (2). Now, the
>>    tryrdlock is guaranteed to succeed in the child. Leak is only
>>    possible when _Fork is used (in which case the child context is an
>>    async signal one, and thus calling any aio_* that would allocate
>>    map[] again is UB -- note that in this case, the only reason we
>>    have to do anything at all in the child is to prevent close from
>>    interacting with aio).
>> 
>> After writing them out, 3 seems like the right choice.
> 
I agree.

> Proposed patch attached.

> diff --git a/src/aio/aio.c b/src/aio/aio.c
> index fa24f6b6..4c3379e1 100644
> --- a/src/aio/aio.c
> +++ b/src/aio/aio.c
> @@ -401,11 +401,25 @@ void __aio_atfork(int who)
>  	if (who<0) {
>  		pthread_rwlock_rdlock(&maplock);
>  		return;
> +	} else if (!who) {
> +		pthread_rwlock_unlock(&maplock);
> +		return;
>  	}

It probably makes sense to reset "aio_fd_cnt" here, though it matters 
only in a case when there are so many nested fork() children each using 
aio that it eventually overflows (breaking aio_close).

> -	if (who>0 && map) for (int a=0; a<(-1U/2+1)>>24; a++)
> +	if (pthread_rwlock_tryrdlock(&maplock)) {
> +		/* Obtaining lock may fail if _Fork was called nor via

s/nor/not/

> +		 * fork. In this case, no further aio is possible from
> +		 * child and we can just null out map so __aio_close
> +		 * does not attempt to do anything. */
> +		map = 0;
> +		return;
> +	}
> +	if (map) for (int a=0; a<(-1U/2+1)>>24; a++)
>  		if (map[a]) for (int b=0; b<256; b++)
>  			if (map[a][b]) for (int c=0; c<256; c++)
>  				if (map[a][b][c]) for (int d=0; d<256; d++)
>  					map[a][b][c][d] = 0;
> -	pthread_rwlock_unlock(&maplock);
> +	/* Re-initialize the rwlock rather than unlocking since there
> +	 * may have been more than one reference on it in the parent.
> +	 * We are not a lock holder anyway; the thread in the parent was. */
> +	pthread_rwlock_init(&maplock, 0);
>  }
> diff --git a/src/process/_Fork.c b/src/process/_Fork.c
> index da063868..fb0fdc2c 100644
> --- a/src/process/_Fork.c
> +++ b/src/process/_Fork.c
> @@ -14,7 +14,6 @@ pid_t _Fork(void)
>  	pid_t ret;
>  	sigset_t set;
>  	__block_all_sigs(&set);
> -	__aio_atfork(-1);
>  	LOCK(__abort_lock);
>  #ifdef SYS_fork
>  	ret = __syscall(SYS_fork);
> @@ -32,7 +31,7 @@ pid_t _Fork(void)
>  		if (libc.need_locks) libc.need_locks = -1;
>  	}
>  	UNLOCK(__abort_lock);
> -	__aio_atfork(!ret);
> +	if (!ret) __aio_atfork(1);
>  	__restore_sigs(&set);
>  	return __syscall_ret(ret);
>  }
> diff --git a/src/process/fork.c b/src/process/fork.c
> index ff71845c..80e804b1 100644
> --- a/src/process/fork.c
> +++ b/src/process/fork.c
> @@ -36,6 +36,7 @@ static volatile int *const *const atfork_locks[] = {
>  static void dummy(int x) { }
>  weak_alias(dummy, __fork_handler);
>  weak_alias(dummy, __malloc_atfork);
> +weak_alias(dummy, __aio_atfork);
>  weak_alias(dummy, __ldso_atfork);
> 
>  static void dummy_0(void) { }
> @@ -50,6 +51,7 @@ pid_t fork(void)
>  	int need_locks = libc.need_locks > 0;
>  	if (need_locks) {
>  		__ldso_atfork(-1);
> +		__aio_atfork(-1);
>  		__inhibit_ptc();
>  		for (int i=0; i<sizeof atfork_locks/sizeof *atfork_locks; i++)
>  			if (*atfork_locks[i]) LOCK(*atfork_locks[i]);
> @@ -75,6 +77,7 @@ pid_t fork(void)
>  				if (ret) UNLOCK(*atfork_locks[i]);
>  				else **atfork_locks[i] = 0;
>  		__release_ptc();
> +		if (ret) __aio_atfork(0);
>  		__ldso_atfork(!ret);
>  	}
>  	__restore_sigs(&set);

Looks good to me otherwise.

Thanks,
Alexey

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [musl] Re: MT fork and key_lock in pthread_key_create.c
  2022-10-07  1:26       ` Rich Felker
@ 2022-10-07 10:53         ` Alexey Izbyshev
  2022-10-07 21:18           ` Rich Felker
  0 siblings, 1 reply; 15+ messages in thread
From: Alexey Izbyshev @ 2022-10-07 10:53 UTC (permalink / raw)
  To: musl

On 2022-10-07 04:26, Rich Felker wrote:
> On Thu, Oct 06, 2022 at 03:50:54PM -0400, Rich Felker wrote:
>> On Thu, Oct 06, 2022 at 03:20:42PM -0400, Rich Felker wrote:
>> > On Thu, Oct 06, 2022 at 10:02:11AM +0300, Alexey Izbyshev wrote:
>> > > On 2022-10-06 09:37, Alexey Izbyshev wrote:
>> > > >Hi,
>> > > >
>> > > >I noticed that fork() doesn't take key_lock that is used to protect
>> > > >the global table of thread-specific keys. I couldn't find mentions of
>> > > >this lock in the MT fork discussion in the mailing list archive. Was
>> > > >this lock overlooked?
>> > > >
>> > > >Also, I looked at how __aio_atfork() handles a similar case with
>> > > >maplock, and it seems wrong. It takes the read lock and then simply
>> > > >unlocks it both in the parent and in the child. But if there were
>> > > >other holders of the read lock at the time of fork(), the lock won't
>> > > >end up in the unlocked state in the child. It should probably be
>> > > >completely nulled-out in the child instead.
>> > > >
>> > > Looking at aio further, I don't understand how it's supposed to work
>> > > with MT fork at all. __aio_atfork() is called in _Fork() when the
>> > > allocator locks are already held. Meanwhile another thread could be
>> > > stuck in __aio_get_queue() holding maplock in exclusive mode while
>> > > trying to allocate, resulting in deadlock.
>> >
>> > Indeed, this is messy and I don't think it makes sense to be doing
>> > this at all. The child is just going to throw away the state so the
>> > parent shouldn't need to synchronize at all, but if we walk the
>> > multi-level map[] table in the child after async fork, it's possible
>> > that the contents seen are inconsistent, even that the pointers are
>> > only half-written or something.
>> >
>> > I see a few possible solutions:
>> >
>> > 1. Just set map = 0 in the child and leak the memory. This is not
>> >    going to matter unless you're doing multiple generations of fork
>> >    with aio anyway.
>> >
>> > 2. The same, but be a little bit smarter. pthread_rwlock_tryrdlock in
>> >    the child, and if it succeeds, we know the map is consistent so we
>> >    can just zero it out the same as now. Still "leaks" but only on
>> >    contention to expand the map.
>> >
>> > 3. Getting a little smarter still: move the __aio_atfork for the
>> >    parent side from _Fork to fork, outside of the critical section
>> >    where malloc lock is held. Then proceed as in (2). Now, the
>> >    tryrdlock is guaranteed to succeed in the child. Leak is only
>> >    possible when _Fork is used (in which case the child context is an
>> >    async signal one, and thus calling any aio_* that would allocate
>> >    map[] again is UB -- note that in this case, the only reason we
>> >    have to do anything at all in the child is to prevent close from
>> >    interacting with aio).
>> >
>> > After writing them out, 3 seems like the right choice.
>> 
>> Proposed patch attached.
> 
>> diff --git a/src/aio/aio.c b/src/aio/aio.c
>> index fa24f6b6..4c3379e1 100644
>> --- a/src/aio/aio.c
>> +++ b/src/aio/aio.c
>> @@ -401,11 +401,25 @@ void __aio_atfork(int who)
>>  	if (who<0) {
>>  		pthread_rwlock_rdlock(&maplock);
>>  		return;
>> +	} else if (!who) {
>> +		pthread_rwlock_unlock(&maplock);
>> +		return;
>>  	}
>> -	if (who>0 && map) for (int a=0; a<(-1U/2+1)>>24; a++)
>> +	if (pthread_rwlock_tryrdlock(&maplock)) {
>> +		/* Obtaining lock may fail if _Fork was called nor via
>> +		 * fork. In this case, no further aio is possible from
>> +		 * child and we can just null out map so __aio_close
>> +		 * does not attempt to do anything. */
>> +		map = 0;
>> +		return;
>> +	}
>> +	if (map) for (int a=0; a<(-1U/2+1)>>24; a++)
>>  		if (map[a]) for (int b=0; b<256; b++)
>>  			if (map[a][b]) for (int c=0; c<256; c++)
>>  				if (map[a][b][c]) for (int d=0; d<256; d++)
>>  					map[a][b][c][d] = 0;
>> -	pthread_rwlock_unlock(&maplock);
>> +	/* Re-initialize the rwlock rather than unlocking since there
>> +	 * may have been more than one reference on it in the parent.
>> +	 * We are not a lock holder anyway; the thread in the parent was. */
>> +	pthread_rwlock_init(&maplock, 0);
>>  }
>> diff --git a/src/process/_Fork.c b/src/process/_Fork.c
>> index da063868..fb0fdc2c 100644
>> --- a/src/process/_Fork.c
>> +++ b/src/process/_Fork.c
>> @@ -14,7 +14,6 @@ pid_t _Fork(void)
>>  	pid_t ret;
>>  	sigset_t set;
>>  	__block_all_sigs(&set);
>> -	__aio_atfork(-1);
>>  	LOCK(__abort_lock);
>>  #ifdef SYS_fork
>>  	ret = __syscall(SYS_fork);
>> @@ -32,7 +31,7 @@ pid_t _Fork(void)
>>  		if (libc.need_locks) libc.need_locks = -1;
>>  	}
>>  	UNLOCK(__abort_lock);
>> -	__aio_atfork(!ret);
>> +	if (!ret) __aio_atfork(1);
>>  	__restore_sigs(&set);
>>  	return __syscall_ret(ret);
>>  }
>> diff --git a/src/process/fork.c b/src/process/fork.c
>> index ff71845c..80e804b1 100644
>> --- a/src/process/fork.c
>> +++ b/src/process/fork.c
>> @@ -36,6 +36,7 @@ static volatile int *const *const atfork_locks[] = {
>>  static void dummy(int x) { }
>>  weak_alias(dummy, __fork_handler);
>>  weak_alias(dummy, __malloc_atfork);
>> +weak_alias(dummy, __aio_atfork);
>>  weak_alias(dummy, __ldso_atfork);
>> 
>>  static void dummy_0(void) { }
>> @@ -50,6 +51,7 @@ pid_t fork(void)
>>  	int need_locks = libc.need_locks > 0;
>>  	if (need_locks) {
>>  		__ldso_atfork(-1);
>> +		__aio_atfork(-1);
>>  		__inhibit_ptc();
>>  		for (int i=0; i<sizeof atfork_locks/sizeof *atfork_locks; i++)
>>  			if (*atfork_locks[i]) LOCK(*atfork_locks[i]);
>> @@ -75,6 +77,7 @@ pid_t fork(void)
>>  				if (ret) UNLOCK(*atfork_locks[i]);
>>  				else **atfork_locks[i] = 0;
>>  		__release_ptc();
>> +		if (ret) __aio_atfork(0);
>>  		__ldso_atfork(!ret);
>>  	}
>>  	__restore_sigs(&set);
> 
> There's at least one other related bug here: when __aio_get_queue has
> to take the write lock, it does so without blocking signals, so close
> called from a signal handler that interrupts will self-deadlock.
> 
This is worse. Even if maplock didn't exist, __aio_get_queue still takes 
the queue lock, so close from a signal handler would still deadlock. 
Maybe this could be fixed by simply blocking signals earlier in submit? 
aio_cancel already calls __aio_get_queue with signals blocked.

> This is fixable by blocking signals for the write-lock critical
> section, but there's still a potentially long window where we're
> blocking forward progress of other threads.
> 
To make this more concrete, are you worrying about the libc-internal 
allocator delaying other threads attempting to take maplock (even in 
shared mode)? If so, this seems like a problem that is independent from 
close signal-safety.

> I think the right thing to do here is add another lock for updating
> the map that we can take while we still hold the rdlock. After taking
> this lock, we can re-inspect if any work is needed. If so, do all the
> work *still holding only the rdlock*, but not writing the results into
> the existing map. After all the allocation is done, release the
> rdlock, take the wrlock (with signals blocked), install the new
> memory, then release all the locks. This makes it so the wrlock is
> only hold momentarily, under very controlled conditions, so we don't
> have to worry about messing up AS-safety of close.
> 
Sorry, I can't understand what exactly you mean here, in particular, 
what the old lock is supposed to protect if its holders are not 
mutually-excluded with updates of the map guarded by the new lock.

I understand the general idea of doing allocations outside the critical 
section though. I imagine it in the following way:

1. Take maplock in shared mode.
2. Find the first level/entry in the map that we need to allocate. If no 
allocation is needed, goto 7.
3. Release maplock.
4. Allocate all needed map levels and a queue.
5. Take maplock in exclusive mode.
6. Modify the map as needed, remembering chunks that we need to free 
because another thread beat us.
7. Take queue lock.
8. Release maplock.
9. Free all unneeded chunks. // This could be delayed further if doing 
it under the queue lock is undesirable.

Waiting at step 7 under the exclusive lock can occur only if at step 6 
we discovered that we don't need modify the map at all (i.e. we're going 
to use an already existing queue). It doesn't seem to be a problem 
because nothing seems to take the queue lock for prolonged periods 
(except possibly step 9), but even if it is, we could "downgrade" the 
exclusive lock to the shared one by dropping it and retrying from step 1 
in a loop.

> As an alternative, the rwlock could be removed entirely in favor of
> maing the whole map[] atomic. This has the amusing effect or making
> what's conceptually one of the worst C types ever to appear in real
> code:
> 
> static struct aio_queue *volatile *volatile *volatile *volatile 
> *volatile map;
> 
> although it would actually have to be declared void *volatile map;
> with lots of type conversions to navigate through it, in order to
> respect type rules for a_cas_p. Then all that would be needed is one
> normal lock governing modifications.
> 
> Given that this code *tries* to mostly use high-level synchronization
> primitives, my leaning would be not to do that.
> 
I'm not sure how such atomic map is supposed to work with queue 
reference counting. We must ensure that "remove the queue from the map 
if we own the last reference" step is atomic, and currently it relies on 
holding the maplock in exclusive mode.

Thanks,
Alexey

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [musl] Re: MT fork and key_lock in pthread_key_create.c
  2022-10-07 10:53         ` Alexey Izbyshev
@ 2022-10-07 21:18           ` Rich Felker
  2022-10-08 16:07             ` Alexey Izbyshev
  0 siblings, 1 reply; 15+ messages in thread
From: Rich Felker @ 2022-10-07 21:18 UTC (permalink / raw)
  To: musl

On Fri, Oct 07, 2022 at 01:53:14PM +0300, Alexey Izbyshev wrote:
> On 2022-10-07 04:26, Rich Felker wrote:
> >On Thu, Oct 06, 2022 at 03:50:54PM -0400, Rich Felker wrote:
> >>On Thu, Oct 06, 2022 at 03:20:42PM -0400, Rich Felker wrote:
> >>> On Thu, Oct 06, 2022 at 10:02:11AM +0300, Alexey Izbyshev wrote:
> >>> > On 2022-10-06 09:37, Alexey Izbyshev wrote:
> >>> > >Hi,
> >>> > >
> >>> > >I noticed that fork() doesn't take key_lock that is used to protect
> >>> > >the global table of thread-specific keys. I couldn't find mentions of
> >>> > >this lock in the MT fork discussion in the mailing list archive. Was
> >>> > >this lock overlooked?
> >>> > >
> >>> > >Also, I looked at how __aio_atfork() handles a similar case with
> >>> > >maplock, and it seems wrong. It takes the read lock and then simply
> >>> > >unlocks it both in the parent and in the child. But if there were
> >>> > >other holders of the read lock at the time of fork(), the lock won't
> >>> > >end up in the unlocked state in the child. It should probably be
> >>> > >completely nulled-out in the child instead.
> >>> > >
> >>> > Looking at aio further, I don't understand how it's supposed to work
> >>> > with MT fork at all. __aio_atfork() is called in _Fork() when the
> >>> > allocator locks are already held. Meanwhile another thread could be
> >>> > stuck in __aio_get_queue() holding maplock in exclusive mode while
> >>> > trying to allocate, resulting in deadlock.
> >>>
> >>> Indeed, this is messy and I don't think it makes sense to be doing
> >>> this at all. The child is just going to throw away the state so the
> >>> parent shouldn't need to synchronize at all, but if we walk the
> >>> multi-level map[] table in the child after async fork, it's possible
> >>> that the contents seen are inconsistent, even that the pointers are
> >>> only half-written or something.
> >>>
> >>> I see a few possible solutions:
> >>>
> >>> 1. Just set map = 0 in the child and leak the memory. This is not
> >>>    going to matter unless you're doing multiple generations of fork
> >>>    with aio anyway.
> >>>
> >>> 2. The same, but be a little bit smarter. pthread_rwlock_tryrdlock in
> >>>    the child, and if it succeeds, we know the map is consistent so we
> >>>    can just zero it out the same as now. Still "leaks" but only on
> >>>    contention to expand the map.
> >>>
> >>> 3. Getting a little smarter still: move the __aio_atfork for the
> >>>    parent side from _Fork to fork, outside of the critical section
> >>>    where malloc lock is held. Then proceed as in (2). Now, the
> >>>    tryrdlock is guaranteed to succeed in the child. Leak is only
> >>>    possible when _Fork is used (in which case the child context is an
> >>>    async signal one, and thus calling any aio_* that would allocate
> >>>    map[] again is UB -- note that in this case, the only reason we
> >>>    have to do anything at all in the child is to prevent close from
> >>>    interacting with aio).
> >>>
> >>> After writing them out, 3 seems like the right choice.
> >>
> >>Proposed patch attached.
> >
> >>diff --git a/src/aio/aio.c b/src/aio/aio.c
> >>index fa24f6b6..4c3379e1 100644
> >>--- a/src/aio/aio.c
> >>+++ b/src/aio/aio.c
> >>@@ -401,11 +401,25 @@ void __aio_atfork(int who)
> >> 	if (who<0) {
> >> 		pthread_rwlock_rdlock(&maplock);
> >> 		return;
> >>+	} else if (!who) {
> >>+		pthread_rwlock_unlock(&maplock);
> >>+		return;
> >> 	}
> >>-	if (who>0 && map) for (int a=0; a<(-1U/2+1)>>24; a++)
> >>+	if (pthread_rwlock_tryrdlock(&maplock)) {
> >>+		/* Obtaining lock may fail if _Fork was called nor via
> >>+		 * fork. In this case, no further aio is possible from
> >>+		 * child and we can just null out map so __aio_close
> >>+		 * does not attempt to do anything. */
> >>+		map = 0;
> >>+		return;
> >>+	}
> >>+	if (map) for (int a=0; a<(-1U/2+1)>>24; a++)
> >> 		if (map[a]) for (int b=0; b<256; b++)
> >> 			if (map[a][b]) for (int c=0; c<256; c++)
> >> 				if (map[a][b][c]) for (int d=0; d<256; d++)
> >> 					map[a][b][c][d] = 0;
> >>-	pthread_rwlock_unlock(&maplock);
> >>+	/* Re-initialize the rwlock rather than unlocking since there
> >>+	 * may have been more than one reference on it in the parent.
> >>+	 * We are not a lock holder anyway; the thread in the parent was. */
> >>+	pthread_rwlock_init(&maplock, 0);
> >> }
> >>diff --git a/src/process/_Fork.c b/src/process/_Fork.c
> >>index da063868..fb0fdc2c 100644
> >>--- a/src/process/_Fork.c
> >>+++ b/src/process/_Fork.c
> >>@@ -14,7 +14,6 @@ pid_t _Fork(void)
> >> 	pid_t ret;
> >> 	sigset_t set;
> >> 	__block_all_sigs(&set);
> >>-	__aio_atfork(-1);
> >> 	LOCK(__abort_lock);
> >> #ifdef SYS_fork
> >> 	ret = __syscall(SYS_fork);
> >>@@ -32,7 +31,7 @@ pid_t _Fork(void)
> >> 		if (libc.need_locks) libc.need_locks = -1;
> >> 	}
> >> 	UNLOCK(__abort_lock);
> >>-	__aio_atfork(!ret);
> >>+	if (!ret) __aio_atfork(1);
> >> 	__restore_sigs(&set);
> >> 	return __syscall_ret(ret);
> >> }
> >>diff --git a/src/process/fork.c b/src/process/fork.c
> >>index ff71845c..80e804b1 100644
> >>--- a/src/process/fork.c
> >>+++ b/src/process/fork.c
> >>@@ -36,6 +36,7 @@ static volatile int *const *const atfork_locks[] = {
> >> static void dummy(int x) { }
> >> weak_alias(dummy, __fork_handler);
> >> weak_alias(dummy, __malloc_atfork);
> >>+weak_alias(dummy, __aio_atfork);
> >> weak_alias(dummy, __ldso_atfork);
> >>
> >> static void dummy_0(void) { }
> >>@@ -50,6 +51,7 @@ pid_t fork(void)
> >> 	int need_locks = libc.need_locks > 0;
> >> 	if (need_locks) {
> >> 		__ldso_atfork(-1);
> >>+		__aio_atfork(-1);
> >> 		__inhibit_ptc();
> >> 		for (int i=0; i<sizeof atfork_locks/sizeof *atfork_locks; i++)
> >> 			if (*atfork_locks[i]) LOCK(*atfork_locks[i]);
> >>@@ -75,6 +77,7 @@ pid_t fork(void)
> >> 				if (ret) UNLOCK(*atfork_locks[i]);
> >> 				else **atfork_locks[i] = 0;
> >> 		__release_ptc();
> >>+		if (ret) __aio_atfork(0);
> >> 		__ldso_atfork(!ret);
> >> 	}
> >> 	__restore_sigs(&set);
> >
> >There's at least one other related bug here: when __aio_get_queue has
> >to take the write lock, it does so without blocking signals, so close
> >called from a signal handler that interrupts will self-deadlock.
> >
> This is worse. Even if maplock didn't exist, __aio_get_queue still
> takes the queue lock, so close from a signal handler would still
> deadlock. Maybe this could be fixed by simply blocking signals
> earlier in submit? aio_cancel already calls __aio_get_queue with
> signals blocked.

The queue lock only affects a close concurrent with aio on the same
fd. This is at least a programming error and possibly UB, so it's not
that big a concern.

> >This is fixable by blocking signals for the write-lock critical
> >section, but there's still a potentially long window where we're
> >blocking forward progress of other threads.
> >
> To make this more concrete, are you worrying about the libc-internal
> allocator delaying other threads attempting to take maplock (even in
> shared mode)? If so, this seems like a problem that is independent
> from close signal-safety.

Yes. The "but there's still..." part is a separate, relatively minor,
issue that's just a matter of delaying forward progress of unrelated
close operations, which could be nasty in a realtime process. It only
happens boundedly-many times (at most once for each fd number ever
used) so it's not that big a deal asymptotically, only worst-case.

> >I think the right thing to do here is add another lock for updating
> >the map that we can take while we still hold the rdlock. After taking
> >this lock, we can re-inspect if any work is needed. If so, do all the
> >work *still holding only the rdlock*, but not writing the results into
> >the existing map. After all the allocation is done, release the
> >rdlock, take the wrlock (with signals blocked), install the new
> >memory, then release all the locks. This makes it so the wrlock is
> >only hold momentarily, under very controlled conditions, so we don't
> >have to worry about messing up AS-safety of close.
> >
> Sorry, I can't understand what exactly you mean here, in particular,
> what the old lock is supposed to protect if its holders are not
> mutually-excluded with updates of the map guarded by the new lock.

Updates to the map are guarded by the new lock, but exclusion of reads
concurrent with installing the update still need to be guarded by the
rwlock unless the updates are made atomically.

> I understand the general idea of doing allocations outside the
> critical section though. I imagine it in the following way:
> 
> 1. Take maplock in shared mode.
> 2. Find the first level/entry in the map that we need to allocate.
> If no allocation is needed, goto 7.
> 3. Release maplock.
> 4. Allocate all needed map levels and a queue.
> 5. Take maplock in exclusive mode.
> 6. Modify the map as needed, remembering chunks that we need to free
> because another thread beat us.
> 7. Take queue lock.
> 8. Release maplock.
> 9. Free all unneeded chunks. // This could be delayed further if
> doing it under the queue lock is undesirable.

The point of the proposed new lock is to avoid having 9 at all. By
evaluating what you need to allocate under a lock, you can ensure
nobody else races to allocate it before you.

> Waiting at step 7 under the exclusive lock can occur only if at step
> 6 we discovered that we don't need modify the map at all (i.e. we're
> going to use an already existing queue). It doesn't seem to be a
> problem because nothing seems to take the queue lock for prolonged
> periods (except possibly step 9), but even if it is, we could
> "downgrade" the exclusive lock to the shared one by dropping it and
> retrying from step 1 in a loop.

AFAICT there's no need for a loop. The map only grows never shrinks.
If it's been seen expanded to cover the fd you want once, it will
cover it forever.

> >As an alternative, the rwlock could be removed entirely in favor of
> >maing the whole map[] atomic. This has the amusing effect or making
> >what's conceptually one of the worst C types ever to appear in real
> >code:
> >
> >static struct aio_queue *volatile *volatile *volatile *volatile
> >*volatile map;
> >
> >although it would actually have to be declared void *volatile map;
> >with lots of type conversions to navigate through it, in order to
> >respect type rules for a_cas_p. Then all that would be needed is one
> >normal lock governing modifications.
> >
> >Given that this code *tries* to mostly use high-level synchronization
> >primitives, my leaning would be not to do that.
> >
> I'm not sure how such atomic map is supposed to work with queue
> reference counting. We must ensure that "remove the queue from the
> map if we own the last reference" step is atomic, and currently it
> relies on holding the maplock in exclusive mode.

Indeed, I forgot there's the difficult aspect of synchronizing the end
of the queue objects' lifetimes. There are ways it could be done, like
storing lock status in the leaf pointers in map[], but it gets
progressively messier and I really don't think I want to go there..

Rich

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [musl] MT fork and key_lock in pthread_key_create.c
  2022-10-06 18:21 ` [musl] " Rich Felker
@ 2022-10-08  1:36   ` Rich Felker
  2022-10-08 17:03     ` Alexey Izbyshev
  0 siblings, 1 reply; 15+ messages in thread
From: Rich Felker @ 2022-10-08  1:36 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 775 bytes --]

On Thu, Oct 06, 2022 at 02:21:51PM -0400, Rich Felker wrote:
> On Thu, Oct 06, 2022 at 09:37:50AM +0300, Alexey Izbyshev wrote:
> > Hi,
> > 
> > I noticed that fork() doesn't take key_lock that is used to protect
> > the global table of thread-specific keys. I couldn't find mentions
> > of this lock in the MT fork discussion in the mailing list archive.
> > Was this lock overlooked?
> 
> I think what happened was that we made the main list of locks to
> review and take care of via grep for LOCK, and then manually added
> known instances of locks using other locking primitives. This one must
> have been missed.
> 
> Having special-case lock types like this is kinda a pain, but as long
> as there aren't too many I guess it's not a big deal.

Proposed patch attached.

[-- Attachment #2: pthread_key_fork.diff --]
[-- Type: text/plain, Size: 2004 bytes --]

diff --git a/src/internal/fork_impl.h b/src/internal/fork_impl.h
index ae3a79e5..354e733b 100644
--- a/src/internal/fork_impl.h
+++ b/src/internal/fork_impl.h
@@ -16,3 +16,4 @@ extern hidden volatile int *const __vmlock_lockptr;
 
 hidden void __malloc_atfork(int);
 hidden void __ldso_atfork(int);
+hidden void __pthread_key_atfork(int);
diff --git a/src/process/fork.c b/src/process/fork.c
index 80e804b1..56f19313 100644
--- a/src/process/fork.c
+++ b/src/process/fork.c
@@ -37,6 +37,7 @@ static void dummy(int x) { }
 weak_alias(dummy, __fork_handler);
 weak_alias(dummy, __malloc_atfork);
 weak_alias(dummy, __aio_atfork);
+weak_alias(dummy, __pthread_key_atfork);
 weak_alias(dummy, __ldso_atfork);
 
 static void dummy_0(void) { }
@@ -51,6 +52,7 @@ pid_t fork(void)
 	int need_locks = libc.need_locks > 0;
 	if (need_locks) {
 		__ldso_atfork(-1);
+		__pthread_key_atfork(-1);
 		__aio_atfork(-1);
 		__inhibit_ptc();
 		for (int i=0; i<sizeof atfork_locks/sizeof *atfork_locks; i++)
@@ -78,6 +80,7 @@ pid_t fork(void)
 				else **atfork_locks[i] = 0;
 		__release_ptc();
 		if (ret) __aio_atfork(0);
+		__pthread_key_atfork(!ret);
 		__ldso_atfork(!ret);
 	}
 	__restore_sigs(&set);
diff --git a/src/thread/pthread_key_create.c b/src/thread/pthread_key_create.c
index d1120941..39770c7a 100644
--- a/src/thread/pthread_key_create.c
+++ b/src/thread/pthread_key_create.c
@@ -1,4 +1,5 @@
 #include "pthread_impl.h"
+#include "fork_impl.h"
 
 volatile size_t __pthread_tsd_size = sizeof(void *) * PTHREAD_KEYS_MAX;
 void *__pthread_tsd_main[PTHREAD_KEYS_MAX] = { 0 };
@@ -20,6 +21,13 @@ static void dummy_0(void)
 weak_alias(dummy_0, __tl_lock);
 weak_alias(dummy_0, __tl_unlock);
 
+void __pthread_key_atfork(int who)
+{
+	if (who<0) __pthread_rwlock_rdlock(&key_lock);
+	else if (!who) __pthread_rwlock_unlock(&key_lock);
+	else key_lock = (pthread_rwlock_t)PTHREAD_RWLOCK_INITIALIZER;
+}
+
 int __pthread_key_create(pthread_key_t *k, void (*dtor)(void *))
 {
 	pthread_t self = __pthread_self();

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [musl] Re: MT fork and key_lock in pthread_key_create.c
  2022-10-07 21:18           ` Rich Felker
@ 2022-10-08 16:07             ` Alexey Izbyshev
  0 siblings, 0 replies; 15+ messages in thread
From: Alexey Izbyshev @ 2022-10-08 16:07 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 5428 bytes --]

On 2022-10-08 00:18, Rich Felker wrote:
> On Fri, Oct 07, 2022 at 01:53:14PM +0300, Alexey Izbyshev wrote:
>> On 2022-10-07 04:26, Rich Felker wrote:
>> >There's at least one other related bug here: when __aio_get_queue has
>> >to take the write lock, it does so without blocking signals, so close
>> >called from a signal handler that interrupts will self-deadlock.
>> >
>> This is worse. Even if maplock didn't exist, __aio_get_queue still
>> takes the queue lock, so close from a signal handler would still
>> deadlock. Maybe this could be fixed by simply blocking signals
>> earlier in submit? aio_cancel already calls __aio_get_queue with
>> signals blocked.
> 
> The queue lock only affects a close concurrent with aio on the same
> fd. This is at least a programming error and possibly UB, so it's not
> that big a concern.
> 
Thanks. Indeed, since aio_cancel isn't required to be async-signal-safe, 
we're interested only in the case when it's called from close to cancel 
all requests for a particular fd.

>> >This is fixable by blocking signals for the write-lock critical
>> >section, but there's still a potentially long window where we're
>> >blocking forward progress of other threads.
>> >
>> To make this more concrete, are you worrying about the libc-internal
>> allocator delaying other threads attempting to take maplock (even in
>> shared mode)? If so, this seems like a problem that is independent
>> from close signal-safety.
> 
> Yes. The "but there's still..." part is a separate, relatively minor,
> issue that's just a matter of delaying forward progress of unrelated
> close operations, which could be nasty in a realtime process. It only
> happens boundedly-many times (at most once for each fd number ever
> used) so it's not that big a deal asymptotically, only worst-case.
> 
>> >I think the right thing to do here is add another lock for updating
>> >the map that we can take while we still hold the rdlock. After taking
>> >this lock, we can re-inspect if any work is needed. If so, do all the
>> >work *still holding only the rdlock*, but not writing the results into
>> >the existing map. After all the allocation is done, release the
>> >rdlock, take the wrlock (with signals blocked), install the new
>> >memory, then release all the locks. This makes it so the wrlock is
>> >only hold momentarily, under very controlled conditions, so we don't
>> >have to worry about messing up AS-safety of close.
>> >
>> Sorry, I can't understand what exactly you mean here, in particular,
>> what the old lock is supposed to protect if its holders are not
>> mutually-excluded with updates of the map guarded by the new lock.
> 
> Updates to the map are guarded by the new lock, but exclusion of reads
> concurrent with installing the update still need to be guarded by the
> rwlock unless the updates are made atomically.
> 
Thanks. I think I understand the purpose of the new lock now, but I 
still can't understand the locking sequence that you proposed (if I read 
it correctly): "take rdlock -> take new_lock -> drop rdlock -> block 
signals -> take wrlock -> ...". This can't work due to lock ordering 
issues, and it's also unclear why we would need to re-inspect anything 
after taking the new lock if we're still holding the rdlock and the 
latter protects us from "installing the updates" (as you say in the 
follow-up).

I've implemented my understanding of how the newly proposed lock should 
work in the attached patch. The patch doesn't optimize the case when on 
re-inspection we discover an already existing queue (it will still block 
signals and take the write lock), but this can be easily added if 
desired. Also it doesn't handle the new lock in aio_atfork.

>> I understand the general idea of doing allocations outside the
>> critical section though. I imagine it in the following way:
>> 
>> 1. Take maplock in shared mode.
>> 2. Find the first level/entry in the map that we need to allocate.
>> If no allocation is needed, goto 7.
>> 3. Release maplock.
>> 4. Allocate all needed map levels and a queue.
>> 5. Take maplock in exclusive mode.
>> 6. Modify the map as needed, remembering chunks that we need to free
>> because another thread beat us.
>> 7. Take queue lock.
>> 8. Release maplock.
>> 9. Free all unneeded chunks. // This could be delayed further if
>> doing it under the queue lock is undesirable.
> 
> The point of the proposed new lock is to avoid having 9 at all. By
> evaluating what you need to allocate under a lock, you can ensure
> nobody else races to allocate it before you.
> 
>> Waiting at step 7 under the exclusive lock can occur only if at step
>> 6 we discovered that we don't need modify the map at all (i.e. we're
>> going to use an already existing queue). It doesn't seem to be a
>> problem because nothing seems to take the queue lock for prolonged
>> periods (except possibly step 9), but even if it is, we could
>> "downgrade" the exclusive lock to the shared one by dropping it and
>> retrying from step 1 in a loop.
> 
> AFAICT there's no need for a loop. The map only grows never shrinks.
> If it's been seen expanded to cover the fd you want once, it will
> cover it forever.
> 
This is true for the map itself, but not for queues. As soon as we drop 
the write lock after seeing an existing queue, it might disappear before 
we take the read lock again. In your scheme with an additional lock this 
problem is avoided.

Thanks,
Alexey

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: aio-map-update-lock.diff --]
[-- Type: text/x-diff; name=aio-map-update-lock.diff, Size: 3853 bytes --]

diff --git a/src/aio/aio.c b/src/aio/aio.c
index a1a3e791..a097de94 100644
--- a/src/aio/aio.c
+++ b/src/aio/aio.c
@@ -8,6 +8,7 @@
 #include <sys/auxv.h>
 #include "syscall.h"
 #include "atomic.h"
+#include "lock.h"
 #include "pthread_impl.h"
 #include "aio_impl.h"
 
@@ -71,6 +72,7 @@ struct aio_args {
 	sem_t sem;
 };
 
+static volatile int map_update_lock[1];
 static pthread_rwlock_t maplock = PTHREAD_RWLOCK_INITIALIZER;
 static struct aio_queue *****map;
 static volatile int aio_fd_cnt;
@@ -86,40 +88,62 @@ static struct aio_queue *__aio_get_queue(int fd, int need)
 		errno = EBADF;
 		return 0;
 	}
+	sigset_t allmask, origmask;
 	int a=fd>>24;
 	unsigned char b=fd>>16, c=fd>>8, d=fd;
-	struct aio_queue *q = 0;
+	struct aio_queue *****m = 0, ****ma = 0, ***mb = 0, **mc = 0, *q = 0;
+	int n = 0;
+
 	pthread_rwlock_rdlock(&maplock);
-	if ((!map || !map[a] || !map[a][b] || !map[a][b][c] || !(q=map[a][b][c][d])) && need) {
-		pthread_rwlock_unlock(&maplock);
-		if (fcntl(fd, F_GETFD) < 0) return 0;
-		pthread_rwlock_wrlock(&maplock);
+	if (map && map[a] && map[a][b] && map[a][b][c] && (q=map[a][b][c][d]))
+		pthread_mutex_lock(&q->lock);
+	pthread_rwlock_unlock(&maplock);
+
+	if (q || !need)
+		return q;
+	if (fcntl(fd, F_GETFD) < 0)
+		return 0;
+
+	LOCK(map_update_lock);
+	if (!map || !(n++, map[a]) || !(n++, map[a][b]) || !(n++, map[a][b][c]) || !(n++, q=map[a][b][c][d])) {
 		if (!io_thread_stack_size) {
 			unsigned long val = __getauxval(AT_MINSIGSTKSZ);
 			io_thread_stack_size = MAX(MINSIGSTKSZ+2048, val+512);
 		}
-		if (!map) map = calloc(sizeof *map, (-1U/2+1)>>24);
-		if (!map) goto out;
-		if (!map[a]) map[a] = calloc(sizeof **map, 256);
-		if (!map[a]) goto out;
-		if (!map[a][b]) map[a][b] = calloc(sizeof ***map, 256);
-		if (!map[a][b]) goto out;
-		if (!map[a][b][c]) map[a][b][c] = calloc(sizeof ****map, 256);
-		if (!map[a][b][c]) goto out;
-		if (!(q = map[a][b][c][d])) {
-			map[a][b][c][d] = q = calloc(sizeof *****map, 1);
-			if (q) {
-				q->fd = fd;
-				pthread_mutex_init(&q->lock, 0);
-				pthread_cond_init(&q->cond, 0);
-				a_inc(&aio_fd_cnt);
-			}
+		switch (n) {
+			case 0: if (!(m = calloc(sizeof *m, (-1U/2+1)>>24))) goto fail;
+			case 1: if (!(ma = calloc(sizeof *ma, 256))) goto fail;
+			case 2: if (!(mb = calloc(sizeof *mb, 256))) goto fail;
+			case 3: if (!(mc = calloc(sizeof *mc, 256))) goto fail;
+			case 4: if (!(q = calloc(sizeof *q, 1))) goto fail;
 		}
+		q->fd = fd;
+		pthread_mutex_init(&q->lock, 0);
+		pthread_cond_init(&q->cond, 0);
+		a_inc(&aio_fd_cnt);
 	}
-	if (q) pthread_mutex_lock(&q->lock);
-out:
+	sigfillset(&allmask);
+	pthread_sigmask(SIG_BLOCK, &allmask, &origmask);
+	pthread_rwlock_wrlock(&maplock);
+	switch (n) {
+		case 0: map = m;
+		case 1: map[a] = ma;
+		case 2: map[a][b] = mb;
+		case 3: map[a][b][c] = mc;
+		case 4: map[a][b][c][d] = q;
+	}
+	pthread_mutex_lock(&q->lock);
 	pthread_rwlock_unlock(&maplock);
+	pthread_sigmask(SIG_SETMASK, &origmask, 0);
+	UNLOCK(map_update_lock);
 	return q;
+fail:
+	UNLOCK(map_update_lock);
+	free(mc);
+	free(mb);
+	free(ma);
+	free(m);
+	return 0;
 }
 
 static void __aio_unref_queue(struct aio_queue *q)
@@ -134,6 +158,7 @@ static void __aio_unref_queue(struct aio_queue *q)
 	 * may arrive since we cannot free the queue object without first
 	 * taking the maplock, which requires releasing the queue lock. */
 	pthread_mutex_unlock(&q->lock);
+	LOCK(map_update_lock);
 	pthread_rwlock_wrlock(&maplock);
 	pthread_mutex_lock(&q->lock);
 	if (q->ref == 1) {
@@ -144,11 +169,13 @@ static void __aio_unref_queue(struct aio_queue *q)
 		a_dec(&aio_fd_cnt);
 		pthread_rwlock_unlock(&maplock);
 		pthread_mutex_unlock(&q->lock);
+		UNLOCK(map_update_lock);
 		free(q);
 	} else {
 		q->ref--;
 		pthread_rwlock_unlock(&maplock);
 		pthread_mutex_unlock(&q->lock);
+		UNLOCK(map_update_lock);
 	}
 }
 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [musl] MT fork and key_lock in pthread_key_create.c
  2022-10-08  1:36   ` Rich Felker
@ 2022-10-08 17:03     ` Alexey Izbyshev
  2022-10-11 17:50       ` Rich Felker
  0 siblings, 1 reply; 15+ messages in thread
From: Alexey Izbyshev @ 2022-10-08 17:03 UTC (permalink / raw)
  To: musl

On 2022-10-08 04:36, Rich Felker wrote:
> On Thu, Oct 06, 2022 at 02:21:51PM -0400, Rich Felker wrote:
>> On Thu, Oct 06, 2022 at 09:37:50AM +0300, Alexey Izbyshev wrote:
>> > Hi,
>> >
>> > I noticed that fork() doesn't take key_lock that is used to protect
>> > the global table of thread-specific keys. I couldn't find mentions
>> > of this lock in the MT fork discussion in the mailing list archive.
>> > Was this lock overlooked?
>> 
>> I think what happened was that we made the main list of locks to
>> review and take care of via grep for LOCK, and then manually added
>> known instances of locks using other locking primitives. This one must
>> have been missed.
>> 
>> Having special-case lock types like this is kinda a pain, but as long
>> as there aren't too many I guess it's not a big deal.
> 
> Proposed patch attached.

diff --git a/src/internal/fork_impl.h b/src/internal/fork_impl.h
index ae3a79e5..354e733b 100644
--- a/src/internal/fork_impl.h
+++ b/src/internal/fork_impl.h
@@ -16,3 +16,4 @@ extern hidden volatile int *const __vmlock_lockptr;

  hidden void __malloc_atfork(int);
  hidden void __ldso_atfork(int);
+hidden void __pthread_key_atfork(int);
diff --git a/src/process/fork.c b/src/process/fork.c
index 80e804b1..56f19313 100644
--- a/src/process/fork.c
+++ b/src/process/fork.c
@@ -37,6 +37,7 @@ static void dummy(int x) { }
  weak_alias(dummy, __fork_handler);
  weak_alias(dummy, __malloc_atfork);
  weak_alias(dummy, __aio_atfork);
+weak_alias(dummy, __pthread_key_atfork);
  weak_alias(dummy, __ldso_atfork);

  static void dummy_0(void) { }
@@ -51,6 +52,7 @@ pid_t fork(void)
  	int need_locks = libc.need_locks > 0;
  	if (need_locks) {
  		__ldso_atfork(-1);
+		__pthread_key_atfork(-1);
  		__aio_atfork(-1);
  		__inhibit_ptc();
  		for (int i=0; i<sizeof atfork_locks/sizeof *atfork_locks; i++)
@@ -78,6 +80,7 @@ pid_t fork(void)
  				else **atfork_locks[i] = 0;
  		__release_ptc();
  		if (ret) __aio_atfork(0);
+		__pthread_key_atfork(!ret);
  		__ldso_atfork(!ret);
  	}
  	__restore_sigs(&set);
diff --git a/src/thread/pthread_key_create.c 
b/src/thread/pthread_key_create.c
index d1120941..39770c7a 100644
--- a/src/thread/pthread_key_create.c
+++ b/src/thread/pthread_key_create.c
@@ -1,4 +1,5 @@
  #include "pthread_impl.h"
+#include "fork_impl.h"

  volatile size_t __pthread_tsd_size = sizeof(void *) * PTHREAD_KEYS_MAX;
  void *__pthread_tsd_main[PTHREAD_KEYS_MAX] = { 0 };
@@ -20,6 +21,13 @@ static void dummy_0(void)
  weak_alias(dummy_0, __tl_lock);
  weak_alias(dummy_0, __tl_unlock);

+void __pthread_key_atfork(int who)
+{
+	if (who<0) __pthread_rwlock_rdlock(&key_lock);
+	else if (!who) __pthread_rwlock_unlock(&key_lock);
+	else key_lock = (pthread_rwlock_t)PTHREAD_RWLOCK_INITIALIZER;

Are you using PTHREAD_RWLOCK_INITIALIZER to avoid dependency on 
pthread_rwlock_init (which is used in the aio_atfork patch[1])?

[1] https://www.openwall.com/lists/musl/2022/10/06/5

+}
+
  int __pthread_key_create(pthread_key_t *k, void (*dtor)(void *))
  {
  	pthread_t self = __pthread_self();

Looks good to me otherwise.

Thanks,
Alexey


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [musl] MT fork and key_lock in pthread_key_create.c
  2022-10-08 17:03     ` Alexey Izbyshev
@ 2022-10-11 17:50       ` Rich Felker
  0 siblings, 0 replies; 15+ messages in thread
From: Rich Felker @ 2022-10-11 17:50 UTC (permalink / raw)
  To: musl

On Sat, Oct 08, 2022 at 08:03:24PM +0300, Alexey Izbyshev wrote:
> On 2022-10-08 04:36, Rich Felker wrote:
> >On Thu, Oct 06, 2022 at 02:21:51PM -0400, Rich Felker wrote:
> >>On Thu, Oct 06, 2022 at 09:37:50AM +0300, Alexey Izbyshev wrote:
> >>> Hi,
> >>>
> >>> I noticed that fork() doesn't take key_lock that is used to protect
> >>> the global table of thread-specific keys. I couldn't find mentions
> >>> of this lock in the MT fork discussion in the mailing list archive.
> >>> Was this lock overlooked?
> >>
> >>I think what happened was that we made the main list of locks to
> >>review and take care of via grep for LOCK, and then manually added
> >>known instances of locks using other locking primitives. This one must
> >>have been missed.
> >>
> >>Having special-case lock types like this is kinda a pain, but as long
> >>as there aren't too many I guess it's not a big deal.
> >
> >Proposed patch attached.
> 
> diff --git a/src/internal/fork_impl.h b/src/internal/fork_impl.h
> index ae3a79e5..354e733b 100644
> --- a/src/internal/fork_impl.h
> +++ b/src/internal/fork_impl.h
> @@ -16,3 +16,4 @@ extern hidden volatile int *const __vmlock_lockptr;
> 
>  hidden void __malloc_atfork(int);
>  hidden void __ldso_atfork(int);
> +hidden void __pthread_key_atfork(int);
> diff --git a/src/process/fork.c b/src/process/fork.c
> index 80e804b1..56f19313 100644
> --- a/src/process/fork.c
> +++ b/src/process/fork.c
> @@ -37,6 +37,7 @@ static void dummy(int x) { }
>  weak_alias(dummy, __fork_handler);
>  weak_alias(dummy, __malloc_atfork);
>  weak_alias(dummy, __aio_atfork);
> +weak_alias(dummy, __pthread_key_atfork);
>  weak_alias(dummy, __ldso_atfork);
> 
>  static void dummy_0(void) { }
> @@ -51,6 +52,7 @@ pid_t fork(void)
>  	int need_locks = libc.need_locks > 0;
>  	if (need_locks) {
>  		__ldso_atfork(-1);
> +		__pthread_key_atfork(-1);
>  		__aio_atfork(-1);
>  		__inhibit_ptc();
>  		for (int i=0; i<sizeof atfork_locks/sizeof *atfork_locks; i++)
> @@ -78,6 +80,7 @@ pid_t fork(void)
>  				else **atfork_locks[i] = 0;
>  		__release_ptc();
>  		if (ret) __aio_atfork(0);
> +		__pthread_key_atfork(!ret);
>  		__ldso_atfork(!ret);
>  	}
>  	__restore_sigs(&set);
> diff --git a/src/thread/pthread_key_create.c
> b/src/thread/pthread_key_create.c
> index d1120941..39770c7a 100644
> --- a/src/thread/pthread_key_create.c
> +++ b/src/thread/pthread_key_create.c
> @@ -1,4 +1,5 @@
>  #include "pthread_impl.h"
> +#include "fork_impl.h"
> 
>  volatile size_t __pthread_tsd_size = sizeof(void *) * PTHREAD_KEYS_MAX;
>  void *__pthread_tsd_main[PTHREAD_KEYS_MAX] = { 0 };
> @@ -20,6 +21,13 @@ static void dummy_0(void)
>  weak_alias(dummy_0, __tl_lock);
>  weak_alias(dummy_0, __tl_unlock);
> 
> +void __pthread_key_atfork(int who)
> +{
> +	if (who<0) __pthread_rwlock_rdlock(&key_lock);
> +	else if (!who) __pthread_rwlock_unlock(&key_lock);
> +	else key_lock = (pthread_rwlock_t)PTHREAD_RWLOCK_INITIALIZER;
> 
> Are you using PTHREAD_RWLOCK_INITIALIZER to avoid dependency on
> pthread_rwlock_init (which is used in the aio_atfork patch[1])?

Yes. See commit 639bcf251e549f634da9a3e7ef8528eb2ec12505. We could add
an additional namespace-safe version of pthread_rwlock_init, but it
didn't seem worthwhile when we can just do it inline like this. This
would not be acceptable portable POSIX code in general, but within
musl, I think it's perfectly reasonable to assume that there's no
magic going on in the initializers and that the assignment works as
intended. It's not like pthread_rwlock_init is portable here itself,
anyway; it's possible as an implementation choice for example that all
live rwlocks are entered into some list, and that clobbering one this
way corrupts the list.

> +}
> +
>  int __pthread_key_create(pthread_key_t *k, void (*dtor)(void *))
>  {
>  	pthread_t self = __pthread_self();
> 
> Looks good to me otherwise.

Thanks for looking it over.

Rich

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-10-11 17:51 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-06  6:37 [musl] MT fork and key_lock in pthread_key_create.c Alexey Izbyshev
2022-10-06  7:02 ` [musl] " Alexey Izbyshev
2022-10-06 19:20   ` Rich Felker
2022-10-06 19:50     ` Rich Felker
2022-10-07  1:26       ` Rich Felker
2022-10-07 10:53         ` Alexey Izbyshev
2022-10-07 21:18           ` Rich Felker
2022-10-08 16:07             ` Alexey Izbyshev
2022-10-07  8:18       ` Alexey Izbyshev
2022-10-06 20:04     ` Jeffrey Walton
2022-10-06 20:09       ` Rich Felker
2022-10-06 18:21 ` [musl] " Rich Felker
2022-10-08  1:36   ` Rich Felker
2022-10-08 17:03     ` Alexey Izbyshev
2022-10-11 17:50       ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).