From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org
X-Spam-Level: 
X-Spam-Status: No, score=-1.1 required=5.0 tests=DKIM_SIGNED,DKIM_VALID,
	DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_MSPIKE_H2 autolearn=ham
	autolearn_force=no version=3.4.4
Received: (qmail 15524 invoked from network); 7 Oct 2022 08:18:53 -0000
Received: from second.openwall.net (193.110.157.125)
  by inbox.vuxu.org with ESMTPUTF8; 7 Oct 2022 08:18:53 -0000
Received: (qmail 14193 invoked by uid 550); 7 Oct 2022 08:18:50 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Reply-To: musl@lists.openwall.com
Received: (qmail 14148 invoked from network); 7 Oct 2022 08:18:49 -0000
DKIM-Filter: OpenDKIM Filter v2.11.0 mail.ispras.ru 7F7FE40786AE
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ispras.ru;
	s=default; t=1665130716;
	bh=7Y7GIDBKZT5KDdF9S8c9T5lzn5wQydgxdBxiG3KEOz4=;
	h=Date:From:To:Subject:Reply-To:In-Reply-To:References:From;
	b=nlNSa0G8jwAybWUK3P8iR4jEagBtLHqXQMXaly108sa8/0UpxdniXwKSmjfo2+MR0
	 uamJVV7xt0yoOtfSVawIxqdLWZ6zmps8+dlDIA91gGM6b2S0O7DFhop+iZUfgoAH3W
	 ETNNPoNOvMM84d9pKGTPEoY8skBCuLrWA59CTa8o=
MIME-Version: 1.0
Date: Fri, 07 Oct 2022 11:18:36 +0300
From: Alexey Izbyshev <izbyshev@ispras.ru>
To: musl@lists.openwall.com
Mail-Followup-To: musl@lists.openwall.com
In-Reply-To: <20221006195053.GX29905@brightrain.aerifal.cx>
References: <033092a85a52d9f8e8d73a235e2381ae@ispras.ru>
 <6cf9cc5562e17610392c0f9c2cc68316@ispras.ru>
 <20221006192042.GW29905@brightrain.aerifal.cx>
 <20221006195053.GX29905@brightrain.aerifal.cx>
User-Agent: Roundcube Webmail/1.4.4
Message-ID: <70a52b448eef097b1dc1dd6f78f1921e@ispras.ru>
X-Sender: izbyshev@ispras.ru
Content-Type: text/plain; charset=US-ASCII;
 format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [musl] Re: MT fork and key_lock in pthread_key_create.c

On 2022-10-06 22:50, Rich Felker wrote:
> On Thu, Oct 06, 2022 at 03:20:42PM -0400, Rich Felker wrote:
>> On Thu, Oct 06, 2022 at 10:02:11AM +0300, Alexey Izbyshev wrote:
>> > On 2022-10-06 09:37, Alexey Izbyshev wrote:
>> > >Hi,
>> > >
>> > >I noticed that fork() doesn't take key_lock that is used to protect
>> > >the global table of thread-specific keys. I couldn't find mentions of
>> > >this lock in the MT fork discussion in the mailing list archive. Was
>> > >this lock overlooked?
>> > >
>> > >Also, I looked at how __aio_atfork() handles a similar case with
>> > >maplock, and it seems wrong. It takes the read lock and then simply
>> > >unlocks it both in the parent and in the child. But if there were
>> > >other holders of the read lock at the time of fork(), the lock won't
>> > >end up in the unlocked state in the child. It should probably be
>> > >completely nulled-out in the child instead.
>> > >
>> > Looking at aio further, I don't understand how it's supposed to work
>> > with MT fork at all. __aio_atfork() is called in _Fork() when the
>> > allocator locks are already held. Meanwhile another thread could be
>> > stuck in __aio_get_queue() holding maplock in exclusive mode while
>> > trying to allocate, resulting in deadlock.
>> 
>> Indeed, this is messy and I don't think it makes sense to be doing
>> this at all. The child is just going to throw away the state so the
>> parent shouldn't need to synchronize at all, but if we walk the
>> multi-level map[] table in the child after async fork, it's possible
>> that the contents seen are inconsistent, even that the pointers are
>> only half-written or something.
>> 
Doesn't musl assume that pointer-sized memory accesses are atomic?

>> I see a few possible solutions:
>> 
>> 1. Just set map = 0 in the child and leak the memory. This is not
>>    going to matter unless you're doing multiple generations of fork
>>    with aio anyway.
>> 
>> 2. The same, but be a little bit smarter. pthread_rwlock_tryrdlock in
>>    the child, and if it succeeds, we know the map is consistent so we
>>    can just zero it out the same as now. Still "leaks" but only on
>>    contention to expand the map.
>> 
>> 3. Getting a little smarter still: move the __aio_atfork for the
>>    parent side from _Fork to fork, outside of the critical section
>>    where malloc lock is held. Then proceed as in (2). Now, the
>>    tryrdlock is guaranteed to succeed in the child. Leak is only
>>    possible when _Fork is used (in which case the child context is an
>>    async signal one, and thus calling any aio_* that would allocate
>>    map[] again is UB -- note that in this case, the only reason we
>>    have to do anything at all in the child is to prevent close from
>>    interacting with aio).
>> 
>> After writing them out, 3 seems like the right choice.
> 
I agree.

> Proposed patch attached.

> diff --git a/src/aio/aio.c b/src/aio/aio.c
> index fa24f6b6..4c3379e1 100644
> --- a/src/aio/aio.c
> +++ b/src/aio/aio.c
> @@ -401,11 +401,25 @@ void __aio_atfork(int who)
>  	if (who<0) {
>  		pthread_rwlock_rdlock(&maplock);
>  		return;
> +	} else if (!who) {
> +		pthread_rwlock_unlock(&maplock);
> +		return;
>  	}

It probably makes sense to reset "aio_fd_cnt" here, though it matters 
only in a case when there are so many nested fork() children each using 
aio that it eventually overflows (breaking aio_close).

> -	if (who>0 && map) for (int a=0; a<(-1U/2+1)>>24; a++)
> +	if (pthread_rwlock_tryrdlock(&maplock)) {
> +		/* Obtaining lock may fail if _Fork was called nor via

s/nor/not/

> +		 * fork. In this case, no further aio is possible from
> +		 * child and we can just null out map so __aio_close
> +		 * does not attempt to do anything. */
> +		map = 0;
> +		return;
> +	}
> +	if (map) for (int a=0; a<(-1U/2+1)>>24; a++)
>  		if (map[a]) for (int b=0; b<256; b++)
>  			if (map[a][b]) for (int c=0; c<256; c++)
>  				if (map[a][b][c]) for (int d=0; d<256; d++)
>  					map[a][b][c][d] = 0;
> -	pthread_rwlock_unlock(&maplock);
> +	/* Re-initialize the rwlock rather than unlocking since there
> +	 * may have been more than one reference on it in the parent.
> +	 * We are not a lock holder anyway; the thread in the parent was. */
> +	pthread_rwlock_init(&maplock, 0);
>  }
> diff --git a/src/process/_Fork.c b/src/process/_Fork.c
> index da063868..fb0fdc2c 100644
> --- a/src/process/_Fork.c
> +++ b/src/process/_Fork.c
> @@ -14,7 +14,6 @@ pid_t _Fork(void)
>  	pid_t ret;
>  	sigset_t set;
>  	__block_all_sigs(&set);
> -	__aio_atfork(-1);
>  	LOCK(__abort_lock);
>  #ifdef SYS_fork
>  	ret = __syscall(SYS_fork);
> @@ -32,7 +31,7 @@ pid_t _Fork(void)
>  		if (libc.need_locks) libc.need_locks = -1;
>  	}
>  	UNLOCK(__abort_lock);
> -	__aio_atfork(!ret);
> +	if (!ret) __aio_atfork(1);
>  	__restore_sigs(&set);
>  	return __syscall_ret(ret);
>  }
> diff --git a/src/process/fork.c b/src/process/fork.c
> index ff71845c..80e804b1 100644
> --- a/src/process/fork.c
> +++ b/src/process/fork.c
> @@ -36,6 +36,7 @@ static volatile int *const *const atfork_locks[] = {
>  static void dummy(int x) { }
>  weak_alias(dummy, __fork_handler);
>  weak_alias(dummy, __malloc_atfork);
> +weak_alias(dummy, __aio_atfork);
>  weak_alias(dummy, __ldso_atfork);
> 
>  static void dummy_0(void) { }
> @@ -50,6 +51,7 @@ pid_t fork(void)
>  	int need_locks = libc.need_locks > 0;
>  	if (need_locks) {
>  		__ldso_atfork(-1);
> +		__aio_atfork(-1);
>  		__inhibit_ptc();
>  		for (int i=0; i<sizeof atfork_locks/sizeof *atfork_locks; i++)
>  			if (*atfork_locks[i]) LOCK(*atfork_locks[i]);
> @@ -75,6 +77,7 @@ pid_t fork(void)
>  				if (ret) UNLOCK(*atfork_locks[i]);
>  				else **atfork_locks[i] = 0;
>  		__release_ptc();
> +		if (ret) __aio_atfork(0);
>  		__ldso_atfork(!ret);
>  	}
>  	__restore_sigs(&set);

Looks good to me otherwise.

Thanks,
Alexey