From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org
X-Spam-Level: 
X-Spam-Status: No, score=-3.3 required=5.0 tests=MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,
	T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4
Received: (qmail 2498 invoked from network); 10 Jan 2024 01:55:51 -0000
Received: from second.openwall.net (193.110.157.125)
  by inbox.vuxu.org with ESMTPUTF8; 10 Jan 2024 01:55:51 -0000
Received: (qmail 30274 invoked by uid 550); 10 Jan 2024 01:54:17 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Reply-To: musl@lists.openwall.com
Received: (qmail 30230 invoked from network); 10 Jan 2024 01:54:17 -0000
Date: Tue, 9 Jan 2024 20:55:50 -0500
From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Message-ID: <20240110015550.GP4163@brightrain.aerifal.cx>
References: <ec138086-c5b9-4ca2-9da5-bef8b14de27d@dustri.org>
 <20240109190726.GO4163@brightrain.aerifal.cx>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20240109190726.GO4163@brightrain.aerifal.cx>
User-Agent: Mutt/1.5.21 (2010-09-15)
Subject: Re: [musl] Protect pthreads' mutexes against use-after-destroy

On Tue, Jan 09, 2024 at 02:07:26PM -0500, Rich Felker wrote:
> On Tue, Jan 09, 2024 at 03:37:17PM +0100, jvoisin wrote:
> > Ohai,
> > 
> > as discussed on irc, Android's bionic has a check to prevent
> > use-after-destroy on phtread mutexes
> > (https://github.com/LineageOS/android_bionic/blob/e0aac7df6f58138dae903b5d456c947a3f8092ea/libc/bionic/pthread_mutex.cpp#L803),
> > and musl doesn't.
> > 
> > While odds are that this is a super-duper common bug, it would still be
> > nice to have this kind of protection, since it's cheap, and would
> > prevent/make it easy to diagnose weird states.
> > 
> > Is this something that should/could be implemented?
> > 
> > o/
> 
> I think you meant that the odds are it's not common. There's already
> enough complexity in the code paths for supporting all the different
> mutex types that my leaning would be, if we do any hardening for
> use-after-destroy, that it should probably just take the form of
> putting the object in a state that will naturally deadlock or error
> rather than adding extra checks to every path where it's used.
> 
> If OTOH we do want it to actually trap in all cases where it's used
> after destroy, the simplest way to achieve that is probably to set it
> up as a non-robust non-PI recursive or errorchecking mutex with
> invalid prev/next pointers and owner of 0x3fffffff. Then the only
> place that would actually have to have an explicit trap is trylock in
> the code path:
> 
>         if (own == 0x3fffffff) return ENOTRECOVERABLE;
> 
> where it could trap if type isn't robust. The unlock code path would
> trap on accessing invalid prev/next pointers.

Unfortunately I discovered a problem we need to deal with in
researching for this: at some point Linux quietly changed the futex
ABI, so that bit 29 is no longer reserved but potentially a tid bit.
This was documented in 9c40365a65d62d7c06a95fb331b3442cb02d2fd9 but
apparently actually happened at the source level a long time before
that. So, we cannot assume 0x3fffffff is not a valid tid, and thereby
cannot assume 0x7fffffff is not equal to ownerdead|valid_tid.

This probably means we need to find a way to encode "not recoverable"
as 0x40000000, as 0 is now the _only_ value in the low-30-bits that
can't potentially be a valid tid.

I'll look at this more over the next day or two. It's probably fixable
but requires fiddling with delicate logic.

Note that the only in-the-wild breakage possible is on systems where
the pid/tid limit has been set extremely high, where attempts to lock
a recursive or errorchecking mutex owned by a thread with tid
0x3fffffff could malfunction.

Rich