From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org
X-Spam-Level: 
X-Spam-Status: No, score=-3.3 required=5.0 tests=MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL autolearn=ham
	autolearn_force=no version=3.4.4
Received: (qmail 20911 invoked from network); 20 May 2020 16:05:24 -0000
Received: from mother.openwall.net (195.42.179.200)
  by inbox.vuxu.org with ESMTPUTF8; 20 May 2020 16:05:24 -0000
Received: (qmail 3978 invoked by uid 550); 20 May 2020 16:05:22 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Reply-To: musl@lists.openwall.com
Received: (qmail 3954 invoked from network); 20 May 2020 16:05:21 -0000
Date: Wed, 20 May 2020 12:05:07 -0400
From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Message-ID: <20200520160506.GL1079@brightrain.aerifal.cx>
References: <c6502d12-8092-3572-2827-1f7884402b8d@yandex-team.ru>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <c6502d12-8092-3572-2827-1f7884402b8d@yandex-team.ru>
User-Agent: Mutt/1.5.21 (2010-09-15)
Subject: Re: [musl] pthread shouldn't ignore errors from syscall futex()

On Wed, May 20, 2020 at 03:31:46PM +0300, Konstantin Khlebnikov wrote:
> Userspace implementations of mutexes (including glibc) in some cases
> retries operation without checking error code from syscall futex.
> 
> Example which loops inside second call rather than hung (or die) peacefully:
> 
> #include <stdlib.h>
> #include <pthread.h>
> 
> int main(int argc, char **argv)
> {
> 	char buf[sizeof(pthread_mutex_t) + 1];
> 	pthread_mutex_t *mutex = (pthread_mutex_t *)(buf + 1);
> 
> 	pthread_mutex_init(mutex, NULL);
> 	pthread_mutex_lock(mutex);
> 	pthread_mutex_lock(mutex);
> }
> 
> Thread in lkml:
> https://lore.kernel.org/lkml/158955700764.647498.18025770126733698386.stgit@buzz/T/
> 
> Related bug in glibc:
> https://sourceware.org/bugzilla/show_bug.cgi?id=25997

In general, this behavior is intentional. If running on a system where
futexx is broken (incomplete implementation of Linux syscall API,
Linux built with flags that break futex which is possible on some
archs, etc.), or if the kernel cannot perform the wait because of an
OOM condition in the kernel (Linux is *not* written to be resilent
against OOM and it shows), the behavior degrades to spinlocks rather
than crashing. Aborting the application because of OOM conditions in
the kernel is simply not acceptable.

It would be possible to try to distinguish the causes of futex failure
and handle the unaligned case specially, but this would put more code
in hot paths, impacting size and possibly performance in valid
programs for the sake of catching a non-security bug in invalid ones.
This does not seem like a useful tradeoff.

Assuming the buggy program actually calls pthread_mutex_init rather
than just using an uninitialized/zero-initialized mutex object at
misaligned address, pthread_mutex_init (and likewise other pthread
object init functions) could possibly trap on the error (with no
syscall, just looking for a misaligned address mod _Alignof() the
object type) to catch it. I'm not sure if this is worthwhile though
since, while being UB, it doesn't seem to be UB with any security
impact.

Rich