From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.4 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 31042 invoked from network); 20 May 2020 17:41:02 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 20 May 2020 17:41:02 -0000 Received: (qmail 11531 invoked by uid 550); 20 May 2020 17:40:59 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 10080 invoked from network); 20 May 2020 17:38:47 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.ru; s=default; t=1589996316; bh=QKxsYp3ExYgbLqvNfAOW8cKzdWJIRERMBOHhYekeVJY=; h=In-Reply-To:References:Date:Message-ID:From:To:Subject; b=Ixl5oDzUpDLeOf0Q26JQiFx3ZBh7/asmEeFVq3m5VyrYTsU9oLS33WcppjPkna7VS AMeVhWLHOVxeFyFXkNKPSqoUvFN2OKlFouXy0m5GiUsCFLDaKZjC2nKVYt2AhUblk7 bopyfTMwZYGXSz3MD+L/1Zl+GUJbkJlIY2kKoOww= Authentication-Results: mxbackcorp2j.mail.yandex.net; dkim=pass header.i=@yandex-team.ru To: Rich Felker , musl@lists.openwall.com References: <20200520160506.GL1079@brightrain.aerifal.cx> From: Konstantin Khlebnikov Message-ID: Date: Wed, 20 May 2020 20:38:35 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <20200520160506.GL1079@brightrain.aerifal.cx> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-CA Content-Transfer-Encoding: 7bit Subject: Re: [musl] pthread shouldn't ignore errors from syscall futex() On 20/05/2020 19.05, Rich Felker wrote: > On Wed, May 20, 2020 at 03:31:46PM +0300, Konstantin Khlebnikov wrote: >> Userspace implementations of mutexes (including glibc) in some cases >> retries operation without checking error code from syscall futex. >> >> Example which loops inside second call rather than hung (or die) peacefully: >> >> #include >> #include >> >> int main(int argc, char **argv) >> { >> char buf[sizeof(pthread_mutex_t) + 1]; >> pthread_mutex_t *mutex = (pthread_mutex_t *)(buf + 1); >> >> pthread_mutex_init(mutex, NULL); >> pthread_mutex_lock(mutex); >> pthread_mutex_lock(mutex); >> } >> >> Thread in lkml: >> https://lore.kernel.org/lkml/158955700764.647498.18025770126733698386.stgit@buzz/T/ >> >> Related bug in glibc: >> https://sourceware.org/bugzilla/show_bug.cgi?id=25997 > > In general, this behavior is intentional. If running on a system where > futexx is broken (incomplete implementation of Linux syscall API, > Linux built with flags that break futex which is possible on some > archs, etc.), or if the kernel cannot perform the wait because of an > OOM condition in the kernel (Linux is *not* written to be resilent > against OOM and it shows), the behavior degrades to spinlocks rather > than crashing. Aborting the application because of OOM conditions in > the kernel is simply not acceptable. Yes, OOM condition in cgroup before linux 4.19 definitely could lead to returning EFAULT by almost any syscall. This is worth to document in futex manpage. But EINVAL from futex() always meant arguments were wrong. Ignoring unknown errors feels wrong anyway. That just hides bugs. And provokes appearing these incomplete/buggy implementations of futex. Also degrading silently to spin-locks isn't very safe. Not all schedulers guarantee progress if waiter spins. At least add some delay or yield into that fallback waiting loop. > > It would be possible to try to distinguish the causes of futex failure > and handle the unaligned case specially, but this would put more code > in hot paths, impacting size and possibly performance in valid > programs for the sake of catching a non-security bug in invalid ones. > This does not seem like a useful tradeoff. I've proposed to send SIGBUS from syscall when futex address is unligned. (In LKML thread, see link above) > > Assuming the buggy program actually calls pthread_mutex_init rather > than just using an uninitialized/zero-initialized mutex object at > misaligned address, pthread_mutex_init (and likewise other pthread > object init functions) could possibly trap on the error (with no > syscall, just looking for a misaligned address mod _Alignof() the > object type) to catch it. I'm not sure if this is worthwhile though > since, while being UB, it doesn't seem to be UB with any security > impact. Yeah, I'm worried more about debugability and CO2 emission =) > > Rich >