From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/7371
Path: news.gmane.org!not-for-mail
From: Rich Felker <dalias@libc.org>
Newsgroups: gmane.comp.lib.glibc.alpha,gmane.linux.lib.musl.general
Subject: Inherent race condition in linux robust_list system
Date: Thu, 9 Apr 2015 23:31:54 -0400
Message-ID: <20150410033154.GA27410@brightrain.aerifal.cx>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: ger.gmane.org 1428636738 12869 80.91.229.3 (10 Apr 2015 03:32:18 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Fri, 10 Apr 2015 03:32:18 +0000 (UTC)
Cc: libc-alpha@sourceware.org
To: musl@lists.openwall.com
Original-X-From: libc-alpha-return-58405-glibc-alpha=m.gmane.org@sourceware.org Fri Apr 10 05:32:17 2015
Return-path: <libc-alpha-return-58405-glibc-alpha=m.gmane.org@sourceware.org>
Envelope-to: glibc-alpha@plane.gmane.org
Original-Received: from server1.sourceware.org ([209.132.180.131] helo=sourceware.org)
	by plane.gmane.org with esmtp (Exim 4.69)
	(envelope-from <libc-alpha-return-58405-glibc-alpha=m.gmane.org@sourceware.org>)
	id 1YgPg0-0008FW-TX
	for glibc-alpha@plane.gmane.org; Fri, 10 Apr 2015 05:32:17 +0200
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
	:list-unsubscribe:list-subscribe:list-archive:list-post
	:list-help:sender:date:from:to:cc:subject:message-id
	:mime-version:content-type; q=dns; s=default; b=w+2RlXc8gF0cj2oz
	xsuLVant+dR6EksJdjC8Unc7mUI0/HTpphoNuc246tgx8ArPG7fF4gWUp//G04Xt
	nr7TffaUOp0q86V88QDOKk8T3Ysmt2qD1egvziEKYcZpH5Y8fIdQtep96ue2Gqqd
	UOkKDxnxHrKzX3d6YgOPX2XgFFE=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
	:list-unsubscribe:list-subscribe:list-archive:list-post
	:list-help:sender:date:from:to:cc:subject:message-id
	:mime-version:content-type; s=default; bh=CTnd+cxbnjgM5awWbPZgzc
	28xQk=; b=OZh+6NwazJQ7ngI8m/UahWq3y1HelkRBfnk+YdyVbX0nN72u3aCcpf
	ADjrMTDvFmb+JlnOje6d/PTNUY5uMiL9HvhBAi+OZcIbde4IBFscVOq4vSqGSLqY
	qFXGmMBah1Ie0J4s3ipAqdfUjFqbjdrnR18RTjhJnNPLqZ7P6hNSk=
Original-Received: (qmail 53404 invoked by alias); 10 Apr 2015 03:32:12 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Unsubscribe: <mailto:libc-alpha-unsubscribe-glibc-alpha=m.gmane.org@sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Original-Sender: libc-alpha-owner@sourceware.org
Original-Received: (qmail 53388 invoked by uid 89); 10 Apr 2015 03:32:11 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-0.3 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RDNS_DYNAMIC,TVD_RCVD_IP autolearn=no version=3.3.2
X-HELO: brightrain.aerifal.cx
Content-Disposition: inline
User-Agent: Mutt/1.5.21 (2010-09-15)
Xref: news.gmane.org gmane.comp.lib.glibc.alpha:50700 gmane.linux.lib.musl.general:7371
Archived-At: <http://permalink.gmane.org/gmane.comp.lib.glibc.alpha/50700>

While working on some of the code handling robust_list for robust (and
other owner-tracked) mutexes in musl, I've come across a race
condition that's inherent in the kernel's design for robust_list.
There is no way to eliminate it with the current API, and I see no way
to eliminate it without requiring a syscall to unlock robust mutexes.

The procedure for unlocking a robust_list tracked mutex looks like
this:

1. Store the address of the mutex to be unlocked in the robust_list
   "pending" slot.

2. Remove the mutex from the robust_list linked list.

3. Unlock the mutex.

4. Clear the "pending" slot in the robust_list.

The purpose of the pending slot is so that the kernel can handle the
case where a process dies asynchronously after removing the mutex from
the linked list but before it's unlocked; in this case it treats the
mutex like it's still in the list. But the kernel has no way of
knowing whether such asynchronous process death occurs before or after
step 3; it only knows it occurs between steps 2 and 4. This is very
bad.

As soon as step 3 takes place, another process can take ownership of
the mutex, and if it knows it's the last user, it can unlock and
destroy the mutex and then reuse the same memory for a new purpose
(imagine a shared-memory heap managed by a malloc-like allocator,
which would be a good application for robust mutexes). Now, if the new
use happens to store a value matching the tid of the thread whose
process is dying at the offset where the mutex owner would be stored,
the kernel misinterprets the new data stored there as a mutex
belonging to the dying process, and happily proceeds to corrupt it!

Fixing this does not look easy. The obvious way is to make clearing
the pending slot of the robust_list effectively atomic with unlocking
the mutex by doing them together in a (futex) syscall, but that would
require a syscall every time a robust mutex is unlocked. An alternate
approach would be enlarging the robust_list to have a PC range during
which the pending slot is valid. This would avoid a syscall but would
require the atomic unlock to be performed in asm (to provide labels
for the PC range). I do not see any way to fix it without kernel
changes.

Please note that this issue is distinct from glibc bug #14485, which
is easily fixable and does not affect musl. The issue I'm describing
here is much harder to fix because it's legal reuse of the same shared
memory mapping the robust mutex existed in rather than reuse of the
same virtual address range for a new mapping.

Rich