From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/7371 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.comp.lib.glibc.alpha,gmane.linux.lib.musl.general Subject: Inherent race condition in linux robust_list system Date: Thu, 9 Apr 2015 23:31:54 -0400 Message-ID: <20150410033154.GA27410@brightrain.aerifal.cx> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1428636738 12869 80.91.229.3 (10 Apr 2015 03:32:18 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 10 Apr 2015 03:32:18 +0000 (UTC) Cc: libc-alpha@sourceware.org To: musl@lists.openwall.com Original-X-From: libc-alpha-return-58405-glibc-alpha=m.gmane.org@sourceware.org Fri Apr 10 05:32:17 2015 Return-path: Envelope-to: glibc-alpha@plane.gmane.org Original-Received: from server1.sourceware.org ([209.132.180.131] helo=sourceware.org) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1YgPg0-0008FW-TX for glibc-alpha@plane.gmane.org; Fri, 10 Apr 2015 05:32:17 +0200 DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:cc:subject:message-id :mime-version:content-type; q=dns; s=default; b=w+2RlXc8gF0cj2oz xsuLVant+dR6EksJdjC8Unc7mUI0/HTpphoNuc246tgx8ArPG7fF4gWUp//G04Xt nr7TffaUOp0q86V88QDOKk8T3Ysmt2qD1egvziEKYcZpH5Y8fIdQtep96ue2Gqqd UOkKDxnxHrKzX3d6YgOPX2XgFFE= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:cc:subject:message-id :mime-version:content-type; s=default; bh=CTnd+cxbnjgM5awWbPZgzc 28xQk=; b=OZh+6NwazJQ7ngI8m/UahWq3y1HelkRBfnk+YdyVbX0nN72u3aCcpf ADjrMTDvFmb+JlnOje6d/PTNUY5uMiL9HvhBAi+OZcIbde4IBFscVOq4vSqGSLqY qFXGmMBah1Ie0J4s3ipAqdfUjFqbjdrnR18RTjhJnNPLqZ7P6hNSk= Original-Received: (qmail 53404 invoked by alias); 10 Apr 2015 03:32:12 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Original-Sender: libc-alpha-owner@sourceware.org Original-Received: (qmail 53388 invoked by uid 89); 10 Apr 2015 03:32:11 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.3 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RDNS_DYNAMIC,TVD_RCVD_IP autolearn=no version=3.3.2 X-HELO: brightrain.aerifal.cx Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Xref: news.gmane.org gmane.comp.lib.glibc.alpha:50700 gmane.linux.lib.musl.general:7371 Archived-At: While working on some of the code handling robust_list for robust (and other owner-tracked) mutexes in musl, I've come across a race condition that's inherent in the kernel's design for robust_list. There is no way to eliminate it with the current API, and I see no way to eliminate it without requiring a syscall to unlock robust mutexes. The procedure for unlocking a robust_list tracked mutex looks like this: 1. Store the address of the mutex to be unlocked in the robust_list "pending" slot. 2. Remove the mutex from the robust_list linked list. 3. Unlock the mutex. 4. Clear the "pending" slot in the robust_list. The purpose of the pending slot is so that the kernel can handle the case where a process dies asynchronously after removing the mutex from the linked list but before it's unlocked; in this case it treats the mutex like it's still in the list. But the kernel has no way of knowing whether such asynchronous process death occurs before or after step 3; it only knows it occurs between steps 2 and 4. This is very bad. As soon as step 3 takes place, another process can take ownership of the mutex, and if it knows it's the last user, it can unlock and destroy the mutex and then reuse the same memory for a new purpose (imagine a shared-memory heap managed by a malloc-like allocator, which would be a good application for robust mutexes). Now, if the new use happens to store a value matching the tid of the thread whose process is dying at the offset where the mutex owner would be stored, the kernel misinterprets the new data stored there as a mutex belonging to the dying process, and happily proceeds to corrupt it! Fixing this does not look easy. The obvious way is to make clearing the pending slot of the robust_list effectively atomic with unlocking the mutex by doing them together in a (futex) syscall, but that would require a syscall every time a robust mutex is unlocked. An alternate approach would be enlarging the robust_list to have a PC range during which the pending slot is valid. This would avoid a syscall but would require the atomic unlock to be performed in asm (to provide labels for the PC range). I do not see any way to fix it without kernel changes. Please note that this issue is distinct from glibc bug #14485, which is easily fixable and does not affect musl. The issue I'm describing here is much harder to fix because it's legal reuse of the same shared memory mapping the robust mutex existed in rather than reuse of the same virtual address range for a new mapping. Rich