From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/11383 Path: news.gmane.org!.POSTED!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: Use-after-free in __unlock Date: Thu, 1 Jun 2017 11:57:53 -0400 Message-ID: <20170601155753.GK1627@brightrain.aerifal.cx> References: Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1496332688 24738 195.159.176.226 (1 Jun 2017 15:58:08 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 1 Jun 2017 15:58:08 +0000 (UTC) User-Agent: Mutt/1.5.21 (2010-09-15) To: musl@lists.openwall.com Original-X-From: musl-return-11396-gllmg-musl=m.gmane.org@lists.openwall.com Thu Jun 01 17:58:05 2017 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1dGSU7-00068t-D9 for gllmg-musl@m.gmane.org; Thu, 01 Jun 2017 17:58:03 +0200 Original-Received: (qmail 25659 invoked by uid 550); 1 Jun 2017 15:58:06 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 25632 invoked from network); 1 Jun 2017 15:58:05 -0000 Content-Disposition: inline In-Reply-To: Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:11383 Archived-At: On Thu, Jun 01, 2017 at 10:32:37AM -0500, Alex Crichton wrote: > Hello! I personally work on the rust-lang/rust compiler [1] and one of the > platforms we run CI for is x86_64 Linux with musl as a libc. We've got a > longstanding issue [2] of spurious segfaults in musl binaries on our CI, and > one of our contributors managed to get a stack trace and I think we've tracked > down the bug! > > I believe there's a use-after-free in the `__unlock` function when used with > threading in musl (still present on master as far as I can tell). The source of > the unlock function looks like: > > void __unlock(volatile int *l) > { > if (l[0]) { > a_store(l, 0); > if (l[1]) __wake(l, 1, 1); > } > } > > The problem I believe I'm seeing is that after `a_store` finishes, the memory > behind the lock, `l`, is deallocated. This means that the later access of > `l[1]` causes a use after free, and I believe the spurious segfaults we're > seeing on our CI. The reproduction we've got is the sequence of events: > > * Thread A starts thread B > * Thread A calls `pthread_detach` on the return value of `pthread_create` for > thread B. > * The implementation of `pthread_detach` does its business and eventually calls > `__unlock(t->exitlock)`. > * Meanwhile, thread B exits. > * Thread B sees that `t->exitlock` is unlocked and deallocates the `pthread_t` > memory. > * Thread a comes back to access `l[1]` (what I think is `t->exitlock[1]` and > segfaults as this memory has been freed. Thanks for finding and reporting this. Indeed, __lock and __unlock are not made safe for use on dynamic-lifetime objects, and I was wrongly thinking they were only used for static-lifetime ones (various libc-internal locks). For infrequently-used locks, which these seem to be, I see no reason to optimize for contention by having a separate waiter count; simply having a single atomic int whose states are "unlocked", "locked with no waiters", and "locked maybe with waiters" is a simple solution and slightly shrinks the relevant structures anyway. It's possible that this is the right solution for all places where __lock and __unlock are used, but we should probably do a review and see which ones might reasonably be subject to high contention (where spurious FUTEX_WAKE is very costly). Rich