From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/11385 Path: news.gmane.org!.POSTED!not-for-mail From: Joakim Sindholt Newsgroups: gmane.linux.lib.musl.general Subject: Re: Use-after-free in __unlock Date: Fri, 2 Jun 2017 07:48:36 +0200 Message-ID: <20170602054835.GB1214367@wirbelwind> References: <20170601155753.GK1627@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1496382531 29646 195.159.176.226 (2 Jun 2017 05:48:51 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 2 Jun 2017 05:48:51 +0000 (UTC) User-Agent: Mutt/1.5.24 (2015-08-30) To: musl@lists.openwall.com Original-X-From: musl-return-11398-gllmg-musl=m.gmane.org@lists.openwall.com Fri Jun 02 07:48:47 2017 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1dGfS2-0007SR-Ri for gllmg-musl@m.gmane.org; Fri, 02 Jun 2017 07:48:46 +0200 Original-Received: (qmail 13508 invoked by uid 550); 2 Jun 2017 05:48:49 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 13484 invoked from network); 2 Jun 2017 05:48:48 -0000 Content-Disposition: inline In-Reply-To: <20170601155753.GK1627@brightrain.aerifal.cx> Xref: news.gmane.org gmane.linux.lib.musl.general:11385 Archived-At: On Thu, Jun 01, 2017 at 11:57:53AM -0400, Rich Felker wrote: > On Thu, Jun 01, 2017 at 10:32:37AM -0500, Alex Crichton wrote: > > Hello! I personally work on the rust-lang/rust compiler [1] and one of the > > platforms we run CI for is x86_64 Linux with musl as a libc. We've got a > > longstanding issue [2] of spurious segfaults in musl binaries on our CI, and > > one of our contributors managed to get a stack trace and I think we've tracked > > down the bug! > > > > I believe there's a use-after-free in the `__unlock` function when used with > > threading in musl (still present on master as far as I can tell). The source of > > the unlock function looks like: > > > > void __unlock(volatile int *l) > > { > > if (l[0]) { > > a_store(l, 0); > > if (l[1]) __wake(l, 1, 1); > > } > > } > > > > The problem I believe I'm seeing is that after `a_store` finishes, the memory > > behind the lock, `l`, is deallocated. This means that the later access of > > `l[1]` causes a use after free, and I believe the spurious segfaults we're > > seeing on our CI. The reproduction we've got is the sequence of events: > > > > * Thread A starts thread B > > * Thread A calls `pthread_detach` on the return value of `pthread_create` for > > thread B. > > * The implementation of `pthread_detach` does its business and eventually calls > > `__unlock(t->exitlock)`. > > * Meanwhile, thread B exits. > > * Thread B sees that `t->exitlock` is unlocked and deallocates the `pthread_t` > > memory. > > * Thread a comes back to access `l[1]` (what I think is `t->exitlock[1]` and > > segfaults as this memory has been freed. > > Thanks for finding and reporting this. Indeed, __lock and __unlock are > not made safe for use on dynamic-lifetime objects, and I was wrongly > thinking they were only used for static-lifetime ones (various > libc-internal locks). > > For infrequently-used locks, which these seem to be, I see no reason > to optimize for contention by having a separate waiter count; simply > having a single atomic int whose states are "unlocked", "locked with > no waiters", and "locked maybe with waiters" is a simple solution and > slightly shrinks the relevant structures anyway. It's possible that > this is the right solution for all places where __lock and __unlock > are used, but we should probably do a review and see which ones might > reasonably be subject to high contention (where spurious FUTEX_WAKE is > very costly). Wouldn't this be the time to consider Jens' lock?[1] It retains the waiter count and locked state in a single int and should perform really well. It would at least ensure that this sort of thing doesn't happen again. [1] https://hal.inria.fr/hal-01236734