From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/11381 Path: news.gmane.org!.POSTED!not-for-mail From: Alex Crichton Newsgroups: gmane.linux.lib.musl.general Subject: Use-after-free in __unlock Date: Thu, 1 Jun 2017 10:32:37 -0500 Message-ID: Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" X-Trace: blaine.gmane.org 1496331191 22048 195.159.176.226 (1 Jun 2017 15:33:11 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 1 Jun 2017 15:33:11 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-11394-gllmg-musl=m.gmane.org@lists.openwall.com Thu Jun 01 17:33:07 2017 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1dGS5y-0005Zf-SO for gllmg-musl@m.gmane.org; Thu, 01 Jun 2017 17:33:06 +0200 Original-Received: (qmail 32748 invoked by uid 550); 1 Jun 2017 15:33:09 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 32691 invoked from network); 1 Jun 2017 15:33:06 -0000 X-Virus-Scanned: amavisd-new at mozilla.org X-Gm-Message-State: AODbwcDntzgJ5rC86Zx2c2gFr1uerTxIwRITMKg7WzGJ4JeN0nx84yHl KA8gt91poCNsyWNBTVCw+XFZ1cv9VA== X-Received: by 10.200.44.98 with SMTP id e31mr2804651qta.123.1496331172570; Thu, 01 Jun 2017 08:32:52 -0700 (PDT) X-Gmail-Original-Message-ID: Xref: news.gmane.org gmane.linux.lib.musl.general:11381 Archived-At: Hello! I personally work on the rust-lang/rust compiler [1] and one of the platforms we run CI for is x86_64 Linux with musl as a libc. We've got a longstanding issue [2] of spurious segfaults in musl binaries on our CI, and one of our contributors managed to get a stack trace and I think we've tracked down the bug! I believe there's a use-after-free in the `__unlock` function when used with threading in musl (still present on master as far as I can tell). The source of the unlock function looks like: void __unlock(volatile int *l) { if (l[0]) { a_store(l, 0); if (l[1]) __wake(l, 1, 1); } } The problem I believe I'm seeing is that after `a_store` finishes, the memory behind the lock, `l`, is deallocated. This means that the later access of `l[1]` causes a use after free, and I believe the spurious segfaults we're seeing on our CI. The reproduction we've got is the sequence of events: * Thread A starts thread B * Thread A calls `pthread_detach` on the return value of `pthread_create` for thread B. * The implementation of `pthread_detach` does its business and eventually calls `__unlock(t->exitlock)`. * Meanwhile, thread B exits. * Thread B sees that `t->exitlock` is unlocked and deallocates the `pthread_t` memory. * Thread a comes back to access `l[1]` (what I think is `t->exitlock[1]` and segfaults as this memory has been freed. I was trying to get a good reliable reproduction but it ended up being unfortunately difficult! I was unable to reproduce easily with an unmodified musl library, but with a small tweak I got it pretty deterministic. If you add a call to `sched_yield()` after the `a_store` in the `__unlock` function (basically just manually introduce some thread yielding) then the following program will segfault pretty quickly: #include #include void *child(void *arg) { sched_yield(); return arg; } int main() { while (1) { pthread_t mychild; assert(pthread_create(&mychild, NULL, child, NULL) == 0); assert(pthread_detach(mychild) == 0); } } I compiled this all locally by running: $ git clone git://git.musl-libc.org/musl $ cd musl $ git rev-parse HEAD 179766aa2ef06df854bc1d9616bf6f00ce49b7f9 $ CFLAGS='-g' ./configure --prefix=$HOME/musl-tmp # edit `src/thread/__lock.c` to have a new call to `sched_yield` $ make -j10 $ make install $ $HOME/musl-tmp/bin/musl-gcc foo.c -static $ ./a.out zsh: segmentation fault ./a.out $ gdb ./a.out ./core* Core was generated by `./a.out'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x000000000040207e in __unlock (l=0x7f2d297f8fc0) at src/thread/__lock.c:15 15 if (l[1]) __wake(l, 1, 1); (gdb) disas Dump of assembler code for function __unlock: 0x000000000040205c <+0>: mov (%rdi),%eax 0x000000000040205e <+2>: test %eax,%eax 0x0000000000402060 <+4>: je 0x4020ac <__unlock+80> 0x0000000000402062 <+6>: sub $0x18,%rsp 0x0000000000402066 <+10>: xor %eax,%eax 0x0000000000402068 <+12>: mov %eax,(%rdi) 0x000000000040206a <+14>: lock orl $0x0,(%rsp) 0x000000000040206f <+19>: mov %rdi,0x8(%rsp) 0x0000000000402074 <+24>: callq 0x4004f5 0x0000000000402079 <+29>: mov 0x8(%rsp),%rdi => 0x000000000040207e <+34>: mov 0x4(%rdi),%eax 0x0000000000402081 <+37>: test %eax,%eax 0x0000000000402083 <+39>: je 0x4020a8 <__unlock+76> 0x0000000000402085 <+41>: mov $0xca,%r8d 0x000000000040208b <+47>: mov $0x1,%edx 0x0000000000402090 <+52>: mov $0x81,%esi 0x0000000000402095 <+57>: mov %r8,%rax 0x0000000000402098 <+60>: syscall 0x000000000040209a <+62>: cmp $0xffffffffffffffda,%rax 0x000000000040209e <+66>: jne 0x4020a8 <__unlock+76> 0x00000000004020a0 <+68>: mov %r8,%rax 0x00000000004020a3 <+71>: mov %rdx,%rsi 0x00000000004020a6 <+74>: syscall 0x00000000004020a8 <+76>: add $0x18,%rsp 0x00000000004020ac <+80>: retq End of assembler dump. The segfaulting instruction is the load of `l[1]` which I believe was deallocated by thread B when it was exiting. I didn't really see a great fix myself, unfortunately, but assistance with this would be much appreciated! If any more information is needed as well I'm more than willing to keep investigtaing locally and provide information, just let me know! [1]: https://github.com/rust-lang/rust [2]: https://github.com/rust-lang/rust/issues/38618