mailing list of musl libc
 help / color / mirror / code / Atom feed
* Use-after-free in __unlock
@ 2017-06-01 15:32 Alex Crichton
  2017-06-01 15:42 ` Alexander Monakov
  2017-06-01 15:57 ` Rich Felker
  0 siblings, 2 replies; 6+ messages in thread
From: Alex Crichton @ 2017-06-01 15:32 UTC (permalink / raw)
  To: musl

Hello! I personally work on the rust-lang/rust compiler [1] and one of the
platforms we run CI for is x86_64 Linux with musl as a libc. We've got a
longstanding issue [2] of spurious segfaults in musl binaries on our CI, and
one of our contributors managed to get a stack trace and I think we've tracked
down the bug!

I believe there's a use-after-free in the `__unlock` function when used with
threading in musl (still present on master as far as I can tell). The source of
the unlock function looks like:

    void __unlock(volatile int *l)
    {
   if (l[0]) {
   a_store(l, 0);
   if (l[1]) __wake(l, 1, 1);
   }
    }

The problem I believe I'm seeing is that after `a_store` finishes, the memory
behind the lock, `l`, is deallocated. This means that the later access of
`l[1]` causes a use after free, and I believe the spurious segfaults we're
seeing on our CI. The reproduction we've got is the sequence of events:

* Thread A starts thread B
* Thread A calls `pthread_detach` on the return value of `pthread_create` for
  thread B.
* The implementation of `pthread_detach` does its business and eventually calls
  `__unlock(t->exitlock)`.
* Meanwhile, thread B exits.
* Thread B sees that `t->exitlock` is unlocked and deallocates the `pthread_t`
  memory.
* Thread a comes back to access `l[1]` (what I think is `t->exitlock[1]` and
  segfaults as this memory has been freed.

I was trying to get a good reliable reproduction but it ended up being
unfortunately difficult! I was unable to reproduce easily with an unmodified
musl library, but with a small tweak I got it pretty deterministic. If you add
a call to `sched_yield()` after the `a_store` in the `__unlock` function
(basically just manually introduce some thread yielding) then the following
program will segfault pretty quickly:

    #include <assert.h>
    #include <pthread.h>

    void *child(void *arg) {
      sched_yield();
      return arg;
    }

    int main() {
      while (1) {
        pthread_t mychild;
        assert(pthread_create(&mychild, NULL, child, NULL) == 0);
        assert(pthread_detach(mychild) == 0);
      }
    }

I compiled this all locally by running:

    $ git clone git://git.musl-libc.org/musl
    $ cd musl
    $ git rev-parse HEAD
    179766aa2ef06df854bc1d9616bf6f00ce49b7f9
    $ CFLAGS='-g' ./configure --prefix=$HOME/musl-tmp
    # edit `src/thread/__lock.c` to have a new call to `sched_yield`
    $ make -j10
    $ make install
    $ $HOME/musl-tmp/bin/musl-gcc foo.c -static
    $ ./a.out
    zsh: segmentation fault  ./a.out
    $ gdb ./a.out ./core*
    Core was generated by `./a.out'.
    Program terminated with signal SIGSEGV, Segmentation fault.
    #0  0x000000000040207e in __unlock (l=0x7f2d297f8fc0) at
src/thread/__lock.c:15
    15                      if (l[1]) __wake(l, 1, 1);
    (gdb) disas
    Dump of assembler code for function __unlock:
       0x000000000040205c <+0>:     mov    (%rdi),%eax
       0x000000000040205e <+2>:     test   %eax,%eax
       0x0000000000402060 <+4>:     je     0x4020ac <__unlock+80>
       0x0000000000402062 <+6>:     sub    $0x18,%rsp
       0x0000000000402066 <+10>:    xor    %eax,%eax
       0x0000000000402068 <+12>:    mov    %eax,(%rdi)
       0x000000000040206a <+14>:    lock orl $0x0,(%rsp)
       0x000000000040206f <+19>:    mov    %rdi,0x8(%rsp)
       0x0000000000402074 <+24>:    callq  0x4004f5 <sched_yield>
       0x0000000000402079 <+29>:    mov    0x8(%rsp),%rdi
    => 0x000000000040207e <+34>:    mov    0x4(%rdi),%eax
       0x0000000000402081 <+37>:    test   %eax,%eax
       0x0000000000402083 <+39>:    je     0x4020a8 <__unlock+76>
       0x0000000000402085 <+41>:    mov    $0xca,%r8d
       0x000000000040208b <+47>:    mov    $0x1,%edx
       0x0000000000402090 <+52>:    mov    $0x81,%esi
       0x0000000000402095 <+57>:    mov    %r8,%rax
       0x0000000000402098 <+60>:    syscall
       0x000000000040209a <+62>:    cmp    $0xffffffffffffffda,%rax
       0x000000000040209e <+66>:    jne    0x4020a8 <__unlock+76>
       0x00000000004020a0 <+68>:    mov    %r8,%rax
       0x00000000004020a3 <+71>:    mov    %rdx,%rsi
       0x00000000004020a6 <+74>:    syscall
       0x00000000004020a8 <+76>:    add    $0x18,%rsp
       0x00000000004020ac <+80>:    retq
    End of assembler dump.

The segfaulting instruction is the load of `l[1]` which I believe was
deallocated by thread B when it was exiting.

I didn't really see a great fix myself, unfortunately, but assistance with this
would be much appreciated! If any more information is needed as well I'm more
than willing to keep investigtaing locally and provide information, just let me
know!

[1]: https://github.com/rust-lang/rust
[2]: https://github.com/rust-lang/rust/issues/38618


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Use-after-free in __unlock
  2017-06-01 15:32 Use-after-free in __unlock Alex Crichton
@ 2017-06-01 15:42 ` Alexander Monakov
  2017-06-01 15:57 ` Rich Felker
  1 sibling, 0 replies; 6+ messages in thread
From: Alexander Monakov @ 2017-06-01 15:42 UTC (permalink / raw)
  To: musl

This is probably due to logic error (recently mentioned either on this list or
#musl irc) in pthread_create.c, handling of self->startlock between
__pthread_create and (static) start() is inconsistent.

Alexander


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Use-after-free in __unlock
  2017-06-01 15:32 Use-after-free in __unlock Alex Crichton
  2017-06-01 15:42 ` Alexander Monakov
@ 2017-06-01 15:57 ` Rich Felker
  2017-06-01 16:16   ` Rich Felker
  2017-06-02  5:48   ` Joakim Sindholt
  1 sibling, 2 replies; 6+ messages in thread
From: Rich Felker @ 2017-06-01 15:57 UTC (permalink / raw)
  To: musl

On Thu, Jun 01, 2017 at 10:32:37AM -0500, Alex Crichton wrote:
> Hello! I personally work on the rust-lang/rust compiler [1] and one of the
> platforms we run CI for is x86_64 Linux with musl as a libc. We've got a
> longstanding issue [2] of spurious segfaults in musl binaries on our CI, and
> one of our contributors managed to get a stack trace and I think we've tracked
> down the bug!
> 
> I believe there's a use-after-free in the `__unlock` function when used with
> threading in musl (still present on master as far as I can tell). The source of
> the unlock function looks like:
> 
>     void __unlock(volatile int *l)
>     {
>    if (l[0]) {
>    a_store(l, 0);
>    if (l[1]) __wake(l, 1, 1);
>    }
>     }
> 
> The problem I believe I'm seeing is that after `a_store` finishes, the memory
> behind the lock, `l`, is deallocated. This means that the later access of
> `l[1]` causes a use after free, and I believe the spurious segfaults we're
> seeing on our CI. The reproduction we've got is the sequence of events:
> 
> * Thread A starts thread B
> * Thread A calls `pthread_detach` on the return value of `pthread_create` for
>   thread B.
> * The implementation of `pthread_detach` does its business and eventually calls
>   `__unlock(t->exitlock)`.
> * Meanwhile, thread B exits.
> * Thread B sees that `t->exitlock` is unlocked and deallocates the `pthread_t`
>   memory.
> * Thread a comes back to access `l[1]` (what I think is `t->exitlock[1]` and
>   segfaults as this memory has been freed.

Thanks for finding and reporting this. Indeed, __lock and __unlock are
not made safe for use on dynamic-lifetime objects, and I was wrongly
thinking they were only used for static-lifetime ones (various
libc-internal locks).

For infrequently-used locks, which these seem to be, I see no reason
to optimize for contention by having a separate waiter count; simply
having a single atomic int whose states are "unlocked", "locked with
no waiters", and "locked maybe with waiters" is a simple solution and
slightly shrinks the relevant structures anyway. It's possible that
this is the right solution for all places where __lock and __unlock
are used, but we should probably do a review and see which ones might
reasonably be subject to high contention (where spurious FUTEX_WAKE is
very costly).

Rich


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Use-after-free in __unlock
  2017-06-01 15:57 ` Rich Felker
@ 2017-06-01 16:16   ` Rich Felker
  2017-06-02  5:48   ` Joakim Sindholt
  1 sibling, 0 replies; 6+ messages in thread
From: Rich Felker @ 2017-06-01 16:16 UTC (permalink / raw)
  To: musl

On Thu, Jun 01, 2017 at 11:57:53AM -0400, Rich Felker wrote:
> Thanks for finding and reporting this. Indeed, __lock and __unlock are
> not made safe for use on dynamic-lifetime objects, and I was wrongly
> thinking they were only used for static-lifetime ones (various
> libc-internal locks).
> 
> For infrequently-used locks, which these seem to be, I see no reason
> to optimize for contention by having a separate waiter count; simply
> having a single atomic int whose states are "unlocked", "locked with
> no waiters", and "locked maybe with waiters" is a simple solution and
> slightly shrinks the relevant structures anyway. It's possible that
> this is the right solution for all places where __lock and __unlock
> are used, but we should probably do a review and see which ones might
> reasonably be subject to high contention (where spurious FUTEX_WAKE is
> very costly).

Quick analysis (based on search for ./-> inside __unlock/UNLOCK args):

Places where __lock/LOCK is used incorrectly:

- pthread struct
- DIR objects (maybe)

And places where it's used correctly:

- tz
- random
- syslog
- static-linked no-free version of malloc
- gettext
- locale
- pthread_atfork
- sem_open
- synccall
- atexit & variants
- open file list

Of the latter, these seem like things you could reasonably hammer
where false contention would be costly:

- tz
- random
- gettext

There are others where you could hammer, but they already involve much
heavier syscalls than futex so it's unlikely to matter what lock
contention tracking method is used.

I actually don't think the current "incorrect" use in DIR objects
matters, since there's no way to observe that thread B can legally
close a dir until _after_ thread A's call that took the lock returns
and thread A synchronizes with thread B. That is, mere use of
__lock/__unlock in dynamic-storage objects is not actually wrong; the
problem is specific to pthread exit lock and the related logic
concerning object lifetime. So a local fix may actually be more
appropriate.

Rich


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Use-after-free in __unlock
  2017-06-01 15:57 ` Rich Felker
  2017-06-01 16:16   ` Rich Felker
@ 2017-06-02  5:48   ` Joakim Sindholt
  2017-06-02 10:11     ` Jens Gustedt
  1 sibling, 1 reply; 6+ messages in thread
From: Joakim Sindholt @ 2017-06-02  5:48 UTC (permalink / raw)
  To: musl

On Thu, Jun 01, 2017 at 11:57:53AM -0400, Rich Felker wrote:
> On Thu, Jun 01, 2017 at 10:32:37AM -0500, Alex Crichton wrote:
> > Hello! I personally work on the rust-lang/rust compiler [1] and one of the
> > platforms we run CI for is x86_64 Linux with musl as a libc. We've got a
> > longstanding issue [2] of spurious segfaults in musl binaries on our CI, and
> > one of our contributors managed to get a stack trace and I think we've tracked
> > down the bug!
> > 
> > I believe there's a use-after-free in the `__unlock` function when used with
> > threading in musl (still present on master as far as I can tell). The source of
> > the unlock function looks like:
> > 
> >     void __unlock(volatile int *l)
> >     {
> >    if (l[0]) {
> >    a_store(l, 0);
> >    if (l[1]) __wake(l, 1, 1);
> >    }
> >     }
> > 
> > The problem I believe I'm seeing is that after `a_store` finishes, the memory
> > behind the lock, `l`, is deallocated. This means that the later access of
> > `l[1]` causes a use after free, and I believe the spurious segfaults we're
> > seeing on our CI. The reproduction we've got is the sequence of events:
> > 
> > * Thread A starts thread B
> > * Thread A calls `pthread_detach` on the return value of `pthread_create` for
> >   thread B.
> > * The implementation of `pthread_detach` does its business and eventually calls
> >   `__unlock(t->exitlock)`.
> > * Meanwhile, thread B exits.
> > * Thread B sees that `t->exitlock` is unlocked and deallocates the `pthread_t`
> >   memory.
> > * Thread a comes back to access `l[1]` (what I think is `t->exitlock[1]` and
> >   segfaults as this memory has been freed.
> 
> Thanks for finding and reporting this. Indeed, __lock and __unlock are
> not made safe for use on dynamic-lifetime objects, and I was wrongly
> thinking they were only used for static-lifetime ones (various
> libc-internal locks).
> 
> For infrequently-used locks, which these seem to be, I see no reason
> to optimize for contention by having a separate waiter count; simply
> having a single atomic int whose states are "unlocked", "locked with
> no waiters", and "locked maybe with waiters" is a simple solution and
> slightly shrinks the relevant structures anyway. It's possible that
> this is the right solution for all places where __lock and __unlock
> are used, but we should probably do a review and see which ones might
> reasonably be subject to high contention (where spurious FUTEX_WAKE is
> very costly).

Wouldn't this be the time to consider Jens' lock?[1]

It retains the waiter count and locked state in a single int and should
perform really well.
It would at least ensure that this sort of thing doesn't happen again.

[1] https://hal.inria.fr/hal-01236734


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Use-after-free in __unlock
  2017-06-02  5:48   ` Joakim Sindholt
@ 2017-06-02 10:11     ` Jens Gustedt
  0 siblings, 0 replies; 6+ messages in thread
From: Jens Gustedt @ 2017-06-02 10:11 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1291 bytes --]

Hello Joakim,

On Fri, 2 Jun 2017 07:48:36 +0200 Joakim Sindholt
<opensource@zhasha.com> wrote:

> Wouldn't this be the time to consider Jens' lock?[1]

Thanks for the suggestion. I think this algorithm would in fact be
suited as a replacement for the internal lock. For the problem that
originated this thread, this algorithm is safer, because it never
dereferences the pointer to the lock after the lock is released. It
only passes the pointer to a futex_wake syscall. So eventually there
could be a spurious wake up for some completely unrelated lock that
happens to be allocated on the same address, but no dereferencing of a
deallocated variable.

The current implementation is much intertwined with the implementation
of stdatomic. While I'd still would like to maintain my long time goal
to integrate the whole package into musl, it would perhaps be
indicated to work on a more direct implementation of just the lock
algorithm in a first phase.

Thanks
Jens

-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-06-02 10:11 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-01 15:32 Use-after-free in __unlock Alex Crichton
2017-06-01 15:42 ` Alexander Monakov
2017-06-01 15:57 ` Rich Felker
2017-06-01 16:16   ` Rich Felker
2017-06-02  5:48   ` Joakim Sindholt
2017-06-02 10:11     ` Jens Gustedt

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).