mailing list of musl libc
 help / color / mirror / code / Atom feed
* seccomp causes pthread_join() to hang
@ 2019-06-25 23:18 Radostin Stoyanov
  2019-06-25 23:26 ` Rich Felker
  0 siblings, 1 reply; 6+ messages in thread
From: Radostin Stoyanov @ 2019-06-25 23:18 UTC (permalink / raw)
  To: musl; +Cc: dalias

Hello,

In the test suite of CRIU [1] we have noticed an interesting bug which 
is caused by commit 8f11e6127fe93093f81a52b15bb1537edc3fc8af ("track all 
live threads in an AS-safe, fully-consistent linked list") [2].

When seccomp is used in a multithreaded application it may cause 
pthread_join() to hang.

This is a minimal application to reproduce the issue:


#include <errno.h>
#include <seccomp.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <pthread.h>
#include <unistd.h>

static void *fn()
{
     scmp_filter_ctx ctx = seccomp_init(SCMP_ACT_KILL);
     if (!ctx) {
         perror("seccomp_init");
         goto err;
     }

     if (seccomp_load(ctx) < 0) {
         perror("seccomp_load");
         goto err;
     }

     /* This should cause SIG_KILL */
     getpid();
err:
     return (void *)1;
}

int main()
{
     pthread_t t1;

     if (pthread_create(&t1, NULL, fn, NULL)) {
         perror("pthread_create");
         return -1;
     }

     if (pthread_join(t1, NULL)) {
         perror("pthread_join");
         return -1;
     }

     return 0;
}


Expected behaviour: Thread t1 should receive SIG_KILL and the main 
thread should return 0.
Actual behaviour: pthread_join() hangs.
Reproducibility: Always
Regression: Yes


This bug can be reproduced with Alpine 3.10 ($ docker run -it 
alpine:3.10 sh).


[1] https://criu.org/
[2] 
https://git.musl-libc.org/cgit/musl/commit/?id=8f11e6127fe93093f81a52b15bb1537edc3fc8af


Kind regards,
Radostin


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: seccomp causes pthread_join() to hang
  2019-06-25 23:18 seccomp causes pthread_join() to hang Radostin Stoyanov
@ 2019-06-25 23:26 ` Rich Felker
  2019-06-26  7:30   ` Radostin Stoyanov
  0 siblings, 1 reply; 6+ messages in thread
From: Rich Felker @ 2019-06-25 23:26 UTC (permalink / raw)
  To: musl

On Wed, Jun 26, 2019 at 12:18:05AM +0100, Radostin Stoyanov wrote:
> Hello,
> 
> In the test suite of CRIU [1] we have noticed an interesting bug
> which is caused by commit 8f11e6127fe93093f81a52b15bb1537edc3fc8af
> ("track all live threads in an AS-safe, fully-consistent linked
> list") [2].
> 
> When seccomp is used in a multithreaded application it may cause
> pthread_join() to hang.
> 
> This is a minimal application to reproduce the issue:
> 
> 
> #include <errno.h>
> #include <seccomp.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <pthread.h>
> #include <unistd.h>
> 
> static void *fn()
> {
>     scmp_filter_ctx ctx = seccomp_init(SCMP_ACT_KILL);
>     if (!ctx) {
>         perror("seccomp_init");
>         goto err;
>     }
> 
>     if (seccomp_load(ctx) < 0) {
>         perror("seccomp_load");
>         goto err;
>     }
> 
>     /* This should cause SIG_KILL */
>     getpid();
> err:
>     return (void *)1;
> }
> 
> int main()
> {
>     pthread_t t1;
> 
>     if (pthread_create(&t1, NULL, fn, NULL)) {
>         perror("pthread_create");
>         return -1;
>     }
> 
>     if (pthread_join(t1, NULL)) {
>         perror("pthread_join");
>         return -1;
>     }
> 
>     return 0;
> }
> 
> 
> Expected behaviour: Thread t1 should receive SIG_KILL and the main
> thread should return 0.
> Actual behaviour: pthread_join() hangs.
> Reproducibility: Always
> Regression: Yes
> 
> 
> This bug can be reproduced with Alpine 3.10 ($ docker run -it
> alpine:3.10 sh).

A fundamental property of the pthread API, and part of why threads are
a much better primitive than processes for lots of purposes, is that
threads are not killable; only whole processes are. Any configuration
that results in a thread being terminated out from under the process
has all sorts of extremely dangerous conditions with memory/locks
being left in inconsistent state, tid reuse while the application
thinks the old thread is still alive, etc., and fundamentally can't be
supported. What you're seeing is exposure of a serious existing
problem with this seccomp usage, not a regression.

Rich


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Re: seccomp causes pthread_join() to hang
  2019-06-25 23:26 ` Rich Felker
@ 2019-06-26  7:30   ` Radostin Stoyanov
  2019-06-26 11:25     ` Szabolcs Nagy
  2019-06-26 15:33     ` Rich Felker
  0 siblings, 2 replies; 6+ messages in thread
From: Radostin Stoyanov @ 2019-06-26  7:30 UTC (permalink / raw)
  To: musl, Rich Felker; +Cc: Andrei Vagin (C), gorcunov

On 26/06/2019 00:26, Rich Felker wrote:
> On Wed, Jun 26, 2019 at 12:18:05AM +0100, Radostin Stoyanov wrote:
>> Hello,
>>
>> In the test suite of CRIU [1] we have noticed an interesting bug
>> which is caused by commit 8f11e6127fe93093f81a52b15bb1537edc3fc8af
>> ("track all live threads in an AS-safe, fully-consistent linked
>> list") [2].
>>
>> When seccomp is used in a multithreaded application it may cause
>> pthread_join() to hang.
>>
>> This is a minimal application to reproduce the issue:
>>
>>
>> #include <errno.h>
>> #include <seccomp.h>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <string.h>
>> #include <pthread.h>
>> #include <unistd.h>
>>
>> static void *fn()
>> {
>>      scmp_filter_ctx ctx = seccomp_init(SCMP_ACT_KILL);
>>      if (!ctx) {
>>          perror("seccomp_init");
>>          goto err;
>>      }
>>
>>      if (seccomp_load(ctx) < 0) {
>>          perror("seccomp_load");
>>          goto err;
>>      }
>>
>>      /* This should cause SIG_KILL */
>>      getpid();
>> err:
>>      return (void *)1;
>> }
>>
>> int main()
>> {
>>      pthread_t t1;
>>
>>      if (pthread_create(&t1, NULL, fn, NULL)) {
>>          perror("pthread_create");
>>          return -1;
>>      }
>>
>>      if (pthread_join(t1, NULL)) {
>>          perror("pthread_join");
>>          return -1;
>>      }
>>
>>      return 0;
>> }
>>
>>
>> Expected behaviour: Thread t1 should receive SIG_KILL and the main
>> thread should return 0.
>> Actual behaviour: pthread_join() hangs.
>> Reproducibility: Always
>> Regression: Yes
>>
>>
>> This bug can be reproduced with Alpine 3.10 ($ docker run -it
>> alpine:3.10 sh).
> A fundamental property of the pthread API, and part of why threads are
> a much better primitive than processes for lots of purposes, is that
> threads are not killable; only whole processes are.
 From the man page of seccomp(2):

     SECCOMP_RET_KILL_PROCESS: This value results in immediate 
termination of the process, with a core dump. ...

     SECCOMP_RET_KILL_THREAD (or SECCOMP_RET_KILL): This  value results 
in immediate termination of the thread that made the system call. The 
system call is not executed. Other threads in the same thread group will 
continue to execute. ...

>   Any configuration
> that results in a thread being terminated out from under the process
> has all sorts of extremely dangerous conditions with memory/locks
> being left in inconsistent state, tid reuse while the application
> thinks the old thread is still alive, etc., and fundamentally can't be
> supported. What you're seeing is exposure of a serious existing
> problem with this seccomp usage, not a regression.
I wrote "Regression: Yes" because this bug was recently introduced and 
it does not occur in previous versions.

IMHO causing pthread_join() to hang when a thread has been terminated is 
not expected behaviour, at least because the man page for 
pthread_join(3) states:

     The pthread_join() function waits for the thread specified by 
thread to terminate. If that thread has already terminated, then 
pthread_join() returns immediately.

and indeed prior commit 8f11e612 pthread_join() returns immediately.

Radostin


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Re: seccomp causes pthread_join() to hang
  2019-06-26  7:30   ` Radostin Stoyanov
@ 2019-06-26 11:25     ` Szabolcs Nagy
  2019-06-26 13:43       ` Radostin Stoyanov
  2019-06-26 15:33     ` Rich Felker
  1 sibling, 1 reply; 6+ messages in thread
From: Szabolcs Nagy @ 2019-06-26 11:25 UTC (permalink / raw)
  To: musl; +Cc: Rich Felker, Andrei Vagin (C), gorcunov

* Radostin Stoyanov <rstoyanov1@gmail.com> [2019-06-26 08:30:34 +0100]:
> On 26/06/2019 00:26, Rich Felker wrote:
> >   Any configuration
> > that results in a thread being terminated out from under the process
> > has all sorts of extremely dangerous conditions with memory/locks
> > being left in inconsistent state, tid reuse while the application
> > thinks the old thread is still alive, etc., and fundamentally can't be
> > supported. What you're seeing is exposure of a serious existing
> > problem with this seccomp usage, not a regression.
> I wrote "Regression: Yes" because this bug was recently introduced and it
> does not occur in previous versions.
> 
> IMHO causing pthread_join() to hang when a thread has been terminated is not
> expected behaviour, at least because the man page for pthread_join(3)
> states:

the point is that if *any* libc api is used in the killed thread
or a libc api is used to create that thread fundamentally breaks
assumptions the c runtime may rely on and thus *any* libc call
after the kill is undefined.

so it's not just pthread_join that's broken but *everything*.

this affects glibc too and old musl too, even if you may only
observe the particlar pthread_join problem with a current musl.

if the killed thread was in a signal handler that interrupted
arbitrary libc operation then it obviously breaks everything,
but even without that the libc will hold onto thread specific
internal state and whenever that is used it can cause problems
(in case of musl it is used in pthread_join, glibc uses it e.g.
for set*id operations)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Re: seccomp causes pthread_join() to hang
  2019-06-26 11:25     ` Szabolcs Nagy
@ 2019-06-26 13:43       ` Radostin Stoyanov
  0 siblings, 0 replies; 6+ messages in thread
From: Radostin Stoyanov @ 2019-06-26 13:43 UTC (permalink / raw)
  To: musl, Rich Felker, nsz; +Cc: Andrei Vagin (C), gorcunov

On 26/06/2019 12:25, Szabolcs Nagy wrote:
> * Radostin Stoyanov <rstoyanov1@gmail.com> [2019-06-26 08:30:34 +0100]:
>> On 26/06/2019 00:26, Rich Felker wrote:
>>>    Any configuration
>>> that results in a thread being terminated out from under the process
>>> has all sorts of extremely dangerous conditions with memory/locks
>>> being left in inconsistent state, tid reuse while the application
>>> thinks the old thread is still alive, etc., and fundamentally can't be
>>> supported. What you're seeing is exposure of a serious existing
>>> problem with this seccomp usage, not a regression.
>> I wrote "Regression: Yes" because this bug was recently introduced and it
>> does not occur in previous versions.
>>
>> IMHO causing pthread_join() to hang when a thread has been terminated is not
>> expected behaviour, at least because the man page for pthread_join(3)
>> states:
> the point is that if *any* libc api is used in the killed thread
> or a libc api is used to create that thread fundamentally breaks
> assumptions the c runtime may rely on and thus *any* libc call
> after the kill is undefined.
>
> so it's not just pthread_join that's broken but *everything*.
>
> this affects glibc too and old musl too, even if you may only
> observe the particlar pthread_join problem with a current musl.
>
> if the killed thread was in a signal handler that interrupted
> arbitrary libc operation then it obviously breaks everything,
> but even without that the libc will hold onto thread specific
> internal state and whenever that is used it can cause problems
> (in case of musl it is used in pthread_join, glibc uses it e.g.
> for set*id operations)
Thank you for the explanation Rich and Szabolcs!

The test case we have for CRIU is essentially loading a seccomp filter, 
performing a checkpoint/restore and then verifies that the seccomp 
filter was restored.

Assuming that behaviour of pthread_join is undefined when the thread has 
been terminated by seccomp, we can refactor the test case to work around 
the issue.

Radostin


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Re: seccomp causes pthread_join() to hang
  2019-06-26  7:30   ` Radostin Stoyanov
  2019-06-26 11:25     ` Szabolcs Nagy
@ 2019-06-26 15:33     ` Rich Felker
  1 sibling, 0 replies; 6+ messages in thread
From: Rich Felker @ 2019-06-26 15:33 UTC (permalink / raw)
  To: musl

On Wed, Jun 26, 2019 at 08:30:34AM +0100, Radostin Stoyanov wrote:
> On 26/06/2019 00:26, Rich Felker wrote:
> >On Wed, Jun 26, 2019 at 12:18:05AM +0100, Radostin Stoyanov wrote:
> >>Hello,
> >>
> >>In the test suite of CRIU [1] we have noticed an interesting bug
> >>which is caused by commit 8f11e6127fe93093f81a52b15bb1537edc3fc8af
> >>("track all live threads in an AS-safe, fully-consistent linked
> >>list") [2].
> >>
> >>When seccomp is used in a multithreaded application it may cause
> >>pthread_join() to hang.
> >>
> >>This is a minimal application to reproduce the issue:
> >>
> >>
> >>#include <errno.h>
> >>#include <seccomp.h>
> >>#include <stdio.h>
> >>#include <stdlib.h>
> >>#include <string.h>
> >>#include <pthread.h>
> >>#include <unistd.h>
> >>
> >>static void *fn()
> >>{
> >>     scmp_filter_ctx ctx = seccomp_init(SCMP_ACT_KILL);
> >>     if (!ctx) {
> >>         perror("seccomp_init");
> >>         goto err;
> >>     }
> >>
> >>     if (seccomp_load(ctx) < 0) {
> >>         perror("seccomp_load");
> >>         goto err;
> >>     }
> >>
> >>     /* This should cause SIG_KILL */
> >>     getpid();
> >>err:
> >>     return (void *)1;
> >>}
> >>
> >>int main()
> >>{
> >>     pthread_t t1;
> >>
> >>     if (pthread_create(&t1, NULL, fn, NULL)) {
> >>         perror("pthread_create");
> >>         return -1;
> >>     }
> >>
> >>     if (pthread_join(t1, NULL)) {
> >>         perror("pthread_join");
> >>         return -1;
> >>     }
> >>
> >>     return 0;
> >>}
> >>
> >>
> >>Expected behaviour: Thread t1 should receive SIG_KILL and the main
> >>thread should return 0.
> >>Actual behaviour: pthread_join() hangs.
> >>Reproducibility: Always
> >>Regression: Yes
> >>
> >>
> >>This bug can be reproduced with Alpine 3.10 ($ docker run -it
> >>alpine:3.10 sh).
> >A fundamental property of the pthread API, and part of why threads are
> >a much better primitive than processes for lots of purposes, is that
> >threads are not killable; only whole processes are.
> From the man page of seccomp(2):
> 
>     SECCOMP_RET_KILL_PROCESS: This value results in immediate
> termination of the process, with a core dump. ...
> 
>     SECCOMP_RET_KILL_THREAD (or SECCOMP_RET_KILL): This  value
> results in immediate termination of the thread that made the system
> call. The system call is not executed. Other threads in the same
> thread group will continue to execute. ...

OK, that's really good to know, that they're separate so you can use
KILL_PROCESS safely.

> >  Any configuration
> >that results in a thread being terminated out from under the process
> >has all sorts of extremely dangerous conditions with memory/locks
> >being left in inconsistent state, tid reuse while the application
> >thinks the old thread is still alive, etc., and fundamentally can't be
> >supported. What you're seeing is exposure of a serious existing
> >problem with this seccomp usage, not a regression.
> I wrote "Regression: Yes" because this bug was recently introduced
> and it does not occur in previous versions.
> 
> IMHO causing pthread_join() to hang when a thread has been
> terminated is not expected behaviour, at least because the man page
> for pthread_join(3) states:
> 
>     The pthread_join() function waits for the thread specified by
> thread to terminate. If that thread has already terminated, then
> pthread_join() returns immediately.
> 
> and indeed prior commit 8f11e612 pthread_join() returns immediately.

...with the process in an unrecoverably broken state, just in ways you
don't notice. For example, any owner-tracked mutexes or FILEs it owned
when it died will be linked into a linked list whose head is in its
pthread structure, which was deallocted when you called pthread_join.

There are also various places where a lock is held on an individual
thread or the thread list to ensure that it doesn't exit (and its tid
isn't reused) until the lock is released. Killing it out from under
the program breaks this invariant and can cause signals to be sent to
wrong threads/processes or other malfunctions.

This simply is not, and fundamentally cannot be, supported usage.

Rich


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-06-26 15:33 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-25 23:18 seccomp causes pthread_join() to hang Radostin Stoyanov
2019-06-25 23:26 ` Rich Felker
2019-06-26  7:30   ` Radostin Stoyanov
2019-06-26 11:25     ` Szabolcs Nagy
2019-06-26 13:43       ` Radostin Stoyanov
2019-06-26 15:33     ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).