* seccomp causes pthread_join() to hang @ 2019-06-25 23:18 Radostin Stoyanov 2019-06-25 23:26 ` Rich Felker 0 siblings, 1 reply; 6+ messages in thread From: Radostin Stoyanov @ 2019-06-25 23:18 UTC (permalink / raw) To: musl; +Cc: dalias Hello, In the test suite of CRIU [1] we have noticed an interesting bug which is caused by commit 8f11e6127fe93093f81a52b15bb1537edc3fc8af ("track all live threads in an AS-safe, fully-consistent linked list") [2]. When seccomp is used in a multithreaded application it may cause pthread_join() to hang. This is a minimal application to reproduce the issue: #include <errno.h> #include <seccomp.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <pthread.h> #include <unistd.h> static void *fn() { scmp_filter_ctx ctx = seccomp_init(SCMP_ACT_KILL); if (!ctx) { perror("seccomp_init"); goto err; } if (seccomp_load(ctx) < 0) { perror("seccomp_load"); goto err; } /* This should cause SIG_KILL */ getpid(); err: return (void *)1; } int main() { pthread_t t1; if (pthread_create(&t1, NULL, fn, NULL)) { perror("pthread_create"); return -1; } if (pthread_join(t1, NULL)) { perror("pthread_join"); return -1; } return 0; } Expected behaviour: Thread t1 should receive SIG_KILL and the main thread should return 0. Actual behaviour: pthread_join() hangs. Reproducibility: Always Regression: Yes This bug can be reproduced with Alpine 3.10 ($ docker run -it alpine:3.10 sh). [1] https://criu.org/ [2] https://git.musl-libc.org/cgit/musl/commit/?id=8f11e6127fe93093f81a52b15bb1537edc3fc8af Kind regards, Radostin ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: seccomp causes pthread_join() to hang 2019-06-25 23:18 seccomp causes pthread_join() to hang Radostin Stoyanov @ 2019-06-25 23:26 ` Rich Felker 2019-06-26 7:30 ` Radostin Stoyanov 0 siblings, 1 reply; 6+ messages in thread From: Rich Felker @ 2019-06-25 23:26 UTC (permalink / raw) To: musl On Wed, Jun 26, 2019 at 12:18:05AM +0100, Radostin Stoyanov wrote: > Hello, > > In the test suite of CRIU [1] we have noticed an interesting bug > which is caused by commit 8f11e6127fe93093f81a52b15bb1537edc3fc8af > ("track all live threads in an AS-safe, fully-consistent linked > list") [2]. > > When seccomp is used in a multithreaded application it may cause > pthread_join() to hang. > > This is a minimal application to reproduce the issue: > > > #include <errno.h> > #include <seccomp.h> > #include <stdio.h> > #include <stdlib.h> > #include <string.h> > #include <pthread.h> > #include <unistd.h> > > static void *fn() > { > scmp_filter_ctx ctx = seccomp_init(SCMP_ACT_KILL); > if (!ctx) { > perror("seccomp_init"); > goto err; > } > > if (seccomp_load(ctx) < 0) { > perror("seccomp_load"); > goto err; > } > > /* This should cause SIG_KILL */ > getpid(); > err: > return (void *)1; > } > > int main() > { > pthread_t t1; > > if (pthread_create(&t1, NULL, fn, NULL)) { > perror("pthread_create"); > return -1; > } > > if (pthread_join(t1, NULL)) { > perror("pthread_join"); > return -1; > } > > return 0; > } > > > Expected behaviour: Thread t1 should receive SIG_KILL and the main > thread should return 0. > Actual behaviour: pthread_join() hangs. > Reproducibility: Always > Regression: Yes > > > This bug can be reproduced with Alpine 3.10 ($ docker run -it > alpine:3.10 sh). A fundamental property of the pthread API, and part of why threads are a much better primitive than processes for lots of purposes, is that threads are not killable; only whole processes are. Any configuration that results in a thread being terminated out from under the process has all sorts of extremely dangerous conditions with memory/locks being left in inconsistent state, tid reuse while the application thinks the old thread is still alive, etc., and fundamentally can't be supported. What you're seeing is exposure of a serious existing problem with this seccomp usage, not a regression. Rich ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Re: seccomp causes pthread_join() to hang 2019-06-25 23:26 ` Rich Felker @ 2019-06-26 7:30 ` Radostin Stoyanov 2019-06-26 11:25 ` Szabolcs Nagy 2019-06-26 15:33 ` Rich Felker 0 siblings, 2 replies; 6+ messages in thread From: Radostin Stoyanov @ 2019-06-26 7:30 UTC (permalink / raw) To: musl, Rich Felker; +Cc: Andrei Vagin (C), gorcunov On 26/06/2019 00:26, Rich Felker wrote: > On Wed, Jun 26, 2019 at 12:18:05AM +0100, Radostin Stoyanov wrote: >> Hello, >> >> In the test suite of CRIU [1] we have noticed an interesting bug >> which is caused by commit 8f11e6127fe93093f81a52b15bb1537edc3fc8af >> ("track all live threads in an AS-safe, fully-consistent linked >> list") [2]. >> >> When seccomp is used in a multithreaded application it may cause >> pthread_join() to hang. >> >> This is a minimal application to reproduce the issue: >> >> >> #include <errno.h> >> #include <seccomp.h> >> #include <stdio.h> >> #include <stdlib.h> >> #include <string.h> >> #include <pthread.h> >> #include <unistd.h> >> >> static void *fn() >> { >> scmp_filter_ctx ctx = seccomp_init(SCMP_ACT_KILL); >> if (!ctx) { >> perror("seccomp_init"); >> goto err; >> } >> >> if (seccomp_load(ctx) < 0) { >> perror("seccomp_load"); >> goto err; >> } >> >> /* This should cause SIG_KILL */ >> getpid(); >> err: >> return (void *)1; >> } >> >> int main() >> { >> pthread_t t1; >> >> if (pthread_create(&t1, NULL, fn, NULL)) { >> perror("pthread_create"); >> return -1; >> } >> >> if (pthread_join(t1, NULL)) { >> perror("pthread_join"); >> return -1; >> } >> >> return 0; >> } >> >> >> Expected behaviour: Thread t1 should receive SIG_KILL and the main >> thread should return 0. >> Actual behaviour: pthread_join() hangs. >> Reproducibility: Always >> Regression: Yes >> >> >> This bug can be reproduced with Alpine 3.10 ($ docker run -it >> alpine:3.10 sh). > A fundamental property of the pthread API, and part of why threads are > a much better primitive than processes for lots of purposes, is that > threads are not killable; only whole processes are. From the man page of seccomp(2): SECCOMP_RET_KILL_PROCESS: This value results in immediate termination of the process, with a core dump. ... SECCOMP_RET_KILL_THREAD (or SECCOMP_RET_KILL): This value results in immediate termination of the thread that made the system call. The system call is not executed. Other threads in the same thread group will continue to execute. ... > Any configuration > that results in a thread being terminated out from under the process > has all sorts of extremely dangerous conditions with memory/locks > being left in inconsistent state, tid reuse while the application > thinks the old thread is still alive, etc., and fundamentally can't be > supported. What you're seeing is exposure of a serious existing > problem with this seccomp usage, not a regression. I wrote "Regression: Yes" because this bug was recently introduced and it does not occur in previous versions. IMHO causing pthread_join() to hang when a thread has been terminated is not expected behaviour, at least because the man page for pthread_join(3) states: The pthread_join() function waits for the thread specified by thread to terminate. If that thread has already terminated, then pthread_join() returns immediately. and indeed prior commit 8f11e612 pthread_join() returns immediately. Radostin ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Re: seccomp causes pthread_join() to hang 2019-06-26 7:30 ` Radostin Stoyanov @ 2019-06-26 11:25 ` Szabolcs Nagy 2019-06-26 13:43 ` Radostin Stoyanov 2019-06-26 15:33 ` Rich Felker 1 sibling, 1 reply; 6+ messages in thread From: Szabolcs Nagy @ 2019-06-26 11:25 UTC (permalink / raw) To: musl; +Cc: Rich Felker, Andrei Vagin (C), gorcunov * Radostin Stoyanov <rstoyanov1@gmail.com> [2019-06-26 08:30:34 +0100]: > On 26/06/2019 00:26, Rich Felker wrote: > > Any configuration > > that results in a thread being terminated out from under the process > > has all sorts of extremely dangerous conditions with memory/locks > > being left in inconsistent state, tid reuse while the application > > thinks the old thread is still alive, etc., and fundamentally can't be > > supported. What you're seeing is exposure of a serious existing > > problem with this seccomp usage, not a regression. > I wrote "Regression: Yes" because this bug was recently introduced and it > does not occur in previous versions. > > IMHO causing pthread_join() to hang when a thread has been terminated is not > expected behaviour, at least because the man page for pthread_join(3) > states: the point is that if *any* libc api is used in the killed thread or a libc api is used to create that thread fundamentally breaks assumptions the c runtime may rely on and thus *any* libc call after the kill is undefined. so it's not just pthread_join that's broken but *everything*. this affects glibc too and old musl too, even if you may only observe the particlar pthread_join problem with a current musl. if the killed thread was in a signal handler that interrupted arbitrary libc operation then it obviously breaks everything, but even without that the libc will hold onto thread specific internal state and whenever that is used it can cause problems (in case of musl it is used in pthread_join, glibc uses it e.g. for set*id operations) ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Re: seccomp causes pthread_join() to hang 2019-06-26 11:25 ` Szabolcs Nagy @ 2019-06-26 13:43 ` Radostin Stoyanov 0 siblings, 0 replies; 6+ messages in thread From: Radostin Stoyanov @ 2019-06-26 13:43 UTC (permalink / raw) To: musl, Rich Felker, nsz; +Cc: Andrei Vagin (C), gorcunov On 26/06/2019 12:25, Szabolcs Nagy wrote: > * Radostin Stoyanov <rstoyanov1@gmail.com> [2019-06-26 08:30:34 +0100]: >> On 26/06/2019 00:26, Rich Felker wrote: >>> Any configuration >>> that results in a thread being terminated out from under the process >>> has all sorts of extremely dangerous conditions with memory/locks >>> being left in inconsistent state, tid reuse while the application >>> thinks the old thread is still alive, etc., and fundamentally can't be >>> supported. What you're seeing is exposure of a serious existing >>> problem with this seccomp usage, not a regression. >> I wrote "Regression: Yes" because this bug was recently introduced and it >> does not occur in previous versions. >> >> IMHO causing pthread_join() to hang when a thread has been terminated is not >> expected behaviour, at least because the man page for pthread_join(3) >> states: > the point is that if *any* libc api is used in the killed thread > or a libc api is used to create that thread fundamentally breaks > assumptions the c runtime may rely on and thus *any* libc call > after the kill is undefined. > > so it's not just pthread_join that's broken but *everything*. > > this affects glibc too and old musl too, even if you may only > observe the particlar pthread_join problem with a current musl. > > if the killed thread was in a signal handler that interrupted > arbitrary libc operation then it obviously breaks everything, > but even without that the libc will hold onto thread specific > internal state and whenever that is used it can cause problems > (in case of musl it is used in pthread_join, glibc uses it e.g. > for set*id operations) Thank you for the explanation Rich and Szabolcs! The test case we have for CRIU is essentially loading a seccomp filter, performing a checkpoint/restore and then verifies that the seccomp filter was restored. Assuming that behaviour of pthread_join is undefined when the thread has been terminated by seccomp, we can refactor the test case to work around the issue. Radostin ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Re: seccomp causes pthread_join() to hang 2019-06-26 7:30 ` Radostin Stoyanov 2019-06-26 11:25 ` Szabolcs Nagy @ 2019-06-26 15:33 ` Rich Felker 1 sibling, 0 replies; 6+ messages in thread From: Rich Felker @ 2019-06-26 15:33 UTC (permalink / raw) To: musl On Wed, Jun 26, 2019 at 08:30:34AM +0100, Radostin Stoyanov wrote: > On 26/06/2019 00:26, Rich Felker wrote: > >On Wed, Jun 26, 2019 at 12:18:05AM +0100, Radostin Stoyanov wrote: > >>Hello, > >> > >>In the test suite of CRIU [1] we have noticed an interesting bug > >>which is caused by commit 8f11e6127fe93093f81a52b15bb1537edc3fc8af > >>("track all live threads in an AS-safe, fully-consistent linked > >>list") [2]. > >> > >>When seccomp is used in a multithreaded application it may cause > >>pthread_join() to hang. > >> > >>This is a minimal application to reproduce the issue: > >> > >> > >>#include <errno.h> > >>#include <seccomp.h> > >>#include <stdio.h> > >>#include <stdlib.h> > >>#include <string.h> > >>#include <pthread.h> > >>#include <unistd.h> > >> > >>static void *fn() > >>{ > >> scmp_filter_ctx ctx = seccomp_init(SCMP_ACT_KILL); > >> if (!ctx) { > >> perror("seccomp_init"); > >> goto err; > >> } > >> > >> if (seccomp_load(ctx) < 0) { > >> perror("seccomp_load"); > >> goto err; > >> } > >> > >> /* This should cause SIG_KILL */ > >> getpid(); > >>err: > >> return (void *)1; > >>} > >> > >>int main() > >>{ > >> pthread_t t1; > >> > >> if (pthread_create(&t1, NULL, fn, NULL)) { > >> perror("pthread_create"); > >> return -1; > >> } > >> > >> if (pthread_join(t1, NULL)) { > >> perror("pthread_join"); > >> return -1; > >> } > >> > >> return 0; > >>} > >> > >> > >>Expected behaviour: Thread t1 should receive SIG_KILL and the main > >>thread should return 0. > >>Actual behaviour: pthread_join() hangs. > >>Reproducibility: Always > >>Regression: Yes > >> > >> > >>This bug can be reproduced with Alpine 3.10 ($ docker run -it > >>alpine:3.10 sh). > >A fundamental property of the pthread API, and part of why threads are > >a much better primitive than processes for lots of purposes, is that > >threads are not killable; only whole processes are. > From the man page of seccomp(2): > > SECCOMP_RET_KILL_PROCESS: This value results in immediate > termination of the process, with a core dump. ... > > SECCOMP_RET_KILL_THREAD (or SECCOMP_RET_KILL): This value > results in immediate termination of the thread that made the system > call. The system call is not executed. Other threads in the same > thread group will continue to execute. ... OK, that's really good to know, that they're separate so you can use KILL_PROCESS safely. > > Any configuration > >that results in a thread being terminated out from under the process > >has all sorts of extremely dangerous conditions with memory/locks > >being left in inconsistent state, tid reuse while the application > >thinks the old thread is still alive, etc., and fundamentally can't be > >supported. What you're seeing is exposure of a serious existing > >problem with this seccomp usage, not a regression. > I wrote "Regression: Yes" because this bug was recently introduced > and it does not occur in previous versions. > > IMHO causing pthread_join() to hang when a thread has been > terminated is not expected behaviour, at least because the man page > for pthread_join(3) states: > > The pthread_join() function waits for the thread specified by > thread to terminate. If that thread has already terminated, then > pthread_join() returns immediately. > > and indeed prior commit 8f11e612 pthread_join() returns immediately. ...with the process in an unrecoverably broken state, just in ways you don't notice. For example, any owner-tracked mutexes or FILEs it owned when it died will be linked into a linked list whose head is in its pthread structure, which was deallocted when you called pthread_join. There are also various places where a lock is held on an individual thread or the thread list to ensure that it doesn't exit (and its tid isn't reused) until the lock is released. Killing it out from under the program breaks this invariant and can cause signals to be sent to wrong threads/processes or other malfunctions. This simply is not, and fundamentally cannot be, supported usage. Rich ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2019-06-26 15:33 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-06-25 23:18 seccomp causes pthread_join() to hang Radostin Stoyanov 2019-06-25 23:26 ` Rich Felker 2019-06-26 7:30 ` Radostin Stoyanov 2019-06-26 11:25 ` Szabolcs Nagy 2019-06-26 13:43 ` Radostin Stoyanov 2019-06-26 15:33 ` Rich Felker
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).