abort() fails to terminate PID 1 process

mailing list of musl libc
 help / color / mirror / code / Atom feed

* abort() fails to terminate PID 1 process
@ 2016-06-18 20:32 Karl Böhlmark
  2016-06-19  1:20 ` nathan
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Karl Böhlmark @ 2016-06-18 20:32 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 585 bytes --]

Hi!

After running alpine-linux based docker containers for a while we noticed
some problematic behaviour when one of our services had a memory leak
causing the process to abort.

Instead of getting abnormal process termination we were seeing the process
hanging at 100% cpu.

A minimal reproduction of this issue is to run

#include <stdlib.h>
int main ()
{
abort();
}

with "unshare --fork --pid" so that it runs as PID 1 in it's own PID
namespace.

Would it be reasonable to add a fallback strategy in abort() for
terminating processes when the signals don't have any effect?

Karl

[-- Attachment #2: Type: text/html, Size: 897 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: abort() fails to terminate PID 1 process
  2016-06-18 20:32 abort() fails to terminate PID 1 process Karl Böhlmark
@ 2016-06-19  1:20 ` nathan
  2016-06-20  9:02 ` Igmar Palsenberg
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: nathan @ 2016-06-19  1:20 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1033 bytes --]

It appears that raise(3) only returns after signal handling, and hence the
infinite loop should only be reached if we're hitting this failure case.
Perhaps we should replace it with asm("ud2") and the equivalent on non-x86
arches, causing a SIGILL, which will definitely abort the process.

On Sat, Jun 18, 2016 at 4:32 PM Karl Böhlmark <karl.bohlmark@gmail.com>
wrote:

> Hi!
>
> After running alpine-linux based docker containers for a while we noticed
> some problematic behaviour when one of our services had a memory leak
> causing the process to abort.
>
> Instead of getting abnormal process termination we were seeing the process
> hanging at 100% cpu.
>
> A minimal reproduction of this issue is to run
>
> #include <stdlib.h>
> int main ()
> {
> abort();
> }
>
> with "unshare --fork --pid" so that it runs as PID 1 in it's own PID
> namespace.
>
> Would it be reasonable to add a fallback strategy in abort() for
> terminating processes when the signals don't have any effect?
>
> Karl
>

[-- Attachment #2: Type: text/html, Size: 1557 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: abort() fails to terminate PID 1 process
  2016-06-18 20:32 abort() fails to terminate PID 1 process Karl Böhlmark
  2016-06-19  1:20 ` nathan
@ 2016-06-20  9:02 ` Igmar Palsenberg
  2016-06-20 10:04   ` Szabolcs Nagy
  2016-06-20 10:29 ` Natanael Copa
  2016-07-03 22:03 ` Rich Felker
  3 siblings, 1 reply; 18+ messages in thread
From: Igmar Palsenberg @ 2016-06-20  9:02 UTC (permalink / raw)
  To: musl

> After running alpine-linux based docker containers for a while we noticed
> some problematic behaviour when one of our services had a memory leak
> causing the process to abort.
> Instead of getting abnormal process termination we were seeing the process
> hanging at 100% cpu.
> 
> A minimal reproduction of this issue is to run
> 
> #include <stdlib.h>
> int main ()
> {
> abort();
> }
> 
> with "unshare --fork --pid" so that it runs as PID 1 in it's own PID
> namespace.
> 
> Would it be reasonable to add a fallback strategy in abort() for terminating
> processes when the signals don't have any effect?

This is a bad idea.

First, processes kan install handlers, which might 
instruct the kernel to ignore the signal. SIGABORT can be ignored. I don't 
expect my process to be SIGILL'ed next because of this (which, can also be 
ignored).
Libc should NOT mess with these kind of things, that's up to the 
application.

Second the behaviour you're seeing is due to the kernel's special PID 1 
handling : It ignores signals send to pid 1 for which an explicit handler 
has nog been installed.

Remedy : Fix your application. Better : Fix your whole setup, if you need 
these changes, it's broken by design.

Igmar

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: abort() fails to terminate PID 1 process
  2016-06-20  9:02 ` Igmar Palsenberg
@ 2016-06-20 10:04   ` Szabolcs Nagy
  2016-06-20 12:00     ` Igmar Palsenberg
  0 siblings, 1 reply; 18+ messages in thread
From: Szabolcs Nagy @ 2016-06-20 10:04 UTC (permalink / raw)
  To: musl

* Igmar Palsenberg <igmar@palsenberg.com> [2016-06-20 11:02:15 +0200]:
> > #include <stdlib.h>
> > int main ()
> > {
> > abort();
> > }
> > 
> > with "unshare --fork --pid" so that it runs as PID 1 in it's own PID
> > namespace.
> > 
> > Would it be reasonable to add a fallback strategy in abort() for terminating
> > processes when the signals don't have any effect?
> 
> This is a bad idea.
> 
> First, processes kan install handlers, which might 
> instruct the kernel to ignore the signal. SIGABORT can be ignored. I don't 

abort() should terminate the process even if SIGABRT is ignored.

> expect my process to be SIGILL'ed next because of this (which, can also be 
> ignored).
> Libc should NOT mess with these kind of things, that's up to the 
> application.

the glibc fallbacks are

change signal mask and set default handling for SIGABRT
raise(SIGABRT);
"abort instruction" (segfault, sigtrap or sigill depending on target)
_exit(127);
infinite loop

http://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/abort.c;h=155d70b0647e848f1d40fc0e3b15a2914d7145c0;hb=HEAD

on x86 glibc, pid 1 would terminate with SIGSEGV
(unless there is a segfault handler).

the musl logic is explained in

http://git.musl-libc.org/cgit/musl/commit/?id=2557d0ba47286ed3e868f8ddc9dbed0942fe99dc

neither of them is correct because it is not possible to
exit with the right status in general.

SIGKILL can only be ignored by pid 1 whose exit status is
not supposed to be observable so musl may want to have a
fallback after it since the pid namespace thing is nowadays
widely abused on linux.

> 
> Second the behaviour you're seeing is due to the kernel's special PID 1 
> handling : It ignores signals send to pid 1 for which an explicit handler 
> has nog been installed.
> 
> Remedy : Fix your application. Better : Fix your whole setup, if you need 
> these changes, it's broken by design.
> 
> 
> 
> Igmar


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: abort() fails to terminate PID 1 process
  2016-06-18 20:32 abort() fails to terminate PID 1 process Karl Böhlmark
  2016-06-19  1:20 ` nathan
  2016-06-20  9:02 ` Igmar Palsenberg
@ 2016-06-20 10:29 ` Natanael Copa
  2016-07-03 22:03 ` Rich Felker
  3 siblings, 0 replies; 18+ messages in thread
From: Natanael Copa @ 2016-06-20 10:29 UTC (permalink / raw)
  To: Karl Böhlmark; +Cc: musl

On Sat, 18 Jun 2016 22:32:23 +0200
Karl Böhlmark <karl.bohlmark@gmail.com> wrote:

> Hi!
> 
> After running alpine-linux based docker containers for a while we noticed
> some problematic behaviour when one of our services had a memory leak
> causing the process to abort.
> 
> Instead of getting abnormal process termination we were seeing the process
> hanging at 100% cpu.
> 
> A minimal reproduction of this issue is to run
> 
> #include <stdlib.h>
> int main ()
> {
> abort();
> }
> 
> with "unshare --fork --pid" so that it runs as PID 1 in it's own PID
> namespace.
> 
> Would it be reasonable to add a fallback strategy in abort() for
> terminating processes when the signals don't have any effect?

A workaround is to run your service under a minimalistic init like tini
https://github.com/krallin/tini

Then your application will no longer run as pid 1.

-nc


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: abort() fails to terminate PID 1 process
  2016-06-20 10:04   ` Szabolcs Nagy
@ 2016-06-20 12:00     ` Igmar Palsenberg
  2016-06-20 19:41       ` Rich Felker
  0 siblings, 1 reply; 18+ messages in thread
From: Igmar Palsenberg @ 2016-06-20 12:00 UTC (permalink / raw)
  To: musl


> > First, processes kan install handlers, which might 
> > instruct the kernel to ignore the signal. SIGABORT can be ignored. I don't 
> 
> abort() should terminate the process even if SIGABRT is ignored.

That rule doesn't apply to pid 1 by default. Pid 1 should be a proper init 
system, not a full blows application that makes the system blow up on 
every error.
 
> > expect my process to be SIGILL'ed next because of this (which, can also be 
> > ignored).
> > Libc should NOT mess with these kind of things, that's up to the 
> > application.
> 
> the glibc fallbacks are
> 
> change signal mask and set default handling for SIGABRT
> raise(SIGABRT);
> "abort instruction" (segfault, sigtrap or sigill depending on target)
> _exit(127);
> infinite loop

Pid 1 is an exception to all of this. 

> http://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/abort.c;h=155d70b0647e848f1d40fc0e3b15a2914d7145c0;hb=HEAD
> 
> on x86 glibc, pid 1 would terminate with SIGSEGV
> (unless there is a segfault handler).
> 
> the musl logic is explained in
> 
> http://git.musl-libc.org/cgit/musl/commit/?id=2557d0ba47286ed3e868f8ddc9dbed0942fe99dc
> 
> neither of them is correct because it is not possible to
> exit with the right status in general.
> 
> SIGKILL can only be ignored by pid 1 whose exit status is
> not supposed to be observable so musl may want to have a
> fallback after it since the pid namespace thing is nowadays
> widely abused on linux.

Well, normally abort() does some signal magic, and then raises again. 
Which is what POSIX mandates I think.

If you're pid 1 however, you should behave like one.



Igmar


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: abort() fails to terminate PID 1 process
  2016-06-20 12:00     ` Igmar Palsenberg
@ 2016-06-20 19:41       ` Rich Felker
  2016-07-03 10:43         ` Igmar Palsenberg
  0 siblings, 1 reply; 18+ messages in thread
From: Rich Felker @ 2016-06-20 19:41 UTC (permalink / raw)
  To: musl

On Mon, Jun 20, 2016 at 02:00:42PM +0200, Igmar Palsenberg wrote:
> 
> > > First, processes kan install handlers, which might 
> > > instruct the kernel to ignore the signal. SIGABORT can be ignored. I don't 
> > 
> > abort() should terminate the process even if SIGABRT is ignored.
> 
> That rule doesn't apply to pid 1 by default. Pid 1 should be a proper init 
> system, not a full blows application that makes the system blow up on 
> every error.

abort is specified to terminate the process no matter what. For it to
ever be able to return is a serious bug since both the compiler and
the programmer can assume any code after abort() is unreachable. At
present musl avoids this worst-case failure (wrongfully returning)
with an infinite loop, but that's just a fail-safe. The intent is that
it terminate, and in particular, terminate abnormally as specified,
which we don't do enough to guarantee (SIGKILL is not "abnormal"
termination). So there's definitely work to be done to fix this. It's
an issue I've been aware of for a long time but the kernel makes it
painful to reliably produce abnormal termination without race
conditions.

> > > expect my process to be SIGILL'ed next because of this (which, can also be 
> > > ignored).
> > > Libc should NOT mess with these kind of things, that's up to the 
> > > application.
> > 
> > the glibc fallbacks are
> > 
> > change signal mask and set default handling for SIGABRT
> > raise(SIGABRT);
> > "abort instruction" (segfault, sigtrap or sigill depending on target)
> > _exit(127);
> > infinite loop
> 
> Pid 1 is an exception to all of this. 
> 
> > http://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/abort.c;h=155d70b0647e848f1d40fc0e3b15a2914d7145c0;hb=HEAD
> > 
> > on x86 glibc, pid 1 would terminate with SIGSEGV
> > (unless there is a segfault handler).
> > 
> > the musl logic is explained in
> > 
> > http://git.musl-libc.org/cgit/musl/commit/?id=2557d0ba47286ed3e868f8ddc9dbed0942fe99dc
> > 
> > neither of them is correct because it is not possible to
> > exit with the right status in general.
> > 
> > SIGKILL can only be ignored by pid 1 whose exit status is
> > not supposed to be observable so musl may want to have a
> > fallback after it since the pid namespace thing is nowadays
> > widely abused on linux.
> 
> Well, normally abort() does some signal magic, and then raises again. 
> Which is what POSIX mandates I think.

To make this work reliably I think we need to make abort() take a lock
the precludes further calls to sigaction prior to re-raising SIGABRT
and resetting the disposition. But there are all sorts of
complications to deal with. For example if another thread performs
posix_spawn for fork and exec concurrent with abort() munging the
disposition of SIGABRT, the child process could start with the wrong
disposition for SIGABRT, which would be non-conforming. Finding ways
to fix all places where the wrong behavior may be observable is a
nontrivial problem.

> If you're pid 1 however, you should behave like one.

I tend to agree, but if you're libc you should also behave as
specified, and currently we don't in this regard.

Rich

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: abort() fails to terminate PID 1 process
  2016-06-20 19:41       ` Rich Felker
@ 2016-07-03 10:43         ` Igmar Palsenberg
  2016-07-03 13:58           ` Rich Felker
  0 siblings, 1 reply; 18+ messages in thread
From: Igmar Palsenberg @ 2016-07-03 10:43 UTC (permalink / raw)
  To: musl


> > That rule doesn't apply to pid 1 by default. Pid 1 should be a proper init 
> > system, not a full blows application that makes the system blow up on 
> > every error.
> 
> abort is specified to terminate the process no matter what.

Yes. But like mentioned : pid 1 is an exception to this.

> For it to
> ever be able to return is a serious bug since both the compiler and
> the programmer can assume any code after abort() is unreachable.

This specific case talked about pid 1. pid 1 has kernel protection, normal 
userspace processes don't. In that case, the normal assumptions don't hold 
up.

> At
> present musl avoids this worst-case failure (wrongfully returning)
> with an infinite loop, but that's just a fail-safe. The intent is that
> it terminate, and in particular, terminate abnormally as specified,
> which we don't do enough to guarantee (SIGKILL is not "abnormal"
> termination). So there's definitely work to be done to fix this. It's
> an issue I've been aware of for a long time but the kernel makes it
> painful to reliably produce abnormal termination without race
> conditions.

Can this even be reproduced under normal circumstances (aka : not pid 1) ? 
If thes, then I agree : It's a bug. If no : Then not. If people have a 
broken container init system, then it breaks and they keep the pieces.
 

> > Well, normally abort() does some signal magic, and then raises again. 
> > Which is what POSIX mandates I think.
> 
> To make this work reliably I think we need to make abort() take a lock
> the precludes further calls to sigaction prior to re-raising SIGABRT
> and resetting the disposition. But there are all sorts of
> complications to deal with. For example if another thread performs
> posix_spawn for fork and exec concurrent with abort() munging the
> disposition of SIGABRT, the child process could start with the wrong
> disposition for SIGABRT, which would be non-conforming. Finding ways
> to fix all places where the wrong behavior may be observable is a
> nontrivial problem.

Does the whole guaranteed termination also includes threaded programs ?
 
> > If you're pid 1 however, you should behave like one.
> 
> I tend to agree, but if you're libc you should also behave as
> specified, and currently we don't in this regard.

Sure, but like mentioned : Normal rules don't apply to pid 1.



Igmar


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: abort() fails to terminate PID 1 process
  2016-07-03 10:43         ` Igmar Palsenberg
@ 2016-07-03 13:58           ` Rich Felker
  2016-07-03 19:58             ` Laurent Bercot
  2016-07-04 13:37             ` Igmar Palsenberg
  0 siblings, 2 replies; 18+ messages in thread
From: Rich Felker @ 2016-07-03 13:58 UTC (permalink / raw)
  To: musl

On Sun, Jul 03, 2016 at 12:43:59PM +0200, Igmar Palsenberg wrote:
> 
> > > That rule doesn't apply to pid 1 by default. Pid 1 should be a proper init 
> > > system, not a full blows application that makes the system blow up on 
> > > every error.
> > 
> > abort is specified to terminate the process no matter what.
> 
> Yes. But like mentioned : pid 1 is an exception to this.
> 
> > For it to
> > ever be able to return is a serious bug since both the compiler and
> > the programmer can assume any code after abort() is unreachable.
> 
> This specific case talked about pid 1. pid 1 has kernel protection, normal 
> userspace processes don't. In that case, the normal assumptions don't hold 
> up.

Whether you realize it or not, what you're saying is equivalent to
saying that it's UB for a process that runs as pid 1 to call abort().
There is no basis for such a claim.

A vague "pid 1 is special" rule (which the standard does not support
except in a few very specific places where an implementation-defined
set of processes are permitted to be treated in specific special ways)
does not imply "calling a function whose behavior is well-defined can
legitimately lead to runaway code execution if the pid is 1".

> > At
> > present musl avoids this worst-case failure (wrongfully returning)
> > with an infinite loop, but that's just a fail-safe. The intent is that
> > it terminate, and in particular, terminate abnormally as specified,
> > which we don't do enough to guarantee (SIGKILL is not "abnormal"
> > termination). So there's definitely work to be done to fix this. It's
> > an issue I've been aware of for a long time but the kernel makes it
> > painful to reliably produce abnormal termination without race
> > conditions.
> 
> Can this even be reproduced under normal circumstances (aka : not pid 1) ? 
> If thes, then I agree : It's a bug. If no : Then not. If people have a 
> broken container init system, then it breaks and they keep the pieces.

Yes.

> > > Well, normally abort() does some signal magic, and then raises again. 
> > > Which is what POSIX mandates I think.
> > 
> > To make this work reliably I think we need to make abort() take a lock
> > the precludes further calls to sigaction prior to re-raising SIGABRT
> > and resetting the disposition. But there are all sorts of
> > complications to deal with. For example if another thread performs
> > posix_spawn for fork and exec concurrent with abort() munging the
> > disposition of SIGABRT, the child process could start with the wrong
> > disposition for SIGABRT, which would be non-conforming. Finding ways
> > to fix all places where the wrong behavior may be observable is a
> > nontrivial problem.
> 
> Does the whole guaranteed termination also includes threaded programs ?

Of course. The fact that you're asking such basic questions tells me
that you're bikeshedding this based on negative opinions of certain
container usage cases and not offering constructive input based on
what the specification actually requires.

Rich


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: abort() fails to terminate PID 1 process
  2016-07-03 13:58           ` Rich Felker
@ 2016-07-03 19:58             ` Laurent Bercot
  2016-07-03 20:01               ` Rich Felker
  2016-07-04 13:38               ` Igmar Palsenberg
  2016-07-04 13:37             ` Igmar Palsenberg
  1 sibling, 2 replies; 18+ messages in thread
From: Laurent Bercot @ 2016-07-03 19:58 UTC (permalink / raw)
  To: musl

On 03/07/2016 15:58, Rich Felker wrote:
> Whether you realize it or not, what you're saying is equivalent to
> saying that it's UB for a process that runs as pid 1 to call abort().
> There is no basis for such a claim.

  There's no basis in the specification, but in practice, on Linux at least,
a process that runs as pid 1 outside of a container and that exits - whether
normally or via abort() or anything else - will cause a kernel panic. So
treating that case as UB is defensible, at least until musl is ported to an
OS where pid 1 death is less dramatic.

-- 
  Laurent

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: abort() fails to terminate PID 1 process
  2016-07-03 19:58             ` Laurent Bercot
@ 2016-07-03 20:01               ` Rich Felker
  2016-07-03 20:20                 ` Laurent Bercot
  2016-07-04 13:38               ` Igmar Palsenberg
  1 sibling, 1 reply; 18+ messages in thread
From: Rich Felker @ 2016-07-03 20:01 UTC (permalink / raw)
  To: musl

On Sun, Jul 03, 2016 at 09:58:45PM +0200, Laurent Bercot wrote:
> On 03/07/2016 15:58, Rich Felker wrote:
> >Whether you realize it or not, what you're saying is equivalent to
> >saying that it's UB for a process that runs as pid 1 to call abort().
> >There is no basis for such a claim.
> 
>  There's no basis in the specification, but in practice, on Linux at least,
> a process that runs as pid 1 outside of a container and that exits - whether
> normally or via abort() or anything else - will cause a kernel panic. So
> treating that case as UB is defensible, at least until musl is ported to an
> OS where pid 1 death is less dramatic.

No. Halting the system safely (which kernel panic does) is completely
different from runaway wrong-code execution, and the only reason we
don't have runaway wrong-code execution right now is because I built
in the for(;;) safety in case termination failed.

Rich


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: abort() fails to terminate PID 1 process
  2016-07-03 20:01               ` Rich Felker
@ 2016-07-03 20:20                 ` Laurent Bercot
  2016-07-03 20:24                   ` Rich Felker
  0 siblings, 1 reply; 18+ messages in thread
From: Laurent Bercot @ 2016-07-03 20:20 UTC (permalink / raw)
  To: musl

On 03/07/2016 22:01, Rich Felker wrote:
> No. Halting the system safely (which kernel panic does) is completely
> different from runaway wrong-code execution, and the only reason we
> don't have runaway wrong-code execution right now is because I built
> in the for(;;) safety in case termination failed.

  Halting the system, no matter how safely, is also completely different from
cleanly terminating the aborting process (while not impacting other processes
as is supposed to be guaranteed by Unix). At this point, we're wildly outside
the realm of specification anyway, and I find it acceptable to say that pid 1
abort (or any kind of death for that matter) is UB. Your choice of
implementation for abort() is good and safe, but I think it's just QoI,
not something you're bound to do by a standard.

-- 
  Laurent

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: abort() fails to terminate PID 1 process
  2016-07-03 20:20                 ` Laurent Bercot
@ 2016-07-03 20:24                   ` Rich Felker
  0 siblings, 0 replies; 18+ messages in thread
From: Rich Felker @ 2016-07-03 20:24 UTC (permalink / raw)
  To: musl

On Sun, Jul 03, 2016 at 10:20:46PM +0200, Laurent Bercot wrote:
> On 03/07/2016 22:01, Rich Felker wrote:
> >No. Halting the system safely (which kernel panic does) is completely
> >different from runaway wrong-code execution, and the only reason we
> >don't have runaway wrong-code execution right now is because I built
> >in the for(;;) safety in case termination failed.
> 
>  Halting the system, no matter how safely, is also completely different from
> cleanly terminating the aborting process (while not impacting other processes
> as is supposed to be guaranteed by Unix). At this point, we're wildly outside
> the realm of specification anyway, and I find it acceptable to say that pid 1
> abort (or any kind of death for that matter) is UB. Your choice of
> implementation for abort() is good and safe, but I think it's just QoI,
> not something you're bound to do by a standard.

Halting the system when init exits is functionally equivalent to
having a hidden parent process provided by the OS that performs a halt
when its child (pid 1) exits. There's nothing fishy going on there. On
the other hand, having random code start executing would be clearly
wrong well beyond mere QoI.

Rich


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: abort() fails to terminate PID 1 process
  2016-06-18 20:32 abort() fails to terminate PID 1 process Karl Böhlmark
                   ` (2 preceding siblings ...)
  2016-06-20 10:29 ` Natanael Copa
@ 2016-07-03 22:03 ` Rich Felker
  3 siblings, 0 replies; 18+ messages in thread
From: Rich Felker @ 2016-07-03 22:03 UTC (permalink / raw)
  To: musl

On Sat, Jun 18, 2016 at 10:32:23PM +0200, Karl Böhlmark wrote:
> Hi!
> 
> After running alpine-linux based docker containers for a while we noticed
> some problematic behaviour when one of our services had a memory leak
> causing the process to abort.
> 
> Instead of getting abnormal process termination we were seeing the process
> hanging at 100% cpu.
> 
> A minimal reproduction of this issue is to run
> 
> #include <stdlib.h>
> int main ()
> {
> abort();
> }
> 
> with "unshare --fork --pid" so that it runs as PID 1 in it's own PID
> namespace.
> 
> Would it be reasonable to add a fallback strategy in abort() for
> terminating processes when the signals don't have any effect?

I've improved the fallback strategy. It's not perfect yet but should
mitigate your issue:

https://git.musl-libc.org/cgit/musl/commit/?id=0c8bc102f287d3993751d80ba2dffb01e0c8bc7f

Rich


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: abort() fails to terminate PID 1 process
  2016-07-03 13:58           ` Rich Felker
  2016-07-03 19:58             ` Laurent Bercot
@ 2016-07-04 13:37             ` Igmar Palsenberg
  2016-07-05  3:07               ` Rich Felker
  1 sibling, 1 reply; 18+ messages in thread
From: Igmar Palsenberg @ 2016-07-04 13:37 UTC (permalink / raw)
  To: musl



> Whether you realize it or not, what you're saying is equivalent to
> saying that it's UB for a process that runs as pid 1 to call abort().
> There is no basis for such a claim.
> 
> A vague "pid 1 is special" rule (which the standard does not support
> except in a few very specific places where an implementation-defined
> set of processes are permitted to be treated in specific special ways)
> does not imply "calling a function whose behavior is well-defined can
> legitimately lead to runaway code execution if the pid is 1".

But doesn't "bevavior is well-defined" also imply that that function 
behaves as it should ? If it doesn't, doesn't the "well-defined" no longer 
apply ? I call it UB in this case.

The standard also says a process can't ignore a SIGKILL, but on pid 1, it 
has no effect. I pretty much call that UB myself.

> > Can this even be reproduced under normal circumstances (aka : not pid 1) ? 
> > If thes, then I agree : It's a bug. If no : Then not. If people have a 
> > broken container init system, then it breaks and they keep the pieces.
> 
> Yes.

It it can be reproducted when not pid 1, then agree, it's a bug.

> > > > Well, normally abort() does some signal magic, and then raises again. 
> > > > Which is what POSIX mandates I think.
> > > 
> > > To make this work reliably I think we need to make abort() take a lock
> > > the precludes further calls to sigaction prior to re-raising SIGABRT
> > > and resetting the disposition. But there are all sorts of
> > > complications to deal with. For example if another thread performs
> > > posix_spawn for fork and exec concurrent with abort() munging the
> > > disposition of SIGABRT, the child process could start with the wrong
> > > disposition for SIGABRT, which would be non-conforming. Finding ways
> > > to fix all places where the wrong behavior may be observable is a
> > > nontrivial problem.
> > 
> > Does the whole guaranteed termination also includes threaded programs ?
> 
> Of course. The fact that you're asking such basic questions tells me
> that you're bikeshedding this based on negative opinions of certain
> container usage cases and not offering constructive input based on
> what the specification actually requires.

I've seen this different in practice, that why I'm asking. I never 
debugged that one issue, it just "disappared" at a certain point, which I 
could never reproduce afterwards.

I'm not bikeshedding container usage, I'm just seeing broken implementation in the 
wild (which do get rapidly fixed usually).




Igmar


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: abort() fails to terminate PID 1 process
  2016-07-03 19:58             ` Laurent Bercot
  2016-07-03 20:01               ` Rich Felker
@ 2016-07-04 13:38               ` Igmar Palsenberg
  1 sibling, 0 replies; 18+ messages in thread
From: Igmar Palsenberg @ 2016-07-04 13:38 UTC (permalink / raw)
  To: musl



> On 03/07/2016 15:58, Rich Felker wrote:
> > Whether you realize it or not, what you're saying is equivalent to
> > saying that it's UB for a process that runs as pid 1 to call abort().
> > There is no basis for such a claim.
> 
>  There's no basis in the specification, but in practice, on Linux at least,
> a process that runs as pid 1 outside of a container and that exits - whether
> normally or via abort() or anything else - will cause a kernel panic. So
> treating that case as UB is defensible, at least until musl is ported to an
> OS where pid 1 death is less dramatic.

The old HP system we had at the university also paniced if I can remember 
correctly. To he honest, I have no sane idea what it should do otherwise.



Igmar


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: abort() fails to terminate PID 1 process
  2016-07-04 13:37             ` Igmar Palsenberg
@ 2016-07-05  3:07               ` Rich Felker
  2016-07-30 21:24                 ` Igmar Palsenberg
  0 siblings, 1 reply; 18+ messages in thread
From: Rich Felker @ 2016-07-05  3:07 UTC (permalink / raw)
  To: musl

On Mon, Jul 04, 2016 at 03:37:35PM +0200, Igmar Palsenberg wrote:
> 
> 
> > Whether you realize it or not, what you're saying is equivalent to
> > saying that it's UB for a process that runs as pid 1 to call abort().
> > There is no basis for such a claim.
> > 
> > A vague "pid 1 is special" rule (which the standard does not support
> > except in a few very specific places where an implementation-defined
> > set of processes are permitted to be treated in specific special ways)
> > does not imply "calling a function whose behavior is well-defined can
> > legitimately lead to runaway code execution if the pid is 1".
> 
> But doesn't "bevavior is well-defined" also imply that that function 
> behaves as it should ? If it doesn't, doesn't the "well-defined" no longer 
> apply ? I call it UB in this case.

"Behavior is well-defined" means the specification tells what it does
and does not leave it implementation-defined, unspecified, or
undefined -- neither by explicitly saying so, nor by omission.

> The standard also says a process can't ignore a SIGKILL, but on pid 1, it 
> has no effect. I pretty much call that UB myself.

You keep using that word. I do not think it means what you think it
means.

If anything what you're arguing is that the Linux kernel has a bug,
since the behavior of raising SIGKILL is specified and Linux does not
do what the spec says (for pid 1). That does not mean it's undefined
but rather that the implementation is behaving contrary to the defined
behavior.

Rich


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: abort() fails to terminate PID 1 process
  2016-07-05  3:07               ` Rich Felker
@ 2016-07-30 21:24                 ` Igmar Palsenberg
  0 siblings, 0 replies; 18+ messages in thread
From: Igmar Palsenberg @ 2016-07-30 21:24 UTC (permalink / raw)
  To: musl


> > > does not imply "calling a function whose behavior is well-defined can
> > > legitimately lead to runaway code execution if the pid is 1".
> > 
> > But doesn't "bevavior is well-defined" also imply that that function 
> > behaves as it should ? If it doesn't, doesn't the "well-defined" no longer 
> > apply ? I call it UB in this case.
> 
> "Behavior is well-defined" means the specification tells what it does
> and does not leave it implementation-defined, unspecified, or
> undefined -- neither by explicitly saying so, nor by omission.

Yeah, indeed. Sending signals is pretty well defined I assume.
 
> > The standard also says a process can't ignore a SIGKILL, but on pid 1, it 
> > has no effect. I pretty much call that UB myself.
> 
> You keep using that word. I do not think it means what you think it
> means.
> 
> If anything what you're arguing is that the Linux kernel has a bug,
> since the behavior of raising SIGKILL is specified and Linux does not
> do what the spec says (for pid 1). That does not mean it's undefined
> but rather that the implementation is behaving contrary to the defined
> behavior.

I wouldn't call it a bug, since it's documented behaviour. I have no idea 
how to call this to be honest, assuming that it even has a formal name.



Igmar


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2016-07-30 21:24 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-18 20:32 abort() fails to terminate PID 1 process Karl Böhlmark
2016-06-19  1:20 ` nathan
2016-06-20  9:02 ` Igmar Palsenberg
2016-06-20 10:04   ` Szabolcs Nagy
2016-06-20 12:00     ` Igmar Palsenberg
2016-06-20 19:41       ` Rich Felker
2016-07-03 10:43         ` Igmar Palsenberg
2016-07-03 13:58           ` Rich Felker
2016-07-03 19:58             ` Laurent Bercot
2016-07-03 20:01               ` Rich Felker
2016-07-03 20:20                 ` Laurent Bercot
2016-07-03 20:24                   ` Rich Felker
2016-07-04 13:38               ` Igmar Palsenberg
2016-07-04 13:37             ` Igmar Palsenberg
2016-07-05  3:07               ` Rich Felker
2016-07-30 21:24                 ` Igmar Palsenberg
2016-06-20 10:29 ` Natanael Copa
2016-07-03 22:03 ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).