From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/10173 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: abort() fails to terminate PID 1 process Date: Mon, 20 Jun 2016 15:41:10 -0400 Message-ID: <20160620194110.GM10893@brightrain.aerifal.cx> References: <20160620100443.GV22574@port70.net> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1466451693 4115 80.91.229.3 (20 Jun 2016 19:41:33 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 20 Jun 2016 19:41:33 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-10186-gllmg-musl=m.gmane.org@lists.openwall.com Mon Jun 20 21:41:27 2016 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1bF54Y-0003cD-6T for gllmg-musl@m.gmane.org; Mon, 20 Jun 2016 21:41:26 +0200 Original-Received: (qmail 27653 invoked by uid 550); 20 Jun 2016 19:41:24 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 26602 invoked from network); 20 Jun 2016 19:41:23 -0000 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:10173 Archived-At: On Mon, Jun 20, 2016 at 02:00:42PM +0200, Igmar Palsenberg wrote: > > > > First, processes kan install handlers, which might > > > instruct the kernel to ignore the signal. SIGABORT can be ignored. I don't > > > > abort() should terminate the process even if SIGABRT is ignored. > > That rule doesn't apply to pid 1 by default. Pid 1 should be a proper init > system, not a full blows application that makes the system blow up on > every error. abort is specified to terminate the process no matter what. For it to ever be able to return is a serious bug since both the compiler and the programmer can assume any code after abort() is unreachable. At present musl avoids this worst-case failure (wrongfully returning) with an infinite loop, but that's just a fail-safe. The intent is that it terminate, and in particular, terminate abnormally as specified, which we don't do enough to guarantee (SIGKILL is not "abnormal" termination). So there's definitely work to be done to fix this. It's an issue I've been aware of for a long time but the kernel makes it painful to reliably produce abnormal termination without race conditions. > > > expect my process to be SIGILL'ed next because of this (which, can also be > > > ignored). > > > Libc should NOT mess with these kind of things, that's up to the > > > application. > > > > the glibc fallbacks are > > > > change signal mask and set default handling for SIGABRT > > raise(SIGABRT); > > "abort instruction" (segfault, sigtrap or sigill depending on target) > > _exit(127); > > infinite loop > > Pid 1 is an exception to all of this. > > > http://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/abort.c;h=155d70b0647e848f1d40fc0e3b15a2914d7145c0;hb=HEAD > > > > on x86 glibc, pid 1 would terminate with SIGSEGV > > (unless there is a segfault handler). > > > > the musl logic is explained in > > > > http://git.musl-libc.org/cgit/musl/commit/?id=2557d0ba47286ed3e868f8ddc9dbed0942fe99dc > > > > neither of them is correct because it is not possible to > > exit with the right status in general. > > > > SIGKILL can only be ignored by pid 1 whose exit status is > > not supposed to be observable so musl may want to have a > > fallback after it since the pid namespace thing is nowadays > > widely abused on linux. > > Well, normally abort() does some signal magic, and then raises again. > Which is what POSIX mandates I think. To make this work reliably I think we need to make abort() take a lock the precludes further calls to sigaction prior to re-raising SIGABRT and resetting the disposition. But there are all sorts of complications to deal with. For example if another thread performs posix_spawn for fork and exec concurrent with abort() munging the disposition of SIGABRT, the child process could start with the wrong disposition for SIGABRT, which would be non-conforming. Finding ways to fix all places where the wrong behavior may be observable is a nontrivial problem. > If you're pid 1 however, you should behave like one. I tend to agree, but if you're libc you should also behave as specified, and currently we don't in this regard. Rich