From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/10257 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: abort() fails to terminate PID 1 process Date: Sun, 3 Jul 2016 09:58:47 -0400 Message-ID: <20160703135846.GF15995@brightrain.aerifal.cx> References: <20160620100443.GV22574@port70.net> <20160620194110.GM10893@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1467554344 31397 80.91.229.3 (3 Jul 2016 13:59:04 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 3 Jul 2016 13:59:04 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-10270-gllmg-musl=m.gmane.org@lists.openwall.com Sun Jul 03 15:59:04 2016 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1bJhvM-0000Ds-2o for gllmg-musl@m.gmane.org; Sun, 03 Jul 2016 15:59:04 +0200 Original-Received: (qmail 11567 invoked by uid 550); 3 Jul 2016 13:59:00 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 11543 invoked from network); 3 Jul 2016 13:59:00 -0000 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:10257 Archived-At: On Sun, Jul 03, 2016 at 12:43:59PM +0200, Igmar Palsenberg wrote: > > > > That rule doesn't apply to pid 1 by default. Pid 1 should be a proper init > > > system, not a full blows application that makes the system blow up on > > > every error. > > > > abort is specified to terminate the process no matter what. > > Yes. But like mentioned : pid 1 is an exception to this. > > > For it to > > ever be able to return is a serious bug since both the compiler and > > the programmer can assume any code after abort() is unreachable. > > This specific case talked about pid 1. pid 1 has kernel protection, normal > userspace processes don't. In that case, the normal assumptions don't hold > up. Whether you realize it or not, what you're saying is equivalent to saying that it's UB for a process that runs as pid 1 to call abort(). There is no basis for such a claim. A vague "pid 1 is special" rule (which the standard does not support except in a few very specific places where an implementation-defined set of processes are permitted to be treated in specific special ways) does not imply "calling a function whose behavior is well-defined can legitimately lead to runaway code execution if the pid is 1". > > At > > present musl avoids this worst-case failure (wrongfully returning) > > with an infinite loop, but that's just a fail-safe. The intent is that > > it terminate, and in particular, terminate abnormally as specified, > > which we don't do enough to guarantee (SIGKILL is not "abnormal" > > termination). So there's definitely work to be done to fix this. It's > > an issue I've been aware of for a long time but the kernel makes it > > painful to reliably produce abnormal termination without race > > conditions. > > Can this even be reproduced under normal circumstances (aka : not pid 1) ? > If thes, then I agree : It's a bug. If no : Then not. If people have a > broken container init system, then it breaks and they keep the pieces. Yes. > > > Well, normally abort() does some signal magic, and then raises again. > > > Which is what POSIX mandates I think. > > > > To make this work reliably I think we need to make abort() take a lock > > the precludes further calls to sigaction prior to re-raising SIGABRT > > and resetting the disposition. But there are all sorts of > > complications to deal with. For example if another thread performs > > posix_spawn for fork and exec concurrent with abort() munging the > > disposition of SIGABRT, the child process could start with the wrong > > disposition for SIGABRT, which would be non-conforming. Finding ways > > to fix all places where the wrong behavior may be observable is a > > nontrivial problem. > > Does the whole guaranteed termination also includes threaded programs ? Of course. The fact that you're asking such basic questions tells me that you're bikeshedding this based on negative opinions of certain container usage cases and not offering constructive input based on what the specification actually requires. Rich