From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/10160 Path: news.gmane.org!not-for-mail From: "nathan@nathan7.eu" Newsgroups: gmane.linux.lib.musl.general Subject: Re: abort() fails to terminate PID 1 process Date: Sun, 19 Jun 2016 01:20:02 +0000 Message-ID: References: Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=001a11418440adc6f20535976373 X-Trace: ger.gmane.org 1466299229 26735 80.91.229.3 (19 Jun 2016 01:20:29 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 19 Jun 2016 01:20:29 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-10173-gllmg-musl=m.gmane.org@lists.openwall.com Sun Jun 19 03:20:28 2016 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1bERPX-0003Bl-SW for gllmg-musl@m.gmane.org; Sun, 19 Jun 2016 03:20:27 +0200 Original-Received: (qmail 3524 invoked by uid 550); 19 Jun 2016 01:20:24 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 3505 invoked from network); 19 Jun 2016 01:20:23 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nathan7.eu; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=l0GTueVpEpmSE6pfhlgFdajMiPQDWvCtjs4QsjbAsVg=; b=MJjPyyncqaFRI4kFwrRonQNDmoDJjiUKzBbA6ZpNaABvglGsconl7AI0kXev9Q1jW2 f5kHR0K/Y/Hyp5lJp6hx89rnNVe98N9dE+DHocUYe3FXt0npzp2bgFxTbta++o2rmUjC ECicfcKixVE8kcWnrRlrcg8H/QTenoy6qV9eI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=l0GTueVpEpmSE6pfhlgFdajMiPQDWvCtjs4QsjbAsVg=; b=gAdjFxnhPxzj8geoKiYKddIno7DMcHnw5dsz7Fdr/yrQhJXUCFd4pJWFIlt4OAhQ7o jhu5tpYq4kFBfmIPyUNItzLeJp1V/DXTkuOz16p2MS1df/uiVw86kiUtEasgqIBniFGP TVKGncXjGOv4E6Ek/KZr3WzqVoe8m30XHgw02/XM+Zyb1W9BWFS/rb3rtJPCH498AAG5 gty2RGOjblvKhIbUGWB1LpDgbHRXOChfpMS4uCQ2e8kerZFI8DDA0umwZvGehB6ZBjge SZ5ktKrjRHmj5vRwtPPx7ijUnlQXX6xEiwhwEZzfc0BA5bqhOM0eSUUWLdlRpUzwdXpM bVTQ== X-Gm-Message-State: ALyK8tLnua0q94CEy98o5g79UGpjhrLykJ9mq3BQKaFv40MKvsshYbLytCstzj1R2CHJDJvFStZ3AuAgZO/eFw== X-Received: by 10.28.126.2 with SMTP id z2mr4924499wmc.73.1466299212214; Sat, 18 Jun 2016 18:20:12 -0700 (PDT) In-Reply-To: Xref: news.gmane.org gmane.linux.lib.musl.general:10160 Archived-At: --001a11418440adc6f20535976373 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable It appears that raise(3) only returns after signal handling, and hence the infinite loop should only be reached if we're hitting this failure case. Perhaps we should replace it with asm("ud2") and the equivalent on non-x86 arches, causing a SIGILL, which will definitely abort the process. On Sat, Jun 18, 2016 at 4:32 PM Karl B=C3=B6hlmark wrote: > Hi! > > After running alpine-linux based docker containers for a while we noticed > some problematic behaviour when one of our services had a memory leak > causing the process to abort. > > Instead of getting abnormal process termination we were seeing the proces= s > hanging at 100% cpu. > > A minimal reproduction of this issue is to run > > #include > int main () > { > abort(); > } > > with "unshare --fork --pid" so that it runs as PID 1 in it's own PID > namespace. > > Would it be reasonable to add a fallback strategy in abort() for > terminating processes when the signals don't have any effect? > > Karl > --001a11418440adc6f20535976373 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
It appears that raise(3) only returns after signal handlin= g, and hence the infinite loop should only be reached if we're hitting = this failure case.
Perhaps we should replace it with asm("ud2"= ;) and the equivalent on non-x86 arches, causing a SIGILL, which will defin= itely abort the process.

On Sat, Jun 18, 2016 at 4:32 PM Karl B=C3=B6hlmark <karl.bohlmark@gmail.com> wrote:
Hi!

A= fter running alpine-linux based docker containers for a while we noticed so= me problematic behaviour when one of our services had a memory leak causing= the process to abort.

Instead of getting abnormal proce= ss termination we were seeing the process hanging at 100% cpu.
A minimal reproduction of this issue is to run

<= /div>
#include <stdlib.h>
int main ()
{<= /div>
abort();
}=

with "unshare --fork --pid" so th= at it runs as PID 1 in it's own PID namespace.

Would it be reasonable to add a fallback strategy in abort() for terminati= ng processes when the signals don't have any effect?

Karl
--001a11418440adc6f20535976373--