From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/12607 Path: news.gmane.org!.POSTED!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: Musl incompatibility with Docker and AWS's C5 class Date: Thu, 15 Mar 2018 10:52:47 -0400 Message-ID: <20180315145247.GE1436@brightrain.aerifal.cx> References: Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1521125460 18927 195.159.176.226 (15 Mar 2018 14:51:00 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 15 Mar 2018 14:51:00 +0000 (UTC) User-Agent: Mutt/1.5.21 (2010-09-15) To: musl@lists.openwall.com Original-X-From: musl-return-12621-gllmg-musl=m.gmane.org@lists.openwall.com Thu Mar 15 15:50:56 2018 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1ewUDX-0004qy-Ud for gllmg-musl@m.gmane.org; Thu, 15 Mar 2018 15:50:56 +0100 Original-Received: (qmail 19959 invoked by uid 550); 15 Mar 2018 14:53:00 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 19936 invoked from network); 15 Mar 2018 14:52:59 -0000 Content-Disposition: inline In-Reply-To: Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:12607 Archived-At: On Thu, Mar 15, 2018 at 09:37:28AM -0400, Ryan Wilson-Perkin wrote: > Hey musl-devs, > > Yesterday we tested out the new C5 instance class that AWS offers using our > Alpine-based images and discovered that we would get a segfault whenever we > ran `npm install`. Tracing the code, it appeared to be happening due to the > use of node's "process.setuid" and "process.setgid" commands, either of > which would cause a segfault. > > We're running Alpine containers inside Docker on EC2, and the smallest > thing I can provide to reproduce this issue would be to run the following > on a C5 EC2 instance: > > docker run -it node:9-alpine sh -c "node -e 'process.setgid(0)'" > > A core dump provided the following limited information: > > > Program terminated with signal SIGSEGV, Segmentation fault. > warning: Unexpected size of section `.reg-xstate/26' in core file. > #0 __cp_end () at src/thread/x86_64/syscall_cp.s:29 > 29 src/thread/x86_64/syscall_cp.s: No such file or directory. > [Current thread is 1 (LWP 26)] > (gdb) bt > #0 __cp_end () at src/thread/x86_64/syscall_cp.s:29 > #1 0x00007fd6161eecd8 in __syscall_cp_c (nr=202, u=, > v=, w=, x=, y=, > z=0) at src/thread/pthread_cancel.c:35 > #2 0x00007fd6161ee2f5 in __timedwait_cp (addr=addr@entry=0x5612e9ebf820, > val=val@entry=-1, clk=clk@entry=0, at=at@entry=0x0, > priv=) at src/thread/__timedwait.c:31 > #3 0x00007fd6161f0e2c in sem_timedwait (sem=0x5612e9ebf820, at=0x0) at > src/thread/sem_timedwait.c:23 > #4 0x00007fd615d7a5a4 in uv_sem_wait () from /usr/lib/libuv.so.1 > #5 0x00005612e94dc00c in node::DebugSignalThreadMain(void*) () > #6 0x00007fd6161ef665 in start (p=0x7fd616424ab0) at > src/thread/pthread_create.c:145 > #7 0x00007fd6161f13e4 in __clone () at src/thread/x86_64/clone.s:21 > Backtrace stopped: frame did not save the PC Changing uids/gids in a multithreaded process involves synchronizing all the threads with a signal. Based on the information, my guess is that the stack for at least one thread is barely large enough, and when the signal arrives, creation of the signal frame (in the kernel) overflows the stack and the kernel generates SIGSEGV for the process. One approach to test if this is the case and mitigate it: LD_PRELOAD a library that calls pthread_setattr_default_np from a constructor to set a larger default thread stack size. If that turns out to be the problem, the Alpine node package should probably be patched to increase the stack size. We may also be increasing the default in musl somewhat (from 80k to 128k or so) in the near future; if so it would likely be enough to solve your problem here. Rich