From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/11768 Path: news.gmane.org!.POSTED!not-for-mail From: Bobby Bingham Newsgroups: gmane.linux.lib.musl.general Subject: Re: possible bug in setjmp implementation for ppc64 Date: Tue, 1 Aug 2017 00:10:43 -0500 Message-ID: <20170801051042.GA14914@dora.lan> References: <1501520360.0.593167188853569@go.bunnymail.go> <20170731203007.GB1627@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 X-Trace: blaine.gmane.org 1501564259 21926 195.159.176.226 (1 Aug 2017 05:10:59 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 1 Aug 2017 05:10:59 +0000 (UTC) User-Agent: Mutt/1.8.3 (2017-05-23) To: musl@lists.openwall.com Original-X-From: musl-return-11781-gllmg-musl=m.gmane.org@lists.openwall.com Tue Aug 01 07:10:56 2017 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1dcPSG-0005HG-R5 for gllmg-musl@m.gmane.org; Tue, 01 Aug 2017 07:10:52 +0200 Original-Received: (qmail 26174 invoked by uid 550); 1 Aug 2017 05:10:56 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 26150 invoked from network); 1 Aug 2017 05:10:56 -0000 Content-Disposition: inline In-Reply-To: <20170731203007.GB1627@brightrain.aerifal.cx> Xref: news.gmane.org gmane.linux.lib.musl.general:11768 Archived-At: On Mon, Jul 31, 2017 at 04:30:07PM -0400, Rich Felker wrote: > On Mon, Jul 31, 2017 at 10:06:51PM +0200, felix.winkelmann@bevuta.com wrote: > > Hi! > > > > I think I may have come across a bug in musl on PPC64(le), and the folks > > on the #musl IRC channel directed me here. I'm not totally sure whether > > the problem is caused by a my misunderstanding of C library functions or whether > > it is a plain bug in the musl implementation of setjmp(3). > > > > In out project[1] we use setjmp to establish a global trampoline > > and allocate small objects on the stack using alloca (see [2] for > > more information about the compiliation strategy used). I was able to reduce > > the code that crashes to the following: > > > > --- > > #include > > #include > > #include > > #include > > #include > > > > jmp_buf jb; > > > > int foo = 99; > > int c = 0; > > > > void bar() > > { > > c++; > > longjmp(jb, 1); > > } > > > > int main() > > { > > setjmp(jb); > > char *p = alloca(256); > > memset(p, 0, 256); > > printf("%d\n", foo); > > > > if(c < 10) bar(); > > > > exit(0); > > } > > --- > > > > When executing the longjmp, the code that restores $r2 (TOC) after the call > > to setjmp reads invalid data, because the memset apparently clobbered > > the stack frame - i.e. the pointer returned be alloca points into a part > > of the stack frame that is still in use. > > > > I tried this on arm, x86_64 and ppc64 with glibc and it seems to work fine, > > but crashes when linked with musl (running Alpine Linux on a VM) > > > > If you need more information, please feel free to ask. You can also keep > > me CC'd, since I'd be interested in knowing more about the details. > > It looks to me like we have a bug here, but it's one where I or > someone else needs to read and understand the PPC64 ELFv2 ABI document > to fully understand what's going on and make a fix. I'll try to get to > it soon, or I'm happy if someone else wants to. I don't just want to > cargo-cult whatever glibc is doing, though; a fix should be > accompanied by an understanding of why it's right. I think I can explain what's happening. The TOC pointer is constant within a given dynamic module (the main executable or a library), but needs to be adjusted at cross-module calls. Each function has two entry points in the ELFv2 ABI. The entry point for intra-module calls can assume r2 is already set up correctly. The entry point for inter-module calls starts two instructions earlier and adjusts r2 before falling through to the intra-module entry point. Normally, r2 is supposed to be preserved across calls. For intra-module calls, there's no problem. For inter-module calls, the PLT stub saves the caller's r2 value to a slot in the caller's stack frame that's required to be reserved for it, at r1+24. The linker then inserts code in the caller to restore the value from the stack immediately after the call. So what's happening here is that the value of r2 that setjmp saves and that longjmp restores is the TOC pointer for libc, as set up by the PLT stub. It's not the value of r2 that the caller had. But that's normally fine -- after the second return from setjmp, the caller will restore its TOC pointer from the stack where it had been saved by the PLT stub when it originally called setjmp. But in this example, gcc decides to allocate the 256 bytes overtop the part of the stack where the setjmp PLT stub had saved the TOC pointer, so it gets clobbered. The problem is that static linking and dynamic linking need to work differently. With dynamic linking, we can fix this by changing setjmp to read the caller's TOC pointer from the reserved slot in the caller's stack frame, and longjmp to restore it to the stack instead of to r2. But with static linking, there's no PLT stub or code added by the linker to restore the TOC pointer from the stack, so we need to save/restore from/to r2, not the TOC slot in the caller's stack from. I think this either requires having different versions of setjmp/longjmp for static and dynamic libc, or to increase the size of jmpbuf so we can always save/restore both r2 and the value on the stack, but this would be an ABI change. -- Bobby