From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/8517 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: dynamic linker bootstrap/rcrt changes Date: Wed, 16 Sep 2015 01:36:04 -0400 Message-ID: <20150916053604.GT17773@brightrain.aerifal.cx> References: <20150911064504.GA21467@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1442381782 18474 80.91.229.3 (16 Sep 2015 05:36:22 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 16 Sep 2015 05:36:22 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-8529-gllmg-musl=m.gmane.org@lists.openwall.com Wed Sep 16 07:36:22 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1Zc5OF-0000OM-HW for gllmg-musl@m.gmane.org; Wed, 16 Sep 2015 07:36:19 +0200 Original-Received: (qmail 5220 invoked by uid 550); 16 Sep 2015 05:36:17 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 5199 invoked from network); 16 Sep 2015 05:36:16 -0000 Content-Disposition: inline In-Reply-To: <20150911064504.GA21467@brightrain.aerifal.cx> User-Agent: Mutt/1.5.21 (2010-09-15) Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:8517 Archived-At: On Fri, Sep 11, 2015 at 02:45:04AM -0400, Rich Felker wrote: > Working on static-PIE and FDPIC has shown the current approach (symbol > name lookup of "__dls2") to getting from stage 1 to stage 2 is not > what we should be doing. It requires -rdynamic for static PIE to work, > which is clunky and potentially bloated for large programs, and for > FDPIC the symbol lookup does not produce a callable function pointer > but rather an actual code address. > > What I'd like to do is punt on having _[dl]start_c make the call into > __dls2 and instead have it return, leaving the calling asm again > responsible for chaining into the next stage. This brings back a small > asm burden I'd tried to eliminate, but it reduces code size and > eliminates the above problems. > > One way we might could mitigate the asm burden is by having the crt > asm leave an extra N words below the original sp (argv-1) when making > the calls. This would give us space to pass state from stage 1 to > stage 2 (and possibly beyond) without the need for per-arch asm to > shuffle around argument registers and individual stack slots. This > would make it so each stage could take a single argument, orig_sp. I've devised what seems to be a good way to fix the current problems (-rdynamic requirement for static-PIE and lack of FDPIC support) without doing any of the above to complicate (or even change) the contract between crt_arch.h and the function it calls (aside from adding extra args for FDPIC). The basic idea is to perform the lookup of the stage-2 function via a small asm macro provided by the arch's reloc.h. Doing it in asm is necessary as a barrier to prevent the load from being hoisted before the self-relocation code and to avoid generating PIC code that could not safely run before there's a working GOT. On i386, it looks like this: diff --git a/arch/i386/reloc.h b/arch/i386/reloc.h index b52ef40..ea06e41 100644 --- a/arch/i386/reloc.h +++ b/arch/i386/reloc.h @@ -14,3 +14,7 @@ #define CRTJMP(pc,sp) __asm__ __volatile__( \ "mov %1,%%esp ; jmp *%0" : : "r"(pc), "r"(sp) : "memory" ) + +#define GETFUNCSYM(fp, sym, got) __asm__ ( \ + ".hidden " #sym " ; call 1f ; 1: addl $" #sym "-.,(%%esp) ; pop %0" \ + : "=r"(*fp) : : "memory" ) diff --git a/src/ldso/dlstart.c b/src/ldso/dlstart.c index 3aaa200..eb919ab 100644 --- a/src/ldso/dlstart.c +++ b/src/ldso/dlstart.c @@ -74,6 +74,7 @@ void _dlstart_c(size_t *sp, size_t *dynv) *rel_addr = (size_t)base + rel[2]; } +#ifndef GETFUNCSYM const char *strings = (void *)(base + dyn[DT_STRTAB]); const Sym *syms = (void *)(base + dyn[DT_SYMTAB]); @@ -85,6 +86,11 @@ void _dlstart_c(size_t *sp, size_t *dynv) break; } ((stage2_func)(base + syms[i].st_value))(base, sp); +#else + stage2_func dls2; + GETFUNCSYM(&dls2, __dls2, base+dyn[DT_PLTGOT]); + dls2(base, sp); +#endif } #endif The got-pointer argument is provided to the macro in case it's needed on some archs; on many, DT_PLTGOT is not even set and thus not usable. This should only be needed on FDPIC, where it won't actually be base+... but rather an address that's properly adjusted for the segment. For example on SH-FDPIC it looks like: #define GETFUNCSYM(fp, sym, got) __asm__ ( \ "mov.l 1f,%0 ; add %1,%0 ; bra 2f ; nop ; .align 2 \n" \ "1: .long " #sym "@GOTOFFFUNCDESC \n2:" \ : "=&r"(*fp) : "r"(got) : "memory" ) The point of returning the function address instead of just making a call from asm is that we want the compiler to generate a tail-call, so that the (considerable on a NOMMU scale) stack space used by the self-relocation code gets freed before transferring control to the main program. So far I have this working on i386 and sheb-nofpu-fdpic (for the latter, with a highly-modified, fdpic-oriented dlstart.c) but it should not be hard to adapt to other archs. If it seems ready to commit, I can leave the above #ifdef in place during transition so that not all archs have to transition in a single commit. Rich