mailing list of musl libc
 help / color / mirror / code / Atom feed
* dynamic linker bootstrap/rcrt changes
@ 2015-09-11  6:45 Rich Felker
  2015-09-16  5:36 ` Rich Felker
  0 siblings, 1 reply; 2+ messages in thread
From: Rich Felker @ 2015-09-11  6:45 UTC (permalink / raw)
  To: musl

Working on static-PIE and FDPIC has shown the current approach (symbol
name lookup of "__dls2") to getting from stage 1 to stage 2 is not
what we should be doing. It requires -rdynamic for static PIE to work,
which is clunky and potentially bloated for large programs, and for
FDPIC the symbol lookup does not produce a callable function pointer
but rather an actual code address.

What I'd like to do is punt on having _[dl]start_c make the call into
__dls2 and instead have it return, leaving the calling asm again
responsible for chaining into the next stage. This brings back a small
asm burden I'd tried to eliminate, but it reduces code size and
eliminates the above problems.

One way we might could mitigate the asm burden is by having the crt
asm leave an extra N words below the original sp (argv-1) when making
the calls. This would give us space to pass state from stage 1 to
stage 2 (and possibly beyond) without the need for per-arch asm to
shuffle around argument registers and individual stack slots. This
would make it so each stage could take a single argument, orig_sp.

Rich


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: dynamic linker bootstrap/rcrt changes
  2015-09-11  6:45 dynamic linker bootstrap/rcrt changes Rich Felker
@ 2015-09-16  5:36 ` Rich Felker
  0 siblings, 0 replies; 2+ messages in thread
From: Rich Felker @ 2015-09-16  5:36 UTC (permalink / raw)
  To: musl

On Fri, Sep 11, 2015 at 02:45:04AM -0400, Rich Felker wrote:
> Working on static-PIE and FDPIC has shown the current approach (symbol
> name lookup of "__dls2") to getting from stage 1 to stage 2 is not
> what we should be doing. It requires -rdynamic for static PIE to work,
> which is clunky and potentially bloated for large programs, and for
> FDPIC the symbol lookup does not produce a callable function pointer
> but rather an actual code address.
> 
> What I'd like to do is punt on having _[dl]start_c make the call into
> __dls2 and instead have it return, leaving the calling asm again
> responsible for chaining into the next stage. This brings back a small
> asm burden I'd tried to eliminate, but it reduces code size and
> eliminates the above problems.
> 
> One way we might could mitigate the asm burden is by having the crt
> asm leave an extra N words below the original sp (argv-1) when making
> the calls. This would give us space to pass state from stage 1 to
> stage 2 (and possibly beyond) without the need for per-arch asm to
> shuffle around argument registers and individual stack slots. This
> would make it so each stage could take a single argument, orig_sp.

I've devised what seems to be a good way to fix the current problems
(-rdynamic requirement for static-PIE and lack of FDPIC support)
without doing any of the above to complicate (or even change) the
contract between crt_arch.h and the function it calls (aside from
adding extra args for FDPIC).

The basic idea is to perform the lookup of the stage-2 function via a
small asm macro provided by the arch's reloc.h. Doing it in asm is
necessary as a barrier to prevent the load from being hoisted before
the self-relocation code and to avoid generating PIC code that could
not safely run before there's a working GOT. On i386, it looks like
this:

diff --git a/arch/i386/reloc.h b/arch/i386/reloc.h
index b52ef40..ea06e41 100644
--- a/arch/i386/reloc.h
+++ b/arch/i386/reloc.h
@@ -14,3 +14,7 @@
 
 #define CRTJMP(pc,sp) __asm__ __volatile__( \
 	"mov %1,%%esp ; jmp *%0" : : "r"(pc), "r"(sp) : "memory" )
+
+#define GETFUNCSYM(fp, sym, got) __asm__ ( \
+	".hidden " #sym " ; call 1f ; 1: addl $" #sym "-.,(%%esp) ; pop %0" \
+	: "=r"(*fp) : : "memory" )
diff --git a/src/ldso/dlstart.c b/src/ldso/dlstart.c
index 3aaa200..eb919ab 100644
--- a/src/ldso/dlstart.c
+++ b/src/ldso/dlstart.c
@@ -74,6 +74,7 @@ void _dlstart_c(size_t *sp, size_t *dynv)
 		*rel_addr = (size_t)base + rel[2];
 	}
 
+#ifndef GETFUNCSYM
 	const char *strings = (void *)(base + dyn[DT_STRTAB]);
 	const Sym *syms = (void *)(base + dyn[DT_SYMTAB]);
 
@@ -85,6 +86,11 @@ void _dlstart_c(size_t *sp, size_t *dynv)
 			break;
 	}
 	((stage2_func)(base + syms[i].st_value))(base, sp);
+#else
+	stage2_func dls2;
+	GETFUNCSYM(&dls2, __dls2, base+dyn[DT_PLTGOT]);
+	dls2(base, sp);
+#endif
 }
 
 #endif

The got-pointer argument is provided to the macro in case it's needed
on some archs; on many, DT_PLTGOT is not even set and thus not usable.
This should only be needed on FDPIC, where it won't actually be
base+... but rather an address that's properly adjusted for the
segment. For example on SH-FDPIC it looks like:

#define GETFUNCSYM(fp, sym, got) __asm__ ( \
	"mov.l 1f,%0 ; add %1,%0 ; bra 2f ; nop ; .align 2 \n" \
	"1:     .long " #sym "@GOTOFFFUNCDESC \n2:" \
	: "=&r"(*fp) : "r"(got) : "memory" )

The point of returning the function address instead of just making a
call from asm is that we want the compiler to generate a tail-call, so
that the (considerable on a NOMMU scale) stack space used by the
self-relocation code gets freed before transferring control to the
main program.

So far I have this working on i386 and sheb-nofpu-fdpic (for the
latter, with a highly-modified, fdpic-oriented dlstart.c) but it
should not be hard to adapt to other archs. If it seems ready to
commit, I can leave the above #ifdef in place during transition so
that not all archs have to transition in a single commit.

Rich


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2015-09-16  5:36 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-11  6:45 dynamic linker bootstrap/rcrt changes Rich Felker
2015-09-16  5:36 ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).