mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Subject: Re: dynamic linker bootstrap/rcrt changes
Date: Wed, 16 Sep 2015 01:36:04 -0400	[thread overview]
Message-ID: <20150916053604.GT17773@brightrain.aerifal.cx> (raw)
In-Reply-To: <20150911064504.GA21467@brightrain.aerifal.cx>

On Fri, Sep 11, 2015 at 02:45:04AM -0400, Rich Felker wrote:
> Working on static-PIE and FDPIC has shown the current approach (symbol
> name lookup of "__dls2") to getting from stage 1 to stage 2 is not
> what we should be doing. It requires -rdynamic for static PIE to work,
> which is clunky and potentially bloated for large programs, and for
> FDPIC the symbol lookup does not produce a callable function pointer
> but rather an actual code address.
> 
> What I'd like to do is punt on having _[dl]start_c make the call into
> __dls2 and instead have it return, leaving the calling asm again
> responsible for chaining into the next stage. This brings back a small
> asm burden I'd tried to eliminate, but it reduces code size and
> eliminates the above problems.
> 
> One way we might could mitigate the asm burden is by having the crt
> asm leave an extra N words below the original sp (argv-1) when making
> the calls. This would give us space to pass state from stage 1 to
> stage 2 (and possibly beyond) without the need for per-arch asm to
> shuffle around argument registers and individual stack slots. This
> would make it so each stage could take a single argument, orig_sp.

I've devised what seems to be a good way to fix the current problems
(-rdynamic requirement for static-PIE and lack of FDPIC support)
without doing any of the above to complicate (or even change) the
contract between crt_arch.h and the function it calls (aside from
adding extra args for FDPIC).

The basic idea is to perform the lookup of the stage-2 function via a
small asm macro provided by the arch's reloc.h. Doing it in asm is
necessary as a barrier to prevent the load from being hoisted before
the self-relocation code and to avoid generating PIC code that could
not safely run before there's a working GOT. On i386, it looks like
this:

diff --git a/arch/i386/reloc.h b/arch/i386/reloc.h
index b52ef40..ea06e41 100644
--- a/arch/i386/reloc.h
+++ b/arch/i386/reloc.h
@@ -14,3 +14,7 @@
 
 #define CRTJMP(pc,sp) __asm__ __volatile__( \
 	"mov %1,%%esp ; jmp *%0" : : "r"(pc), "r"(sp) : "memory" )
+
+#define GETFUNCSYM(fp, sym, got) __asm__ ( \
+	".hidden " #sym " ; call 1f ; 1: addl $" #sym "-.,(%%esp) ; pop %0" \
+	: "=r"(*fp) : : "memory" )
diff --git a/src/ldso/dlstart.c b/src/ldso/dlstart.c
index 3aaa200..eb919ab 100644
--- a/src/ldso/dlstart.c
+++ b/src/ldso/dlstart.c
@@ -74,6 +74,7 @@ void _dlstart_c(size_t *sp, size_t *dynv)
 		*rel_addr = (size_t)base + rel[2];
 	}
 
+#ifndef GETFUNCSYM
 	const char *strings = (void *)(base + dyn[DT_STRTAB]);
 	const Sym *syms = (void *)(base + dyn[DT_SYMTAB]);
 
@@ -85,6 +86,11 @@ void _dlstart_c(size_t *sp, size_t *dynv)
 			break;
 	}
 	((stage2_func)(base + syms[i].st_value))(base, sp);
+#else
+	stage2_func dls2;
+	GETFUNCSYM(&dls2, __dls2, base+dyn[DT_PLTGOT]);
+	dls2(base, sp);
+#endif
 }
 
 #endif

The got-pointer argument is provided to the macro in case it's needed
on some archs; on many, DT_PLTGOT is not even set and thus not usable.
This should only be needed on FDPIC, where it won't actually be
base+... but rather an address that's properly adjusted for the
segment. For example on SH-FDPIC it looks like:

#define GETFUNCSYM(fp, sym, got) __asm__ ( \
	"mov.l 1f,%0 ; add %1,%0 ; bra 2f ; nop ; .align 2 \n" \
	"1:     .long " #sym "@GOTOFFFUNCDESC \n2:" \
	: "=&r"(*fp) : "r"(got) : "memory" )

The point of returning the function address instead of just making a
call from asm is that we want the compiler to generate a tail-call, so
that the (considerable on a NOMMU scale) stack space used by the
self-relocation code gets freed before transferring control to the
main program.

So far I have this working on i386 and sheb-nofpu-fdpic (for the
latter, with a highly-modified, fdpic-oriented dlstart.c) but it
should not be hard to adapt to other archs. If it seems ready to
commit, I can leave the above #ifdef in place during transition so
that not all archs have to transition in a single commit.

Rich


      reply	other threads:[~2015-09-16  5:36 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-11  6:45 Rich Felker
2015-09-16  5:36 ` Rich Felker [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150916053604.GT17773@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).