mailing list of musl libc
 help / color / mirror / code / Atom feed
* Dynamic linker changes
@ 2015-04-05 22:30 Rich Felker
  2015-04-05 22:55 ` Rich Felker
  0 siblings, 1 reply; 5+ messages in thread
From: Rich Felker @ 2015-04-05 22:30 UTC (permalink / raw)
  To: musl

As part of the dynamic linker overhaul project for ssp-enabled
libc.so, I'd like to make some somewhat unrelated changes to the
dynamic linker. Some aspects of these are just general improvements,
but most of them eliminate implementation snags I'm forseeing in the
early-relocation code. Anyway, here they are:


Revisiting how we find load base address:

If the dynamic linker is invoked as the PT_INTERP for a program, it
gets its own base address as AT_BASE in auxv. But if it's invoked
directly, AT_BASE is empty, and we presently round down the AT_PHDR
address to a page boundary and assume that's the load base. This is an
ugly hack and not guaranteed to be correct (although it should be with
any reasonable linker).

A better approach is having the asm entry point for the dynamic linker
compute the address of _DYNAMIC using its known PC-relative offset and
pass this into the C code. The C code can then find the base-relative
location of _DYNAMIC via PT_DYNAMIC in the program headers, and the
difference between these two values (absolute address of _DYNAMIC and
base-relative address of _DYNAMIC) is the base.


Revisiting how ld.so skips argv entries:

Presently, when invoked as a command, ld.so uses an ugly hack for
stripping the beginning of argv[] before passing it to the main
program entry point. It replaces slots with (char*)-1 and the calling
asm is responsible for skipping over these before passing execution to
the main program's entry point. This requires a lot of ugly
arch-specific asm, and often this asm does not get tested early on
since invocation of ld.so as a command is not a commonly used feature.

A better approach would be making the C part of the dynamic linker
never return, but instead call longjmp to pass execution to the main
program's entry point. Provided we tell the dynamic linker where the
PC and SP registers are located in jmp_buf, all it needs to do are
store the AT_ENTRY address into the PC slot and the updated start
address of the argv array in the SP slot, then call longjmp.


Stripping down entry point asm further:

Like the way crt_arch.h and crt1.c work for the main program entry
point now, almost all asm can be eliminated from the dynamic linker
entry point. All that's needed is some minimal asm to align SP and put
the original SP value (and now, also the address of _DYNAMIC) in
argument registers/slots and tail-call to the C code. The C code can
be responsible for extracting argc out of the ELF argv array.




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Dynamic linker changes
  2015-04-05 22:30 Dynamic linker changes Rich Felker
@ 2015-04-05 22:55 ` Rich Felker
  2015-04-08 23:19   ` Rich Felker
  0 siblings, 1 reply; 5+ messages in thread
From: Rich Felker @ 2015-04-05 22:55 UTC (permalink / raw)
  To: musl

On Sun, Apr 05, 2015 at 06:30:31PM -0400, Rich Felker wrote:
> As part of the dynamic linker overhaul project for ssp-enabled
> libc.so, I'd like to make some somewhat unrelated changes to the
> dynamic linker. Some aspects of these are just general improvements,
> but most of them eliminate implementation snags I'm forseeing in the
> early-relocation code. Anyway, here they are:
> 
> 
> Revisiting how we find load base address:
> [...]
> Revisiting how ld.so skips argv entries:
> [...]
> Stripping down entry point asm further:
> [...]

Here's the draft code for what runs before libc.so/ldso itself is
relocated:

void *__dlstart_c(uintptr_t sp, uintptr_t dynamic)
{
	size_t i, aux[AUX_CNT] = {0}, dyn[DYN_CNT] = {0};
	struct dso *self = {0};

	int argc = *(size_t *)sp;
	char **argv = (void *)(sp + sizeof(size_t));

	for (i=argc+1; argv[i]; i++);
	size_t *auxv = (void *)(argv+i+1);

	decode_vec(auxv, aux, AUX_CNT);

	if (!aux[AT_BASE]) {
		size_t phnum = aux[AT_PHNUM];
		size_t phentsize = aux[AT_PHENT];
		Phdr *ph = (void *)aux[AT_PHDR];
		for (i=phnum; i--; ph = (void *)((char *)ph + phentsize)) {
			if (ph->p_type == PT_DYNAMIC) {
				aux[AT_BASE] = dynamic - ph->p_vaddr;
				break;
			}
		}
	}

	self.dynv = (void *)dynamic;
	self.base = (void *)aux[AT_BASE];
	decode_dyn(&self);
	reloc_dso(&self, &self);

	dynlink = (void (*)())find_sym(&self, "__dynlink", 1).sym;
	return dynlink(argc, argv, auxv);
}

I'm hand-waving at some additional changes that need to be made:
reloc_dso is like the current reloc_all but it takes an additional
argument for the root dso to search for symbols (rather than using a
global var) and it doesn't call mprotect for relro (two reasons -- 1,
that would preclude patching up libc's PLT relocs later, and 2,
mprotect can't be called yet since it's external and we're not
assuming external calls can be made without relocating GOT/PLT).
Anyway this requires some changes several levels of function calls
down to get rid of global data, but that's a big cleanup win anyway.

Also, the core symbol lookup code is calling strcmp, which is
external. That needs to be replaced with a call to a static function,
which is no problem. Right now the real strcmp isn't even optimized.
If we eventually do want to optimize it though, this may introduce
some additional complexity in the dynamic linker to use the simple C
strcmp at early load time and switch to an optimized one later. Of
course just leaving the call to strcmp won't break with the current
assumption of -Bsymbolic-functions, but I'd like to eliminate that
assumption and treat it as an optimization rather than a semantic
necessity.

Rich


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Dynamic linker changes
  2015-04-05 22:55 ` Rich Felker
@ 2015-04-08 23:19   ` Rich Felker
  2015-04-11 20:21     ` Rich Felker
  0 siblings, 1 reply; 5+ messages in thread
From: Rich Felker @ 2015-04-08 23:19 UTC (permalink / raw)
  To: musl

So far I've been hitting some ugly obstacles:

1. If we want the new code to eliminate the need for platform-specific
__reloc_self functions for mips/microblaze/powerpc/etc., the first
phase needs to perform relative relocations without making _any_
function calls, since even calls to static functions go through
(anonymous) got entries. That's not such a bad idea if it would
eliminate this ugly per-arch code, but...

2. The current method of making relocations generic rather than
arch-specific uses a function/switch statement to remap them. This is
not going to be usable for solving item 1 above in an arch-agnostic
manner. So perhaps we should replace this function with a set of
macros. Rather than having generic values for the REL_* macros and
mapping the R_{arch}_* values to them with switch, we could do
something like (example for microblaze):

#define REL_SYMBOLIC R_MICROBLAZE_32
#define REL_GOT R_MICROBLAZE_GLOB_DAT
...

However it's not clear to me whether the mapping from arch-specific
reloc types to generic ones is one-to-one in all archs, so we'd
potentially have to have extra macros like REL_SYMBOLIC_2.

One benefit of this approach, on the other hand, is that the switch in
the portable reloc processing code can have #ifdef around each case
and eliminate code for relocations a given arch doesn't support.
(Technically the compiler's dead code elimination could already after
inlining the mapping function or performing inter-procedural range
analysis, but I seriously doubt it did so.)

3. The original plan was to have one early-ldso-relocation step and
avoid all possible GOT/globals use and everything after that free to
use arbitrary global data and symbols, with a single barrier in
between to prevent reordering of GOT loads before they're relocated.
This seems impractical since it's hard, due to issue 1, do to symbolic
relocations without being able to make function calls.

Instead I'd like to treat the early-ldso-relocation process as two
steps. The first is generalizing and making arch-agnostic the work
mips, microblaze, and powerpc are doing now to get to a state where
all non-symbolic global accesses are safe. The second would be a
separate function call from the asm (or chained from the first if
there's an obvious way to do it) that performs symbolic relocations on
itself. It would end by (as proposed in the sketch before) doing a
symbol lookup and final call into the code that will setup the dso
chain, load dependencies, perform all remaining relocations, and pass
control to the program's entry point.

Rich


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Dynamic linker changes
  2015-04-08 23:19   ` Rich Felker
@ 2015-04-11 20:21     ` Rich Felker
  2015-04-12  1:59       ` Rich Felker
  0 siblings, 1 reply; 5+ messages in thread
From: Rich Felker @ 2015-04-11 20:21 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 2972 bytes --]

On Wed, Apr 08, 2015 at 07:19:11PM -0400, Rich Felker wrote:
> 3. The original plan was to have one early-ldso-relocation step and
> avoid all possible GOT/globals use and everything after that free to
> use arbitrary global data and symbols, with a single barrier in
> between to prevent reordering of GOT loads before they're relocated.
> This seems impractical since it's hard, due to issue 1, do to symbolic
> relocations without being able to make function calls.
> 
> Instead I'd like to treat the early-ldso-relocation process as two
> steps. The first is generalizing and making arch-agnostic the work
> mips, microblaze, and powerpc are doing now to get to a state where
> all non-symbolic global accesses are safe. The second would be a
> separate function call from the asm (or chained from the first if
> there's an obvious way to do it) that performs symbolic relocations on
> itself. It would end by (as proposed in the sketch before) doing a
> symbol lookup and final call into the code that will setup the dso
> chain, load dependencies, perform all remaining relocations, and pass
> control to the program's entry point.

I've got the first working draft of the above design, and it's three
stages:

1. Perform relative relocations on ldso/libc itself referencing
   nothing but its arguments and the data they point to.

2. Setup a dso structure for ldso/libc and perform symbolic
   relocations on it using nothing but static functions/data from
   dynlink.c.

3. Do nearly everything the old __dynlink did, but with the ldso dso
   structure already setup and fully usable (not depending on
   -Bsymbolic-functions and arch-specific __reloc_self to make it
   almost-fully-usable like we did before).

Currently, stage 1 calls into stage 2 and 3 via very primitive
symbol-lookup code. This has some trade-offs.

Pros: The dynamic linker entry point asm does not need to be aware of
the details of the dynamic linking process. It just calls one function
with minimal args (original SP and &_DYNAMIC) and uses the return
value as a jump destination (along with a simple SP-fixup trick).

Cons: Stage 1 is coupled with the rest of the dynamic linking process.
This is somewhat unfortunate since the stage 1 code, minus this last
symbol lookup step but including the entry point asm prior to calling
stage 1, is _exactly_ what would be needed for "static PIE" Rcrt1.o.
It could be made to work 'unmodified' for static PIE by having the
source for Rcrt1.o provide its own definitions of the stage 2 and 3
functions, but since stage 1 looks them up by name at runtime,
stripping dynamic symbol names (which should in principle work for
static PIE) would break it.

I'm attaching a diff with the work so far for comments. It's
unfinished (only i386 and mips are implemented so far; mips was chosen
because it's the one arch that needs ugly arch-specific relocations
and I had to check and make sure they work right in the new design)
but seems to work.

Rich

[-- Attachment #2: dynlink_g2_v1.diff --]
[-- Type: text/plain, Size: 22695 bytes --]

diff --git a/arch/i386/reloc.h b/arch/i386/reloc.h
index eaf5aae..95c6022 100644
--- a/arch/i386/reloc.h
+++ b/arch/i386/reloc.h
@@ -3,31 +3,14 @@
 
 #define LDSO_ARCH "i386"
 
-static int remap_rel(int type)
-{
-	switch(type) {
-	case R_386_32:
-		return REL_SYMBOLIC;
-	case R_386_PC32:
-		return REL_OFFSET;
-	case R_386_GLOB_DAT:
-		return REL_GOT;
-	case R_386_JMP_SLOT:
-		return REL_PLT;
-	case R_386_RELATIVE:
-		return REL_RELATIVE;
-	case R_386_COPY:
-		return REL_COPY;
-	case R_386_TLS_DTPMOD32:
-		return REL_DTPMOD;
-	case R_386_TLS_DTPOFF32:
-		return REL_DTPOFF;
-	case R_386_TLS_TPOFF:
-		return REL_TPOFF;
-	case R_386_TLS_TPOFF32:
-		return REL_TPOFF_NEG;
-	case R_386_TLS_DESC:
-		return REL_TLSDESC;
-	}
-	return 0;
-}
+#define REL_SYMBOLIC    R_386_32
+#define REL_OFFSET      R_386_PC32
+#define REL_GOT         R_386_GLOB_DAT
+#define REL_PLT         R_386_JMP_SLOT
+#define REL_RELATIVE    R_386_RELATIVE
+#define REL_COPY        R_386_COPY
+#define REL_DTPMOD      R_386_TLS_DTPMOD32
+#define REL_DTPOFF      R_386_TLS_DTPOFF32
+#define REL_TPOFF       R_386_TLS_TPOFF
+#define REL_TPOFF_NEG   R_386_TLS_TPOFF32
+#define REL_TLSDESC     R_386_TLS_DESC
diff --git a/arch/mips/reloc.h b/arch/mips/reloc.h
index 4b81d32..f77f9d1 100644
--- a/arch/mips/reloc.h
+++ b/arch/mips/reloc.h
@@ -18,37 +18,12 @@
 
 #define TPOFF_K (-0x7000)
 
-static int remap_rel(int type)
-{
-	switch(type) {
-	case R_MIPS_REL32:
-		return REL_SYM_OR_REL;
-	case R_MIPS_JUMP_SLOT:
-		return REL_PLT;
-	case R_MIPS_COPY:
-		return REL_COPY;
-	case R_MIPS_TLS_DTPMOD32:
-		return REL_DTPMOD;
-	case R_MIPS_TLS_DTPREL32:
-		return REL_DTPOFF;
-	case R_MIPS_TLS_TPREL32:
-		return REL_TPOFF;
-	}
-	return 0;
-}
-
-void __reloc_self(int c, size_t *a, size_t *dynv, size_t *got)
-{
-	char *base;
-	size_t t[20], n;
-	for (a+=c+1; *a; a++);
-	for (a++; *a; a+=2) if (*a<20) t[*a] = a[1];
-	base = (char *)t[AT_BASE];
-	if (!base) base = (char *)(t[AT_PHDR] & -t[AT_PAGESZ]);
-	for (a=dynv; *a; a+=2) if (*a-0x70000000UL<20) t[*a&31] = a[1];
-	n = t[DT_MIPS_LOCAL_GOTNO - 0x70000000];
-	for (a=got; n; a++, n--) *a += (size_t)base;
-}
+#define REL_SYM_OR_REL  R_MIPS_REL32
+#define REL_PLT         R_MIPS_JUMP_SLOT
+#define REL_COPY        R_MIPS_COPY
+#define REL_DTPMOD      R_MIPS_TLS_DTPMOD32
+#define REL_DTPOFF      R_MIPS_TLS_DTPREL32
+#define REL_TPOFF       R_MIPS_TLS_TPREL32
 
 static void do_relocs(struct dso *dso, size_t *rel, size_t rel_size, size_t stride);
 
@@ -68,7 +43,7 @@ static void do_arch_relocs(struct dso *this, struct dso *head)
 			got = dynv[i+1];
 	}
 	i = dyn[DT_MIPS_LOCAL_GOTNO-0x70000000];
-	if (this->shortname && !strcmp(this->shortname, "libc.so")) {
+	if (this->rel_early_relative) {
 		got += sizeof(size_t) * i;
 	} else {
 		for (; i; i--, got+=sizeof(size_t))
@@ -79,11 +54,11 @@ static void do_arch_relocs(struct dso *this, struct dso *head)
 	for (; i; i--, got+=sizeof(size_t), sym++) {
 		rel[0] = got;
 		rel[1] = sym-this->syms << 8 | R_MIPS_JUMP_SLOT;
-		*(size_t *)(base+got) = 0;
 		do_relocs(this, rel, sizeof rel, 2);
 	}
 }
 
+#define NEED_MIPS_GOT_RELOCS
 #define NEED_ARCH_RELOCS 1
 #define DYNAMIC_IS_RO 1
 #define ARCH_SYM_REJECT_UND(s) (!((s)->st_other & STO_MIPS_PLT))
diff --git a/src/ldso/dynlink.c b/src/ldso/dynlink.c
index f6ed801..7f19686 100644
--- a/src/ldso/dynlink.c
+++ b/src/ldso/dynlink.c
@@ -35,7 +35,7 @@ typedef Elf32_Sym Sym;
 typedef Elf64_Ehdr Ehdr;
 typedef Elf64_Phdr Phdr;
 typedef Elf64_Sym Sym;
-#define R_TYPE(x) ((x)&0xffffffff)
+#define R_TYPE(x) ((x)&0x7fffffff)
 #define R_SYM(x) ((x)>>32)
 #endif
 
@@ -88,6 +88,7 @@ struct dso {
 	volatile int new_dtv_idx, new_tls_idx;
 	struct td_index *td_index;
 	struct dso *fini_next;
+	int rel_early_relative, rel_update_got;
 	char *shortname;
 	char buf[];
 };
@@ -97,9 +98,10 @@ struct symdef {
 	struct dso *dso;
 };
 
+/* These enum constants provide unmatchable default values for
+ * any relocation type the arch does not use. */
 enum {
-	REL_ERR,
-	REL_SYMBOLIC,
+	REL_SYMBOLIC = -100,
 	REL_GOT,
 	REL_PLT,
 	REL_RELATIVE,
@@ -107,7 +109,6 @@ enum {
 	REL_OFFSET32,
 	REL_COPY,
 	REL_SYM_OR_REL,
-	REL_TLS, /* everything past here is TLS */
 	REL_DTPMOD,
 	REL_DTPOFF,
 	REL_TPOFF,
@@ -117,6 +118,10 @@ enum {
 
 #include "reloc.h"
 
+#define IS_RELATIVE(x) ( \
+	(R_TYPE(x) == REL_RELATIVE) || \
+	(R_TYPE(x) == REL_SYM_OR_REL && !R_SYM(x)) )
+
 int __init_tp(void *);
 void __init_libc(char **, char *);
 
@@ -129,7 +134,8 @@ static struct builtin_tls {
 } builtin_tls[1];
 #define MIN_TLS_ALIGN offsetof(struct builtin_tls, pt)
 
-static struct dso *head, *tail, *ldso, *fini_head;
+static struct dso ldso;
+static struct dso *head, *tail, *fini_head;
 static char *env_path, *sys_path;
 static unsigned long long gencnt;
 static int runtime;
@@ -148,9 +154,17 @@ struct debug *_dl_debug_addr = &debug;
 #define AUX_CNT 38
 #define DYN_CNT 34
 
+static int dl_strcmp(const char *l, const char *r)
+{
+	for (; *l==*r && *l; l++, r++);
+	return *(unsigned char *)l - *(unsigned char *)r;
+}
+#define strcmp(l,r) dl_strcmp(l,r)
+
 static void decode_vec(size_t *v, size_t *a, size_t cnt)
 {
-	memset(a, 0, cnt*sizeof(size_t));
+	size_t i;
+	for (i=0; i<cnt; i++) a[i] = 0;
 	for (; v[0]; v+=2) if (v[0]<cnt) {
 		a[0] |= 1ULL<<v[0];
 		a[v[0]] = v[1];
@@ -276,8 +290,6 @@ static struct symdef find_sym(struct dso *dso, const char *s, int need_def)
 	return def;
 }
 
-#define NO_INLINE_ADDEND (1<<REL_COPY | 1<<REL_GOT | 1<<REL_PLT)
-
 ptrdiff_t __tlsdesc_static(), __tlsdesc_dynamic();
 
 static void do_relocs(struct dso *dso, size_t *rel, size_t rel_size, size_t stride)
@@ -288,7 +300,7 @@ static void do_relocs(struct dso *dso, size_t *rel, size_t rel_size, size_t stri
 	Sym *sym;
 	const char *name;
 	void *ctx;
-	int astype, type;
+	int type;
 	int sym_index;
 	struct symdef def;
 	size_t *reloc_addr;
@@ -297,14 +309,8 @@ static void do_relocs(struct dso *dso, size_t *rel, size_t rel_size, size_t stri
 	size_t addend;
 
 	for (; rel_size; rel+=stride, rel_size-=stride*sizeof(size_t)) {
-		astype = R_TYPE(rel[1]);
-		if (!astype) continue;
-		type = remap_rel(astype);
-		if (!type) {
-			error("Error relocating %s: unsupported relocation type %d",
-				dso->name, astype);
-			continue;
-		}
+		if (dso->rel_early_relative && IS_RELATIVE(rel[1])) continue;
+		type = R_TYPE(rel[1]);
 		sym_index = R_SYM(rel[1]);
 		reloc_addr = (void *)(base + rel[0]);
 		if (sym_index) {
@@ -324,14 +330,19 @@ static void do_relocs(struct dso *dso, size_t *rel, size_t rel_size, size_t stri
 			def.dso = dso;
 		}
 
+		int gotplt = (type == REL_GOT || type == REL_PLT);
+		if (dso->rel_update_got && !gotplt) continue;
+
 		addend = stride>2 ? rel[2]
-			: (1<<type & NO_INLINE_ADDEND) ? 0
+			: gotplt ? 0
 			: *reloc_addr;
 
 		sym_val = def.sym ? (size_t)def.dso->base+def.sym->st_value : 0;
 		tls_val = def.sym ? def.sym->st_value : 0;
 
 		switch(type) {
+		case 0:
+			break;
 		case REL_OFFSET:
 			addend -= (size_t)reloc_addr;
 		case REL_SYMBOLIC:
@@ -395,6 +406,10 @@ static void do_relocs(struct dso *dso, size_t *rel, size_t rel_size, size_t stri
 #endif
 			}
 			break;
+		default:
+			error("Error relocating %s: unsupported relocation type %d",
+				dso->name, type);
+			continue;
 		}
 	}
 }
@@ -711,22 +726,22 @@ static struct dso *load_library(const char *name, struct dso *needed_by)
 					if (!(reported & mask)) {
 						reported |= mask;
 						dprintf(1, "\t%s => %s (%p)\n",
-							name, ldso->name,
-							ldso->base);
+							name, ldso.name,
+							ldso.base);
 					}
 				}
 				is_self = 1;
 			}
 		}
 	}
-	if (!strcmp(name, ldso->name)) is_self = 1;
+	if (!strcmp(name, ldso.name)) is_self = 1;
 	if (is_self) {
-		if (!ldso->prev) {
-			tail->next = ldso;
-			ldso->prev = tail;
-			tail = ldso->next ? ldso->next : ldso;
+		if (!ldso.prev) {
+			tail->next = &ldso;
+			ldso.prev = tail;
+			tail = ldso.next ? ldso.next : &ldso;
 		}
-		return ldso;
+		return &ldso;
 	}
 	if (strchr(name, '/')) {
 		pathname = name;
@@ -752,13 +767,13 @@ static struct dso *load_library(const char *name, struct dso *needed_by)
 			if (!sys_path) {
 				char *prefix = 0;
 				size_t prefix_len;
-				if (ldso->name[0]=='/') {
+				if (ldso.name[0]=='/') {
 					char *s, *t, *z;
-					for (s=t=z=ldso->name; *s; s++)
+					for (s=t=z=ldso.name; *s; s++)
 						if (*s=='/') z=t, t=s;
-					prefix_len = z-ldso->name;
+					prefix_len = z-ldso.name;
 					if (prefix_len < PATH_MAX)
-						prefix = ldso->name;
+						prefix = ldso.name;
 				}
 				if (!prefix) {
 					prefix = "";
@@ -1121,31 +1136,136 @@ static void update_tls_size()
 	tls_align);
 }
 
-void *__dynlink(int argc, char **argv)
+void *__dlstart_c(size_t *sp, size_t *dynv)
 {
-	size_t aux[AUX_CNT] = {0};
+	size_t i, aux[AUX_CNT], dyn[DYN_CNT];
+
+	int argc = *sp;
+	char **argv = (void *)(sp+1);
+
+	for (i=argc+1; argv[i]; i++);
+	size_t *auxv = (void *)(argv+i+1);
+
+	for (i=0; i<AUX_CNT; i++) aux[i] = 0;
+	for (i=0; auxv[i]; i+=2) if (auxv[i]<AUX_CNT) {
+		aux[0] |= 1U<<auxv[i];
+		aux[auxv[i]] = auxv[i+1];
+	}
+
+	for (i=0; i<DYN_CNT; i++) dyn[i] = 0;
+	for (i=0; dynv[i]; i+=2) if (dynv[i]<DYN_CNT) {
+		dyn[dynv[i]] = dynv[i+1];
+	}
+
+	unsigned char *base = (void *)aux[AT_BASE];
+	if (!base) {
+		size_t phnum = aux[AT_PHNUM];
+		size_t phentsize = aux[AT_PHENT];
+		Phdr *ph = (void *)aux[AT_PHDR];
+		for (i=phnum; i--; ph = (void *)((char *)ph + phentsize)) {
+			if (ph->p_type == PT_DYNAMIC) {
+				base = (void *)((size_t)dynv - ph->p_vaddr);
+				break;
+			}
+		}
+	}
+
+#ifdef NEED_MIPS_GOT_RELOCS
+	size_t local_cnt;
+	size_t *got = (void *)(base + dyn[DT_PLTGOT]);
+	for (i=0; dynv[i]; i+=2) if (dynv[i]==DT_MIPS_LOCAL_GOTNO)
+		local_cnt = dynv[i+1];
+	for (i=0; i<local_cnt; i++) got[i] += (size_t)base;
+#endif
+
+	/* The use of the reloc_info structure and nested loops is a trick
+	 * to work around the fact that we can't necessarily make function
+	 * calls yet. Each struct in the array serves like the arguments
+	 * to a function call. */
+	struct reloc_info {
+		void *rel;
+		size_t size;
+		size_t stride;
+	} reloc_info[] = {
+		{ base+dyn[DT_JMPREL], dyn[DT_PLTRELSZ], 2+(dyn[DT_PLTREL]==DT_RELA) },
+		{ base+dyn[DT_REL], dyn[DT_RELSZ], 2 },
+		{ base+dyn[DT_RELA], dyn[DT_RELASZ], 3 },
+		{ 0, 0, 0 }
+	};
+
+	for (i=0; reloc_info[i].rel; i++) {
+		size_t *rel = reloc_info[i].rel;
+		size_t rel_size = reloc_info[i].size;
+		size_t stride = reloc_info[i].stride;
+		for (; rel_size; rel+=stride, rel_size-=stride*sizeof(size_t)) {
+			if (!IS_RELATIVE(rel[1])) continue;
+			size_t *rel_addr = (void *)(base + rel[0]);
+			size_t addend = stride==3 ? rel[2] : *rel_addr;
+			*rel_addr = (size_t)base + addend;
+		}
+	}
+
+	const char *strings = (void *)(base + dyn[DT_STRTAB]);
+	const Sym *syms = (void *)(base + dyn[DT_SYMTAB]);
+
+	/* Call dynamic linker stage-2, __dls2 */
+	for (i=0; ;i++) {
+		const char *s = strings + syms[i].st_name;
+		if (s[0]=='_' && s[1]=='_' && s[2]=='d'
+		 && s[3]=='l' && s[4]=='s' && s[5]=='2' && !s[6])
+			break;
+	}
+	((void (*)(unsigned char *))(base + syms[i].st_value))(base);
+
+	/* Call dynamic linker stage-3, __dls3 */
+	for (i=0; ;i++) {
+		const char *s = strings + syms[i].st_name;
+		if (s[0]=='_' && s[1]=='_' && s[2]=='d'
+		 && s[3]=='l' && s[4]=='s' && s[5]=='3' && !s[6])
+			break;
+	}
+	return ((void *(*)(size_t *, size_t *, size_t *))
+		(base + syms[i].st_value))(sp, auxv, aux);
+}
+
+void __dls2(unsigned char *base)
+{
+	Ehdr *ehdr = (void *)base;
+	ldso.base = base;
+	ldso.name = ldso.shortname = "libc.so";
+	ldso.global = 1;
+	ldso.phnum = ehdr->e_phnum;
+	ldso.phdr = (void *)(base + ehdr->e_phoff);
+	ldso.phentsize = ehdr->e_phentsize;
+	ldso.rel_early_relative = 1;
+	kernel_mapped_dso(&ldso);
+	decode_dyn(&ldso);
+
+	head = &ldso;
+	reloc_all(&ldso);
+
+	ldso.relocated = 0;
+	ldso.rel_update_got = 1;
+}
+
+void *__dls3(size_t *sp, size_t *auxv, size_t *aux)
+{
+	static struct dso app, vdso;
 	size_t i;
-	Phdr *phdr;
-	Ehdr *ehdr;
-	static struct dso builtin_dsos[3];
-	struct dso *const app = builtin_dsos+0;
-	struct dso *const lib = builtin_dsos+1;
-	struct dso *const vdso = builtin_dsos+2;
 	char *env_preload=0;
 	size_t vdso_base;
-	size_t *auxv;
+	int argc = *sp;
+	char **argv = (void *)(sp+1);
+	char **argv_orig = argv;
 	char **envp = argv+argc+1;
 	void *initial_tls;
 
-	/* Find aux vector just past environ[] */
-	for (i=argc+1; argv[i]; i++)
-		if (!memcmp(argv[i], "LD_LIBRARY_PATH=", 16))
-			env_path = argv[i]+16;
-		else if (!memcmp(argv[i], "LD_PRELOAD=", 11))
-			env_preload = argv[i]+11;
-	auxv = (void *)(argv+i+1);
-
-	decode_vec(auxv, aux, AUX_CNT);
+	for (i=0; envp[i]; i++) {
+		if (!memcmp(envp[i], "LD_LIBRARY_PATH=", 16))
+			env_path = envp[i]+16;
+		else if (!memcmp(envp[i], "LD_PRELOAD=", 11))
+			env_preload = envp[i]+11;
+	}
 
 	/* Only trust user/env if kernel says we're not suid/sgid */
 	if ((aux[0]&0x7800)!=0x7800 || aux[AT_UID]!=aux[AT_EUID]
@@ -1157,54 +1277,36 @@ void *__dynlink(int argc, char **argv)
 	libc.page_size = aux[AT_PAGESZ];
 	libc.auxv = auxv;
 
-	/* If the dynamic linker was invoked as a program itself, AT_BASE
-	 * will not be set. In that case, we assume the base address is
-	 * the start of the page containing the PHDRs; I don't know any
-	 * better approach... */
-	if (!aux[AT_BASE]) {
-		aux[AT_BASE] = aux[AT_PHDR] & -PAGE_SIZE;
-		aux[AT_PHDR] = aux[AT_PHENT] = aux[AT_PHNUM] = 0;
-	}
-
-	/* The dynamic linker load address is passed by the kernel
-	 * in the AUX vector, so this is easy. */
-	lib->base = (void *)aux[AT_BASE];
-	lib->name = lib->shortname = "libc.so";
-	lib->global = 1;
-	ehdr = (void *)lib->base;
-	lib->phnum = ehdr->e_phnum;
-	lib->phdr = (void *)(aux[AT_BASE]+ehdr->e_phoff);
-	lib->phentsize = ehdr->e_phentsize;
-	kernel_mapped_dso(lib);
-	decode_dyn(lib);
-
-	if (aux[AT_PHDR]) {
+	/* If the main program was already loaded by the kernel,
+	 * AT_PHDR will point to some location other than the dynamic
+	 * linker's program headers. */
+	if (aux[AT_PHDR] != (size_t)ldso.phdr) {
 		size_t interp_off = 0;
 		size_t tls_image = 0;
 		/* Find load address of the main program, via AT_PHDR vs PT_PHDR. */
-		app->phdr = phdr = (void *)aux[AT_PHDR];
-		app->phnum = aux[AT_PHNUM];
-		app->phentsize = aux[AT_PHENT];
+		Phdr *phdr = app.phdr = (void *)aux[AT_PHDR];
+		app.phnum = aux[AT_PHNUM];
+		app.phentsize = aux[AT_PHENT];
 		for (i=aux[AT_PHNUM]; i; i--, phdr=(void *)((char *)phdr + aux[AT_PHENT])) {
 			if (phdr->p_type == PT_PHDR)
-				app->base = (void *)(aux[AT_PHDR] - phdr->p_vaddr);
+				app.base = (void *)(aux[AT_PHDR] - phdr->p_vaddr);
 			else if (phdr->p_type == PT_INTERP)
 				interp_off = (size_t)phdr->p_vaddr;
 			else if (phdr->p_type == PT_TLS) {
 				tls_image = phdr->p_vaddr;
-				app->tls_len = phdr->p_filesz;
-				app->tls_size = phdr->p_memsz;
-				app->tls_align = phdr->p_align;
+				app.tls_len = phdr->p_filesz;
+				app.tls_size = phdr->p_memsz;
+				app.tls_align = phdr->p_align;
 			}
 		}
-		if (app->tls_size) app->tls_image = (char *)app->base + tls_image;
-		if (interp_off) lib->name = (char *)app->base + interp_off;
+		if (app.tls_size) app.tls_image = (char *)app.base + tls_image;
+		if (interp_off) ldso.name = (char *)app.base + interp_off;
 		if ((aux[0] & (1UL<<AT_EXECFN))
 		    && strncmp((char *)aux[AT_EXECFN], "/proc/", 6))
-			app->name = (char *)aux[AT_EXECFN];
+			app.name = (char *)aux[AT_EXECFN];
 		else
-			app->name = argv[0];
-		kernel_mapped_dso(app);
+			app.name = argv[0];
+		kernel_mapped_dso(&app);
 	} else {
 		int fd;
 		char *ldname = argv[0];
@@ -1231,6 +1333,7 @@ void *__dynlink(int argc, char **argv)
 			}
 			argv[-1] = (void *)-1;
 		}
+		argv[-1] = (void *)(argc - (argv-argv_orig));
 		if (!argv[0]) {
 			dprintf(2, "musl libc\n"
 				"Version %s\n"
@@ -1246,96 +1349,88 @@ void *__dynlink(int argc, char **argv)
 			_exit(1);
 		}
 		runtime = 1;
-		ehdr = (void *)map_library(fd, app);
+		Ehdr *ehdr = (void *)map_library(fd, &app);
 		if (!ehdr) {
 			dprintf(2, "%s: %s: Not a valid dynamic program\n", ldname, argv[0]);
 			_exit(1);
 		}
 		runtime = 0;
 		close(fd);
-		lib->name = ldname;
-		app->name = argv[0];
-		aux[AT_ENTRY] = (size_t)app->base + ehdr->e_entry;
+		ldso.name = ldname;
+		app.name = argv[0];
+		aux[AT_ENTRY] = (size_t)app.base + ehdr->e_entry;
 		/* Find the name that would have been used for the dynamic
 		 * linker had ldd not taken its place. */
 		if (ldd_mode) {
-			for (i=0; i<app->phnum; i++) {
-				if (app->phdr[i].p_type == PT_INTERP)
-					lib->name = (void *)(app->base
-						+ app->phdr[i].p_vaddr);
+			for (i=0; i<app.phnum; i++) {
+				if (app.phdr[i].p_type == PT_INTERP)
+					ldso.name = (void *)(app.base
+						+ app.phdr[i].p_vaddr);
 			}
-			dprintf(1, "\t%s (%p)\n", lib->name, lib->base);
+			dprintf(1, "\t%s (%p)\n", ldso.name, ldso.base);
 		}
 	}
-	if (app->tls_size) {
-		app->tls_id = tls_cnt = 1;
+	if (app.tls_size) {
+		app.tls_id = tls_cnt = 1;
 #ifdef TLS_ABOVE_TP
-		app->tls_offset = 0;
-		tls_offset = app->tls_size
-			+ ( -((uintptr_t)app->tls_image + app->tls_size)
-			& (app->tls_align-1) );
+		app.tls_offset = 0;
+		tls_offset = app.tls_size
+			+ ( -((uintptr_t)app.tls_image + app.tls_size)
+			& (app.tls_align-1) );
 #else
-		tls_offset = app->tls_offset = app->tls_size
-			+ ( -((uintptr_t)app->tls_image + app->tls_size)
-			& (app->tls_align-1) );
+		tls_offset = app.tls_offset = app.tls_size
+			+ ( -((uintptr_t)app.tls_image + app.tls_size)
+			& (app.tls_align-1) );
 #endif
-		tls_align = MAXP2(tls_align, app->tls_align);
+		tls_align = MAXP2(tls_align, app.tls_align);
 	}
-	app->global = 1;
-	decode_dyn(app);
+	app.global = 1;
+	decode_dyn(&app);
 
 	/* Attach to vdso, if provided by the kernel */
 	if (search_vec(auxv, &vdso_base, AT_SYSINFO_EHDR)) {
-		ehdr = (void *)vdso_base;
-		vdso->phdr = phdr = (void *)(vdso_base + ehdr->e_phoff);
-		vdso->phnum = ehdr->e_phnum;
-		vdso->phentsize = ehdr->e_phentsize;
+		Ehdr *ehdr = (void *)vdso_base;
+		Phdr *phdr = vdso.phdr = (void *)(vdso_base + ehdr->e_phoff);
+		vdso.phnum = ehdr->e_phnum;
+		vdso.phentsize = ehdr->e_phentsize;
 		for (i=ehdr->e_phnum; i; i--, phdr=(void *)((char *)phdr + ehdr->e_phentsize)) {
 			if (phdr->p_type == PT_DYNAMIC)
-				vdso->dynv = (void *)(vdso_base + phdr->p_offset);
+				vdso.dynv = (void *)(vdso_base + phdr->p_offset);
 			if (phdr->p_type == PT_LOAD)
-				vdso->base = (void *)(vdso_base - phdr->p_vaddr + phdr->p_offset);
+				vdso.base = (void *)(vdso_base - phdr->p_vaddr + phdr->p_offset);
 		}
-		vdso->name = "";
-		vdso->shortname = "linux-gate.so.1";
-		vdso->global = 1;
-		decode_dyn(vdso);
-		vdso->prev = lib;
-		lib->next = vdso;
+		vdso.name = "";
+		vdso.shortname = "linux-gate.so.1";
+		vdso.global = 1;
+		vdso.relocated = 1;
+		decode_dyn(&vdso);
+		vdso.prev = &ldso;
+		ldso.next = &vdso;
 	}
 
-	/* Initial dso chain consists only of the app. We temporarily
-	 * append the dynamic linker/libc so we can relocate it, then
-	 * restore the initial chain in preparation for loading third
-	 * party libraries (preload/needed). */
-	head = tail = app;
-	ldso = lib;
-	app->next = lib;
-	reloc_all(lib);
-	app->next = 0;
-
-	/* PAST THIS POINT, ALL LIBC INTERFACES ARE FULLY USABLE. */
+	/* Initial dso chain consists only of the app. */
+	head = tail = &app;
 
 	/* Donate unused parts of app and library mapping to malloc */
-	reclaim_gaps(app);
-	reclaim_gaps(lib);
+	reclaim_gaps(&app);
+	reclaim_gaps(&ldso);
 
 	/* Load preload/needed libraries, add their symbols to the global
-	 * namespace, and perform all remaining relocations. The main
-	 * program must be relocated LAST since it may contain copy
-	 * relocations which depend on libraries' relocations. */
+	 * namespace, and perform all remaining relocations. */
 	if (env_preload) load_preload(env_preload);
-	load_deps(app);
-	make_global(app);
+	load_deps(&app);
+	make_global(&app);
 
 #ifndef DYNAMIC_IS_RO
-	for (i=0; app->dynv[i]; i+=2)
-		if (app->dynv[i]==DT_DEBUG)
-			app->dynv[i+1] = (size_t)&debug;
+	for (i=0; app.dynv[i]; i+=2)
+		if (app.dynv[i]==DT_DEBUG)
+			app.dynv[i+1] = (size_t)&debug;
 #endif
 
-	reloc_all(app->next);
-	reloc_all(app);
+	/* The main program must be relocated LAST since it may contin
+	 * copy relocations which depend on libraries' relocations. */
+	reloc_all(app.next);
+	reloc_all(&app);
 
 	update_tls_size();
 	if (libc.tls_size > sizeof builtin_tls) {
@@ -1359,14 +1454,13 @@ void *__dynlink(int argc, char **argv)
 
 	/* Switch to runtime mode: any further failures in the dynamic
 	 * linker are a reportable failure rather than a fatal startup
-	 * error. If the dynamic loader (dlopen) will not be used, free
-	 * all memory used by the dynamic linker. */
+	 * error. */
 	runtime = 1;
 
 	debug.ver = 1;
 	debug.bp = _dl_debug_state;
 	debug.head = head;
-	debug.base = lib->base;
+	debug.base = ldso.base;
 	debug.state = 0;
 	_dl_debug_state();
 
@@ -1375,6 +1469,7 @@ void *__dynlink(int argc, char **argv)
 	errno = 0;
 	do_init_fini(tail);
 
+	sp[-1] = (size_t)(argv-1);
 	return (void *)aux[AT_ENTRY];
 }
 
diff --git a/src/ldso/i386/start.s b/src/ldso/i386/start.s
index c37a1fa..63a20ee 100644
--- a/src/ldso/i386/start.s
+++ b/src/ldso/i386/start.s
@@ -1,22 +1,17 @@
 .text
+.hidden _DYNAMIC
+.hidden __dlstart_c
 .global _dlstart
 _dlstart:
 	xor %ebp,%ebp
-	pop %edi
-	mov %esp,%esi
+	mov %esp,%ebx
+	push %ebx
 	and $-16,%esp
-	push %ebp
-	push %ebp
-	push %esi
-	push %edi
-	call __dynlink
-	mov %esi,%esp
-1:	dec %edi
-	pop %esi
-	cmp $-1,%esi
-	jz 1b
-	inc %edi
-	push %esi
-	push %edi
-	xor %edx,%edx
+	push %ebx
+	push %ebx
+	call 1f
+1:	addl $_DYNAMIC-1b, (%esp)
+	push %ebx
+	call __dlstart_c
+	mov -4(%ebx),%esp
 	jmp *%eax
diff --git a/src/ldso/mips/start.s b/src/ldso/mips/start.s
index 0cadbf8..4de1d34 100644
--- a/src/ldso/mips/start.s
+++ b/src/ldso/mips/start.s
@@ -1,5 +1,5 @@
 .hidden _DYNAMIC
-.hidden __reloc_self
+.hidden __dlstart_c
 .set noreorder
 .set nomacro
 .global _dlstart
@@ -11,36 +11,20 @@ _dlstart:
 	nop
 2:	.gpword 2b
 	.gpword _DYNAMIC
-	.gpword __reloc_self
+	.gpword __dlstart_c
 1:	lw $gp, 0($ra)
 	subu $gp, $ra, $gp
 
-	lw $4, 0($sp)
-	addiu $5, $sp, 4
-	lw $6, 4($ra)
-	addu $6, $6, $gp
-	addiu $7, $gp, -0x7ff0
-	subu $sp, $sp, 16
+	move $4, $sp
+	lw $5, 4($ra)
+	add $5, $5, $gp
 	lw $25, 8($ra)
 	add $25, $25, $gp
-	jalr $25
-	nop
 
-	lw $25, %call16(__dynlink)($gp)
-	lw $4, 16($sp)
-	addiu $5, $sp, 20
+	move $16, $sp
+	and $sp, $sp, -8
 	jalr $25
-	nop
+	 subu $sp, $sp, 32
 
-	add $sp, $sp, 16
-	li $6, -1
-	lw $4, ($sp)
-1:	lw $5, 4($sp)
-	bne $5, $6, 2f
-	nop
-	addu $sp, $sp, 4
-	addu $4, $4, -1
-	b 1b
-	nop
-2:	sw $4, ($sp)
 	jr $2
+	 lw $sp, -4($16)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Dynamic linker changes
  2015-04-11 20:21     ` Rich Felker
@ 2015-04-12  1:59       ` Rich Felker
  0 siblings, 0 replies; 5+ messages in thread
From: Rich Felker @ 2015-04-12  1:59 UTC (permalink / raw)
  To: musl

On Sat, Apr 11, 2015 at 04:21:52PM -0400, Rich Felker wrote:
> On Wed, Apr 08, 2015 at 07:19:11PM -0400, Rich Felker wrote:
> > 3. The original plan was to have one early-ldso-relocation step and
> > avoid all possible GOT/globals use and everything after that free to
> > use arbitrary global data and symbols, with a single barrier in
> > between to prevent reordering of GOT loads before they're relocated.
> > This seems impractical since it's hard, due to issue 1, do to symbolic
> > relocations without being able to make function calls.
> > 
> > Instead I'd like to treat the early-ldso-relocation process as two
> > steps. The first is generalizing and making arch-agnostic the work
> > mips, microblaze, and powerpc are doing now to get to a state where
> > all non-symbolic global accesses are safe. The second would be a
> > separate function call from the asm (or chained from the first if
> > there's an obvious way to do it) that performs symbolic relocations on
> > itself. It would end by (as proposed in the sketch before) doing a
> > symbol lookup and final call into the code that will setup the dso
> > chain, load dependencies, perform all remaining relocations, and pass
> > control to the program's entry point.
> 
> I've got the first working draft of the above design, and it's three
> stages:
> 
> 1. Perform relative relocations on ldso/libc itself referencing
>    nothing but its arguments and the data they point to.
> 
> 2. Setup a dso structure for ldso/libc and perform symbolic
>    relocations on it using nothing but static functions/data from
>    dynlink.c.
> 
> 3. Do nearly everything the old __dynlink did, but with the ldso dso
>    structure already setup and fully usable (not depending on
>    -Bsymbolic-functions and arch-specific __reloc_self to make it
>    almost-fully-usable like we did before).
> 
> Currently, stage 1 calls into stage 2 and 3 via very primitive
> symbol-lookup code. This has some trade-offs.
> 
> Pros: The dynamic linker entry point asm does not need to be aware of
> the details of the dynamic linking process. It just calls one function
> with minimal args (original SP and &_DYNAMIC) and uses the return
> value as a jump destination (along with a simple SP-fixup trick).
> 
> Cons: Stage 1 is coupled with the rest of the dynamic linking process.
> This is somewhat unfortunate since the stage 1 code, minus this last
> symbol lookup step but including the entry point asm prior to calling
> stage 1, is _exactly_ what would be needed for "static PIE" Rcrt1.o.
> It could be made to work 'unmodified' for static PIE by having the
> source for Rcrt1.o provide its own definitions of the stage 2 and 3
> functions, but since stage 1 looks them up by name at runtime,
> stripping dynamic symbol names (which should in principle work for
> static PIE) would break it.
> 
> I'm attaching a diff with the work so far for comments. It's
> unfinished (only i386 and mips are implemented so far; mips was chosen
> because it's the one arch that needs ugly arch-specific relocations
> and I had to check and make sure they work right in the new design)
> but seems to work.

OK, so some big ideas for resolving this:

Let's get rid of all the ldso/*/start.s files.

Instead, I want to reuse crt_arch.h, which requires making the
following changes to it:

1. Fix any existing cases where crt_arch.h uses addressing methods
   that would not work prior to relocations. This is probably only
   mips.

2. Add the ability for the calling file to rename the _start function
   via a macro defined before including crt_arch.h.

3. Add conditional loading of &_DYNAMIC in the second arg slot before
   calling __cstart.

At this point, we have minimal entry point asm that can be reused by
crt1.o, Scrt1.o, dlstart.lo, and the future rcrt1.o for static PIE.

For dlstart.c would look like this:

#ifndef START
#define START "_dl_start"
#endif
#define NEED_DYNAMIC
#include "crt_arch.h"

void __cstart(size_t *sp, size_t *dynv)
{
	/* body of existing stage 1 */
}

And the future rcrt1.c would look like this:

#define START "_start"
#include "dlstart.c"

/* stage 2 and 3 functions */

where stage 2 is empty, and stage 3 looks like the __cstart in crt1.c.

Rich


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-04-12  1:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-05 22:30 Dynamic linker changes Rich Felker
2015-04-05 22:55 ` Rich Felker
2015-04-08 23:19   ` Rich Felker
2015-04-11 20:21     ` Rich Felker
2015-04-12  1:59       ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).