On Wed, Apr 08, 2015 at 07:19:11PM -0400, Rich Felker wrote:
> 3. The original plan was to have one early-ldso-relocation step and
> avoid all possible GOT/globals use and everything after that free to
> use arbitrary global data and symbols, with a single barrier in
> between to prevent reordering of GOT loads before they're relocated.
> This seems impractical since it's hard, due to issue 1, do to symbolic
> relocations without being able to make function calls.
> 
> Instead I'd like to treat the early-ldso-relocation process as two
> steps. The first is generalizing and making arch-agnostic the work
> mips, microblaze, and powerpc are doing now to get to a state where
> all non-symbolic global accesses are safe. The second would be a
> separate function call from the asm (or chained from the first if
> there's an obvious way to do it) that performs symbolic relocations on
> itself. It would end by (as proposed in the sketch before) doing a
> symbol lookup and final call into the code that will setup the dso
> chain, load dependencies, perform all remaining relocations, and pass
> control to the program's entry point.

I've got the first working draft of the above design, and it's three
stages:

1. Perform relative relocations on ldso/libc itself referencing
   nothing but its arguments and the data they point to.

2. Setup a dso structure for ldso/libc and perform symbolic
   relocations on it using nothing but static functions/data from
   dynlink.c.

3. Do nearly everything the old __dynlink did, but with the ldso dso
   structure already setup and fully usable (not depending on
   -Bsymbolic-functions and arch-specific __reloc_self to make it
   almost-fully-usable like we did before).

Currently, stage 1 calls into stage 2 and 3 via very primitive
symbol-lookup code. This has some trade-offs.

Pros: The dynamic linker entry point asm does not need to be aware of
the details of the dynamic linking process. It just calls one function
with minimal args (original SP and &_DYNAMIC) and uses the return
value as a jump destination (along with a simple SP-fixup trick).

Cons: Stage 1 is coupled with the rest of the dynamic linking process.
This is somewhat unfortunate since the stage 1 code, minus this last
symbol lookup step but including the entry point asm prior to calling
stage 1, is _exactly_ what would be needed for "static PIE" Rcrt1.o.
It could be made to work 'unmodified' for static PIE by having the
source for Rcrt1.o provide its own definitions of the stage 2 and 3
functions, but since stage 1 looks them up by name at runtime,
stripping dynamic symbol names (which should in principle work for
static PIE) would break it.

I'm attaching a diff with the work so far for comments. It's
unfinished (only i386 and mips are implemented so far; mips was chosen
because it's the one arch that needs ugly arch-specific relocations
and I had to check and make sure they work right in the new design)
but seems to work.

Rich