On Wed, Apr 08, 2015 at 07:19:11PM -0400, Rich Felker wrote: > 3. The original plan was to have one early-ldso-relocation step and > avoid all possible GOT/globals use and everything after that free to > use arbitrary global data and symbols, with a single barrier in > between to prevent reordering of GOT loads before they're relocated. > This seems impractical since it's hard, due to issue 1, do to symbolic > relocations without being able to make function calls. > > Instead I'd like to treat the early-ldso-relocation process as two > steps. The first is generalizing and making arch-agnostic the work > mips, microblaze, and powerpc are doing now to get to a state where > all non-symbolic global accesses are safe. The second would be a > separate function call from the asm (or chained from the first if > there's an obvious way to do it) that performs symbolic relocations on > itself. It would end by (as proposed in the sketch before) doing a > symbol lookup and final call into the code that will setup the dso > chain, load dependencies, perform all remaining relocations, and pass > control to the program's entry point. I've got the first working draft of the above design, and it's three stages: 1. Perform relative relocations on ldso/libc itself referencing nothing but its arguments and the data they point to. 2. Setup a dso structure for ldso/libc and perform symbolic relocations on it using nothing but static functions/data from dynlink.c. 3. Do nearly everything the old __dynlink did, but with the ldso dso structure already setup and fully usable (not depending on -Bsymbolic-functions and arch-specific __reloc_self to make it almost-fully-usable like we did before). Currently, stage 1 calls into stage 2 and 3 via very primitive symbol-lookup code. This has some trade-offs. Pros: The dynamic linker entry point asm does not need to be aware of the details of the dynamic linking process. It just calls one function with minimal args (original SP and &_DYNAMIC) and uses the return value as a jump destination (along with a simple SP-fixup trick). Cons: Stage 1 is coupled with the rest of the dynamic linking process. This is somewhat unfortunate since the stage 1 code, minus this last symbol lookup step but including the entry point asm prior to calling stage 1, is _exactly_ what would be needed for "static PIE" Rcrt1.o. It could be made to work 'unmodified' for static PIE by having the source for Rcrt1.o provide its own definitions of the stage 2 and 3 functions, but since stage 1 looks them up by name at runtime, stripping dynamic symbol names (which should in principle work for static PIE) would break it. I'm attaching a diff with the work so far for comments. It's unfinished (only i386 and mips are implemented so far; mips was chosen because it's the one arch that needs ugly arch-specific relocations and I had to check and make sure they work right in the new design) but seems to work. Rich