mailing list of musl libc
 help / color / mirror / code / Atom feed
* ppc soft-float regression
@ 2015-05-17  8:03 Waldemar Brodkorb
  2015-05-17 10:02 ` Felix Janda
                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Waldemar Brodkorb @ 2015-05-17  8:03 UTC (permalink / raw)
  To: musl

Hi,

as mentioned the days on IRC.
1.1.9 produces a lot of segfaults on Qemu-PPC with Soft-Float.
1.1.8 was fine.

Execute Qemu as this:
qemu-system-ppc -nographic -M bamboo -device e1000,netdev=adk0 \
-netdev user,id=adk0 -kernel qemu-ppc-initramfs-kernel
-initrd qemu-ppc-musl-initramfs

wbx@chrom:~/musl-ppc $ qemu-system-ppc --version                                                                                                               
QEMU emulator version 2.3.0, Copyright (c) 2003-2008 Fabrice Bellard

The system stopps at boot. strg-d to continue the boot with
/etc/init.d/rcS.

Strace is available in /usr/sbin/strace.

Get the kernels and initramfs from here:
http://openadk.org/musl-ppc/

Any ideas?

best regards
 Waldemar


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ppc soft-float regression
  2015-05-17  8:03 ppc soft-float regression Waldemar Brodkorb
@ 2015-05-17 10:02 ` Felix Janda
  2015-05-17 16:37   ` Rich Felker
  2015-05-17 13:06 ` Felix Janda
  2015-05-17 16:35 ` Rich Felker
  2 siblings, 1 reply; 26+ messages in thread
From: Felix Janda @ 2015-05-17 10:02 UTC (permalink / raw)
  To: musl

Waldemar Brodkorb wrote:
> Hi,
> 
> as mentioned the days on IRC.
> 1.1.9 produces a lot of segfaults on Qemu-PPC with Soft-Float.
> 1.1.8 was fine.
> 
> Execute Qemu as this:
> qemu-system-ppc -nographic -M bamboo -device e1000,netdev=adk0 \
> -netdev user,id=adk0 -kernel qemu-ppc-initramfs-kernel
> -initrd qemu-ppc-musl-initramfs
> 
> wbx@chrom:~/musl-ppc $ qemu-system-ppc --version
> QEMU emulator version 2.3.0, Copyright (c) 2003-2008 Fabrice Bellard
> 
> The system stopps at boot. strg-d to continue the boot with
> /etc/init.d/rcS.
> 
> Strace is available in /usr/sbin/strace.
> 
> Get the kernels and initramfs from here:
> http://openadk.org/musl-ppc/
> 
> Any ideas?

Command line parsing in busybox seems to be really broken. For example

mount -t proc proc /proc

has no effect and

dmesg | head -n 1

segfaults. git bisect says

# first bad commit: [f3ddd173806fd5c60b3f034528ca24542aecc5b9] dynamic linker bootstrap overhaul

Felix


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ppc soft-float regression
  2015-05-17  8:03 ppc soft-float regression Waldemar Brodkorb
  2015-05-17 10:02 ` Felix Janda
@ 2015-05-17 13:06 ` Felix Janda
  2015-05-17 16:35 ` Rich Felker
  2 siblings, 0 replies; 26+ messages in thread
From: Felix Janda @ 2015-05-17 13:06 UTC (permalink / raw)
  To: musl

Waldemar Brodkorb wrote:
> Hi,
> 
> as mentioned the days on IRC.
> 1.1.9 produces a lot of segfaults on Qemu-PPC with Soft-Float.
> 1.1.8 was fine.
> 
> Execute Qemu as this:
> qemu-system-ppc -nographic -M bamboo -device e1000,netdev=adk0 \
> -netdev user,id=adk0 -kernel qemu-ppc-initramfs-kernel
> -initrd qemu-ppc-musl-initramfs
> 
> wbx@chrom:~/musl-ppc $ qemu-system-ppc --version                                                                                                               
> QEMU emulator version 2.3.0, Copyright (c) 2003-2008 Fabrice Bellard
> 
> The system stopps at boot. strg-d to continue the boot with
> /etc/init.d/rcS.
> 
> Strace is available in /usr/sbin/strace.
> 
> Get the kernels and initramfs from here:
> http://openadk.org/musl-ppc/
> 
> Any ideas?

I can reproduce with your binaries, but not with self-built ones.

Your binaries seem to have an executable stack:

   STACK off    0x00000000 vaddr 0x00000000 paddr 0x00000000 align 2**4
         filesz 0x00000000 memsz 0x00000000 flags rwx

Make sure that your binaries are compiled with -msecure-plt and
-Wl,--secure-plt. (The musl gcc patches should enable these options
automatically.)

Felix


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ppc soft-float regression
  2015-05-17  8:03 ppc soft-float regression Waldemar Brodkorb
  2015-05-17 10:02 ` Felix Janda
  2015-05-17 13:06 ` Felix Janda
@ 2015-05-17 16:35 ` Rich Felker
  2015-05-17 17:20   ` Szabolcs Nagy
  2 siblings, 1 reply; 26+ messages in thread
From: Rich Felker @ 2015-05-17 16:35 UTC (permalink / raw)
  To: musl

On Sun, May 17, 2015 at 10:03:21AM +0200, Waldemar Brodkorb wrote:
> Hi,
> 
> as mentioned the days on IRC.
> 1.1.9 produces a lot of segfaults on Qemu-PPC with Soft-Float.
> 1.1.8 was fine.

Soft-float has never been a supported configuration for PowerPC, as
you can see from the fact that there's no separate dynamic linker name
for it. I'm surprised it ever seemed to work -- surely setjmp/longjmp
would be broken since they save/restore FPU registers. I don't think
it would be hard to add though, and I'd welcome patches for it. What's
needed is basically:

- detection in configure, setting $SUBARCH there
- dynamic linker name variants in arch/powerpc/reloc.h
- separate subarch dir for soft-float in src/setjmp

If you add soft-float it would probably make sense to add
little-endian variant at the same time, if that's useful to anyone,
since once you do the subarch work there's hardly any more work to
make an endian variant too.

Rich


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ppc soft-float regression
  2015-05-17 10:02 ` Felix Janda
@ 2015-05-17 16:37   ` Rich Felker
  2015-05-17 17:50     ` Felix Janda
  0 siblings, 1 reply; 26+ messages in thread
From: Rich Felker @ 2015-05-17 16:37 UTC (permalink / raw)
  To: musl

On Sun, May 17, 2015 at 12:02:19PM +0200, Felix Janda wrote:
> Waldemar Brodkorb wrote:
> > Hi,
> > 
> > as mentioned the days on IRC.
> > 1.1.9 produces a lot of segfaults on Qemu-PPC with Soft-Float.
> > 1.1.8 was fine.
> > 
> > Execute Qemu as this:
> > qemu-system-ppc -nographic -M bamboo -device e1000,netdev=adk0 \
> > -netdev user,id=adk0 -kernel qemu-ppc-initramfs-kernel
> > -initrd qemu-ppc-musl-initramfs
> > 
> > wbx@chrom:~/musl-ppc $ qemu-system-ppc --version
> > QEMU emulator version 2.3.0, Copyright (c) 2003-2008 Fabrice Bellard
> > 
> > The system stopps at boot. strg-d to continue the boot with
> > /etc/init.d/rcS.
> > 
> > Strace is available in /usr/sbin/strace.
> > 
> > Get the kernels and initramfs from here:
> > http://openadk.org/musl-ppc/
> > 
> > Any ideas?
> 
> Command line parsing in busybox seems to be really broken. For example
> 
> mount -t proc proc /proc
> 
> has no effect and
> 
> dmesg | head -n 1
> 
> segfaults. git bisect says
> 
> # first bad commit: [f3ddd173806fd5c60b3f034528ca24542aecc5b9] dynamic linker bootstrap overhaul

Did you observe a regression here yourself, or just with Waldemar's
binaries which you said were broken? If you think this commit really
caused regressions on ppc I'll look into it but I don't want to waste
time doing that if it's not broken.

Rich


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ppc soft-float regression
  2015-05-17 16:35 ` Rich Felker
@ 2015-05-17 17:20   ` Szabolcs Nagy
  0 siblings, 0 replies; 26+ messages in thread
From: Szabolcs Nagy @ 2015-05-17 17:20 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1629 bytes --]

* Rich Felker <dalias@libc.org> [2015-05-17 12:35:21 -0400]:
> On Sun, May 17, 2015 at 10:03:21AM +0200, Waldemar Brodkorb wrote:
> > Hi,
> > 
> > as mentioned the days on IRC.
> > 1.1.9 produces a lot of segfaults on Qemu-PPC with Soft-Float.
> > 1.1.8 was fine.
> 
> Soft-float has never been a supported configuration for PowerPC, as
> you can see from the fact that there's no separate dynamic linker name
> for it. I'm surprised it ever seemed to work -- surely setjmp/longjmp
> would be broken since they save/restore FPU registers. I don't think
> it would be hard to add though, and I'd welcome patches for it. What's
> needed is basically:
> 
> - detection in configure, setting $SUBARCH there
> - dynamic linker name variants in arch/powerpc/reloc.h
> - separate subarch dir for soft-float in src/setjmp
> 
> If you add soft-float it would probably make sense to add
> little-endian variant at the same time, if that's useful to anyone,
> since once you do the subarch work there's hardly any more work to
> make an endian variant too.
> 

note that the current musl-cross powerpc patches don't do
the include reordering so the gcc includes will be the
first in the path.  this is fixed in

http://gcc.gnu.org/ml/gcc-patches/2015-04/msg01640.html

if there is a ppc soft-float abi musl may support then please
invent an abi name for it so i can update the gcc patch before
it gets committed.

(i think powerpc{el}{-sf} and powerpc64{el}{-sf} should work:
it is consistent with the naming for mips).

attached the musl dynamic linker name patch for "el" support,
similar would be needed for -sf (+setjmp.s etc updates).

[-- Attachment #2: powerpcel.diff --]
[-- Type: text/x-diff, Size: 833 bytes --]

diff --git a/arch/powerpc/reloc.h b/arch/powerpc/reloc.h
index aa5f8c9..4292cea 100644
--- a/arch/powerpc/reloc.h
+++ b/arch/powerpc/reloc.h
@@ -1,4 +1,12 @@
-#define LDSO_ARCH "powerpc"
+#include <endian.h>
+
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+#define ENDIAN_SUFFIX "el"
+#else
+#define ENDIAN_SUFFIX ""
+#endif
+
+#define LDSO_ARCH "powerpc" ENDIAN_SUFFIX
 
 #define TPOFF_K (-0x7000)
 
diff --git a/configure b/configure
index 143dc92..4e1a1fe 100755
--- a/configure
+++ b/configure
@@ -491,6 +491,9 @@ fi
 test "$ARCH" = "microblaze" && trycppif __MICROBLAZEEL__ "$t" \
 && SUBARCH=${SUBARCH}el
 
+test "$ARCH" = "powerpc" && trycppif __LITTLE_ENDIAN__ "$t" \
+&& SUBARCH=${SUBARCH}el
+
 if test "$ARCH" = "sh" ; then
 trycppif __BIG_ENDIAN__ "$t" && SUBARCH=${SUBARCH}eb
 if trycppif "__SH_FPU_ANY__ || __SH4__" "$t" ; then


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ppc soft-float regression
  2015-05-17 16:37   ` Rich Felker
@ 2015-05-17 17:50     ` Felix Janda
  2015-05-17 18:15       ` Felix Janda
  0 siblings, 1 reply; 26+ messages in thread
From: Felix Janda @ 2015-05-17 17:50 UTC (permalink / raw)
  To: musl

Rich Felker wrote:
> On Sun, May 17, 2015 at 12:02:19PM +0200, Felix Janda wrote:
[..]
> > segfaults. git bisect says
> > 
> > # first bad commit: [f3ddd173806fd5c60b3f034528ca24542aecc5b9] dynamic linker bootstrap overhaul
> 
> Did you observe a regression here yourself, or just with Waldemar's
> binaries which you said were broken? If you think this commit really
> caused regressions on ppc I'll look into it but I don't want to waste
> time doing that if it's not broken.

The commit is just the point where Waldemar's binary stopped working.
So not really a bug in musl.

Felix


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ppc soft-float regression
  2015-05-17 17:50     ` Felix Janda
@ 2015-05-17 18:15       ` Felix Janda
  2015-05-17 19:56         ` Felix Janda
  0 siblings, 1 reply; 26+ messages in thread
From: Felix Janda @ 2015-05-17 18:15 UTC (permalink / raw)
  To: musl

Felix Janda wrote:
> Rich Felker wrote:
> > On Sun, May 17, 2015 at 12:02:19PM +0200, Felix Janda wrote:
> [..]
> > > segfaults. git bisect says
> > > 
> > > # first bad commit: [f3ddd173806fd5c60b3f034528ca24542aecc5b9] dynamic linker bootstrap overhaul
> > 
> > Did you observe a regression here yourself, or just with Waldemar's
> > binaries which you said were broken? If you think this commit really
> > caused regressions on ppc I'll look into it but I don't want to waste
> > time doing that if it's not broken.
> 
> The commit is just the point where Waldemar's binary stopped working.
> So not really a bug in musl.

Actually, in my test ppc chroot bash (and more generally programs linked
against ncurses) starts segfaulting (with a null-pointer access) after
upgrading to musl-1.1.9.

Felix


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ppc soft-float regression
  2015-05-17 18:15       ` Felix Janda
@ 2015-05-17 19:56         ` Felix Janda
  2015-05-18 18:39           ` Felix Janda
  0 siblings, 1 reply; 26+ messages in thread
From: Felix Janda @ 2015-05-17 19:56 UTC (permalink / raw)
  To: musl

Felix Janda wrote:
> Felix Janda wrote:
> > Rich Felker wrote:
> > > On Sun, May 17, 2015 at 12:02:19PM +0200, Felix Janda wrote:
> > [..]
> > > > segfaults. git bisect says
> > > > 
> > > > # first bad commit: [f3ddd173806fd5c60b3f034528ca24542aecc5b9] dynamic linker bootstrap overhaul
> > > 
> > > Did you observe a regression here yourself, or just with Waldemar's
> > > binaries which you said were broken? If you think this commit really
> > > caused regressions on ppc I'll look into it but I don't want to waste
> > > time doing that if it's not broken.
> > 
> > The commit is just the point where Waldemar's binary stopped working.
> > So not really a bug in musl.
> 
> Actually, in my test ppc chroot bash (and more generally programs linked
> against ncurses) starts segfaulting (with a null-pointer access) after
> upgrading to musl-1.1.9.

The program

#define _GNU_SOURCE
#include <errno.h>

int main(void) {
	return program_invocation_short_name[0];
}

starts segfaulting with the above commit.

Felix


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ppc soft-float regression
  2015-05-17 19:56         ` Felix Janda
@ 2015-05-18 18:39           ` Felix Janda
  2015-05-18 20:10             ` Rich Felker
  0 siblings, 1 reply; 26+ messages in thread
From: Felix Janda @ 2015-05-18 18:39 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 521 bytes --]

Something seems to be wrong with the powerpc copy relocations.

Attached is a binary for testing (qemu-ppc works fine). It is compiled
from


#include <stdio.h>

extern char **environ;

int main(int argc, char **argv)
{
	printf("%p %p\n", environ, &argv[argc + 1]);
	return 0;
}


Output before dynamic linker overhaul:

0xf6f4f830 0xf6f4f830

and after:

0 0xf6f4f830


It seems to me that the dynamic linker indeed copies environ, which is
0 at that point, but that __init_libc uses the old location of environ.

Felix

[-- Attachment #2: a.out --]
[-- Type: application/octet-stream, Size: 6308 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ppc soft-float regression
  2015-05-18 18:39           ` Felix Janda
@ 2015-05-18 20:10             ` Rich Felker
  2015-05-18 20:14               ` Rich Felker
  0 siblings, 1 reply; 26+ messages in thread
From: Rich Felker @ 2015-05-18 20:10 UTC (permalink / raw)
  To: musl

On Mon, May 18, 2015 at 08:39:30PM +0200, Felix Janda wrote:
> Something seems to be wrong with the powerpc copy relocations.
> 
> Attached is a binary for testing (qemu-ppc works fine). It is compiled
> from
> 
> 
> #include <stdio.h>
> 
> extern char **environ;
> 
> int main(int argc, char **argv)
> {
> 	printf("%p %p\n", environ, &argv[argc + 1]);
> 	return 0;
> }
> 
> 
> Output before dynamic linker overhaul:
> 
> 0xf6f4f830 0xf6f4f830
> 
> and after:
> 
> 0 0xf6f4f830
> 
> 
> It seems to me that the dynamic linker indeed copies environ, which is
> 0 at that point, but that __init_libc uses the old location of environ.

OK I've looked at this and I understand what's happening. PowerPC does
not have a separate relocation type for GOT entries; instead it uses
the same relocation type used for address constants global data. These
do not get re-processed after the main program and libraries are
added, because unlike GOT slots, they have addends, and if the addend
is inline (using REL rather than RELA) then it's already been
clobbered by the early relocation phase and can't easily be recovered.

I see three possible solutions:

1. Treat R_PPC_ADDR32 as a GOT relocation instead of a regular
   symbolic relocation in data. This would suppress the addend (giving
   wrong address) if inline addends (REL) were used, but in practice
   powerpc aways uses RELA. I consider this a hack, and perhaps risky,
   since in principle someone could make powerpc binaries with REL.

2. Re-process not just GOT type relocs, but also any RELA
   (non-inline-addend) relocs again on the second pass. This would
   work as long as powerpc only uses RELA, and if REL is ever used,
   the worst that would happen is the current bug (losing environ,
   etc.) rather than silently wrong relocations in global data. This
   approach is not a hack, but I consider it something of an
   incomplete fix.

3. Re-process all symbolic relocations. For REL-type (inline addend),
   we have to recover the original addend, which can be done by
   calling find_sym again, but using ldso instead of the current
   library chain head as the context to search for the symbol in, then
   subtracting the resulting address to get back the original addend.

I like the third solution best, even though it incurs a small code
size cost and a performance cost for archs using REL, because it's
completely robust against any weird ways some archs might end up using
relocations. The expected number of such relocations is tiny anyway;
on my i386 builds it's 14.

If option 3 proves to be difficult or costly, however, we could
consider option 2 as a temporary measure to get powerpc working. It
wouldn't even need to be reverted, because option 3 includes/subsumes
the work that would be done for option 2.

Rich


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ppc soft-float regression
  2015-05-18 20:10             ` Rich Felker
@ 2015-05-18 20:14               ` Rich Felker
  2015-05-18 22:07                 ` Felix Janda
  0 siblings, 1 reply; 26+ messages in thread
From: Rich Felker @ 2015-05-18 20:14 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 2484 bytes --]

On Mon, May 18, 2015 at 04:10:43PM -0400, Rich Felker wrote:
> OK I've looked at this and I understand what's happening. PowerPC does
> not have a separate relocation type for GOT entries; instead it uses
> the same relocation type used for address constants global data. These
> do not get re-processed after the main program and libraries are
> added, because unlike GOT slots, they have addends, and if the addend
> is inline (using REL rather than RELA) then it's already been
> clobbered by the early relocation phase and can't easily be recovered.
> 
> I see three possible solutions:
> 
> 1. Treat R_PPC_ADDR32 as a GOT relocation instead of a regular
>    symbolic relocation in data. This would suppress the addend (giving
>    wrong address) if inline addends (REL) were used, but in practice
>    powerpc aways uses RELA. I consider this a hack, and perhaps risky,
>    since in principle someone could make powerpc binaries with REL.
> 
> 2. Re-process not just GOT type relocs, but also any RELA
>    (non-inline-addend) relocs again on the second pass. This would
>    work as long as powerpc only uses RELA, and if REL is ever used,
>    the worst that would happen is the current bug (losing environ,
>    etc.) rather than silently wrong relocations in global data. This
>    approach is not a hack, but I consider it something of an
>    incomplete fix.
> 
> 3. Re-process all symbolic relocations. For REL-type (inline addend),
>    we have to recover the original addend, which can be done by
>    calling find_sym again, but using ldso instead of the current
>    library chain head as the context to search for the symbol in, then
>    subtracting the resulting address to get back the original addend.
> 
> I like the third solution best, even though it incurs a small code
> size cost and a performance cost for archs using REL, because it's
> completely robust against any weird ways some archs might end up using
> relocations. The expected number of such relocations is tiny anyway;
> on my i386 builds it's 14.
> 
> If option 3 proves to be difficult or costly, however, we could
> consider option 2 as a temporary measure to get powerpc working. It
> wouldn't even need to be reverted, because option 3 includes/subsumes
> the work that would be done for option 2.

Attached is a patch to implement option 2. I'll probably commit it
soon anyway but here is it in case you want to test sooner. I verified
it fixes the test program on powerpc for me.

Rich

[-- Attachment #2: reprocess_rela.diff --]
[-- Type: text/plain, Size: 473 bytes --]

diff --git a/src/ldso/dynlink.c b/src/ldso/dynlink.c
index 7c92ef6..93595a0 100644
--- a/src/ldso/dynlink.c
+++ b/src/ldso/dynlink.c
@@ -281,7 +281,7 @@ static void do_relocs(struct dso *dso, size_t *rel, size_t rel_size, size_t stri
 		}
 
 		int gotplt = (type == REL_GOT || type == REL_PLT);
-		if (dso->rel_update_got && !gotplt) continue;
+		if (dso->rel_update_got && !gotplt && stride==2) continue;
 
 		addend = stride>2 ? rel[2]
 			: gotplt || type==REL_COPY ? 0

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ppc soft-float regression
  2015-05-18 20:14               ` Rich Felker
@ 2015-05-18 22:07                 ` Felix Janda
  2015-05-22  6:23                   ` Rich Felker
  0 siblings, 1 reply; 26+ messages in thread
From: Felix Janda @ 2015-05-18 22:07 UTC (permalink / raw)
  To: musl

Rich Felker wrote:
> On Mon, May 18, 2015 at 04:10:43PM -0400, Rich Felker wrote:
> > OK I've looked at this and I understand what's happening. PowerPC does
> > not have a separate relocation type for GOT entries; instead it uses
> > the same relocation type used for address constants global data. These
> > do not get re-processed after the main program and libraries are
> > added, because unlike GOT slots, they have addends, and if the addend
> > is inline (using REL rather than RELA) then it's already been
> > clobbered by the early relocation phase and can't easily be recovered.
> > 
> > I see three possible solutions:
> > 
> > 1. Treat R_PPC_ADDR32 as a GOT relocation instead of a regular
> >    symbolic relocation in data. This would suppress the addend (giving
> >    wrong address) if inline addends (REL) were used, but in practice
> >    powerpc aways uses RELA. I consider this a hack, and perhaps risky,
> >    since in principle someone could make powerpc binaries with REL.
> > 
> > 2. Re-process not just GOT type relocs, but also any RELA
> >    (non-inline-addend) relocs again on the second pass. This would
> >    work as long as powerpc only uses RELA, and if REL is ever used,
> >    the worst that would happen is the current bug (losing environ,
> >    etc.) rather than silently wrong relocations in global data. This
> >    approach is not a hack, but I consider it something of an
> >    incomplete fix.
> > 
> > 3. Re-process all symbolic relocations. For REL-type (inline addend),
> >    we have to recover the original addend, which can be done by
> >    calling find_sym again, but using ldso instead of the current
> >    library chain head as the context to search for the symbol in, then
> >    subtracting the resulting address to get back the original addend.
> > 
> > I like the third solution best, even though it incurs a small code
> > size cost and a performance cost for archs using REL, because it's
> > completely robust against any weird ways some archs might end up using
> > relocations. The expected number of such relocations is tiny anyway;
> > on my i386 builds it's 14.
> > 
> > If option 3 proves to be difficult or costly, however, we could
> > consider option 2 as a temporary measure to get powerpc working. It
> > wouldn't even need to be reverted, because option 3 includes/subsumes
> > the work that would be done for option 2.
> 
> Attached is a patch to implement option 2. I'll probably commit it
> soon anyway but here is it in case you want to test sooner. I verified
> it fixes the test program on powerpc for me.

Thanks for the quick fix! The new commit fixes also the other segfaults
I've seen.

Felix


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ppc soft-float regression
  2015-05-18 22:07                 ` Felix Janda
@ 2015-05-22  6:23                   ` Rich Felker
  2015-05-24  3:08                     ` Rich Felker
  0 siblings, 1 reply; 26+ messages in thread
From: Rich Felker @ 2015-05-22  6:23 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 3034 bytes --]

On Tue, May 19, 2015 at 12:07:31AM +0200, Felix Janda wrote:
> Rich Felker wrote:
> > On Mon, May 18, 2015 at 04:10:43PM -0400, Rich Felker wrote:
> > > OK I've looked at this and I understand what's happening. PowerPC does
> > > not have a separate relocation type for GOT entries; instead it uses
> > > the same relocation type used for address constants global data. These
> > > do not get re-processed after the main program and libraries are
> > > added, because unlike GOT slots, they have addends, and if the addend
> > > is inline (using REL rather than RELA) then it's already been
> > > clobbered by the early relocation phase and can't easily be recovered.
> > > 
> > > I see three possible solutions:
> > > 
> > > 1. Treat R_PPC_ADDR32 as a GOT relocation instead of a regular
> > >    symbolic relocation in data. This would suppress the addend (giving
> > >    wrong address) if inline addends (REL) were used, but in practice
> > >    powerpc aways uses RELA. I consider this a hack, and perhaps risky,
> > >    since in principle someone could make powerpc binaries with REL.
> > > 
> > > 2. Re-process not just GOT type relocs, but also any RELA
> > >    (non-inline-addend) relocs again on the second pass. This would
> > >    work as long as powerpc only uses RELA, and if REL is ever used,
> > >    the worst that would happen is the current bug (losing environ,
> > >    etc.) rather than silently wrong relocations in global data. This
> > >    approach is not a hack, but I consider it something of an
> > >    incomplete fix.
> > > 
> > > 3. Re-process all symbolic relocations. For REL-type (inline addend),
> > >    we have to recover the original addend, which can be done by
> > >    calling find_sym again, but using ldso instead of the current
> > >    library chain head as the context to search for the symbol in, then
> > >    subtracting the resulting address to get back the original addend.
> > > 
> > > I like the third solution best, even though it incurs a small code
> > > size cost and a performance cost for archs using REL, because it's
> > > completely robust against any weird ways some archs might end up using
> > > relocations. The expected number of such relocations is tiny anyway;
> > > on my i386 builds it's 14.
> > > 
> > > If option 3 proves to be difficult or costly, however, we could
> > > consider option 2 as a temporary measure to get powerpc working. It
> > > wouldn't even need to be reverted, because option 3 includes/subsumes
> > > the work that would be done for option 2.
> > 
> > Attached is a patch to implement option 2. I'll probably commit it
> > soon anyway but here is it in case you want to test sooner. I verified
> > it fixes the test program on powerpc for me.
> 
> Thanks for the quick fix! The new commit fixes also the other segfaults
> I've seen.

Attached is a patch that finishes the job by completing option 3. I
haven't tested it much yet so I'll hold off on committing it for a
while but it seems to work fine (not break anything) on i386.

Rich

[-- Attachment #2: reprocess_all_syms.diff --]
[-- Type: text/plain, Size: 887 bytes --]

diff --git a/src/ldso/dynlink.c b/src/ldso/dynlink.c
index 93595a0..485bd4f 100644
--- a/src/ldso/dynlink.c
+++ b/src/ldso/dynlink.c
@@ -280,12 +280,17 @@ static void do_relocs(struct dso *dso, size_t *rel, size_t rel_size, size_t stri
 			def.dso = dso;
 		}
 
-		int gotplt = (type == REL_GOT || type == REL_PLT);
-		if (dso->rel_update_got && !gotplt && stride==2) continue;
-
-		addend = stride>2 ? rel[2]
-			: gotplt || type==REL_COPY ? 0
-			: *reloc_addr;
+		if (stride > 2) {
+			addend = rel[2];
+		} else if (type==REL_GOT || type==REL_PLT || type==REL_COPY) {
+			addend = 0;
+		} else {
+			addend = *reloc_addr;
+			if (dso->rel_update_got) {
+				struct symdef old = find_sym(&ldso, name, 0);
+				addend -= (size_t)ldso.base+old.sym->st_value;
+			}
+		}
 
 		sym_val = def.sym ? (size_t)def.dso->base+def.sym->st_value : 0;
 		tls_val = def.sym ? def.sym->st_value : 0;

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ppc soft-float regression
  2015-05-22  6:23                   ` Rich Felker
@ 2015-05-24  3:08                     ` Rich Felker
  2015-05-25  0:36                       ` Rich Felker
  0 siblings, 1 reply; 26+ messages in thread
From: Rich Felker @ 2015-05-24  3:08 UTC (permalink / raw)
  To: musl

On Fri, May 22, 2015 at 02:23:46AM -0400, Rich Felker wrote:
> On Tue, May 19, 2015 at 12:07:31AM +0200, Felix Janda wrote:
> > Rich Felker wrote:
> > > On Mon, May 18, 2015 at 04:10:43PM -0400, Rich Felker wrote:
> > > > OK I've looked at this and I understand what's happening. PowerPC does
> > > > not have a separate relocation type for GOT entries; instead it uses
> > > > the same relocation type used for address constants global data. These
> > > > do not get re-processed after the main program and libraries are
> > > > added, because unlike GOT slots, they have addends, and if the addend
> > > > is inline (using REL rather than RELA) then it's already been
> > > > clobbered by the early relocation phase and can't easily be recovered.
> > > > 
> > > > I see three possible solutions:
> > > > 
> > > > 1. Treat R_PPC_ADDR32 as a GOT relocation instead of a regular
> > > >    symbolic relocation in data. This would suppress the addend (giving
> > > >    wrong address) if inline addends (REL) were used, but in practice
> > > >    powerpc aways uses RELA. I consider this a hack, and perhaps risky,
> > > >    since in principle someone could make powerpc binaries with REL.
> > > > 
> > > > 2. Re-process not just GOT type relocs, but also any RELA
> > > >    (non-inline-addend) relocs again on the second pass. This would
> > > >    work as long as powerpc only uses RELA, and if REL is ever used,
> > > >    the worst that would happen is the current bug (losing environ,
> > > >    etc.) rather than silently wrong relocations in global data. This
> > > >    approach is not a hack, but I consider it something of an
> > > >    incomplete fix.
> > > > 
> > > > 3. Re-process all symbolic relocations. For REL-type (inline addend),
> > > >    we have to recover the original addend, which can be done by
> > > >    calling find_sym again, but using ldso instead of the current
> > > >    library chain head as the context to search for the symbol in, then
> > > >    subtracting the resulting address to get back the original addend.
> > > > 
> > > > I like the third solution best, even though it incurs a small code
> > > > size cost and a performance cost for archs using REL, because it's
> > > > completely robust against any weird ways some archs might end up using
> > > > relocations. The expected number of such relocations is tiny anyway;
> > > > on my i386 builds it's 14.
> > > > 
> > > > If option 3 proves to be difficult or costly, however, we could
> > > > consider option 2 as a temporary measure to get powerpc working. It
> > > > wouldn't even need to be reverted, because option 3 includes/subsumes
> > > > the work that would be done for option 2.
> > > 
> > > Attached is a patch to implement option 2. I'll probably commit it
> > > soon anyway but here is it in case you want to test sooner. I verified
> > > it fixes the test program on powerpc for me.
> > 
> > Thanks for the quick fix! The new commit fixes also the other segfaults
> > I've seen.
> 
> Attached is a patch that finishes the job by completing option 3. I
> haven't tested it much yet so I'll hold off on committing it for a
> while but it seems to work fine (not break anything) on i386.
> 
> diff --git a/src/ldso/dynlink.c b/src/ldso/dynlink.c
> index 93595a0..485bd4f 100644
> --- a/src/ldso/dynlink.c
> +++ b/src/ldso/dynlink.c
> @@ -280,12 +280,17 @@ static void do_relocs(struct dso *dso, size_t *rel, size_t rel_size, size_t stri
>  			def.dso = dso;
>  		}
>  
> -		int gotplt = (type == REL_GOT || type == REL_PLT);
> -		if (dso->rel_update_got && !gotplt && stride==2) continue;
> -
> -		addend = stride>2 ? rel[2]
> -			: gotplt || type==REL_COPY ? 0
> -			: *reloc_addr;
> +		if (stride > 2) {
> +			addend = rel[2];
> +		} else if (type==REL_GOT || type==REL_PLT || type==REL_COPY) {
> +			addend = 0;
> +		} else {
> +			addend = *reloc_addr;
> +			if (dso->rel_update_got) {
> +				struct symdef old = find_sym(&ldso, name, 0);
> +				addend -= (size_t)ldso.base+old.sym->st_value;
> +			}
> +		}

Actually I'm not happy with this patch as-is. It's only valid for
REL_SYMBOLIC (or REL_SYM_OR_REL with a symbol) type relocations,
because it's assuming that the value at reloc_addr is sym_val+addend.
We could restrict reprocessing to just those types, but there are a
number of other reloc types that could theoretically arise and that we
should be treating correctly. REL_OFFSET/REL_OFFSET32 probably should
not appear in libc.so (or anything without TEXTRELs), but if we need
to support them, we would also need to adjust by (size_t)reloc_addr.
What's more important, though, are TLS-type relocations which in
principle could appear if libgcc.a is emulating floating point
environment for softfloat via TLS. REL_DTPOFF and REL_TLSDESC are
probably the only ones that would be valid here (only GD model is
valid in shared libraries) and REL_DTPOFF is trivial to reverse and
extract an addend, but REL_TLSDESC is relatively complex to handle.

Sure we could just do REL_SYMBOLIC for now, but if we can't yet solve
the problem in a future-proof way, I'm not sure there's much value in
committing the patch at this point, since there's no present issue
it's fixing.

Rich


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ppc soft-float regression
  2015-05-24  3:08                     ` Rich Felker
@ 2015-05-25  0:36                       ` Rich Felker
  2015-05-25  6:31                         ` Jens Gustedt
  0 siblings, 1 reply; 26+ messages in thread
From: Rich Felker @ 2015-05-25  0:36 UTC (permalink / raw)
  To: musl

On Sat, May 23, 2015 at 11:08:09PM -0400, Rich Felker wrote:
> > Attached is a patch that finishes the job by completing option 3. I
> > haven't tested it much yet so I'll hold off on committing it for a
> > while but it seems to work fine (not break anything) on i386.
> > 
> > diff --git a/src/ldso/dynlink.c b/src/ldso/dynlink.c
> > index 93595a0..485bd4f 100644
> > --- a/src/ldso/dynlink.c
> > +++ b/src/ldso/dynlink.c
> > @@ -280,12 +280,17 @@ static void do_relocs(struct dso *dso, size_t *rel, size_t rel_size, size_t stri
> >  			def.dso = dso;
> >  		}
> >  
> > -		int gotplt = (type == REL_GOT || type == REL_PLT);
> > -		if (dso->rel_update_got && !gotplt && stride==2) continue;
> > -
> > -		addend = stride>2 ? rel[2]
> > -			: gotplt || type==REL_COPY ? 0
> > -			: *reloc_addr;
> > +		if (stride > 2) {
> > +			addend = rel[2];
> > +		} else if (type==REL_GOT || type==REL_PLT || type==REL_COPY) {
> > +			addend = 0;
> > +		} else {
> > +			addend = *reloc_addr;
> > +			if (dso->rel_update_got) {
> > +				struct symdef old = find_sym(&ldso, name, 0);
> > +				addend -= (size_t)ldso.base+old.sym->st_value;
> > +			}
> > +		}
> 
> Actually I'm not happy with this patch as-is. It's only valid for
> REL_SYMBOLIC (or REL_SYM_OR_REL with a symbol) type relocations,
> because it's assuming that the value at reloc_addr is sym_val+addend.
> We could restrict reprocessing to just those types, but there are a
> number of other reloc types that could theoretically arise and that we
> should be treating correctly. REL_OFFSET/REL_OFFSET32 probably should
> not appear in libc.so (or anything without TEXTRELs), but if we need
> to support them, we would also need to adjust by (size_t)reloc_addr.
> What's more important, though, are TLS-type relocations which in
> principle could appear if libgcc.a is emulating floating point
> environment for softfloat via TLS. REL_DTPOFF and REL_TLSDESC are
> probably the only ones that would be valid here (only GD model is
> valid in shared libraries) and REL_DTPOFF is trivial to reverse and
> extract an addend, but REL_TLSDESC is relatively complex to handle.
> 
> Sure we could just do REL_SYMBOLIC for now, but if we can't yet solve
> the problem in a future-proof way, I'm not sure there's much value in
> committing the patch at this point, since there's no present issue
> it's fixing.

I think I have a potential solution. What I've wanted to do is backup
the original addends before stage 2 relocation, either by constructing
a RELA table to replace the REL or just backing up the addends
out-of-line and having special logic in do_relocs to pull them instead
of the inline addends (the latter takes 1/3 the space, which is
appealing). Unfortunately, this requires allocation of storage, which
pessimizes archs where this whole issue doesn't matter -- extra
syscalls (mmap) and/or large enough static storage to be safely above
the possible size of the addends.

There's a simple alternative I just came up with though: have
dlstart.c compute the number of REL entries that need their addends
saved and allocate a VLA on its stack for stages 2 and 3 to use. While
the number of addends could be significant, it's many orders of
magnitude smaller than the smallest practical stack sizes we could
actually run with, so it's perfectly safe to put it on the stack.

Here're the basic changes I'd like to make to dlstart.c to implement
this:

1. Remove processing of DT_JMPREL REL/RELA table. All entries in this
   table are necessarily JMP_SLOT type, and thus symbolic, so there's
   nothing stage 1 can do with them anyway. Also, being JMP_SLOT type,
   they have implicit addends of zero if they're REL-type, so there's
   no need to save addends.

2. Remove the loop in dlstart.c that works like a fake function call 3
   times to process DT_JMPREL, DT_REL, and DT_RELA. Instead we just
   need 2 iterations, and now the stride is constant in each, so they
   should simplify down a lot more inline.

3. During the loop that processes DT_REL, count the number of
   non-relative relocations (ones we skip at this stage), then make a
   VLA this size and pass its address to __dls2 as a second argument.

4. Have the do_relocs in stage 2 save addends in this provided array
   before overwriting them, and save its address for use by stage 3.

5. Have the do_relocs in stage 3 (for ldso/libc only) pull addends
   from this array instead of of from inline.

Steps 1 and 2 are purely code removal/simplification and should be
done regardless of whether we move forward on the above program, I
think. Steps 3-5 add some complexity but hardly any code, just a few
lines here and there.

Comments?

Rich


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ppc soft-float regression
  2015-05-25  0:36                       ` Rich Felker
@ 2015-05-25  6:31                         ` Jens Gustedt
  2015-05-25  6:57                           ` Rich Felker
  0 siblings, 1 reply; 26+ messages in thread
From: Jens Gustedt @ 2015-05-25  6:31 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 2614 bytes --]

Am Sonntag, den 24.05.2015, 20:36 -0400 schrieb Rich Felker:
> There's a simple alternative I just came up with though: have
> dlstart.c compute the number of REL entries that need their addends
> saved and allocate a VLA on its stack for stages 2 and 3 to use. While
> the number of addends could be significant, it's many orders of
> magnitude smaller than the smallest practical stack sizes we could
> actually run with, so it's perfectly safe to put it on the stack.
> 
> Here're the basic changes I'd like to make to dlstart.c to implement
> this:
> 
> 1. Remove processing of DT_JMPREL REL/RELA table. All entries in this
>    table are necessarily JMP_SLOT type, and thus symbolic, so there's
>    nothing stage 1 can do with them anyway. Also, being JMP_SLOT type,
>    they have implicit addends of zero if they're REL-type, so there's
>    no need to save addends.
> 
> 2. Remove the loop in dlstart.c that works like a fake function call 3
>    times to process DT_JMPREL, DT_REL, and DT_RELA. Instead we just
>    need 2 iterations, and now the stride is constant in each, so they
>    should simplify down a lot more inline.
> 
> 3. During the loop that processes DT_REL, count the number of
>    non-relative relocations (ones we skip at this stage), then make a
>    VLA this size and pass its address to __dls2 as a second argument.
> 
> 4. Have the do_relocs in stage 2 save addends in this provided array
>    before overwriting them, and save its address for use by stage 3.
> 
> 5. Have the do_relocs in stage 3 (for ldso/libc only) pull addends
>    from this array instead of of from inline.
> 
> Steps 1 and 2 are purely code removal/simplification and should be
> done regardless of whether we move forward on the above program, I
> think. Steps 3-5 add some complexity but hardly any code, just a few
> lines here and there.
> 
> Comments?

I like it.

The thing that is a bit critical here, is the VLA. Not because it is a
VLA as such, but because it is a dynamic allocation on the stack. We
already have a similar strategy in pthread_create for TLS. The
difference is that there we have

 - a sanity check
 - an alternative strategy if the sanity check fails

Would there be a possibility to have both here, too?

Thanks
Jens

-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::




[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ppc soft-float regression
  2015-05-25  6:31                         ` Jens Gustedt
@ 2015-05-25  6:57                           ` Rich Felker
  2015-05-25  7:44                             ` Jens Gustedt
  0 siblings, 1 reply; 26+ messages in thread
From: Rich Felker @ 2015-05-25  6:57 UTC (permalink / raw)
  To: musl

On Mon, May 25, 2015 at 08:31:29AM +0200, Jens Gustedt wrote:
> Am Sonntag, den 24.05.2015, 20:36 -0400 schrieb Rich Felker:
> > There's a simple alternative I just came up with though: have
> > dlstart.c compute the number of REL entries that need their addends
> > saved and allocate a VLA on its stack for stages 2 and 3 to use. While
> > the number of addends could be significant, it's many orders of
> > magnitude smaller than the smallest practical stack sizes we could
> > actually run with, so it's perfectly safe to put it on the stack.
> > 
> > Here're the basic changes I'd like to make to dlstart.c to implement
> > this:
> > 
> > 1. Remove processing of DT_JMPREL REL/RELA table. All entries in this
> >    table are necessarily JMP_SLOT type, and thus symbolic, so there's
> >    nothing stage 1 can do with them anyway. Also, being JMP_SLOT type,
> >    they have implicit addends of zero if they're REL-type, so there's
> >    no need to save addends.
> > 
> > 2. Remove the loop in dlstart.c that works like a fake function call 3
> >    times to process DT_JMPREL, DT_REL, and DT_RELA. Instead we just
> >    need 2 iterations, and now the stride is constant in each, so they
> >    should simplify down a lot more inline.
> > 
> > 3. During the loop that processes DT_REL, count the number of
> >    non-relative relocations (ones we skip at this stage), then make a
> >    VLA this size and pass its address to __dls2 as a second argument.
> > 
> > 4. Have the do_relocs in stage 2 save addends in this provided array
> >    before overwriting them, and save its address for use by stage 3.
> > 
> > 5. Have the do_relocs in stage 3 (for ldso/libc only) pull addends
> >    from this array instead of of from inline.
> > 
> > Steps 1 and 2 are purely code removal/simplification and should be
> > done regardless of whether we move forward on the above program, I
> > think. Steps 3-5 add some complexity but hardly any code, just a few
> > lines here and there.
> > 
> > Comments?
> 
> I like it.
> 
> The thing that is a bit critical here, is the VLA. Not because it is a
> VLA as such, but because it is a dynamic allocation on the stack. We
> already have a similar strategy in pthread_create for TLS. The
> difference is that there we have
> 
>  - a sanity check
>  - an alternative strategy if the sanity check fails
> 
> Would there be a possibility to have both here, too?

I'm not sure what you're referring to in pthread_create. It has no
VLA. Maybe you mean the choice of whether to put TLS in a
caller-provided thread stack, but I don't think that's an analogous
situaton. In that one, the provided stack size is known and TLS is
arbitrarily large (under the control of the app and loaded libs). Here
(dlstart), the stack size is not known and the size of the addend
table is not specifically known, but it's fixed at ld-time for libc.so
and the same for every run, and it's orders of magnitude smaller than
any usable stack (e.g. 14 entries on i386).

There are two potential failure modes here:

1. We run out of stack because RLIMIT_STACK is insanely small. In that
   case we'll quickly crash somewhere else. If there's a risk of
   getting off the stack into other memory, that risk would already
   exist in other places.

2. We run out of stack because the REL table is HUGE. This is a static
   condition for the libc.so ELF file and would not change from run to
   run. If something went wrong in the build process to cause this, it
   needs to be fixed.

So I'm skeptical of the need for a fallback.

If we do want/need a fallback, it would need to be of the following
form:

1. Count addends that need to be saved.

2. If cnt<LIMIT (e.g. 16k), produce VLA of the right size; otherwise
   produce VLA of size 1 and pass a null pointer instead of a pointer
   to the VLA.

3. In stage 2, if the addend-buffer pointer is null, allocate storage
   for it with __syscall(SYS_mmap, ...) (we can't call mmap yet).
   Abort on failure.

4. In stage 3, if the addend-buffer was allocated donate it to malloc
   after we're done using it.

I would probably prefer just having it crash/abort in step 3, since
this condition obviously indicates a broken build. It's better to
catch such an issue and fix it than to have a non-robust libc.so that
breaks fail-safety for no-third-party-libs binaries by depending on
allocation.

Rich


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ppc soft-float regression
  2015-05-25  6:57                           ` Rich Felker
@ 2015-05-25  7:44                             ` Jens Gustedt
  2015-05-25 13:26                               ` Szabolcs Nagy
  2015-05-25 21:45                               ` Rich Felker
  0 siblings, 2 replies; 26+ messages in thread
From: Jens Gustedt @ 2015-05-25  7:44 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 5250 bytes --]

Am Montag, den 25.05.2015, 02:57 -0400 schrieb Rich Felker:
> On Mon, May 25, 2015 at 08:31:29AM +0200, Jens Gustedt wrote:
> > Am Sonntag, den 24.05.2015, 20:36 -0400 schrieb Rich Felker:
> > > There's a simple alternative I just came up with though: have
> > > dlstart.c compute the number of REL entries that need their addends
> > > saved and allocate a VLA on its stack for stages 2 and 3 to use. While
> > > the number of addends could be significant, it's many orders of
> > > magnitude smaller than the smallest practical stack sizes we could
> > > actually run with, so it's perfectly safe to put it on the stack.
> > > 
> > > Here're the basic changes I'd like to make to dlstart.c to implement
> > > this:
> > > 
> > > 1. Remove processing of DT_JMPREL REL/RELA table. All entries in this
> > >    table are necessarily JMP_SLOT type, and thus symbolic, so there's
> > >    nothing stage 1 can do with them anyway. Also, being JMP_SLOT type,
> > >    they have implicit addends of zero if they're REL-type, so there's
> > >    no need to save addends.
> > > 
> > > 2. Remove the loop in dlstart.c that works like a fake function call 3
> > >    times to process DT_JMPREL, DT_REL, and DT_RELA. Instead we just
> > >    need 2 iterations, and now the stride is constant in each, so they
> > >    should simplify down a lot more inline.
> > > 
> > > 3. During the loop that processes DT_REL, count the number of
> > >    non-relative relocations (ones we skip at this stage), then make a
> > >    VLA this size and pass its address to __dls2 as a second argument.
> > > 
> > > 4. Have the do_relocs in stage 2 save addends in this provided array
> > >    before overwriting them, and save its address for use by stage 3.
> > > 
> > > 5. Have the do_relocs in stage 3 (for ldso/libc only) pull addends
> > >    from this array instead of of from inline.
> > > 
> > > Steps 1 and 2 are purely code removal/simplification and should be
> > > done regardless of whether we move forward on the above program, I
> > > think. Steps 3-5 add some complexity but hardly any code, just a few
> > > lines here and there.
> > > 
> > > Comments?
> > 
> > I like it.
> > 
> > The thing that is a bit critical here, is the VLA. Not because it is a
> > VLA as such, but because it is a dynamic allocation on the stack. We
> > already have a similar strategy in pthread_create for TLS. The
> > difference is that there we have
> > 
> >  - a sanity check
> >  - an alternative strategy if the sanity check fails
> > 
> > Would there be a possibility to have both here, too?
> 
> I'm not sure what you're referring to in pthread_create. It has no
> VLA. Maybe you mean the choice of whether to put TLS in a
> caller-provided thread stack,

that's what I meant

> but I don't think that's an analogous
> situaton. In that one, the provided stack size is known and TLS is
> arbitrarily large (under the control of the app and loaded libs). Here
> (dlstart), the stack size is not known and the size of the addend
> table is not specifically known, but it's fixed at ld-time for libc.so
> and the same for every run, and it's orders of magnitude smaller than
> any usable stack (e.g. 14 entries on i386).

> There are two potential failure modes here:
> 
> 1. We run out of stack because RLIMIT_STACK is insanely small. In that
>    case we'll quickly crash somewhere else. If there's a risk of
>    getting off the stack into other memory, that risk would already
>    exist in other places.
> 
> 2. We run out of stack because the REL table is HUGE. This is a static
>    condition for the libc.so ELF file and would not change from run to
>    run. If something went wrong in the build process to cause this, it
>    needs to be fixed.
> 
> So I'm skeptical of the need for a fallback.
> 
> If we do want/need a fallback, it would need to be of the following
> form:
> 
> 1. Count addends that need to be saved.
> 
> 2. If cnt<LIMIT (e.g. 16k), produce VLA of the right size; otherwise
>    produce VLA of size 1 and pass a null pointer instead of a pointer
>    to the VLA.

Ok, so we have a simple sanity check, great.

> 3. In stage 2, if the addend-buffer pointer is null, allocate storage
>    for it with __syscall(SYS_mmap, ...) (we can't call mmap yet).
>    Abort on failure.
> 
> 4. In stage 3, if the addend-buffer was allocated donate it to malloc
>    after we're done using it.
> 
> I would probably prefer just having it crash/abort in step 3,

me too

> since
> this condition obviously indicates a broken build. It's better to
> catch such an issue and fix it than to have a non-robust libc.so that
> breaks fail-safety for no-third-party-libs binaries by depending on
> allocation.

Observe that I said "alternative strategy" not "fallback".

"Crash early" to be that strategy sounds completely ok to me.

Thanks

Jens

-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::




[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ppc soft-float regression
  2015-05-25  7:44                             ` Jens Gustedt
@ 2015-05-25 13:26                               ` Szabolcs Nagy
  2015-05-25 13:40                                 ` Alexander Monakov
  2015-05-25 21:45                               ` Rich Felker
  1 sibling, 1 reply; 26+ messages in thread
From: Szabolcs Nagy @ 2015-05-25 13:26 UTC (permalink / raw)
  To: musl

* Jens Gustedt <jens.gustedt@inria.fr> [2015-05-25 09:44:44 +0200]:
> Am Montag, den 25.05.2015, 02:57 -0400 schrieb Rich Felker:
> > 2. If cnt<LIMIT (e.g. 16k), produce VLA of the right size; otherwise
> >    produce VLA of size 1 and pass a null pointer instead of a pointer
> >    to the VLA.
> 
> Ok, so we have a simple sanity check, great.
> 

btw the stack usage of libc should be documented

currently a libc call always uses less than 10K stack(*)
but the guaranteed limit is not documented (16K limit is
fine i guess).

(*) assuming reasonable compiler, and execl is an
exception where stack usage depends on the number of
arguments.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ppc soft-float regression
  2015-05-25 13:26                               ` Szabolcs Nagy
@ 2015-05-25 13:40                                 ` Alexander Monakov
  2015-05-25 14:35                                   ` Szabolcs Nagy
  0 siblings, 1 reply; 26+ messages in thread
From: Alexander Monakov @ 2015-05-25 13:40 UTC (permalink / raw)
  To: musl

On Mon, 25 May 2015, Szabolcs Nagy wrote:
> * Jens Gustedt <jens.gustedt@inria.fr> [2015-05-25 09:44:44 +0200]:
> > Am Montag, den 25.05.2015, 02:57 -0400 schrieb Rich Felker:
> > > 2. If cnt<LIMIT (e.g. 16k), produce VLA of the right size; otherwise
> > >    produce VLA of size 1 and pass a null pointer instead of a pointer
> > >    to the VLA.
> > 
> > Ok, so we have a simple sanity check, great.
> > 
> 
> btw the stack usage of libc should be documented
> 
> currently a libc call always uses less than 10K stack(*)
> but the guaranteed limit is not documented (16K limit is
> fine i guess).

Not true: fmt_fp consumes more that LDBL_MAX_EXP bytes, which is 16384 on x86
and aarch64.  As I recall, there's another function with >16K static stack
usage in the resolver, but I forget where exactly.

Alexander


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ppc soft-float regression
  2015-05-25 13:40                                 ` Alexander Monakov
@ 2015-05-25 14:35                                   ` Szabolcs Nagy
  2015-05-25 14:45                                     ` Alexander Monakov
  0 siblings, 1 reply; 26+ messages in thread
From: Szabolcs Nagy @ 2015-05-25 14:35 UTC (permalink / raw)
  To: musl

* Alexander Monakov <amonakov@ispras.ru> [2015-05-25 16:40:46 +0300]:
> > 
> > currently a libc call always uses less than 10K stack(*)
> > but the guaranteed limit is not documented (16K limit is
> > fine i guess).
> 
> Not true: fmt_fp consumes more that LDBL_MAX_EXP bytes, which is 16384 on x86
> and aarch64.  As I recall, there's another function with >16K static stack
> usage in the resolver, but I forget where exactly.

note the /9

	uint32_t big[(LDBL_MANT_DIG+28)/29 + 1          // mantissa expansion
		+ (LDBL_MAX_EXP+LDBL_MANT_DIG+28+8)/9]; // exponent expansion

you can compile musl with -fstack-usage and analyze
the output (i did that once on i386) and verify that
all stack usage is <10K (some functions have vla or
recursion where verification is harder)

(the large worst-case stack users are printf, scanf,
glob, execl, the dynamic loader does not use that much
stack: it keeps some file names and elf header in buffers
but it should use < 3K).


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ppc soft-float regression
  2015-05-25 14:35                                   ` Szabolcs Nagy
@ 2015-05-25 14:45                                     ` Alexander Monakov
  0 siblings, 0 replies; 26+ messages in thread
From: Alexander Monakov @ 2015-05-25 14:45 UTC (permalink / raw)
  To: musl



On Mon, 25 May 2015, Szabolcs Nagy wrote:

> * Alexander Monakov <amonakov@ispras.ru> [2015-05-25 16:40:46 +0300]:
> > > 
> > > currently a libc call always uses less than 10K stack(*)
> > > but the guaranteed limit is not documented (16K limit is
> > > fine i guess).
> > 
> > Not true: fmt_fp consumes more that LDBL_MAX_EXP bytes, which is 16384 on x86
> > and aarch64.  As I recall, there's another function with >16K static stack
> > usage in the resolver, but I forget where exactly.
> 
> note the /9
> 
> 	uint32_t big[(LDBL_MANT_DIG+28)/29 + 1          // mantissa expansion
> 		+ (LDBL_MAX_EXP+LDBL_MANT_DIG+28+8)/9]; // exponent expansion
> 
> you can compile musl with -fstack-usage and analyze
> the output (i did that once on i386) and verify that
> all stack usage is <10K (some functions have vla or
> recursion where verification is harder)
> 
> (the large worst-case stack users are printf, scanf,
> glob, execl, the dynamic loader does not use that much
> stack: it keeps some file names and elf header in buffers
> but it should use < 3K).

My mistake.  Most likely I was misremembering.   Looking at my old logs, I see
floatscan.c:decfloat and netlink.c:__netlink_enumerate, which use >8K (but
less than 16K) stack.

Alexander


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ppc soft-float regression
  2015-05-25  7:44                             ` Jens Gustedt
  2015-05-25 13:26                               ` Szabolcs Nagy
@ 2015-05-25 21:45                               ` Rich Felker
  2015-05-25 22:46                                 ` Rich Felker
  1 sibling, 1 reply; 26+ messages in thread
From: Rich Felker @ 2015-05-25 21:45 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1299 bytes --]

On Mon, May 25, 2015 at 09:44:44AM +0200, Jens Gustedt wrote:
> > If we do want/need a fallback, it would need to be of the following
> > form:
> > 
> > 1. Count addends that need to be saved.
> > 
> > 2. If cnt<LIMIT (e.g. 16k), produce VLA of the right size; otherwise
> >    produce VLA of size 1 and pass a null pointer instead of a pointer
> >    to the VLA.
> 
> Ok, so we have a simple sanity check, great.
> 
> > 3. In stage 2, if the addend-buffer pointer is null, allocate storage
> >    for it with __syscall(SYS_mmap, ...) (we can't call mmap yet).
> >    Abort on failure.
> > 
> > 4. In stage 3, if the addend-buffer was allocated donate it to malloc
> >    after we're done using it.
> > 
> > I would probably prefer just having it crash/abort in step 3,
> 
> me too
> 
> > since
> > this condition obviously indicates a broken build. It's better to
> > catch such an issue and fix it than to have a non-robust libc.so that
> > breaks fail-safety for no-third-party-libs binaries by depending on
> > allocation.
> 
> Observe that I said "alternative strategy" not "fallback".
> 
> "Crash early" to be that strategy sounds completely ok to me.

OK, sounds good. Here's the patch I'm testing now, which seems to work
fine. I'll probably commit it soon if there are no objections.

Rich

[-- Attachment #2: save_and_reuse_addends_v1.diff --]
[-- Type: text/plain, Size: 5396 bytes --]

diff --git a/src/internal/dynlink.h b/src/internal/dynlink.h
index 53661d6..8621d2d 100644
--- a/src/internal/dynlink.h
+++ b/src/internal/dynlink.h
@@ -51,7 +51,7 @@ enum {
 #define AUX_CNT 32
 #define DYN_CNT 32
 
-typedef void (*stage2_func)(unsigned char *);
+typedef void (*stage2_func)(unsigned char *, size_t *);
 typedef _Noreturn void (*stage3_func)(size_t *);
 
 #endif
diff --git a/src/ldso/dlstart.c b/src/ldso/dlstart.c
index 5f84465..6af5037 100644
--- a/src/ldso/dlstart.c
+++ b/src/ldso/dlstart.c
@@ -56,12 +56,15 @@ void _dlstart_c(size_t *sp, size_t *dynv)
 		for (i=0; i<local_cnt; i++) got[i] += (size_t)base;
 	}
 
-	size_t *rel, rel_size;
+	size_t *rel, rel_size, symbolic_rel_cnt=0;
 
 	rel = (void *)(base+dyn[DT_REL]);
 	rel_size = dyn[DT_RELSZ];
 	for (; rel_size; rel+=2, rel_size-=2*sizeof(size_t)) {
-		if (!IS_RELATIVE(rel[1])) continue;
+		if (!IS_RELATIVE(rel[1])) {
+			symbolic_rel_cnt++;
+			continue;
+		}
 		size_t *rel_addr = (void *)(base + rel[0]);
 		*rel_addr += (size_t)base;
 	}
@@ -74,6 +77,16 @@ void _dlstart_c(size_t *sp, size_t *dynv)
 		*rel_addr = (size_t)base + rel[2];
 	}
 
+	/* Prepare storage for stages 2 to save clobbered REL
+	 * addends so they can be reused in stage 3. There should
+	 * be very few. If something goes wrong and there are a
+	 * huge number, pass a null pointer to trigger stage 2
+	 * to abort instead of risking stack overflow. */
+	int too_many_addends = symbolic_rel_cnt > 4096;
+	size_t naddends = too_many_addends ? 1 : symbolic_rel_cnt;
+	size_t addends[naddends];
+	size_t *paddends = too_many_addends ? 0 : addends;
+
 	const char *strings = (void *)(base + dyn[DT_STRTAB]);
 	const Sym *syms = (void *)(base + dyn[DT_SYMTAB]);
 
@@ -84,7 +97,7 @@ void _dlstart_c(size_t *sp, size_t *dynv)
 		 && s[3]=='l' && s[4]=='s' && s[5]=='2' && !s[6])
 			break;
 	}
-	((stage2_func)(base + syms[i].st_value))(base);
+	((stage2_func)(base + syms[i].st_value))(base, paddends);
 
 	/* Call dynamic linker stage-3, __dls3 */
 	for (i=0; ;i++) {
diff --git a/src/ldso/dynlink.c b/src/ldso/dynlink.c
index 93595a0..bfc1c96 100644
--- a/src/ldso/dynlink.c
+++ b/src/ldso/dynlink.c
@@ -74,7 +74,6 @@ struct dso {
 	volatile int new_dtv_idx, new_tls_idx;
 	struct td_index *td_index;
 	struct dso *fini_next;
-	int rel_early_relative, rel_update_got;
 	char *shortname;
 	char buf[];
 };
@@ -98,6 +97,7 @@ static struct builtin_tls {
 
 static struct dso ldso;
 static struct dso *head, *tail, *fini_head;
+static size_t *saved_addends;
 static char *env_path, *sys_path;
 static unsigned long long gencnt;
 static int runtime;
@@ -256,9 +256,19 @@ static void do_relocs(struct dso *dso, size_t *rel, size_t rel_size, size_t stri
 	size_t sym_val;
 	size_t tls_val;
 	size_t addend;
+	int skip_relative = 0, reuse_addends = 0, save_slot = 0;
+
+	if (dso == &ldso) {
+		size_t dyn[DYN_CNT];
+		decode_vec(ldso.dynv, dyn, DYN_CNT);
+		/* Only ldso's REL table needs addend saving/reuse. */
+		if (rel == (size_t *)(ldso.base+dyn[DT_REL]))
+			reuse_addends = 1;
+		skip_relative = 1;
+	}
 
 	for (; rel_size; rel+=stride, rel_size-=stride*sizeof(size_t)) {
-		if (dso->rel_early_relative && IS_RELATIVE(rel[1])) continue;
+		if (skip_relative && IS_RELATIVE(rel[1])) continue;
 		type = R_TYPE(rel[1]);
 		sym_index = R_SYM(rel[1]);
 		reloc_addr = (void *)(base + rel[0]);
@@ -280,12 +290,20 @@ static void do_relocs(struct dso *dso, size_t *rel, size_t rel_size, size_t stri
 			def.dso = dso;
 		}
 
-		int gotplt = (type == REL_GOT || type == REL_PLT);
-		if (dso->rel_update_got && !gotplt && stride==2) continue;
-
-		addend = stride>2 ? rel[2]
-			: gotplt || type==REL_COPY ? 0
-			: *reloc_addr;
+		if (stride > 2) {
+			addend = rel[2];
+		} else if (type==REL_GOT || type==REL_PLT|| type==REL_COPY) {
+			addend = 0;
+		} else if (reuse_addends) {
+			/* Save original addend in stage 2 where the dso
+			 * chain consists of just ldso; otherwise read back
+			 * saved addend since the inline one was clobbered. */
+			if (head==&ldso)
+				saved_addends[save_slot] = *reloc_addr;
+			addend = saved_addends[save_slot++];
+		} else {
+			addend = *reloc_addr;
+		}
 
 		sym_val = def.sym ? (size_t)def.dso->base+def.sym->st_value : 0;
 		tls_val = def.sym ? def.sym->st_value : 0;
@@ -879,7 +897,7 @@ static void do_mips_relocs(struct dso *p, size_t *got)
 	size_t i, j, rel[2];
 	unsigned char *base = p->base;
 	i=0; search_vec(p->dynv, &i, DT_MIPS_LOCAL_GOTNO);
-	if (p->rel_early_relative) {
+	if (p==&ldso) {
 		got += i;
 	} else {
 		while (i--) *got++ += (size_t)base;
@@ -1116,16 +1134,16 @@ static void update_tls_size()
  * linker itself, but some of the relocations performed may need to be
  * replaced later due to copy relocations in the main program. */
 
-void __dls2(unsigned char *base)
+void __dls2(unsigned char *base, size_t *addends)
 {
 	Ehdr *ehdr = (void *)base;
+	if (!(saved_addends = addends)) a_crash();
 	ldso.base = base;
 	ldso.name = ldso.shortname = "libc.so";
 	ldso.global = 1;
 	ldso.phnum = ehdr->e_phnum;
 	ldso.phdr = (void *)(base + ehdr->e_phoff);
 	ldso.phentsize = ehdr->e_phentsize;
-	ldso.rel_early_relative = 1;
 	kernel_mapped_dso(&ldso);
 	decode_dyn(&ldso);
 
@@ -1133,7 +1151,6 @@ void __dls2(unsigned char *base)
 	reloc_all(&ldso);
 
 	ldso.relocated = 0;
-	ldso.rel_update_got = 1;
 }
 
 /* Stage 3 of the dynamic linker is called with the dynamic linker/libc

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ppc soft-float regression
  2015-05-25 21:45                               ` Rich Felker
@ 2015-05-25 22:46                                 ` Rich Felker
  2015-05-25 23:51                                   ` Rich Felker
  0 siblings, 1 reply; 26+ messages in thread
From: Rich Felker @ 2015-05-25 22:46 UTC (permalink / raw)
  To: musl

On Mon, May 25, 2015 at 05:45:12PM -0400, Rich Felker wrote:
> @@ -74,6 +77,16 @@ void _dlstart_c(size_t *sp, size_t *dynv)
>  		*rel_addr = (size_t)base + rel[2];
>  	}
>  
> +	/* Prepare storage for stages 2 to save clobbered REL
> +	 * addends so they can be reused in stage 3. There should
> +	 * be very few. If something goes wrong and there are a
> +	 * huge number, pass a null pointer to trigger stage 2
> +	 * to abort instead of risking stack overflow. */
> +	int too_many_addends = symbolic_rel_cnt > 4096;
> +	size_t naddends = too_many_addends ? 1 : symbolic_rel_cnt;
> +	size_t addends[naddends];
> +	size_t *paddends = too_many_addends ? 0 : addends;
> +
>  	const char *strings = (void *)(base + dyn[DT_STRTAB]);
>  	const Sym *syms = (void *)(base + dyn[DT_SYMTAB]);

This logic could lead to a zero-sized VLA (thus UB); instead, trying:

	int too_many_addends = symbolic_rel_cnt > 4096;
	size_t naddends = too_many_addends ? 0 : symbolic_rel_cnt;
	size_t addends[naddends+1];
	size_t *paddends = too_many_addends ? 0 : addends;

Avoiding the wasteful +1 would involve more conditionals so I think
it's best just avoiding it. Alternatively this might be
simpler/smaller:

	size_t addends[symbolic_rel_cnt & LIMIT-1 | 1];
	size_t *paddends = symbolic_rel_cnt >= LIMIT ? 0 : addends;

Rich


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ppc soft-float regression
  2015-05-25 22:46                                 ` Rich Felker
@ 2015-05-25 23:51                                   ` Rich Felker
  0 siblings, 0 replies; 26+ messages in thread
From: Rich Felker @ 2015-05-25 23:51 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1620 bytes --]

On Mon, May 25, 2015 at 06:46:29PM -0400, Rich Felker wrote:
> On Mon, May 25, 2015 at 05:45:12PM -0400, Rich Felker wrote:
> > @@ -74,6 +77,16 @@ void _dlstart_c(size_t *sp, size_t *dynv)
> >  		*rel_addr = (size_t)base + rel[2];
> >  	}
> >  
> > +	/* Prepare storage for stages 2 to save clobbered REL
> > +	 * addends so they can be reused in stage 3. There should
> > +	 * be very few. If something goes wrong and there are a
> > +	 * huge number, pass a null pointer to trigger stage 2
> > +	 * to abort instead of risking stack overflow. */
> > +	int too_many_addends = symbolic_rel_cnt > 4096;
> > +	size_t naddends = too_many_addends ? 1 : symbolic_rel_cnt;
> > +	size_t addends[naddends];
> > +	size_t *paddends = too_many_addends ? 0 : addends;
> > +
> >  	const char *strings = (void *)(base + dyn[DT_STRTAB]);
> >  	const Sym *syms = (void *)(base + dyn[DT_SYMTAB]);
> 
> This logic could lead to a zero-sized VLA (thus UB); instead, trying:
> 
> 	int too_many_addends = symbolic_rel_cnt > 4096;
> 	size_t naddends = too_many_addends ? 0 : symbolic_rel_cnt;
> 	size_t addends[naddends+1];
> 	size_t *paddends = too_many_addends ? 0 : addends;
> 
> Avoiding the wasteful +1 would involve more conditionals so I think
> it's best just avoiding it. Alternatively this might be
> simpler/smaller:
> 
> 	size_t addends[symbolic_rel_cnt & LIMIT-1 | 1];
> 	size_t *paddends = symbolic_rel_cnt >= LIMIT ? 0 : addends;

Attached is an updated version of the patch with much simpler logic
and the addend buffer moved into stage 2 which is now possible thanks
to commit 768b82c6de24e480267c4c251c440edfc71800e3.

Rich

[-- Attachment #2: save_and_reuse_addends_v3.diff --]
[-- Type: text/plain, Size: 3638 bytes --]

diff --git a/src/ldso/dynlink.c b/src/ldso/dynlink.c
index 3842aeb..42930ad 100644
--- a/src/ldso/dynlink.c
+++ b/src/ldso/dynlink.c
@@ -74,7 +74,6 @@ struct dso {
 	volatile int new_dtv_idx, new_tls_idx;
 	struct td_index *td_index;
 	struct dso *fini_next;
-	int rel_early_relative, rel_update_got;
 	char *shortname;
 	char buf[];
 };
@@ -96,6 +95,9 @@ static struct builtin_tls {
 } builtin_tls[1];
 #define MIN_TLS_ALIGN offsetof(struct builtin_tls, pt)
 
+#define ADDEND_LIMIT 4096
+static size_t *saved_addends, *apply_addends_to;
+
 static struct dso ldso;
 static struct dso *head, *tail, *fini_head;
 static char *env_path, *sys_path;
@@ -256,9 +258,17 @@ static void do_relocs(struct dso *dso, size_t *rel, size_t rel_size, size_t stri
 	size_t sym_val;
 	size_t tls_val;
 	size_t addend;
+	int skip_relative = 0, reuse_addends = 0, save_slot = 0;
+
+	if (dso == &ldso) {
+		/* Only ldso's REL table needs addend saving/reuse. */
+		if (rel == apply_addends_to)
+			reuse_addends = 1;
+		skip_relative = 1;
+	}
 
 	for (; rel_size; rel+=stride, rel_size-=stride*sizeof(size_t)) {
-		if (dso->rel_early_relative && IS_RELATIVE(rel[1])) continue;
+		if (skip_relative && IS_RELATIVE(rel[1])) continue;
 		type = R_TYPE(rel[1]);
 		sym_index = R_SYM(rel[1]);
 		reloc_addr = (void *)(base + rel[0]);
@@ -280,12 +290,20 @@ static void do_relocs(struct dso *dso, size_t *rel, size_t rel_size, size_t stri
 			def.dso = dso;
 		}
 
-		int gotplt = (type == REL_GOT || type == REL_PLT);
-		if (dso->rel_update_got && !gotplt && stride==2) continue;
-
-		addend = stride>2 ? rel[2]
-			: gotplt || type==REL_COPY ? 0
-			: *reloc_addr;
+		if (stride > 2) {
+			addend = rel[2];
+		} else if (type==REL_GOT || type==REL_PLT|| type==REL_COPY) {
+			addend = 0;
+		} else if (reuse_addends) {
+			/* Save original addend in stage 2 where the dso
+			 * chain consists of just ldso; otherwise read back
+			 * saved addend since the inline one was clobbered. */
+			if (head==&ldso)
+				saved_addends[save_slot] = *reloc_addr;
+			addend = saved_addends[save_slot++];
+		} else {
+			addend = *reloc_addr;
+		}
 
 		sym_val = def.sym ? (size_t)def.dso->base+def.sym->st_value : 0;
 		tls_val = def.sym ? def.sym->st_value : 0;
@@ -879,7 +897,7 @@ static void do_mips_relocs(struct dso *p, size_t *got)
 	size_t i, j, rel[2];
 	unsigned char *base = p->base;
 	i=0; search_vec(p->dynv, &i, DT_MIPS_LOCAL_GOTNO);
-	if (p->rel_early_relative) {
+	if (p==&ldso) {
 		got += i;
 	} else {
 		while (i--) *got++ += (size_t)base;
@@ -1125,15 +1143,29 @@ void __dls2(unsigned char *base, size_t *sp)
 	ldso.phnum = ehdr->e_phnum;
 	ldso.phdr = (void *)(base + ehdr->e_phoff);
 	ldso.phentsize = ehdr->e_phentsize;
-	ldso.rel_early_relative = 1;
 	kernel_mapped_dso(&ldso);
 	decode_dyn(&ldso);
 
+	/* Prepare storage for to save clobbered REL addends so they
+	 * can be reused in stage 3. There should be very few. If
+	 * something goes wrong and there are a huge number, abort
+	 * instead of risking stack overflow. */
+	size_t dyn[DYN_CNT];
+	decode_vec(ldso.dynv, dyn, DYN_CNT);
+	size_t *rel = (void *)(base+dyn[DT_REL]);
+	size_t rel_size = dyn[DT_RELSZ];
+	size_t symbolic_rel_cnt = 0;
+	apply_addends_to = rel;
+	for (; rel_size; rel+=2, rel_size-=2*sizeof(size_t))
+		if (!IS_RELATIVE(rel[1])) symbolic_rel_cnt++;
+	if (symbolic_rel_cnt >= ADDEND_LIMIT) a_crash();
+	size_t addends[symbolic_rel_cnt+1];
+	saved_addends = addends;
+
 	head = &ldso;
 	reloc_all(&ldso);
 
 	ldso.relocated = 0;
-	ldso.rel_update_got = 1;
 
 	/* Call dynamic linker stage-3, __dls3, looking it up
 	 * symbolically as a barrier against moving the address

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2015-05-25 23:51 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-17  8:03 ppc soft-float regression Waldemar Brodkorb
2015-05-17 10:02 ` Felix Janda
2015-05-17 16:37   ` Rich Felker
2015-05-17 17:50     ` Felix Janda
2015-05-17 18:15       ` Felix Janda
2015-05-17 19:56         ` Felix Janda
2015-05-18 18:39           ` Felix Janda
2015-05-18 20:10             ` Rich Felker
2015-05-18 20:14               ` Rich Felker
2015-05-18 22:07                 ` Felix Janda
2015-05-22  6:23                   ` Rich Felker
2015-05-24  3:08                     ` Rich Felker
2015-05-25  0:36                       ` Rich Felker
2015-05-25  6:31                         ` Jens Gustedt
2015-05-25  6:57                           ` Rich Felker
2015-05-25  7:44                             ` Jens Gustedt
2015-05-25 13:26                               ` Szabolcs Nagy
2015-05-25 13:40                                 ` Alexander Monakov
2015-05-25 14:35                                   ` Szabolcs Nagy
2015-05-25 14:45                                     ` Alexander Monakov
2015-05-25 21:45                               ` Rich Felker
2015-05-25 22:46                                 ` Rich Felker
2015-05-25 23:51                                   ` Rich Felker
2015-05-17 13:06 ` Felix Janda
2015-05-17 16:35 ` Rich Felker
2015-05-17 17:20   ` Szabolcs Nagy

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).