* [musl] Re: [PATCH v2] powerpc/64/signal: balance return predictor stack in signal trampoline [not found] <20200511101952.1463138-1-npiggin@gmail.com> @ 2021-01-22 11:27 ` Florian Weimer 2021-01-22 14:44 ` Rich Felker 2021-01-22 18:13 ` Raoni Fassina Firmino 0 siblings, 2 replies; 6+ messages in thread From: Florian Weimer @ 2021-01-22 11:27 UTC (permalink / raw) To: Nicholas Piggin; +Cc: linuxppc-dev, Alan Modra, musl, libc-alpha * Nicholas Piggin: > diff --git a/arch/powerpc/kernel/vdso64/sigtramp.S b/arch/powerpc/kernel/vdso64/sigtramp.S > index a8cc0409d7d2..bbf68cd01088 100644 > --- a/arch/powerpc/kernel/vdso64/sigtramp.S > +++ b/arch/powerpc/kernel/vdso64/sigtramp.S > @@ -6,6 +6,7 @@ > * Copyright (C) 2004 Benjamin Herrenschmuidt (benh@kernel.crashing.org), IBM Corp. > * Copyright (C) 2004 Alan Modra (amodra@au.ibm.com)), IBM Corp. > */ > +#include <asm/cache.h> /* IFETCH_ALIGN_BYTES */ > #include <asm/processor.h> > #include <asm/ppc_asm.h> > #include <asm/unistd.h> > @@ -14,21 +15,17 @@ > > .text > > -/* The nop here is a hack. The dwarf2 unwind routines subtract 1 from > - the return address to get an address in the middle of the presumed > - call instruction. Since we don't have a call here, we artificially > - extend the range covered by the unwind info by padding before the > - real start. */ > - nop > .balign 8 > + .balign IFETCH_ALIGN_BYTES > V_FUNCTION_BEGIN(__kernel_sigtramp_rt64) > -.Lsigrt_start = . - 4 > +.Lsigrt_start: > + bctrl /* call the handler */ > addi r1, r1, __SIGNAL_FRAMESIZE > li r0,__NR_rt_sigreturn > sc > .Lsigrt_end: > V_FUNCTION_END(__kernel_sigtramp_rt64) > -/* The ".balign 8" above and the following zeros mimic the old stack > +/* The .balign 8 above and the following zeros mimic the old stack > trampoline layout. The last magic value is the ucontext pointer, > chosen in such a way that older libgcc unwind code returns a zero > for a sigcontext pointer. */ As far as I understand it, this breaks cancellation handling on musl and future glibc because it is necessary to look at the signal delivery location to see if a system call sequence has result in an action, and that location is no longer in user code after this change. We have a glibc test in preparation of our change, and it started failing: Linux 5.10 breaks sigcontext_get_pc on powerpc64 <https://sourceware.org/bugzilla/show_bug.cgi?id=27223> Isn't it possible to avoid the return predictor desynchronization by adding the appropriate hint? Thanks, Florian -- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [musl] Re: [PATCH v2] powerpc/64/signal: balance return predictor stack in signal trampoline 2021-01-22 11:27 ` [musl] Re: [PATCH v2] powerpc/64/signal: balance return predictor stack in signal trampoline Florian Weimer @ 2021-01-22 14:44 ` Rich Felker 2021-01-22 18:19 ` Raoni Fassina Firmino 2021-01-22 18:13 ` Raoni Fassina Firmino 1 sibling, 1 reply; 6+ messages in thread From: Rich Felker @ 2021-01-22 14:44 UTC (permalink / raw) To: Florian Weimer Cc: Nicholas Piggin, linuxppc-dev, Alan Modra, musl, libc-alpha On Fri, Jan 22, 2021 at 12:27:14PM +0100, Florian Weimer wrote: > * Nicholas Piggin: > > > diff --git a/arch/powerpc/kernel/vdso64/sigtramp.S b/arch/powerpc/kernel/vdso64/sigtramp.S > > index a8cc0409d7d2..bbf68cd01088 100644 > > --- a/arch/powerpc/kernel/vdso64/sigtramp.S > > +++ b/arch/powerpc/kernel/vdso64/sigtramp.S > > @@ -6,6 +6,7 @@ > > * Copyright (C) 2004 Benjamin Herrenschmuidt (benh@kernel.crashing.org), IBM Corp. > > * Copyright (C) 2004 Alan Modra (amodra@au.ibm.com)), IBM Corp. > > */ > > +#include <asm/cache.h> /* IFETCH_ALIGN_BYTES */ > > #include <asm/processor.h> > > #include <asm/ppc_asm.h> > > #include <asm/unistd.h> > > @@ -14,21 +15,17 @@ > > > > .text > > > > -/* The nop here is a hack. The dwarf2 unwind routines subtract 1 from > > - the return address to get an address in the middle of the presumed > > - call instruction. Since we don't have a call here, we artificially > > - extend the range covered by the unwind info by padding before the > > - real start. */ > > - nop > > .balign 8 > > + .balign IFETCH_ALIGN_BYTES > > V_FUNCTION_BEGIN(__kernel_sigtramp_rt64) > > -.Lsigrt_start = . - 4 > > +.Lsigrt_start: > > + bctrl /* call the handler */ > > addi r1, r1, __SIGNAL_FRAMESIZE > > li r0,__NR_rt_sigreturn > > sc > > .Lsigrt_end: > > V_FUNCTION_END(__kernel_sigtramp_rt64) > > -/* The ".balign 8" above and the following zeros mimic the old stack > > +/* The .balign 8 above and the following zeros mimic the old stack > > trampoline layout. The last magic value is the ucontext pointer, > > chosen in such a way that older libgcc unwind code returns a zero > > for a sigcontext pointer. */ > > As far as I understand it, this breaks cancellation handling on musl and > future glibc because it is necessary to look at the signal delivery > location to see if a system call sequence has result in an action, and > that location is no longer in user code after this change. > > We have a glibc test in preparation of our change, and it started > failing: > > Linux 5.10 breaks sigcontext_get_pc on powerpc64 > <https://sourceware.org/bugzilla/show_bug.cgi?id=27223> > > Isn't it possible to avoid the return predictor desynchronization by > adding the appropriate hint? Maybe I'm missing something but I don't see how this would break musl; we just inspect the PC in the mcontext, which I don't see any changes to and which should still point to the next instruction of the interrupted context. I don't have a test environment though so I'll have to wait for feedback from ppc users to be sure. Are there any further details on how it's breaking glibc? Rich ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [musl] Re: [PATCH v2] powerpc/64/signal: balance return predictor stack in signal trampoline 2021-01-22 14:44 ` Rich Felker @ 2021-01-22 18:19 ` Raoni Fassina Firmino 2021-01-22 18:31 ` Rich Felker 0 siblings, 1 reply; 6+ messages in thread From: Raoni Fassina Firmino @ 2021-01-22 18:19 UTC (permalink / raw) To: Rich Felker Cc: Florian Weimer, musl, libc-alpha, linuxppc-dev, Nicholas Piggin, Alan Modra On Fri, Jan 22, 2021 at 09:44:05AM -0500, Rich Felker wrote: > Maybe I'm missing something but I don't see how this would break musl; > we just inspect the PC in the mcontext, which I don't see any changes > to and which should still point to the next instruction of the > interrupted context. I don't have a test environment though so I'll > have to wait for feedback from ppc users to be sure. Are there any > further details on how it's breaking glibc? For glibc, backtrace() compares the return-address from each stack frame to the value of `__kernel_sigtramp_rt64` to identify the frame with the mcontext information, but now the return-address is not the start of the routine, but the middle of it, so it fails to catch this special frame. o/ Raoni Fassina ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [musl] Re: [PATCH v2] powerpc/64/signal: balance return predictor stack in signal trampoline 2021-01-22 18:19 ` Raoni Fassina Firmino @ 2021-01-22 18:31 ` Rich Felker 2021-01-22 18:50 ` Raoni Fassina Firmino 0 siblings, 1 reply; 6+ messages in thread From: Rich Felker @ 2021-01-22 18:31 UTC (permalink / raw) To: Florian Weimer, musl, libc-alpha, linuxppc-dev, Nicholas Piggin, Alan Modra On Fri, Jan 22, 2021 at 03:19:22PM -0300, Raoni Fassina Firmino wrote: > On Fri, Jan 22, 2021 at 09:44:05AM -0500, Rich Felker wrote: > > Maybe I'm missing something but I don't see how this would break musl; > > we just inspect the PC in the mcontext, which I don't see any changes > > to and which should still point to the next instruction of the > > interrupted context. I don't have a test environment though so I'll > > have to wait for feedback from ppc users to be sure. Are there any > > further details on how it's breaking glibc? > > For glibc, backtrace() compares the return-address from each stack frame > to the value of `__kernel_sigtramp_rt64` to identify the frame with the > mcontext information, but now the return-address is not the start of the > routine, but the middle of it, so it fails to catch this special frame. Is there a reason it's backtracing rather than just looking at the interrupted context (pointed to by the third argument to the signal handler)? Rich ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [musl] Re: [PATCH v2] powerpc/64/signal: balance return predictor stack in signal trampoline 2021-01-22 18:31 ` Rich Felker @ 2021-01-22 18:50 ` Raoni Fassina Firmino 0 siblings, 0 replies; 6+ messages in thread From: Raoni Fassina Firmino @ 2021-01-22 18:50 UTC (permalink / raw) To: Rich Felker Cc: Florian Weimer, musl, libc-alpha, linuxppc-dev, Nicholas Piggin, Alan Modra On Fri, Jan 22, 2021 at 01:31:27PM -0500, Rich Felker wrote: > On Fri, Jan 22, 2021 at 03:19:22PM -0300, Raoni Fassina Firmino wrote: > > On Fri, Jan 22, 2021 at 09:44:05AM -0500, Rich Felker wrote: > > > Maybe I'm missing something but I don't see how this would break musl; > > > we just inspect the PC in the mcontext, which I don't see any changes > > > to and which should still point to the next instruction of the > > > interrupted context. I don't have a test environment though so I'll > > > have to wait for feedback from ppc users to be sure. Are there any > > > further details on how it's breaking glibc? > > > > For glibc, backtrace() compares the return-address from each stack frame > > to the value of `__kernel_sigtramp_rt64` to identify the frame with the > > mcontext information, but now the return-address is not the start of the > > routine, but the middle of it, so it fails to catch this special frame. > > Is there a reason it's backtracing rather than just looking at the > interrupted context (pointed to by the third argument to the signal > handler)? The regression is exposed in the backtrace() routine. More precisely, when the backtrace() is called from inside a signal handling. What I described is the way backtrace() uses to identify this special situation. What is failling in glibc is the test for this. o/ Raoni Fassina ^ permalink raw reply [flat|nested] 6+ messages in thread
* [musl] Re: [PATCH v2] powerpc/64/signal: balance return predictor stack in signal trampoline 2021-01-22 11:27 ` [musl] Re: [PATCH v2] powerpc/64/signal: balance return predictor stack in signal trampoline Florian Weimer 2021-01-22 14:44 ` Rich Felker @ 2021-01-22 18:13 ` Raoni Fassina Firmino 1 sibling, 0 replies; 6+ messages in thread From: Raoni Fassina Firmino @ 2021-01-22 18:13 UTC (permalink / raw) To: Florian Weimer Cc: Nicholas Piggin, musl, libc-alpha, linuxppc-dev, Alan Modra On Fri, Jan 22, 2021 at 12:27:14PM +0100, AL glibc-alpha wrote: > * Nicholas Piggin: > > > diff --git a/arch/powerpc/kernel/vdso64/sigtramp.S b/arch/powerpc/kernel/vdso64/sigtramp.S > > index a8cc0409d7d2..bbf68cd01088 100644 > > --- a/arch/powerpc/kernel/vdso64/sigtramp.S > > +++ b/arch/powerpc/kernel/vdso64/sigtramp.S > > @@ -6,6 +6,7 @@ > > * Copyright (C) 2004 Benjamin Herrenschmuidt (benh@kernel.crashing.org), IBM Corp. > > * Copyright (C) 2004 Alan Modra (amodra@au.ibm.com)), IBM Corp. > > */ > > +#include <asm/cache.h> /* IFETCH_ALIGN_BYTES */ > > #include <asm/processor.h> > > #include <asm/ppc_asm.h> > > #include <asm/unistd.h> > > @@ -14,21 +15,17 @@ > > > > .text > > > > -/* The nop here is a hack. The dwarf2 unwind routines subtract 1 from > > - the return address to get an address in the middle of the presumed > > - call instruction. Since we don't have a call here, we artificially > > - extend the range covered by the unwind info by padding before the > > - real start. */ > > - nop > > .balign 8 > > + .balign IFETCH_ALIGN_BYTES > > V_FUNCTION_BEGIN(__kernel_sigtramp_rt64) > > -.Lsigrt_start = . - 4 > > +.Lsigrt_start: > > + bctrl /* call the handler */ > > addi r1, r1, __SIGNAL_FRAMESIZE > > li r0,__NR_rt_sigreturn > > sc > > .Lsigrt_end: > > V_FUNCTION_END(__kernel_sigtramp_rt64) > > -/* The ".balign 8" above and the following zeros mimic the old stack > > +/* The .balign 8 above and the following zeros mimic the old stack > > trampoline layout. The last magic value is the ucontext pointer, > > chosen in such a way that older libgcc unwind code returns a zero > > for a sigcontext pointer. */ > > As far as I understand it, this breaks cancellation handling on musl and > future glibc because it is necessary to look at the signal delivery > location to see if a system call sequence has result in an action, and > that location is no longer in user code after this change. > > We have a glibc test in preparation of our change, and it started > failing: > > Linux 5.10 breaks sigcontext_get_pc on powerpc64 > <https://sourceware.org/bugzilla/show_bug.cgi?id=27223> > > Isn't it possible to avoid the return predictor desynchronization by > adding the appropriate hint? I also caught this regression, I believe it was introduced in the kernel 5.9. I don't know enough to comment on Florian suggestion, but I am working on some possible fixes: On the kernel side we can keep `__kernel_sigtramp_rt64` in the original place after `bctrl` and add a new symbol so the kernel can jump to the right place before `bctrl`. This would ensure backward compatibility. On the other side, this change exposed how fragile `backtrace()` is to any changes in the trampoline code, which the libc has no control over in this case. So maybe there is something that can be improved in how backtrace decides that the return-address is to the trampoline. My fist option is to test a range, after `__kernel_sigtramp_rt64` so see if the address is inside the routine. This would be better if we can know the size of the function, I know that the vdso.so has this info in the elf but I don't know if it is exposed to the glibc. As Nicholas mentioned in his patch, GDB seems to keep working just fine, it is seems that GDB uses some heuristics to match the surround code of the return address to identify that it is the trampoline code. So maybe other option is to do something similar. o/ Raoni Fassina ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-01-22 18:51 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <20200511101952.1463138-1-npiggin@gmail.com> 2021-01-22 11:27 ` [musl] Re: [PATCH v2] powerpc/64/signal: balance return predictor stack in signal trampoline Florian Weimer 2021-01-22 14:44 ` Rich Felker 2021-01-22 18:19 ` Raoni Fassina Firmino 2021-01-22 18:31 ` Rich Felker 2021-01-22 18:50 ` Raoni Fassina Firmino 2021-01-22 18:13 ` Raoni Fassina Firmino
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).