mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] List-Unsubscribe
@ 2021-12-07  8:33 Quesada Gonzalez, Elena
  0 siblings, 0 replies; only message in thread
From: Quesada Gonzalez, Elena @ 2021-12-07  8:33 UTC (permalink / raw)
  To: musl



-----Mensaje original-----
De: David Edelsohn <dje.gcc@gmail.com> 
Enviado el: martes, 7 de diciembre de 2021 2:45
Para: Rich Felker <dalias@libc.org>
CC: musl@lists.openwall.com; Florian Weimer <fweimer@redhat.com>; Stijn Tintel <stijn@linux-ipv6.be>
Asunto: Re: [musl] [PATCH] ppc64: check for AltiVec in setjmp/longjmp

On Mon, Dec 6, 2021 at 8:39 PM Rich Felker <dalias@libc.org> wrote:
>
> On Mon, Dec 06, 2021 at 08:15:48PM -0500, David Edelsohn wrote:
> > On Mon, Dec 6, 2021 at 7:59 PM Rich Felker <dalias@libc.org> wrote:
> > >
> > > On Tue, Dec 07, 2021 at 01:37:12AM +0100, Florian Weimer wrote:
> > > > * Stijn Tintel:
> > > >
> > > > > diff --git a/src/setjmp/powerpc64/setjmp.s 
> > > > > b/src/setjmp/powerpc64/setjmp.s index 37683fda..32853693 
> > > > > 100644
> > > > > --- a/src/setjmp/powerpc64/setjmp.s
> > > > > +++ b/src/setjmp/powerpc64/setjmp.s
> > > > > @@ -69,7 +69,17 @@ __setjmp_toc:
> > > > >     stfd 30, 38*8(3)
> > > > >     stfd 31, 39*8(3)
> > > > >
> > > > > -   # 5) store vector registers v20-v31
> > > > > +   # 5) store vector registers v20-v31 if hardware supports AltiVec
> > > > > +   mflr 0
> > > > > +   bl 1f
> > > > > +   .hidden __hwcap
> > > > > +   .long __hwcap-.
> > > > > +1: mflr 4
> > > >
> > > > This de-balances the return stack and probably has quite severe 
> > > > performance impact.  The ISA manual says to use
> > > >
> > > >   bcl 20,31,$+4
> > > >
> > > > and you'll have to store the __hwcap offset somewhere else.
> > >
> > > To begin with, let's change the .s files to .S files and put the 
> > > whole branch logic inside #ifndef __ALTIVEC__ so that it does not 
> > > impact normal builds with an ISA level where Altivec can be 
> > > assumed to be present.
> > >
> > > I'm not sufficiently familiar with the PowerPC ISA to know how bcl 
> > > works, but if there's a less expensive solution along those lines 
> > > that's compatible with all ISA levels, by all means let's use it. 
> > > The same could be done for powerpc-sf (32-bit) and its SPE branches, too.
> >
> > bl = branch and link
> > bcl = branch conditional and link
> >
> > link means place the next instruction address in the link register.
> > Normally a branch and link would be used for a matching "return"
> > instruction, but in this case it is being used to compute a position 
> > independent code address.  As Florian correctly points out, the "bl"
> > will corrupt the link stack in the processor used to predict return 
> > addresses and the recommended sequence is the one that he suggests.
> >
> > bcl 20,31,addr
> >
> > which means branch always and, because the condition register bits 
> > are irrelevant, a special value that instructs the processor to not  
> > push the address onto the link stack so that the "calls" and "returns"
> > remain matched.
>
> Thanks. Am I correct in understanding then that we don't need $+4, but 
> can instead use the 1f just as now, with inline .long __hwcap-. -- in 
> other words that "bcl 20,31," is a drop-in replacement for "bl"
> without the link stack impact?

It should work, but it's slightly preferred to use $+4 because one explicitly wants the address of the next instruction and labels of the form "1f" are not supported by all assemblers.

Thanks, David

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2021-12-07  8:34 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-07  8:33 [musl] List-Unsubscribe Quesada Gonzalez, Elena

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).