From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 10446 invoked from network); 7 Dec 2021 13:25:27 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 7 Dec 2021 13:25:27 -0000 Received: (qmail 7528 invoked by uid 550); 7 Dec 2021 13:25:23 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 7505 invoked from network); 7 Dec 2021 13:25:23 -0000 Date: Tue, 7 Dec 2021 08:25:09 -0500 From: Rich Felker To: David Edelsohn Cc: musl@lists.openwall.com, Florian Weimer , Stijn Tintel Message-ID: <20211207132509.GO7074@brightrain.aerifal.cx> References: <20211206234358.2174444-1-stijn@linux-ipv6.be> <87tufljlmv.fsf@oldenburg.str.redhat.com> <20211207005940.GK7074@brightrain.aerifal.cx> <20211207013930.GM7074@brightrain.aerifal.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] [PATCH] ppc64: check for AltiVec in setjmp/longjmp On Mon, Dec 06, 2021 at 08:44:47PM -0500, David Edelsohn wrote: > On Mon, Dec 6, 2021 at 8:39 PM Rich Felker wrote: > > > > On Mon, Dec 06, 2021 at 08:15:48PM -0500, David Edelsohn wrote: > > > On Mon, Dec 6, 2021 at 7:59 PM Rich Felker wrote: > > > > > > > > On Tue, Dec 07, 2021 at 01:37:12AM +0100, Florian Weimer wrote: > > > > > * Stijn Tintel: > > > > > > > > > > > diff --git a/src/setjmp/powerpc64/setjmp.s b/src/setjmp/powerpc64/setjmp.s > > > > > > index 37683fda..32853693 100644 > > > > > > --- a/src/setjmp/powerpc64/setjmp.s > > > > > > +++ b/src/setjmp/powerpc64/setjmp.s > > > > > > @@ -69,7 +69,17 @@ __setjmp_toc: > > > > > > stfd 30, 38*8(3) > > > > > > stfd 31, 39*8(3) > > > > > > > > > > > > - # 5) store vector registers v20-v31 > > > > > > + # 5) store vector registers v20-v31 if hardware supports AltiVec > > > > > > + mflr 0 > > > > > > + bl 1f > > > > > > + .hidden __hwcap > > > > > > + .long __hwcap-. > > > > > > +1: mflr 4 > > > > > > > > > > This de-balances the return stack and probably has quite severe > > > > > performance impact. The ISA manual says to use > > > > > > > > > > bcl 20,31,$+4 > > > > > > > > > > and you'll have to store the __hwcap offset somewhere else. > > > > > > > > To begin with, let's change the .s files to .S files and put the whole > > > > branch logic inside #ifndef __ALTIVEC__ so that it does not impact > > > > normal builds with an ISA level where Altivec can be assumed to be > > > > present. > > > > > > > > I'm not sufficiently familiar with the PowerPC ISA to know how bcl > > > > works, but if there's a less expensive solution along those lines > > > > that's compatible with all ISA levels, by all means let's use it. The > > > > same could be done for powerpc-sf (32-bit) and its SPE branches, too. > > > > > > bl = branch and link > > > bcl = branch conditional and link > > > > > > link means place the next instruction address in the link register. > > > Normally a branch and link would be used for a matching "return" > > > instruction, but in this case it is being used to compute a position > > > independent code address. As Florian correctly points out, the "bl" > > > will corrupt the link stack in the processor used to predict return > > > addresses and the recommended sequence is the one that he suggests. > > > > > > bcl 20,31,addr > > > > > > which means branch always and, because the condition register bits are > > > irrelevant, a special value that instructs the processor to not push > > > the address onto the link stack so that the "calls" and "returns" > > > remain matched. > > > > Thanks. Am I correct in understanding then that we don't need $+4, but > > can instead use the 1f just as now, with inline .long __hwcap-. -- in > > other words that "bcl 20,31," is a drop-in replacement for "bl" > > without the link stack impact? > > It should work, but it's slightly preferred to use $+4 because one > explicitly wants the address of the next instruction and labels of the In this case we don't want the address of the next instruction. We want the address of the constant __hwcap-. Rich