From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-10.9 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HTML_MESSAGE,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 21696 invoked from network); 7 Dec 2021 18:27:50 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 7 Dec 2021 18:27:50 -0000 Received: (qmail 9630 invoked by uid 550); 7 Dec 2021 18:27:48 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 9607 invoked from network); 7 Dec 2021 18:27:47 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=f59fVhyXj8sAnbrX0b3Fu6QMBeeCwLRriHuCt3ODY3k=; b=psXM+iJZoewDUhutDQjFD3AzaVkNq00uwVuP6iDyQw8QWjOlNY68LiBofBYB1podLm 7mgH1appFKox4Y/krpl1GtWngYEWp3Q+BUNeruMFl8+73m74fm04fJfQ8cpXLMgVjK8G B8z74ACV0G1yWQnGfjoaGGKeARfVJ5MSSbLf1GInwyVTPiI1TQrUQulsI104Ueo88p87 IHlMdAGGWBC+CcYXCDt0+n7OWKIdcAVT6/kpWs7XM2nMO0EW0lwFa8FPAraUw/3e3osH HnlYrBvzIJrsilR50mc+zZt0jP3jm8ryIKHnzp3hI1nJ88VWFF2oNZneHteZ5RdCn2Xz 6awQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=f59fVhyXj8sAnbrX0b3Fu6QMBeeCwLRriHuCt3ODY3k=; b=FNeCw6+7o8yAO2OxYwOWo7AhoLyTSxd4V8QFNbBr/fYJLJ+UfIIybOXZ8/JDMeuSBL vgH6n+nUU9ysVlQ/Q9qO6aiMlskiBlHXBuRVszprFCD2AjQK9qId1AEMWUrx67/TgBPy iY1Fplju4LXTWya6rseld88IIKb5siX5dlktXiojQNtfKrTZVaYaYGbz2/pToMN+K2jL mQN/Q5SIZzDt2Ijx1vQ1X418Nu9tYkJb75CK3/S7BWz/+nJQxsjHwLak67LE76UjWGMu Qr/LgrZrJfoLNx4oYtmvZE4nDIoh/rfi2trWgLBPU8CyiuqA8QiFGHyruRGgLvE2b8eN jwaA== X-Gm-Message-State: AOAM532Ub7tmX9y/5w42R15CbeYcMj98c5mpQ7Mqh2yqmZOyr6GE3rr3 gOMpSECSe60nq5MBFHJ6h3fKXnnzr5G8Auz4PjFrp6ij7i0= X-Google-Smtp-Source: ABdhPJzCEmxXZ7YOPo4dlPMoBC1k94/IKf0qHaHbPF/5H7i5otj/LkuMKD2KnFRAlcBB/ImjA7yJN8CQ8O/G4Do1q4w= X-Received: by 2002:a2e:b541:: with SMTP id a1mr44059958ljn.289.1638901654785; Tue, 07 Dec 2021 10:27:34 -0800 (PST) MIME-Version: 1.0 References: <20211206234358.2174444-1-stijn@linux-ipv6.be> <87tufljlmv.fsf@oldenburg.str.redhat.com> <20211207005940.GK7074@brightrain.aerifal.cx> <20211207013930.GM7074@brightrain.aerifal.cx> In-Reply-To: From: James Y Knight Date: Tue, 7 Dec 2021 13:27:08 -0500 Message-ID: To: musl@lists.openwall.com Cc: Rich Felker , Florian Weimer , Stijn Tintel Content-Type: multipart/alternative; boundary="000000000000f5486d05d292860e" Subject: Re: [musl] [PATCH] ppc64: check for AltiVec in setjmp/longjmp --000000000000f5486d05d292860e Content-Type: text/plain; charset="UTF-8" The important question at hand is whether the hardware treats "next instruction" as a critical part of the special case. The recommended sequence is: bcl 20,31,$+4 next-instructions... But, does the hardware _also_ trigger the expected special-cased effect on the return stack when jumping to locations other than the next instruction? E.g. is this OK w.r.t. return-stack? bcl 20,31,$+8 .long __hwcap-. next-instructions... On X86, calling *exactly* the next instruction is how you trigger the special-case in the return-stack-predictor. But, it sounds like potentially on PPC, the address is not part of what triggers the special-case. Is that correct? On Mon, Dec 6, 2021 at 8:45 PM David Edelsohn wrote: > On Mon, Dec 6, 2021 at 8:39 PM Rich Felker wrote: > > > > On Mon, Dec 06, 2021 at 08:15:48PM -0500, David Edelsohn wrote: > > > On Mon, Dec 6, 2021 at 7:59 PM Rich Felker wrote: > > > > > > > > On Tue, Dec 07, 2021 at 01:37:12AM +0100, Florian Weimer wrote: > > > > > * Stijn Tintel: > > > > > > > > > > > diff --git a/src/setjmp/powerpc64/setjmp.s > b/src/setjmp/powerpc64/setjmp.s > > > > > > index 37683fda..32853693 100644 > > > > > > --- a/src/setjmp/powerpc64/setjmp.s > > > > > > +++ b/src/setjmp/powerpc64/setjmp.s > > > > > > @@ -69,7 +69,17 @@ __setjmp_toc: > > > > > > stfd 30, 38*8(3) > > > > > > stfd 31, 39*8(3) > > > > > > > > > > > > - # 5) store vector registers v20-v31 > > > > > > + # 5) store vector registers v20-v31 if hardware supports > AltiVec > > > > > > + mflr 0 > > > > > > + bl 1f > > > > > > + .hidden __hwcap > > > > > > + .long __hwcap-. > > > > > > +1: mflr 4 > > > > > > > > > > This de-balances the return stack and probably has quite severe > > > > > performance impact. The ISA manual says to use > > > > > > > > > > bcl 20,31,$+4 > > > > > > > > > > and you'll have to store the __hwcap offset somewhere else. > > > > > > > > To begin with, let's change the .s files to .S files and put the > whole > > > > branch logic inside #ifndef __ALTIVEC__ so that it does not impact > > > > normal builds with an ISA level where Altivec can be assumed to be > > > > present. > > > > > > > > I'm not sufficiently familiar with the PowerPC ISA to know how bcl > > > > works, but if there's a less expensive solution along those lines > > > > that's compatible with all ISA levels, by all means let's use it. The > > > > same could be done for powerpc-sf (32-bit) and its SPE branches, too. > > > > > > bl = branch and link > > > bcl = branch conditional and link > > > > > > link means place the next instruction address in the link register. > > > Normally a branch and link would be used for a matching "return" > > > instruction, but in this case it is being used to compute a position > > > independent code address. As Florian correctly points out, the "bl" > > > will corrupt the link stack in the processor used to predict return > > > addresses and the recommended sequence is the one that he suggests. > > > > > > bcl 20,31,addr > > > > > > which means branch always and, because the condition register bits are > > > irrelevant, a special value that instructs the processor to not push > > > the address onto the link stack so that the "calls" and "returns" > > > remain matched. > > > > Thanks. Am I correct in understanding then that we don't need $+4, but > > can instead use the 1f just as now, with inline .long __hwcap-. -- in > > other words that "bcl 20,31," is a drop-in replacement for "bl" > > without the link stack impact? > > It should work, but it's slightly preferred to use $+4 because one > explicitly wants the address of the next instruction and labels of the > form "1f" are not supported by all assemblers. > > Thanks, David > --000000000000f5486d05d292860e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
The important question at hand is whether the hardware tre= ats "next instruction" as a critical part of the special case. Th= e recommended sequence is:
=C2=A0 bcl 20,31,$+4
=C2= =A0 next-instructions...

But, does the hardware _a= lso_ trigger the expected special-cased effect on the return stack when jum= ping to locations other than the next instruction? E.g. is this OK w.r.t. r= eturn-stack?
=C2=A0 bcl 20,31,$+8
=C2=A0 .long __hwcap-= .
=C2=A0 next-instructions...

On X86, calling exactly the next instruction is how you trigger th= e special-case in the return-stack-predictor.=C2=A0But, it sounds like pote= ntially=C2=A0on PPC, the address is not part of what triggers the special-c= ase. Is that correct?


On Mon, Dec 6, 2021 at 8:45 PM Da= vid Edelsohn <dje= .gcc@gmail.com> wrote:
On Mon, Dec 6, 2021 at 8:39 PM Rich Felker <dalias@libc.org> wrote:
>
> On Mon, Dec 06, 2021 at 08:15:48PM -0500, David Edelsohn wrote:
> > On Mon, Dec 6, 2021 at 7:59 PM Rich Felker <dalias@libc.org> wrote:
> > >
> > > On Tue, Dec 07, 2021 at 01:37:12AM +0100, Florian Weimer wro= te:
> > > > * Stijn Tintel:
> > > >
> > > > > diff --git a/src/setjmp/powerpc64/setjmp.s b/src/s= etjmp/powerpc64/setjmp.s
> > > > > index 37683fda..32853693 100644
> > > > > --- a/src/setjmp/powerpc64/setjmp.s
> > > > > +++ b/src/setjmp/powerpc64/setjmp.s
> > > > > @@ -69,7 +69,17 @@ __setjmp_toc:
> > > > >=C2=A0 =C2=A0 =C2=A0stfd 30, 38*8(3)
> > > > >=C2=A0 =C2=A0 =C2=A0stfd 31, 39*8(3)
> > > > >
> > > > > -=C2=A0 =C2=A0# 5) store vector registers v20-v31<= br> > > > > > +=C2=A0 =C2=A0# 5) store vector registers v20-v31 = if hardware supports AltiVec
> > > > > +=C2=A0 =C2=A0mflr 0
> > > > > +=C2=A0 =C2=A0bl 1f
> > > > > +=C2=A0 =C2=A0.hidden __hwcap
> > > > > +=C2=A0 =C2=A0.long __hwcap-.
> > > > > +1: mflr 4
> > > >
> > > > This de-balances the return stack and probably has quit= e severe
> > > > performance impact.=C2=A0 The ISA manual says to use > > > >
> > > >=C2=A0 =C2=A0bcl 20,31,$+4
> > > >
> > > > and you'll have to store the __hwcap offset somewhe= re else.
> > >
> > > To begin with, let's change the .s files to .S files and= put the whole
> > > branch logic inside #ifndef __ALTIVEC__ so that it does not = impact
> > > normal builds with an ISA level where Altivec can be assumed= to be
> > > present.
> > >
> > > I'm not sufficiently familiar with the PowerPC ISA to kn= ow how bcl
> > > works, but if there's a less expensive solution along th= ose lines
> > > that's compatible with all ISA levels, by all means let&= #39;s use it. The
> > > same could be done for powerpc-sf (32-bit) and its SPE branc= hes, too.
> >
> > bl =3D branch and link
> > bcl =3D branch conditional and link
> >
> > link means place the next instruction address in the link registe= r.
> > Normally a branch and link would be used for a matching "ret= urn"
> > instruction, but in this case it is being used to compute a posit= ion
> > independent code address.=C2=A0 As Florian correctly points out, = the "bl"
> > will corrupt the link stack in the processor used to predict retu= rn
> > addresses and the recommended sequence is the one that he suggest= s.
> >
> > bcl 20,31,addr
> >
> > which means branch always and, because the condition register bit= s are
> > irrelevant, a special value that instructs the processor to not= =C2=A0 push
> > the address onto the link stack so that the "calls" and= "returns"
> > remain matched.
>
> Thanks. Am I correct in understanding then that we don't need $+4,= but
> can instead use the 1f just as now, with inline .long __hwcap-. -- in<= br> > other words that "bcl 20,31," is a drop-in replacement for &= quot;bl"
> without the link stack impact?

It should work, but it's slightly preferred to use $+4 because one
explicitly wants the address of the next instruction and labels of the
form "1f" are not supported by all assemblers.

Thanks, David
--000000000000f5486d05d292860e--