From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/9157 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general,gmane.comp.gcc.devel Subject: Re: SH runtime switchable atomics - proposed design Date: Tue, 19 Jan 2016 15:51:06 -0500 Message-ID: <20160119205105.GQ238@brightrain.aerifal.cx> References: <20160119202851.GA18720@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="d6Gm4EdcadzBjdND" X-Trace: ger.gmane.org 1453236691 31971 80.91.229.3 (19 Jan 2016 20:51:31 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 19 Jan 2016 20:51:31 +0000 (UTC) Cc: Oleg Endo , musl@lists.openwall.com To: gcc@gcc.gnu.org Original-X-From: musl-return-9170-gllmg-musl=m.gmane.org@lists.openwall.com Tue Jan 19 21:51:30 2016 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1aLdFQ-0003Et-51 for gllmg-musl@m.gmane.org; Tue, 19 Jan 2016 21:51:28 +0100 Original-Received: (qmail 5674 invoked by uid 550); 19 Jan 2016 20:51:25 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 5651 invoked from network); 19 Jan 2016 20:51:24 -0000 Content-Disposition: inline In-Reply-To: <20160119202851.GA18720@brightrain.aerifal.cx> User-Agent: Mutt/1.5.21 (2010-09-15) Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:9157 gmane.comp.gcc.devel:142265 Archived-At: --d6Gm4EdcadzBjdND Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Tue, Jan 19, 2016 at 03:28:52PM -0500, Rich Felker wrote: > I've been working on the new version of runtime-selected SH atomics > for musl, and I think what I've got might be appropriate for GCC's > generated atomics too. I know Oleg was not very excited about doing > this on the gcc side from a cost/benefit perspective, but I think my > approach is actually preferable over inline atomics from a code size > perspective. It uses a single "cas" function with an "SFUNC" type ABI > (not standard calling convention) with the following constraints: > > Inputs: > - R0: Memory address to operate on > - R1: Address of implementation function, loaded from a global > - R2: Comparison value > - R3: Value to set on success > > Outputs: > - R3: Old value read, ==R2 iff cas succeeded. > > Preserved: R0, R2. > > Clobbered: R1, PR, T. > > This call (performed from __asm__ for musl, but gcc would do it as SH > "SFUNC") is highly compact/convenient for inlining because it avoids > clobbering any of the argument registers that are likely to already be > in use by the caller, and it preserves the important values that are > likely to be reused after the cas operation. > > For J2 and future J4, the function pointer just points to: > > rts > cas.l r2,r3,@r0 > > and the only costs vs an inline cas.l are loading the address of the > function (done in the caller; involves GOT access) and clobbering R1 > and PR. > > This is still a draft design and the version in musl is subject to > change at any time since it's not a public API/ABI, but I think it > could turn into something useful to have on the gcc side with a > -matomic-model=libfunc option or similar. Other ABI considerations for > gcc use would be where to store the function pointer and how to > initialize it. To be reasonably efficient with FDPIC the caller needs > to be responsible for loading the function pointer (and it needs to > always point to code, not a function descriptor) so that the callee > does not need a GOT pointer passed in. Attached is my current draft of the implementations of the cas 'sfunc' for musl. Forgot to include it before. Rich --d6Gm4EdcadzBjdND Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="sh.s" /* Contract for all versions is same as cas.l r2,r3,@r0 * pr and r1 are also clobbered (by jsr & r1 as temp). * r0,r2,r4-r15 must be preserved. * r3 contains result (==r2 iff cas succeeded). */ .align 2 __sh_cas_gusa: mov.l r5,@-r15 mov.l r4,@-r15 mov.l r0,r4 mova 1f,r0 mov r15,r1 mov #(0f-1f),r15 0: mov.l @r4,r5 cmp/eq r5,r2 bf 1f mov.l r3,@r4 1: mov r1,r15 mov r5,r3 mov r4,r0 mov.l @r15+,r4 rts mov.l @r15+,r5 __sh_cas_llsc: mov r0,r1 synco 0: movli.l @r1,r0 cmp/eq r0,r2 bf 1f mov r3,r0 movco.l r0,@r1 bf 0b mov r2,r0 1: synco mov r0,r3 rts mov r1,r0 __sh_cas_imask: mov r0,r1 stc sr,r0 mov.l r0,@-r15 or #0xf0,r0 ldc r0,sr mov.l @r1,r0 cmp/eq r0,r2 bf 1f mov r3,@r1 1: ldc.l @r15+,sr mov r0,r3 rts mov r1,r0 __sh_cas_cas_l: rts cas.l r2,r3,@r0 --d6Gm4EdcadzBjdND--